See How Easily you can Solve "Log File Sync" Wait Event ?

Log File Sync - I believe every Database Administrator has seen this wait event during his work. Few can solve log file sync wait Event easily but for other this could be a pain. Before going into how to solve log file sync wait event. Let's understand.


What is Log File Sync Wait Event ?


Whenever user issue a commit statement, LGWR start writing data from log buffer cache to redo log files. Starting from writing to redo log upto get back confirmation from LGWR that data is written in redo log file user session will wait into log file sync wait event. This is basically time when your data is getting written from log buffer to redo log files.

Whenever you see log file sync, I would suggest to take following and analyze.


1. OS Watcher: OS Watcher will give you a fair idea about over all load on the system. Check do you really has high load on system which can cause delay in system I/O process and finally high log file sync wait event. How to Analyze or Read OS Watcher Output in three easy steps ?


2. AWR Report: Next, collect at least two AWR Reports one for good time and another at time of high log file sync wait event. 30 minutes is the good time interval for AWR report, report has longer time interval doesn't help much to figure out root cause of the issue. Database administrator has to compare and Analyze these AWR repots.


3. Alert Log File: Alert log file of database shows how frequently redo logs are switching, recommended time for redo log switching is 15 to 20 minutes if database administrator find that redo log switching is happening more frequently. He should consider resizing redo log files.


How to Solve Log File Sync Wait Event ?


There could be so many reasons for Log file Sync wait event, Here I would explain most obvious reasons from them.

1. Slow Write on Hard Disk: One of the reason for Log file sync wait event could be slow write in to hard disk by LGWR process. To verify this DBA has to compare AVG wait of "log file sync" and "log file parallel write" wait event. Since log file parallel write is a part of log file sync wait event. So DBA has to check if Proportion of log file parallel write is high in log file sync if yes, this issue is due to disk I/O issue. In below AWR report snapshot


Total time wait for Log file sync is 868,667 and out of it 93,144 times it was waiting for log file parallel write wait event which is quite high. One more important criteria to major I/O is If an average time for 'log file parallel write' is over 20 milliseconds this means problem with IO subsystem.

Since you have confirmed the issue is with System I/O. DBA can take following steps to solve this.

1. Check if you are using Hard disk having redo log files for some other files, if yes move redo log files to a separate Hard disk which should not have any frequently accessed data.
2. Do not put redo logs on RAID 5.
3. Ensure that the log_buffer is not too big. A very large log_buffer can have an adverse affect  as waits will be longer when flushes occur. When the buffer fills up,it has to write all the data into the redo log file and the LGWR will wait until the last I/O is completed.
4. Ask you network admin to look into and find out root cause of the issue.


2. Excessive Commits from Application: Sometimes Application running on the database also cause high log file wait events due to frequent issue of commit command. Every time user execute commit command LGWR write log buffer data into redo log files which initiate log file sync wait event. 

To identify a high commit rate, if the average wait time for 'log file sync' is much higher than the average wait time for 'log file parallel write', then this means that most of the time waiting is not due to waiting for the redo to be written and thus slow IO is not the cause of the problem. The surplus time is CPU activity and is most commonly contention caused by over committing.

DBA can also see, In the AWR or Statspack report, if  the  average user calls per commit or rollback  calculated as "user calls/(user commits+user rollbacks)"   is less than 30, then  commits are happening too frequently.
 

In the above example we see an average of 5.76 user calls  per commit which is considered high - about 5x higher that recommended. Rule of thumb, we should expect at least 25 user calls / commit

Recommendations for reducing high Application commits:

 1. If there are lots of short duration transactions, see if it is possible to group transactions together so there are fewer distinct COMMIT operations. Since it is mandatory for each commit to receive confirmation that the relevant REDO is on disk.
2. See if any activity can safely be done with NOLOGGING / UNRECOVERABLE options.
3. Talk to development team for making some modifications into application code to reduce high commits.

3. Redo Log file Size: If database redo log file size is not adequate, which cause high number log file switching. During log file switching process LGWR writes all log buffer data into redo log files and goes into log file sync wait event. To verify frequency of log file switching check alert log file which keep record of each log file switch. Recommended time for log file switch is 15 to 20 minutes. If you are seeing frequenct log file switch consider resizing redo log file size.


If you don't find any of the above issue matching your situation. Then there could be some known bugs into the database. I would suggest to open a Service Request with Oracle Support for further assistance.

1 comment:

  1. Nice article...Thanks a ton for such a valuable information

    ReplyDelete