A number of users were receiving a “request timeout” error message when using a SharePoint workspace:
After logging into the SharePoint web front-end, you could immediately feel the high CPU usage, as the Windows desktop felt sluggish and unresponsive. A glance at Task Manager confirmed the high CPU usage from a w3wp process.
The event log recorded slightly more detail about the timeouts, giving each one a generic “ApplicationException” label with a message of: ” This operation returned because the timeout period expired. (Exception from HRESULT: 0x800705B4).” Since neither of these items were particularly informative, I turned my attention to top of the stack trace from the error:
Stack trace: at System.Threading.ReaderWriterLock.AcquireReaderLockInternal(Int32 millisecondsTimeout)
at System.Threading.ReaderWriterLock.AcquireReaderLock(Int32 millisecondsTimeout)
The ReaderWriterLock line was interesting; a reader-writer lock allows multiple threads to read data concurrently, while any thread that needs to write must acquire an exclusive lock to write data. And when a writer is writing data, readers will be blocked until the writer is finished writing.
This error is often associated with a deadlock on the database server. My next step was to enable deadlock tracing on the SQL Server via the “DBCC TRACEON” query, and watch the error logs to see if any deadlocks were recorded. After a full day of normal usage, no deadlocks were found. I turned my attention back to the web server…
I took a dump of the W3WP process that was chewing up all the CPU cycles with the “adplus” command, and opened it in Windbg with the sosex.dll loaded. I then ran the !rwlock command to list all the ReaderWriterLock instances:
Address ReaderCount WaitingReaderCount WriterThread WaitingWriterCount
1437fcb0 0 7 0x2e 1
The command revealed that thread 46 (hex value 0x2e) owned the WriterThread, which meant no other thread could acquire a ReaderLock or WriterLock. You can also see that seven other Reader threads and one Writer thread were queued, waiting for this lock to clear.
Digging into thread 46 provided more detail about the other threads connected to it:
0:033> !rwlock 1437fcb0
I decided to have a deeper look at thread 46, and issued the “kb” command to print its call stack:
ChildEBP RetAddr Args to Child
30aefe9c 7c822114 77e6bb08 000022e8 00000000 ntdll!KiFastSystemCallRet
30aefea0 77e6bb08 000022e8 00000000 00000000 ntdll!ZwWaitForSingleObject+0xc
30aeff10 77e6ba72 000022e8 ffffffff 00000000 kernel32!WaitForSingleObjectEx+0xac
30aeff24 36232fd0 000022e8 ffffffff 30aeff68 kernel32!WaitForSingleObject+0x12
30aeff34 36233707 ffffffff 6fde9fa0 00000000 STSWEL!VvirusCheckUploadStatus::addDocument+0xf1
30aeff68 35f9b408 00000000 00000000 30aeffb0 STSWEL!VvirusCheckUploadStatus::addDocument+0x828
30aeff78 781329bb 2c44cbe8 b4845c59 00000000 ONETUTIL!Vthread::kill+0x86
30aeffb0 78132a47 00000000 77e660b9 0663acb8 msvcr80!_endthreadex+0x3b
30aeffb8 77e660b9 0663acb8 00000000 00000000 msvcr80!_endthreadex+0xc7
30aeffec 00000000 781329e1 0663acb8 00000000 kernel32!BaseThreadStart+0x34
Note the presence of the “virusCheckUploadStatus::addDocument” method. By its name, I guessed that this might be related to the ForeFront Server Security functionality that scans documents on upload to SharePoint. I decided to have a look at thread 48 (hex value 0x30), which was listed as a WaitingReader thread:
48 Id: 2d4.1e90 Suspend: 1 Teb: 7ff0d000 Unfrozen
ChildEBP RetAddr Args to Child
312efe9c 7c822114 77e6bb08 00002254 00000000 ntdll!KiFastSystemCallRet
312efea0 77e6bb08 00002254 00000000 00000000 ntdll!ZwWaitForSingleObject+0xc
312eff10 77e6ba72 00002254 ffffffff 00000000 kernel32!WaitForSingleObjectEx+0xac
312eff24 36232fd0 00002254 ffffffff 312eff68 kernel32!WaitForSingleObject+0x12
312eff34 36233707 ffffffff 6e5e9fa0 00000000 STSWEL!VvirusCheckUploadStatus::addDocument+0xf1
312eff68 35f9b408 00000000 00000000 312effb0 STSWEL!VvirusCheckUploadStatus::addDocument+0x828
312eff78 781329bb 2c44d0b8 b5045c59 00000000 ONETUTIL!Vthread::kill+0x86
312effb0 78132a47 00000000 77e660b9 0663acb8 msvcr80!_endthreadex+0x3b
312effb8 77e660b9 0663acb8 00000000 00000000 msvcr80!_endthreadex+0xc7
312effec 00000000 781329e1 0663acb8 00000000 kernel32!BaseThreadStart+0x34
As you can see, it also had the “virusCheckUploadStatus::addDocument” method in its stack. The other waiting threads looked the same. Based on this data, I decided to disable upload scanning in SharePoint, and see if it impacted CPU usage:
It did! Over the course of the next few business days, CPU usage returned to normal levels and subsequent memory dumps of the w3wp process revealed no ReaderWriter locks. However, since scanning documents on upload is the primary reason we use ForeFront, we had to figure out what was causing the problem. We dug into the SecurityAdministrator console, and reviewed the “Report” section. There we found ForeFront having repeated problems with quarantine of documents containing the “MSWord/Dropper.B!Camelot” virus.
We disabled quarantine of files for the Realtime Scan Job, which is a Microsoft recommended best practice. This allowed us to re-enable upload document scanning while avoiding the high CPU problem.