Debugging high CPU usage on ForeFront Security for SharePoint

A number of users were receiving a “request timeout” error message when using a SharePoint workspace:

After logging into the SharePoint web front-end, you could immediately feel the high CPU usage, as the Windows desktop felt sluggish and unresponsive.  A glance at Task Manager confirmed the high CPU usage from a w3wp process.

The event log recorded slightly more detail about the timeouts, giving each one a generic “ApplicationException” label with a message of: ” This operation returned because the timeout period expired. (Exception from HRESULT: 0x800705B4).”  Since neither of these items were particularly informative, I turned my attention to top of the stack trace from the error:

Stack trace:    at System.Threading.ReaderWriterLock.AcquireReaderLockInternal(Int32 millisecondsTimeout)
at System.Threading.ReaderWriterLock.AcquireReaderLock(Int32 millisecondsTimeout)

The ReaderWriterLock line was interesting;  a reader-writer lock allows multiple threads to read data concurrently, while any thread that needs to write must acquire an exclusive lock to write data.  And when a writer is writing data, readers will be blocked until the writer is finished writing.

This error is often associated with a deadlock on the database server.  My next step was to enable deadlock tracing on the SQL Server via the “DBCC TRACEON” query, and watch the error logs to see if any deadlocks were recorded.  After a full day of normal usage, no deadlocks were found.  I turned my attention back to the web server…

I took a dump of the W3WP process that was chewing up all the CPU cycles with the “adplus” command, and opened it in Windbg with the sosex.dll loaded. I then ran the !rwlock command to list all the ReaderWriterLock instances:

0:022> !rwlock
Address    ReaderCount   WaitingReaderCount     WriterThread   WaitingWriterCount
-----------------------------------------------------------------------------------------
1437fcb0             0                    7            0x2e                     1

The command revealed that thread 46 (hex value 0x2e) owned the WriterThread, which meant no other thread could acquire a ReaderLock or WriterLock.  You can also see that seven other Reader threads and one Writer thread were queued, waiting for this lock to clear.

Digging into thread 46 provided more detail about the other threads connected to it:

0:033> !rwlock 1437fcb0
WriterThread:           0x2e
WriterLevel:            1
WaitingWriterCount:     1
WriterEvent:            1e40
WaitingWriterThreadIds: 0x5
ReaderCount:            0
CurrentReaderThreadIds: None
WaitingReaderCount:     7
ReaderEvent:            2cb8
WaitingReaderThreadIds: 0xa,0x28,0x29,0xe,0x2d,0x30,0x31

I decided to have a deeper look at thread 46, and issued the “kb” command to print its call stack:

0:046> kb
ChildEBP RetAddr  Args to Child
30aefe9c 7c822114 77e6bb08 000022e8 00000000 ntdll!KiFastSystemCallRet
30aefea0 77e6bb08 000022e8 00000000 00000000 ntdll!ZwWaitForSingleObject+0xc
30aeff10 77e6ba72 000022e8 ffffffff 00000000 kernel32!WaitForSingleObjectEx+0xac
30aeff24 36232fd0 000022e8 ffffffff 30aeff68 kernel32!WaitForSingleObject+0x12
30aeff34 36233707 ffffffff 6fde9fa0 00000000 STSWEL!VvirusCheckUploadStatus::addDocument+0xf1
30aeff68 35f9b408 00000000 00000000 30aeffb0 STSWEL!VvirusCheckUploadStatus::addDocument+0x828
30aeff78 781329bb 2c44cbe8 b4845c59 00000000 ONETUTIL!Vthread::kill+0x86
30aeffb0 78132a47 00000000 77e660b9 0663acb8 msvcr80!_endthreadex+0x3b
30aeffb8 77e660b9 0663acb8 00000000 00000000 msvcr80!_endthreadex+0xc7
30aeffec 00000000 781329e1 0663acb8 00000000 kernel32!BaseThreadStart+0x34

Note the presence of the “virusCheckUploadStatus::addDocument” method.  By its name, I guessed that this might be related to the ForeFront Server Security functionality that scans documents on upload to SharePoint. I decided to have a look at thread 48 (hex value 0x30), which was listed as a WaitingReader thread:

48  Id: 2d4.1e90 Suspend: 1 Teb: 7ff0d000 Unfrozen
ChildEBP RetAddr  Args to Child
312efe9c 7c822114 77e6bb08 00002254 00000000 ntdll!KiFastSystemCallRet
312efea0 77e6bb08 00002254 00000000 00000000 ntdll!ZwWaitForSingleObject+0xc
312eff10 77e6ba72 00002254 ffffffff 00000000 kernel32!WaitForSingleObjectEx+0xac
312eff24 36232fd0 00002254 ffffffff 312eff68 kernel32!WaitForSingleObject+0x12
312eff34 36233707 ffffffff 6e5e9fa0 00000000 STSWEL!VvirusCheckUploadStatus::addDocument+0xf1
312eff68 35f9b408 00000000 00000000 312effb0 STSWEL!VvirusCheckUploadStatus::addDocument+0x828
312eff78 781329bb 2c44d0b8 b5045c59 00000000 ONETUTIL!Vthread::kill+0x86
312effb0 78132a47 00000000 77e660b9 0663acb8 msvcr80!_endthreadex+0x3b
312effb8 77e660b9 0663acb8 00000000 00000000 msvcr80!_endthreadex+0xc7
312effec 00000000 781329e1 0663acb8 00000000 kernel32!BaseThreadStart+0x34

As you can see, it also had the “virusCheckUploadStatus::addDocument” method in its stack.  The other waiting threads looked the same. Based on this data, I decided to disable upload scanning in SharePoint, and see if it impacted CPU usage:

It did! Over the course of the next few business days, CPU usage returned to normal levels and subsequent memory dumps of the w3wp process revealed no ReaderWriter locks.  However, since scanning documents on upload is the primary reason we use ForeFront, we had to figure out what was causing the problem.  We dug into the SecurityAdministrator console, and reviewed the “Report” section.  There we found ForeFront having repeated problems with quarantine of documents containing the “MSWord/Dropper.B!Camelot” virus.

We disabled quarantine of files for the Realtime Scan Job, which is a Microsoft recommended best practice.  This allowed us to re-enable upload document scanning while avoiding the high CPU problem.

WinDBG: "The call to LoadLibrary (exts) failed. Win32 error 0n2"

You have started WinDBG and immediately run into the “The call to LoadLibrary (exts) failed. Win32 error 0n2” error, like this:

This seemed to be an odd error, as it was the first time I had started the debugger. To verify what extensions were, in fact, loaded, I ran the .chain command and noticed that the DLL for ntsdexts was showing in a “(Not loaded)” status:

This told me that a needed extension was not loaded, and that might have been the result of my custom install of WinDBG. Custom installs are typically done on servers to conserve disk space and not load unnecessary files (such as the SDK and samples). When I re-ran the installer and stepped through the wizard, I noticed that I had not checked the Windows XP Extensions. I did not check this extension as I was installing the debugger on a Windows 2003 Server, not a desktop OS. However, if you look more closely at the description, you will see that the extension is geared to “Windows XP and Windows Server 2003.” This is an omission in the installer UI that can easily be missed:

I simply ensured I selected the Windows XP extention for install, and completed running the installer. Once the installer completed, I restarted WinDBG, and all extentions loaded without a problem:

Hopefully this might fix you up if you run into a similar situation.

Take a minidump without breaking a process

Lets say you want to take a dump of an IIS application pools. Login to your server, open a command prompt, and run the following command (where the -p parameter is the PID of the application pool)

ntsd -pvr -p 1640 -c “.dump /ma /u w3wp.dmp; q”

This will popup another DOS box with a variety of ntsd debugger information flashing by. It will look like this:

The window will eventually close, and you will be left with a dump file in the root of your C: drive, like this:

You now have access to a minidump to open with WinDBG.

Don't debug an unmanaged application

Since most of the world is .NET these days, its my tendency to think I can find everything using SOS. Recently, I was attempting to debug a Windows service (Office SharePoint Server Search), and mentally thought “Well, its SharePoint, which is nothing more than a large .NET application, do its .NET as well.” After opening Windbg, and trying to load SOS, I got the following error:

“Failed to find runtime DLL (mscorwks.dll), 0x80004005
Extension commands need mscorwks.dll in order to have something to do.”

This perplexed me until I realized I was trying to debug an application that is not managed. To prove this, I ran the “lmvm mscorwks” command to see if mscorwks.dll was loaded. It wasn’t.

Debugging a different version of .NET runtime

Copy the mscorwks.dll from the machine that has a different .NET version (than the version running on your debugging desktop) to the C:temp directory. Rename the file with the name “mscordacwks” and the version of .NET framework it came from, like this: mscordacwks_2.0.50727.1433.63.dll

In WinDBG, issue the following command: cdb -z c:tempmscordacwks_2.0.50727.1433.63.dll