Friday, November 18, 2011

Watson dump analysis with SOS

This article explains about Watson dump analysis by using WinDBG.

Why Watson?

Unlike live debugging, postmortem debugging often can take more time to analyze and sometime not successful if the dump is not right.
Even worse, Watson dump can take longer than full dump since its content is generally limited.
However, the value of Watson dump is that it often contains the real crash situation the customers encountered, which normally we never reproduce in the lab.

Setting up debugger (WinDbg)

1) First thing we need to do is to setup the debugger -in this article, WinDbg.
Go to Microsoft Download Center and install latest Debugging Tools for Windows.

2) Next thing we have to set Symbol server path. There are several ways to set symbol path. I prefer to set symbol path into Windbg base workspace so that all other workspaces inherits the symbol path.
In order to do that, run windbg.exe (with no args) in elevated cmd prompt and click File->Symbol File Path. Set path below.

3) Click File->Save Workspace.

Opening Watson Dump

In order to open Watson dump with WinDbg, I typically use the following -z command.

C> windbg -z memory.hdmp

Once it’s open, the following information is shown. It’s basically saying the dump contains limited information,
the server where dump generated is 4 CPU Win2008 x86, and the cause of crash was stack buffer overflow.


Using !analyze –v

Using !analyze –v is usually good start. This command is developed by MS Research.
I think it generally provides valuable where-to-start information.

Call stack investigation

Management Tools are heavily using C# managed code. In order to look at Managed Call Stack, we have to use SOS debugger extension.
SOS comes with .NET framework, if you have .NET installed (of course), you don’t have to install anything.
(There is another my favorite debugger extension is SOSEX, but let’s focus on SOS)
To load SOS debugger extension,

0:00> .loadby SOS mscorworks (CLR 2.0)
0:00> .loadby SOS clr (CLR 4.0)

I prefer to reduce typing. So I copied sos.dll from .NET framework folder (under Windows) to C:\Debuggers so that I simply type:

0:00> .load sos


Since we don’t know where the error occurred from, we generally investigate both Native and Managed Call Stack.
For Native call stack, my favorite command is kp (or kP) which automatically interprets parameters and source lines.
Here is an example:

0:00> kp
….
098ef14c 7571898e user32!InternalCallWinProc(void)+0x23 [d:\longhorn\windows\core\ntuser\client\i386\callproc.asm @ 106]
098ef1c4 75718ab9 user32!UserCallWinProcCheckWow(struct _ACTIVATION_CONTEXT * pActCtx = 0x00000000, * pfn = 0x06fa095a, struct HWND__ * hwnd = 0x0003040e, unsigned int msg = 0x202, unsigned int wParam = 0, long lParam = 0n917517, void * pww = 0x00f038f8, int fEnableLiteHooks = 0n1)+0x109 [d:\longhorn\windows\core\ntuser\client\clmsg.c @ 163]
098ef228 75718b10 user32!DispatchMessageWorker(struct tagMSG * pmsg = 0x06fa095a, int fAnsi = 0n0)+0x380 [d:\longhorn\windows\core\ntuser\client\clmsg.c @ 2440]
098ef238 0689510e user32!DispatchMessageW(struct tagMSG * lpMsg = 0x098ef2c4)+0xf [d:\longhorn\windows\core\ntuser\client\cltxt.h @ 971]
098ef254 6e0b8d2e CLRStub[StubLinkStub]@2b0e2400689510e()
098ef308 6e0b8997 System_Windows_Forms_ni!System.Windows.Forms.Application+ComponentManager.System.Windows.Forms.UnsafeNativeMethods.IMsoComponentManager.FPushMessageLoop()+0x24e [f:\dd\ndp\fx\src\WinForms\Managed\System\WinForms\Application.cs @ 2106]
098ef360 6e0b87e1 System_Windows_Forms_ni!System.Windows.Forms.Application+ThreadContext.RunMessageLoopInner()+0x177 [f:\dd\ndp\fx\src\WinForms\Managed\System\WinForms\Application.cs @ 3377]
098ef390 6e5cde2b System_Windows_Forms_ni!System.Windows.Forms.Application+ThreadContext.RunMessageLoop()+0x61 [f:\dd\ndp\fx\src\WinForms\Managed\System\WinForms\Application.cs @ 3261]
098ef3a8 6e6025ab System_Windows_Forms_ni!System.Windows.Forms.Application.RunDialog()+0x33 [f:\dd\ndp\fx\src\WinForms\Managed\System\WinForms\Application.cs @ 1488]
098ef434 6e6027c3 System_Windows_Forms_ni!System.Windows.Forms.Form.ShowDialog()+0x373 [f:\dd\ndp\fx\src\WinForms\Managed\System\WinForms\Form.cs @ 6120]
098ef468 6f16274a System_Windows_Forms_ni!System.Windows.Forms.Form.ShowDialog()+0x7 [f:\dd\ndp\fx\src\WinForms\Managed\System\WinForms\Form.cs @ 6020]
098ef468 70116e96 SqlMgmt_ni!Microsoft.SqlServer.Management.SqlMgmt.RunningFormsTable+RunningFormsTableImpl+ThreadStarter.StartThread()+0xb2 [e:\sql10_katmai_t\sql\mpu\ssms\shared\SqlMgmt\src\RunningFormsTable.cs @ 415]
098ef474 7012031f mscorlib_ni!System.Threading.ThreadHelper.ThreadStart_Context()+0x66 [f:\dd\ndp\clr\src\BCL\System\Threading\Thread.cs @ 61]
098ef488 70116e14 mscorlib_ni!System.Threading.ExecutionContext.Run()+0x6f [f:\dd\ndp\clr\src\BCL\System\Threading\ExecutionContext.cs @ 359]
….

To get Managed Call Stack, once we load SOS, use “ClrStack –a”

0:00> clrstack –a

098ef39c 6e5cde2b System.Windows.Forms.Application.RunDialog(System.Windows.Forms.Form)
PARAMETERS:
form =
098ef3b0 6e6025ab System.Windows.Forms.Form.ShowDialog(System.Windows.Forms.IWin32Window)
PARAMETERS:
this = 0x032cf95c
owner =
LOCALS:

0x098ef410 = 0x00000000
0x098ef40c = 0x00000000
0x098ef3dc = 0x00000000





      
098ef43c 6e6027c3 System.Windows.Forms.Form.ShowDialog()
PARAMETERS:
this =
098ef440 6f16274a Microsoft.SqlServer.Management.SqlMgmt.RunningFormsTable+RunningFormsTableImpl+ThreadStarter.StartThread()
PARAMETERS:
this = 0x032ce1a0
LOCALS: <CLR reg>
= 0x032cf95c 0x098ef444 = 0x00000001


098ef470 70116e96 System.Threading.ThreadHelper.ThreadStart_Context(System.Object)
PARAMETERS:
state =
LOCALS:



Sometimes, we need to dig into local variables or parameters by using command such as dumpobj, dumpvc, dumparray.
Say for example, we can investigate 0x032cf95c by using !dumpobj (!do for short).
Now we know that MaintenancePlanWizardForm was launched from SSMS.

0:009> !do 0x032cf95c
Name: Microsoft.SqlServer.Management.MaintenancePlanWizard.MaintenancePlanWizardForm
MethodTable: 66438a04
EEClass: 6639483c
Size: 568(0x238) bytes
(E:\Program Files (x86)\Microsoft SQL Server\100\Tools\Binn\VSShell\Common7\IDE\Microsoft.SqlServer.Management.MaintenancePlanWizard.dll)



098ee844 6e09f435 System.Windows.Forms.ButtonBase.OnPaint(System.Windows.Forms.PaintEventArgs)
PARAMETERS:
this = 0x033bbb8c
pevent = 0x035ef130

0:009> !do 0x033bbb8c
Name: System.Windows.Forms.RadioButton
MethodTable: 6e0f6478
EEClass: 6debc0d0
Size: 168(0xa8) bytes
(C:\Windows\assembly\GAC_MSIL\System.Windows.Forms\2.0.0.0__b77a5c561934e089\System.Windows.Forms.dll)
Fields:
MT Field Offset Type VT Attr Value Name
70170b54 4001130 20 System.String 0 instance 033bc774 text


0:009> !do 033bc774
Name: System.String
MethodTable: 70170b54
EEClass: 6ff2d65c
Size: 138(0x8a) bytes
(C:\Windows\assembly\GAC_32\mscorlib\2.0.0.0__b77a5c561934e089\mscorlib.dll)
String: All &user databases (excluding master, model, msdb, tempdb)

In order for SOS help, you can simply type !sos.help in windbg or visit http://msdn.microsoft.com/en-us/library/bb190764.aspx.


Wednesday, October 12, 2011

How to call other function manually in Debugger

Is it possible to call another function manually in the middle of the debugging? If we manipulate stack and jump to the function properly, it is doable. Here I am taking an example of calling a Win32 function in DebugBreak state. We are going to call RtlAdjustPrivilege() function which is in NTDLL.DLL from WinDBG here. We use x86 debugging here. (64 bit debugging is different; more information can be found at 64 bit calling convention)
In this example, I launched mspaint.exe from windbg.

C> windbg mspaint.exe

In the middle of the debugging, let's say I want to adjust one of privileges for the mspaint process.
To check current privileges status, !token command can be used.

0:008> !token -n
Thread is not impersonating. Using process token...
TS Session ID: 0x1
Privs: 
 00 0x000000005 SeIncreaseQuotaPrivilege          Attributes - 
 ......
 11 0x000000013 SeShutdownPrivilege               Attributes - 
 12 0x000000014 SeDebugPrivilege                  Attributes - Enabled 
 13 0x000000016 SeSystemEnvironmentPrivilege      Attributes - 
 14 0x000000017 SeChangeNotifyPrivilege           Attributes - Enabled Default 
 15 0x000000018 SeRemoteShutdownPrivilege         Attributes - 
 16 0x000000019 SeUndockPrivilege                 Attributes - 
 17 0x00000001c SeManageVolumePrivilege           Attributes - 
 18 0x00000001d SeImpersonatePrivilege            Attributes - Enabled Default 
 19 0x00000001e SeCreateGlobalPrivilege           Attributes - Enabled Default 
Auth ID: 0:4d37b
Impersonation Level: Anonymous
TokenType: Primary
Is restricted token: no.

Here, let's try to enable SeUndockPrivilege (0x19). To check current registers, r command is used. It is DbgBreak state at this point.

0:008> r
eax=7ffd6000 ebx=00000000 ecx=00000000 edx=77add23d esi=00000000 edi=00000000
eip=77a73540 esp=026ef9ec ebp=026efa18 iopl=0         nv up ei pl zr na pe nc
cs=001b  ss=0023  ds=0023  es=0023  fs=003b  gs=0000             efl=00000246
ntdll!DbgBreakPoint:
77a73540 cc              int     3

Now in order to call RtlAdjustPrivilege() function in NTDLL.DLL, we first manipulate thread stack manually.

 NTSTATUS RtlAdjustPrivilege
 (
  ULONG    Privilege,
  BOOLEAN  Enable,
  BOOLEAN  CurrentThread,
  PBOOLEAN Enabled
 )

Since RtlAdjustPrivilege() API has 4 parameters, 4 parameter values should be pushed into the stack from 4th parameter to 1st parameter (due to calling convention). ESP points to current stack location. 4th parameter is located at ESP-8 (since 4th parameter is output pointer type, it is pointing to another memory address; in this case, ESP-4 memory address), 3rd parameter is at ESP-C, and so on. Once 4 parameters are added into stack, return address should be added. Since we want to return back to current point, the example sets ESP-18 to . (which means current address).

0:008> ed esp-4 0
0:008> ed esp-8 @esp-4
0:008> ed esp-c 0
0:008> ed esp-10 1
0:008> ed esp-14 19
0:008> ed esp-18 .
0:008> resp=@esp-18
0:008> r $ip=ntdll!RtlAdjustPrivilege

Now I set stack pointer (esp) to ESP-18 and set instruction pointer to ntdll!RtlAdjustPrivilege function. So all registers are set to go. Since current execution pointer is set to ntdll!RtlAdjustPrivilege, if we use 'uf .' command, WinDBG will show the whole RtlAdjustPrivilege function in assembly language.

0:008> uf .
ntdll!RtlAdjustPrivilege+0x9d:
77a45414 8b45e8          mov     eax,dword ptr [ebp-18h]
77a45417 d1e8            shr     eax,1
77a45419 2401            and     al,1
77a4541b e9b45a0000      jmp     ntdll!RtlAdjustPrivilege+0xa4 (77a4aed4)
.....

Now, to execute the RtlAdjustPrivilege function, run 'gu' command. gu command runs until the current function is complete.

0:008> gu
(16a0.ec0): Break instruction exception - code 80000003 (first chance)
eax=00000000 ebx=00000000 ecx=5ef2406b edx=77a864f4 esi=00000000 edi=00000000
eip=77a73540 esp=026ef9e8 ebp=026efa18 iopl=0         nv up ei pl zr na pe nc
cs=001b  ss=0023  ds=0023  es=0023  fs=003b  gs=0000             efl=00000246
ntdll!DbgBreakPoint:
77a73540 cc              int     3

0:008> r esp=@esp+4
0:008> r
eax=00000000 ebx=00000000 ecx=5ef2406b edx=77a864f4 esi=00000000 edi=00000000
eip=77a73540 esp=026ef9ec ebp=026efa18 iopl=0         nv up ei pl zr na pe nc
cs=001b  ss=0023  ds=0023  es=0023  fs=003b  gs=0000             efl=00000246
ntdll!DbgBreakPoint:
77a73540 cc              int     3

Once the function is successfully run, we can check new privilege list by using !token command again. But how do we know the run is successful? Return value (NTSTATUS for the function) can be checked by looking at EAX register. Since it is 0, we can figure out that the function execution was successful.

0:008> !token -n
Thread is not impersonating. Using process token...
TS Session ID: 0x1
......
Privs: 
 00 0x000000005 SeIncreaseQuotaPrivilege          Attributes - 
 .......
 12 0x000000014 SeDebugPrivilege                  Attributes - Enabled 
 13 0x000000016 SeSystemEnvironmentPrivilege      Attributes - 
 14 0x000000017 SeChangeNotifyPrivilege           Attributes - Enabled Default 
 15 0x000000018 SeRemoteShutdownPrivilege         Attributes - 
 16 0x000000019 SeUndockPrivilege                 Attributes - Enabled 
 17 0x00000001c SeManageVolumePrivilege           Attributes - 
 18 0x00000001d SeImpersonatePrivilege            Attributes - Enabled Default 
 19 0x00000001e SeCreateGlobalPrivilege           Attributes - Enabled Default 

As you can see, SeUndockPrivilege is now enabled...

NOTE: setting SeUndockPrivilege might not be useful in real world, but enabling some other privileges such as TCB privilege or DEBUG privilege in the debugger might be useful in some situations.

Thursday, September 22, 2011

ba (Break on Access) in WinDbg

There was an Access Violation case during test run. It turned out that one private member, which is supposed to be accessed only from application init routine and exit routine, was overwritten by unknown thread / function.

Break on memory access command - ba command - is very useful for this kind of case. ba allows us to see who is touching the variable. To simplify the case and demonstrate it easily, I will use notepad.exe in WinDBG.

1) Run notepad.exe from WinDBG

   C> Windbg  notepad.exe

2) Run notepad.exe until you see UI by typing "g"
3) Pause notepad process in windbg
4) In Windbg, check any global variable of notepad process
0:000> x notepad!g_*
00000000`ff6f0050 notepad!g_fFontSettingChanged = 
00000000`ff6f0b28 notepad!g_fUISettingChanged = 
00000000`ff6f0200 notepad!g_PageSetupDlg = 
00000000`ff6f1e70 notepad!g_lExisting = 
00000000`ff6f2720 notepad!g_ftSaveAs = 
00000000`ff6f0b2c notepad!g_fPageSetupChanges = 
00000000`ff6f0100 notepad!g_wpOrig = 
00000000`ff6f0088 notepad!g_ftOpenedAs = 

Here let's pick a variable (notepad!g_ftOpenedAs).

5) Set breakpoint with ba command.
0:001> ba w4 notepad!g_ftOpenedAs ".echo ******; ~. ; kp; g;"

ba command will break into debugger when the specified memory is accessed by any thread / any function. w4 means the debugger will break into if any method is writing to the specificed memory area. second parameter notepad!g_ftOpenedAs is the memory location to watch. And the last parameter is the command that will run when break occurs. If no command is specified, it will break into the debugger and wait at prompt. The command is (1)first print six asterisks, (2) show current thread (~.), (3) show current thread call stack (kp), (4) keep running without break into debugger (g).



6) Run notepad process again ("g")

7) In notepad UI, click File->Open. In Open File Dialog, click Cancel.
This is to let a thread to write a value to global variable.

8) Now, you can see who is touching notepad!g_ftOpenedAs variable by look at the call stack output.

******
.  0  Id: 1290.dbc Suspend: 1 Teb: 000007ff`fffdd000 Unfrozen
      Start: notepad!WinMainCRTStartup (00000000`ff6e3570) 
      Priority: 0  Priority class: 32  Affinity: f
Child-SP          RetAddr           Call Site
00000000`0019f7b0 00000000`ff6e14eb notepad!NPCommand+0x3fa
00000000`0019f8e0 00000000`77b8c3c1 notepad!NPWndProc+0x540
00000000`0019f920 00000000`77b8c60a USER32!UserCallWinProcCheckWow+0x1ad
00000000`0019f9e0 00000000`ff6e10bc USER32!DispatchMessageWorker+0x3b5
00000000`0019fa60 00000000`ff6e133c notepad!WinMain+0x16f
00000000`0019fae0 00000000`77a6f56d notepad!DisplayNonGenuineDlgWorker+0x2da
00000000`0019fba0 00000000`77ca2cc1 kernel32!BaseThreadInitThunk+0xd
00000000`0019fbd0 00000000`00000000 ntdll!RtlUserThreadStart+0x1d

If you're in live debug mode, you can simply break into the debugger to look into suspicious thread.

Tuesday, September 20, 2011

How to use UMDH

UMDH (User Mode Dump Heap) tool  in 'Debugging Tools for Windows'  analyze Windows heap memory and useful for detecting native memory leak.

Here is brief summary of how to use it.

(a) Run gflags.exe. Select [Image File], type filename and check [Create user mode stack trace database].
     This enables data collection for the specifc process.


     Alternatively the command line below can be run:
                   C> gflags -i notepad.exe +ust

(b) Set symbol path.
     C>  SET _NT_SYMBOL_PATH=c:\symbols

(c) Run UMDH.EXE
     Take first snapshot of memory and run your application until memory leak.
     Taks second snapshot and compare two snapshots and get the difference.

    C>umdh -p:2868 > firstsnap.txt             <== take first snapshot
    C>umdh -p:2868 > secondsnap.txt        <== take 2nd snapshot
    C>umdh firstsnap.txt secondsnap.txt > diff.txt       <== create diff file
    C>notepad.exe diff.txt      <== check the result

Monday, August 15, 2011

Windbg - Set break on DLL load

When an application crashes when certain DLL is loaded, we normally see the callstack in Windbg at the point of second chance exception. If the application catches the exception internally, the things can get difficult to track down. In this case, we probably want to set break on DLL load to see if which code is accessing the DLL functions.

Here are simple steps:

1) Run the application under Windbg

    C> Windbg MyApp.exe

2) In Windbg, run below commands. The command sxe sets exception enabled.

    0:000> sxe ld:suspect.dll
    0:000> g

3) When the application hits the DLL, the application will stop and you can check callstack as below:

   0:004> kp

The result of k command (which is callstack) can provide some clues of what's going on.

Monday, January 17, 2011

How to set Microsoft Symbol Server

When it comes to debugging, symbol files are essential. Whenever you build an application in VS, build system normally generates symbols files (.pdb files) togather regardless of Release or Debug build configuration. So you have symbols files for your own application. But then how can get symbol files for OS. For Windows OS, Microsoft provides public symbol server.

To set the symbol server on your debuggeer, simply set the following symbol server path:
http://msdl.microsoft.com/download/symbols



For example in WinDbg,

(1) You can set it in command mode:

     .sympath SRV*C:\Symbols*http://msdl.microsoft.com/download/symbols
 
(2) Or set the symbol path in File menu. Select File-> Symbol File Path... and type the path as follows: