64 bit calling convention - what it means to debugging.
While 32 bit (x86) has multiple calling conventions such as cdecl, stdcall, fastcall, thiscall, 64 bit (x64) only has single calling convention which has unique characteristics. Some important characteristics are- 64 bit calling converntion passes first 4 parameters to 4 registers (RCX, RDX, R8, R9) and additional parameters to stack (similar to fastcall calling convention). And even if parameters are less than 4, stack space for 4 parameters are always reserved (this area is called home space or home area). (Note: Fastcall calling convnetions pass one or more parameters by using registers to make a fast function call. x86 fastcall calling convention passes first 2 parameters to ECX, EDX registers.)
- Stack will have 16 bytes alignment to aid performance. This means if there are 5 parameters, there will be 48 bytes reserved for parameters (5 params x 8 bytes + 8 bytes for alignment)
- Stack pointer (rsp) typically does not change within a given function. Stack size for a function code is pre-calculated and so stack pointer does not change once prolog is done.
Let's look at a small sample.
int Calc(int a, int b, int c, int d, int e) { // <= breakpoint 1 int result = 0; // <= breakpoint 2 for(int i=0; i<10; i++) { result += a*i + b - c + d * 2 + e; printf("%d : %d\n", i, result); } result += a - b + c -d + e; return result; } int _tmain(int argc, _TCHAR* argv[]) { int s1,s2,s3,s4,s5; scanf("%d %d %d %d %d", &s1, &s2, &s3, &s4, &s5); int result = Calc(s1,s2,s3,s4,s5); // <= breakpoint 0 printf("Result = %d", result); return 0; }I set 3 breakpoints as marked above.
0:000> bl 0 e 00000001`3f5e10ed 0001 (0001) 0:**** Simple!wmain+0x3d 1 e 00000001`3f5e1000 0001 (0001) 0:**** Simple!Calc 2 e 00000001`3f5e1016 0001 (0001) 0:**** Simple!Calc+0x16Right before calling a function at breakpoint 0, we can inspect the assembly code to see how the parameters are passed. Basically what it does is to pass first 4 parameters (I entered 1,2,3,4,5 for scanf()) to ECX, EDX, R8D, R9D registers. (Since passing parameters are int32, ECX register is used instead of RCX). The last 5th parameter is passed to stack (rsp+20h).
0:000> u . Simple!wmain+0x3d [c:\temp\simple\simple.cpp @ 18]: 00000001`3fd910ed 8b442434 mov eax,dword ptr [rsp+34h] 00000001`3fd910f1 89442420 mov dword ptr [rsp+20h],eax //5th param: 5 00000001`3fd910f5 448b4c2440 mov r9d,dword ptr [rsp+40h] // 4 00000001`3fd910fa 448b442430 mov r8d,dword ptr [rsp+30h] // 3 00000001`3fd910ff 8b542438 mov edx,dword ptr [rsp+38h] // 2 00000001`3fd91103 8b4c243c mov ecx,dword ptr [rsp+3Ch] //1st param: 1 00000001`3fd91107 e8f4feffff call Simple!Calc (00000001`3fd91000)Now let's continue to reach breakpoint 1 at the begining of Calc() function. This is the point where we can check prolog assembly code of the function. For non-optimzition build, here you can see that those registers for parameters are copied to stack home area.
0:000> uf . Simple!Calc [c:\temp\simple\simple.cpp @ 4]: 4 00000001`3f5d1000 44894c2420 mov dword ptr [rsp+20h],r9d 4 00000001`3f5d1005 4489442418 mov dword ptr [rsp+18h],r8d 4 00000001`3f5d100a 89542410 mov dword ptr [rsp+10h],edx 4 00000001`3f5d100e 894c2408 mov dword ptr [rsp+8],ecxOnce those function prolog codes are executed, that is, when we move to breakpoint 2, the stack has correct 5 parameters and thus kP call stack command or dv command displays correct parameter values. Below we can check 5 parameters in stack address 00000000`0026feb0 ~ 00000000`0026fed0. Stack slot 00000000`0026fed8 has garbage value, just for 16 bytes alignment.
0:000> p
Breakpoint 2 hit
Simple!Calc+0x16:
00000001`3f5d1016 c744242000000000 mov dword ptr [rsp+20h],0
0:000> dq /c 1 @rsp
00000000`0026fe70 00000000`00000000
00000000`0026fe78 00000000`5fca10b1
00000000`0026fe80 00000000`00000001
00000000`0026fe88 00000000`00000000
00000000`0026fe90 00000000`00000000
00000000`0026fe98 00000001`3f5d11ac
00000000`0026fea0 00000001`3f5d2150
00000000`0026fea8 00000001`3f5d110c //return address
00000000`0026feb0 00000001`00000001 //param 1
00000000`0026feb8 00000000`00000002
00000000`0026fec0 00000000`00000003
00000000`0026fec8 00000000`00000004
00000000`0026fed0 00000000`00000005 //param 5
00000000`0026fed8 00000000`0026fee4 //for alignment
And here is what I got when running kP and dv command.0:000> kP Child-SP RetAddr Call Site 00000000`0026fe70 00000001`3f5d110c Simple!Calc( int a = 0n1, int b = 0n2, int c = 0n3, int d = 0n4, int e = 0n5)+0x16 [c:\temp\simple\simple.cpp @ 5] 0:000> dv /i /V prv param 00000000`0026feb0 @rsp+0x0040 a = 0n1 prv param 00000000`0026feb8 @rsp+0x0048 b = 0n2 prv param 00000000`0026fec0 @rsp+0x0050 c = 0n3 prv param 00000000`0026fec8 @rsp+0x0058 d = 0n4 prv param 00000000`0026fed0 @rsp+0x0060 e = 0n5 prv local 00000000`0026fe90 @rsp+0x0020 result = 0n0Now what if we have optimized build? I recompiled the source code with Maxmimum Speed optimization (/O2). For optimized build, the prolog of Calc() function starts like this.
0:000> uf Simple!Calc Simple!Calc [c:\temp\simple\simple.cpp @ 4]: 4 00000001`3ff51000 48895c2408 mov qword ptr [rsp+8],rbx 4 00000001`3ff51005 48896c2410 mov qword ptr [rsp+10h],rbp 4 00000001`3ff5100a 4889742418 mov qword ptr [rsp+18h],rsi 4 00000001`3ff5100f 57 push rdi 4 00000001`3ff51010 4154 push r12 4 00000001`3ff51012 4155 push r13 4 00000001`3ff51014 4156 push r14 4 00000001`3ff51016 4157 push r15 4 00000001`3ff51018 4883ec20 sub rsp,20hAs you can see here, there is no mov command for parameter copy. By the time I reached breakpoint 2 where prolog codes are all executed, the first 4 parameter values were not copied at all and only registers held the parameter values.
0:000> p Breakpoint 2 hit Simple!Calc+0x1c: 00000001`3ff5101c 448b6c2470 mov r13d,dword ptr [rsp+70h] ss:00000000`0022f8f0=00000005 0:000> kP L1 Child-SP RetAddr Call Site 00000000`0022f880 00000001`3ff510e1 Simple!Calc( int a = 0n1, int b = 0n0, int c = 0n0, int d = 0n2291968, int e = 0n5)+0x1c [c:\temp\simple\simple.cpp @ 5] 0:000> dv /i prv param a = 0n1 prv param b = 0n0 prv param c = 0n0 prv param d = 0n2291968 prv param e = 0n5 0:000> r rcx rcx=0000000000000001 0:000> r rdx rdx=0000000000000002 0:000> r r8 r8=0000000000000003 0:000> r r9 r9=0000000000000004As you might already notice, this behavior of optimized build can cause a lot of headache for 64 bit debugging. The behavior means that the call stack parameter information in 64 bit optimization build is completely useless. It will be much painful if we need to analyze regular dump file or Watson dump file which has less debugging information. So then how can we find correct parameter values? We know from the previous inspection that only registers hold those 4 parameter values. Starting from this point, we can think we have to trace down what parameter values were entered from previous call frame. When caller calls a function, it saves 4 parameters to registers. Since we can see this in assembly code, we unassmeble the code and can track down the parameter value. But what if the caller doesn't pass constant value as a parameter? Well, then, it will be much more tedious investigation since we have to dig into the history of the registers or stack area. For unfortunate cases, we might need to inspect many call stack frames and the assmebly codes to figure out how the parameters were passed all the way up to current stack frame.