Functions
Last updated
Last updated
Functions/procedures are the group of instructions together to perform a specific task and are relatively independent of the remaining code. The main code calls and temporarily transfer execution to functions before returning to the main code. Lets listen a story.
A person named CPU is a efficient worker, but cant remember things properly. But the work has to be done. So what CPU came up with, that a list/record(named as stack) will be maintained. Record will contain the tasks to do. And in between any task, some other important work comes up, before moving to important task, an entry will be put containing where previous task was left. So that when the important task ended, previous task could be completed.
CPU starts going to office and in between a coffee store arrives. He went inside. But as we know he cant remember things, so before going in, he wrote - 'Go to office' in the record/list. So that he could remember where he was/what he was doing. He goes inside the coffee shop. Gets the coffee, and returns. While at the door of shop, he checks the record/list, and checks what he was doing earlier. He found "Go to office", and then continues to office. This way he completes all the task/work seamlessly
Same things happens in our CPU. If currently Function-1 and in between, a call to Function-2 exists. Before execution jumps to Function-2, CPU will save the address of instruction, which is just after Function-2 call in Function-1's set of instructions. So that CPU knows, where to resume after executing and returning from Function-2.
Function1 mov eax, 0x1 mov ebx, 0x2 . . call Function2 cmp eax, 0x1 . .
Here, before execution of Function2 starts, address of 'cmp eax, 0x1' is saved in stack. To return back here after Function2 is executed.
Lets understand this via code and binary analysis.
(procedure.nasm)
Here, in analysis, we will be looking the value of ESP, before ‘call print2’ instruction executes and as we step into print2 execution.
Load binary in gdb, set intel format, and put breakpoint at ‘call print2’ instruction.
Here we can see the address of instruction after ‘call print2’ ,i.e “mov eax, 0x1” is 0x0804903d. This address should be pushed in stack automatically as print2 call starts.
Above example contained a simple function. And we know, that function can also be used with arguments and could return something at the end.
Lets look up a few things.
Stack Frame
Prologue & Epilogue
Calling Conventions
Each function uses a specific portion of stack, which we call Stack Frame. It is used to allow access to both function parameters, and automatic function variables. The idea behind a stack frame is that each function can act independently of its location on the stack, and each function can act as if it is the top of the stack.
Each time a function is called, a new stack frame is generated. A function maintains its own stack frame untill it returns, at which time the caller’s stack frame is restored and execution is transferred back to caller.
The stack frame for each function is divided into three parts: function parameters, back-pointer to the previous stack frame, and local variables.
Lets take a dummy code for better understanding.
void sampleFunction(a,b,c)
{
int localVar1, localVar2, localVar3;
localVar1 = 1;
localVar2 = 2;
localVar3 = 3;
[some operation]
return;
}
a,b,c are function parameters. (Will see in Calling Convention, how these paramter are stored)
Back pointer contains the address of the previous function(which called sampleFunction) stack frame
localVar1, localVar2, localVar3 are local variables.
When a stack frame is set up, there is a standard entry sequence.
push ebp ; Saves the address of ebp (this is the base pointer of previous function)
mov ebp, esp ; ebp now points to the top of stack (esp)
sub esp, X ; X is the no. of bytes sampleFunction needs to store local variables.
Now, local variables can be accessed by referencing ebp register.
mov [ebp - 4], 1 ; location of localVar1
mov [ebp - 8], 2 ; location of localVar2
mov [ebp - 12], 3 ; location of localVar3
Similar to Entry Sequence, we have a Standard Exit Sequence. This will undo the things Entry sequence does. It removes the space given for local variables, restore the old ebp value, return to the old function (calling function) with a ret instruction.
mov esp, ebp
pop ebp
ret
Prologue is the set of instructions, which is in the beginning of a function, which prepare the stack and registers for use within the function.
push ebp
mov ebp, esp
sub esp, N
Epilogue, on the otherhand, is in the end of the function to restore the stack and registers to the previous state.
mov esp, ebp
pop ebp
ret
Calling convention governs the way the function call occurs. These convention include the order in which parameters are placed on the stack or passed to function, and whether the calling function(the caller) or the function called (the callee) is responsible for cleaning up the stack when the function is complete. These conventions depends upon the compiler and among other factors.
In short, the calling convention specifies how a function call in C or C++ is converted into assembly language. There are three major calling convention that are used in C language: STDCALL, CDECL and FASTCALL. With C++ we have different convention, THISCALL
CDECL
One of the most popular convention. In this, parameters are pushed onto stack from right to left fashion. The caller cleans up the stack when the function is complete.
Consider the following code:
_cdecl int MyFunction1(int a, int b) { return a + b; }
And following function call
x = MyFunction1(2, 3);
Stack frame for MyFunction1 will be:
:MyFunction1 push ebp ; Prologue mov ebp, esp ; Prologue. No local variable, therefore no ‘sub exp, N’ instruction. mov eax, [ebp + 8] mov edx, [ebp + 12] add eax, edx pop ebp ; Restore old ebp ret ; Return to caller
And this is how caller function instruction will look like (right to left order of paramter passing)
push 3 ; 2nd parameter is passed push 2 ; 1st parameter is passed call _MyFunction1 ; Caller calling the callee function add esp, 8 ; Caller clearing stack. 2 values(3,2) passed. 4 byte each. restore 8 byte.
CDECL functions are almost always prepended with an underscore, reason for ‘_MyFunction1’.
STDCALL
This convention is used almost by Microsoft as the standard calling convention for Win32 API. It is similar to CDECL except the difference, that callee(called function) needs to clear the stack instead of caller (calling function). Callee’s epilogue will take care of it.
Consider the following code:
_stdcall int MyFunction2(int a, int b) { return a + b; }
And following function call
x = MyFunction2(2, 3);
Stack frame for MyFunction1 will be:
:_MyFunction12@8 push ebp ; Prologue mov ebp, esp ; Prologue. No local variable, therefore no ‘sub exp, N’ instruction. mov eax, [ebp + 8] mov edx, [ebp + 12] add eax, edx pop ebp ; Restore old ebp ret 8 ; Return to caller
In the above instruction, the return instruction has an optional argument that indicates how many bytes to pop from the stack when the function returns.
STDCALL functions are having function name starting with underscore followed by @ and then the number of bytes passed on the stack at the end.
And this is how caller function instruction will look like (right to left order of paramter passing)
push 3 ; 2nd parameter is passed push 2 ; 1st parameter is passed call _MyFunction2@8 ; Caller calling the callee function
FASTCALL
This calling convention is not standard across all compilers. In FASTCALL, the first 2 or 3 arguments (less than or equal to 32 bits) are passed in registers. And registers commonly used are EDX, ECX, and EAX. Other additional arguments or argument more then 4byte (32 bit) are passed on the stack, similar to CDECL, right to left order. Calling function is responsible for cleaning the stack.
Consider the following code:
_fastcall int MyFunction3(int a, int b) { return a + b; }
And following function call
x = MyFunction3(2, 3);
Stack frame for MyFunction1 will be:
:@MyFunction3@8 push ebp ; Prologue could be created even if its not used. mov ebp, esp add ecx, edx ; a is in ECX, b is in EDX pop ebp ; Restore old ebp ret ; Return to caller
And this is how caller function instruction will look like (right to left order of paramter passing)
mov ecx, 2 mov edx, 3 call @MyFunction3@8 ; Caller calling the callee function
Since, there is not stack space assigned in frame, nothing gets cleared by caller in this case.
FASTCALL functions are having function name starting with ‘@’ and then the number of bytes passed on the stack at the end.