Friday, July 27, 2012

Native x86 User-mode System Calls Hooking

In this post i am going to explain how to implement system call hooking from user-mode for native x86 processes (i here refer to 32-bit processes running in 32-bit versions of Windows XP SP2 and SP3).

Let's have a look at the "ZwOpenProcess" function of Windows XP SP2 and of Windows XP SP3.

1) XP SP2


2) XP SP3

As you can see in the images above, EAX is set to 0x7A, the system call ordinal and EDX is made to point at 0x7FFE0300 in the _KUSER_SHARED_DATA page. Then comes a CALL instruction which jumps to the "KiFastSystemCall" function whose address is stored in 0x7FFE0300 (_KUSER_SHARED_DATA::SystemCall).

One difference we can see is that SYSENTER of XP SP2 is followed by 5 NOPs while in XP SP3 SYSENTER is directly followed by the RET of the "KiFastSystemCallRet" function.
 
The first thing one may think of to implement the user-mode system call hook in Windows XP SP3/SP2 is to overwrite the "_KUSER_SHARED_DATA::SystemCall" and "_KUSER_SHARED_DATA::SystemCallRet" fields. Unfortunately, this is not possible since the page is not writable and any attempt to change its memory protection constant always fails.

So, we should now turn to the "KiFastSystemCall" function and try to overwrite its very first instruction with a JMP instruction. Is this all? Let's see.

For XP SP2, it is okay to write a near jmp instruction (5-byte long) since we have enough space (filled with 5 NOPs) and this does not hurt the RET instruction of the "KiFastSystemCallRet" function. But for XP SP3, any attempt to write the near jmp instruction will hurt the "KiFastSystemCallRet" function. Any common method for both XP SP2 and SP3?

I thought about that and came up with something that worked for both service packs. If we allocate a memory page at an address which when converted from absolute to relative gives 0xC3 as the fifth byte of the new JMP instruction. For example, if we allocate a memory page at 0x3F910000, given that the "KiFastSystemCall" function is at 0x7C90E510, we get the new JMP instruction as a sequence of
 "\xE9\xEB\x1A\x00\xC3". You can check the source code of InjectHookLib for more information.

N.B. We can still use a short JMP by searching for any vacant 5 bytes in the range of -128 to +127 from the address of the "KiFastSystemCall" function. LEA ESP,[ESP] seems to be okay for both service packs.

N.B. With certain processors or under certain conditions e.g. disabled VT-x/AMD-V if using VirtualBox, the "KiFastSystemCall" function is not used at all and the "KiIntSystemCall" is used instead. In these cases, you can safely overwrite the first instructions of "KiIntSystemCall" function with a near JMP instruction as long as the code you hook to takes care of that.


Any ideas or suggestions are always very welcome.

You can follow me @waleedassar

Thursday, July 26, 2012

Wow64 User-mode System Calls Hooking

The idea started as an attempt to implement "System Calls Hooking" from user-mode under Wow64 processes (32-bit processes running on 64-bit versions of Windows). I later extended it to include native 32-bit processes. The whole thing ended up as an OllyDbg plugin which you may find useful for many purposes e.g. malware analysis and unpacking.

Now let's quickly see how 32-bit code issues system calls in Wow64 processes and in native 32-bit processes.

1) In Wow64 Processes:
If we take the "ZwOpenProcess" function of the 64-bit version of Windows 7, we can see that EAX holds 0x23, the system call ordinal and EDX points at the stack arguments.

Then comes a CALL instruction, neither SYSENTER, nor SYSCALL, nor INT 2E. The Call DWORD PTR FS:[C0] instruction jumps to the address stored at TEB+0xC0, 0x74DE2320, see the image below.

If we move to this address, we see a FAR jump. It jumps to 0x74DE271E setting the code segment to 0x33 and this is where 32-bit debuggers can't go any further.

So, we can now conclude that by changing the CS, code segment to 0x33, transition from 32-bit mode to 64-bit mode occurs and this is where system calls are taken care of.

0x74DE271E does not exist in the "Executable modules" list that you see if you press ALT+E in OllyDbg. But if we try to dump the memory that this address belongs to, we can see that the module name is wow64cpu.dll.


N.B. wow64cpu.dll is a 64-bit dynamic link library that resides in the "system32" directory and is always loaded into the address space of Wow64 processes along with other 64-bit libraries. The other 64-bit libraries are wow64.dll, wow64win.dll, and the 64-bit version of ntdll.dll. They are hidden by depriving them of having entries into the doubly linked lists of PEB.LoaderData (i here refer to the 32-bit PEB, we will see this later).

N.B. If we apply symbols to wow64cpu.dll, we find that 0x74DE2320 is the address of the non-exported symbol of  "X86SwitchTo64BitMode" and 0x74DE271E is the address of the non-exported symbol of "CpupReturnFromSimulatedCode".




2) In Native 32-bit Processes:

The image above is the "ZwOpenProcess" function of Windows XP SP3. EAX is 0x7A, the system call ordinal and EDX points at 0x7FFE0300 in the _KUSER_SHARED_DATA page. Then comes a CALL instruction which jumps to the "KiFastSystemCall" function whose address is stored in 0x7FFE0300 (_KUSER_SHARED_DATA::SystemCall)

Actually, depending on the underlying processor architecture, that CALL instruction may jump to the "KiIntSystemCall" function. In this post, i will just focus on the "KiFastSystemCall" function.


Looking at the "KiFastSystemCall" function, we can see it is as simple as pointing EDX at the stack and issuing SYSENTER to enter kernel-mode. Then comes a RET instruction which represents the "KiFastSystemCallRet" function.


Now, let's see how we can implement a user-mode system calls hook in Windows 7 64-bit (Wow64 processes).

1) Wow64 processes

To implement a hook, the first method one may think of is replacing the address stored at FS:[0xC0]  (0x74DE2320, as seen above) with the address of our own hooking code. While this seems to be very easy, it has one drawback, that is, this field is per-thread i.e. we have to keep track of all new threads and for each new thread, we have to replace the address at FS:[0xC0] with the address to our own hooking code.

Imagine the scenario where we CreateProcess our target process in  suspended state, overwrite the address stored at FS:[0xC0], and finally ResumeThread. In this scenario, we can't keep track of any new tread created after we call the "ResumeThread" function and hence all its system calls will be lost.

Imagine the second scenario where we call the "CreateProcess" function on our target process with the "dwCreationFlags" parameter set to DEBUG_ONLY_THIS_PROCESS a.k.a we are debugging our target process. In this scenario we can see all new threads as we intercept the "CREATE_THREAD_DEBUG_EVENT" events. Once we receive the "CREATE_THREAD_DEBUG_EVENT" event, FS:[0xC0] should contain the address of the FAR jump, but this does not always occur. To explore this fact, let's use the 64bit version of Debugging Tools for Windows to debug a demo 32-bit executable that does nothing but creates a new thread.

We instruct WinDbg to break on new threads and then place a software breakpoint on the "Wow64cpu!CpuThreadInit" function, the function responsible for storing the address of the FAR jump into FS:[0xC0].



After repeating the abovementioned step few times, you can see that the "Wow64cpu!CpuThreadInit" function does not always precede the thread entry point.
Now we have seen that overwriting the pointer at FS:[0xC0] is not the best way to implement the user-mode system call hook.

Let's try the second method. Actually it is the one i prefer. By overwriting the FAR jump instruction itself in wow64cpu.dll, we can get rid of the new threads' annoyance. All we have to to in this method is set the proper memory protection of the wow64cpu.dll page that contains the FAR jump, write a near  JMP instruction into your hook code, and finally restore the original memory protection. This method has been implemented in my open source OllyDbg plugin. A link to the source code is found at the end of this blog post.

One more method would be manipulating the "Wow64cpu!CpuThreadInit" function to force it to store the address of our own code at offset 0xC0 instead of storing the address of the FAR jump.


Side notes:
1) As you can see in the "Wow64cpu!CpuThreadInit" function code, each Wow64 thread has two TEB's, 32bit TEB and 64bit TEB. The 64bit TEB always precedes the 32bit TEB by two pages.

2) I have also noticed that 32bit PEB always precedes the 64bit PEB by one page. So, in a single-threaded application, the sequence is 64bit TEB-->32bit TEB--->32bit PEB --> 64bit PEB.

3) Wow64 processes, at their startup, always raise a special exception called "STATUS_WX86_BREAKPOINT" with exception code 0x4000001f. This is something that 64bit debuggers are supposed to be aware of.

4) I have also noticed that Wow64 threads seem to have two stacks, 64bit stack and 32bit one.
In later posts, i will show how we implement the user-mode system call in Windows XP SP3. Don't worry it is even easier.

Let's see how we can implement the system call hook for OllyDbg v1.10.

First, i designed the hook into two DLLs. The first is the OllyDbg plugin or the injector DLL, i named it InjectHookLib.dll. The second is the injected DLL which has your own code for logging or manipulating system calls. I will show you the steps i have taken to write InjectHookLib.dll. I will also show you how to write a simple library to inject.

1) Injector DLL
Once you choose to inject a library, a common dialog box is opened for you to choose the library. One memory page is allocated into the address space of the target process (the debuggee) and a few x86 instructions are copied into it.



In the image above you can see the code cave copied into the target process address space.

If it is the first system call issued by the target process, this code cave injects the library you chose into the address space. After the library has successfully been injected, the code then jumps to its "DllMain" function where you can manipulate intercepted system calls.


One difficulty that i met was filtering the calls originating from the "LoadLibraryA" function and from inside the "DllMain" function. That was overcome by having global variables which are to be checked upon any call to the "DllMain" function.

Calling the "DllMain" function of the injected library, the "fdwReason" is always set to 0x4 to tell the "DllMain" function that a system call is being passed and "lpvReserved" is made to point to the stack where the registers are saved (those registers are the ones of PUSHAD and PUSHFD).


Now let's see how we write the "DllMain" function of the library to be injected. I will take my dumpSysCalls.dll to be my first example. More examples will be released soon.

I will rename the third parameter from void* lpvReserved to MyContext* pContext so that the "DllMain" function prototype looks like below.


If the "fdwReason" parameter is DLL_PROCESS_ATTACH, i recommend you to call the "DisableThreadLibraryCalls" function.

The "MyContext" structure as its name implies has all registers passed via the code cave mentioned above and its definition looks like below.
If the "fdwReason" parameter is 0x4, this means that a system call is being passed and we should start playing with it. Given the "pContext" pointer and the platform-specific info. discussed earlier in the post, we can easily play with system calls. For example, in Windows 7 64-bit, the Eax (pContext->Eax) always holds the system call ordinal. We can look up this ordinal to determine the system call string. According to the system call ordinal we can use (pContext->Esp)  to get the return address and the system call arguments. See the image below.


N.B. The library entry point must be "DllMain" (not "DllMainCRTStartup"). This is accomplished by ignoring all default libraries and setting the "/entry" to "DllMain".

Example libraries:

1) dumpSysCalls.dll

Function: It logs all system calls to c:\syscall_log.txt 
Source code:
VC6.0-compatible source code from here.
VC8.0-compatible source code from here.

2) FindWindow.dll

Function: It logs calls to the "FindWindow", "FindWindowA", "FindWindowExA", and "FindWindowExW" functions and their parameters to "c:\FindWindow.txt"

Source code:
VC6.0-compatible source code from here.
VC8.0-compatible source code from here.



For list of system calls per OSes, you can refer to the following links:
http://j00ru.vexillium.org/ntapi/
http://j00ru.vexillium.org/ntapi_64/
http://j00ru.vexillium.org/win32k_syscalls/

You can find the the "InjectHookLib" plugin here and its source code here. It has been tested with Windows 7 SP0 64-bit and Windows XP SP2/SP3.

You can follow me on Twitter @waleedassar