Saturday, November 24, 2012

SuppressDebugMsg As Anti-Debug Trick

In this post i will show you a new anti-debug trick that affects many debuggers e.g. WinDbg and IDA Debugger.

When you load a module into the address space of a process usually via calling e.g.  the kernel32 "LoadLibrary" function, the debugger is notified of this through the LOAD_DLL_DEBUG_EVENT event. This occurs at the point the "NtMapViewOfSection" function calls the "DbgkMapViewOfSection" function.

As we saw in the previous post, the "HideFromDebugger" flag of the "_ETHREAD" structure and the "DebugPort" field of the "_EPROCESS" structure are queried. If the "HideFromDebugger" flag is not set and the "DebugPort" field is set, the debug event is delivered to the debugger but only after the return value of the "DbgkpSuppressDbgMsg" function is checked.

If the "DbgkpSuppressDbgMsg" function returns false, the debug event is delivered to the debugger and vice versa. Now let's see the "DbgkpSuppressDbgMsg" function in disassembly.


As you can see in the image below, it checks the "SuppressDebugMsg" flag of the 64-bit TEB of the thread. If it is set, the function returns true and the debug event is not delivered to the debugger.

Also, the "SuppressDebugMsg" field of the 32-bit TEB is queried, if the "Wow64Process" field of the "_EPROCESS" structure is set.

Notes:
1) Each Wow64 process has two Process Environment Blocks (PEBs), a 64-bit one and a 32-bit one.

2) Each thread in a Wow64 process has two Thread Information Blocks (TEBs), a 64-bit one and a 32-bit one. The 64-bit TEB is of size 2 pages and the 32-bit TEB is of size 1 page. The 32-bit TEB always follows the 64-bit TEB.

3) If the "Wow64Process" field of the "_EPROCESS" structure is set, then it is a Wow64 process (32-bit process running on 64-bit system). This field holds the address of the process's 32-bit PEB.
In WinDbg and IDA debugger, if our process loads a module e.g. walied.dll via calling e.g. the "LoadLibrary" function, the debugger receives the LOAD_DLL_DEBUG_EVENT event and caches the "hFile" field of the "LOAD_DLL_DEBUG_INFO" structure. It uses the "hFile" field to ReadFile info. e.g. debug info. from walied.dll

The problem here is that WinDbg and IDA debugger don't CloseHandle(hFile) until the UNLOAD_DLL_DEBUG_EVENT event for walied.dll is received. So, if we set the "SuppressDebugMsg" bit of TEB and then call FreeLibrary("walied.dll"), then the debugger will not receive the UNLOAD_DLL_DEBUG_EVENT for walied.dll. Any subsequent attempt to acquire an exclusive access to walied.dll via calling the "CreateFile" function will definitely fail which is a very sign of debugger existence.

A demo can be found here and its source code from here.

The trick mentioned above affects WinDbg and IDA debugger. OllyDbg v1.10 is affected but in a slightly different way. OllyDbg v1.10 does not CloseHandle(hFile) even if the corresponding UNLOAD_DLL_DEBUG_EVENT event is received.

N.B. OllyDbg v2.x is not affected since it immediately CloseHandle the "hFile" field of the "LOAD_DLL_DEBUG_INFO" structure once it receives the LOAD_DLL_DEBUG_EVENT event.

Conclusion:
Setting the "SuppressDebugMsg" bit of thread's TEB prevents the attached debugger from receiving UN/LOAD_DLL_DEBUG_EVENT's from this thread.

For debuggers to be immune to this trick, they should use the "hFile" field to read info. and close this handle immediately.

Any comments or ideas are very welcome.

You can follow me on Twitter @waleedassar

Thursday, November 22, 2012

Hidding Threads From Debuggers

In this post i will take into discussion an old anti-debug trick that many of us know well. The trick is the ability of our code to hide specific threads from debuggers. This is usually achieved by calling the ntdll "ZwSetInformationThread" function with the "ThreadInformationClass" parameter set to ThreadHideFromDebugger 0x11. Sample code for this trick can be found here.

If we take the "ZwSetInformationThread" function into disassembly, we can easily see that the "ThreadInformationLength" parameter must be zero for the function call to succeed, otherwise ERROR_BAD_LENGTH is returned. See image below.

 And here is the 64-bit version

As you can see from the two images above, the whole function call ends up setting the "HideFromDebugger" bit of the "_ETHREAD" structure. Once this flag has been set, the kernel guarantees that the debugger will never receive any debug events from the corresponding thread.

For example, let's take the LOAD_DLL_DEBUG_EVENT events. As you know, any time a module is loaded into the address space of specific process, the debugger is notified of this action through the LOAD_DLL_DEBUG_EVENT events.The debugger then inspects various interesting fields in the "LOAD_DLL_DEBUG_INFO" structure e.g. ImageBase. Depending on the debugger configuration, the debugger notifies you of that or not. You can see this if you instruct OllyDbg to break on new module.

The two images above show how OllyDbg acts if a normal (not hidden) thread loads a new DLL. It is as follows:
1) Thread Loads a new DLL via calling e.g. the "LoadLibrary" function.

2) The "LoadLibrary" function wraps up a call to the ntdll "ZwMapViewOfSection" function.

3) The kernel mode part of ZwMapViewOfSection calls the "DbgkMapViewOfSection" function.

4) The "DbgkMapViewOfSection" function queries both the "HideFromDebugger" bit of the "_ETHREAD" structure and the value of the "DebugPort" field of the "_EPROCESS" structure. If the "HideFromDebugger" bit is not set and the "DebugPort" field is set, then the function builds the "LOAD_DLL_DEBUG_INFO" structure and calls the "DbgkpSendApiMessage" function which is responsible for delivering the debug event to the attached debugger.
On the other side, if the "HideFromDebugger" bit is set, DbgkMapViewOfSection returns immediately without delivering the debug event. See images below.


N.B. Regarding the UN/LOAD_DLL_DEBUG_EVENT's, there are other factors that determine whether or not the debug event is going to be delivered to debugger e.g. the "SuppressDebugMsg" bit of the Thread Environment Block (TEB).

5)  In the debugger, the "WaitForDebugEvent" function returns with the "dwDebugEventCode" field set to LOAD_DLL_DEBUG_EVENT 0x6. Given this, the debugger figures out that a new module has just been loaded and that it should inspect the "LOAD_DLL_DEBUG_INFO" structure to extract the new image base, file handle, etc.

6) After extracting info. from the "LOAD_DLL_DEBUG_INFO" structure, the debugger calls the "ContinueDebugEvent" function to continue executing the thread.

Similar to LOAD_DLL_DEBUG_EVENT's, debuggers never get notified of exceptions raised in the scope of hidden threads. To ensure that let's have a look at the "DbgkForwardException" function.

As you can see in the image above, the "HideFromDebugger" bit of the "_ETHREAD" structure is queried here as well.

Conclusion: When the "HideFromDebugger" bit flag of the "_ETHREAD" structure is set, the thread will not receive any debug events.

If we look again at the "NtSetInformationThread" function in disassembly, we will see that the function call is one-way i.e. you can make this function call to hide the thread from debugger but you can not make this call to un-hide the thread from debuggers.

Let's have a look at the "ZwQueryInformationThread" function. As the name implies, we can use this function to determine if a specific thread is hidden from debuggers. See below.

And here is the 64-bit version.

As you can see from the two images above, the "ThreadInformationLength" parameter must be one for this function call to succeed. If it is one as expected, nothing surprising is seen, the function just sets the first byte pointed to by the "ThreadInformation" parameter to one if the "HideFromDebugger" bit of the "_ETHREAD" structure is set. Given this knowledge, i have created a small OllyDbg v1.10 plugin to detect any hidden thread in the process being debugged esp. if we are attaching to an active process. The plugin is called HiddenThreads. You download it from here and its source code from here.

Unfortunately, in older versions of Windows e.g. XP, the "ZwQueryInformationThread" function can't be used to detect if a thread is hidden from debuggers as the ThreadHideFromDebugger information class 0x11 is simply not implemented. The function call returns ERROR_INVALID_PARAMETER.

Now that we have seen how to hide a thread from debuggers, how this works under the hood, and how to detect if a thread is hidden from debuggers, let's try to find another way to hide the thread other than calling the "ZwSetInformationThread" function.

With the introduction of the "ZwCreateThreadEx" function e.g. Windows Vista and 7, a new flags parameter is present. This flag causes new threads to be created hidden from debuggers i.e. you don't need to call the "ZwSetInformationThread" function. If we set this parameter (the 7th parameter) to 0x4, then the new thread will be hidden from debuggers. In this case, setting the "HideFromDebugger" bit occurs in the "PspAllocateThread" function. See image below.


You can find a demo here and its source code from here.


This post was written based on debugging sessions on Windows 7 64-bit. This is why you see me switching from x86 to x64.

Any comments or ideas are very welcome.

You can follow me on Twitter @waleedassar

Monday, November 12, 2012

VirtualBox CPUID Discrepancy

In this post i will show you a weird issue i have lately found in VirtualBox. This issue is seen only if VirtualBox is running without hardware virtualization support (VT-x/AMD-V).

For example, when Windows XP is running in VirtualBox with no hardware virtualization support, it is forced to use INT 2E to make system calls instead of SYSENTER. This is because SYSENTER is apparently not supported by VirtualBox. The problem here is that in this case the CPUID instruction still detects supported SYSENTER/SYSEXIT instructions.

We can use this discrepancy to detect VirtualBox (only if running with no hardware virtualization). All we have to do is execute CPUID (Leaf 1) and if we have bit 0x800 of EDX set, then execute SYSENTER in the form of any system call e.g. ZwDelayExecution. If an EXCEPTION_ILLEGAL_INSTRUCTION 0xC000001D is raised, then VirtualBox is present.


You can find a demo here and source code from here.

Any comments or ideas are very welcome.

You can follow me on Twitter @waleedassar

OllyDbg RaiseException Bug

In this post i will show you a bug in OllyDbg that can be used to detect its presence. The trick is so easy that all you have to do is call the "RaiseException" function with the "dwExceptionCode" parameter set to EXCEPTION_BREAKPOINT 0x80000003. The response depends on the OllyDbg version used. If it is v1.10, then the exception is going to be silently swallowed by the debugger and the registered exception handler is not called. In v2.01 (alpha 4), several message boxes pop up and the exception handler is not called either. Only v2.01 (beta 2) is immune to this bug.



The reason behind this bug is OllyDbg trying to read the x86 instruction pointed to by the "ExceptionAddress" field of the "EXCEPTION_RECORD" structure to ensure it is 0xCC or 0x03. In case of EXCEPTION_BREAKPOINT exceptions raised by explicitly calling the "RaiseException" function, the instructions at ExceptionAddress is definitely not 0xCC or 0x03.


You can find a demo here and its source code from here.

Any comments or ideas are very welcome.

You can follow me on Twitter @waleedassar

Defeating Memory Breakpoints

In this post i will show you a couple of tricks that can be used to defeat memory breakpoints. First i should explain what memory breakpoints are and how they work.

Anyone who has spent some time in the field of software protection and debuggers must have heard of Memory breakpoints. Actually, memory breakpoints were not extensively used in the past but since more and more protection schemes implement anti-INT3 and anti-Hardware breakpoints tricks, reverse engineers started to use memory breakpoints to avoid detection.

The idea of memory breakpoints is so simple. Imagine that we want to place a memory breakpoint at address 0x402005 (On-Execution), what the debugger theoretically does is as follows:

1) Marks the memory page which the address 0x402005 belongs to (page 0x402000) as guarded via calling the "VirtualProtectEx" or "ZwProtectVirtualMemory" function with the "flNewProtect" parameter having the "PAGE_GUARD" protection attribute set. In this case page 0x402000 is originally PAGE_EXECUTE_READ 0x20 and after placing the memory breakpoint it becomes PAGE_EXECUTE_READ|PAGE_GUARD 0x120.

2) Each time the guarded page is touched whether read from, written to, or executes, then an exception STATUS_GUARD_PAGE_VIOLATION 0x80000001 is raised and the debugger receives a debug event of type  EXCEPTION_DEBUG_EVENT.

3) The debugger then inspects various fields in the "EXCEPTION_RECORD" structure of the "DEBUG_EVENT" structure to determine the reason why the exception was raised.
If the following conditions are met, then the debugger figures out that instruction at 0x402005 is about to execute i.e. breakpoint reached and that it should break accordingly.
a) The "ExceptionCode" field is set to STATUS_GUARD_PAGE_VIOLATION 0x80000001. b) The "NumberParameters" field is greater than or equal to 2. c) The "ExceptionInformation[0]" field is set to 8. d) The "ExceptionInformation[1]" field is set to 0x402005. The image below represents something very similar.


If any of the above mentioned conditions is not met, then the debugger figures out it is not the breakpoint. Whether the breakpoint is hit or not, the debugger resets the "PAGE_GUARD" protection attribute.

Surprisingly, even though this is the typical way debuggers should implement memory breakpoints, OllyDbg and many other user-mode debuggers implement memory breakpoints in a slightly different way.

Let's first take OllyDbg v1.10 and see how it implements memory breakpoints.

If you already use OllyDbg v1.10, you should already know that it has only two kinds of memory breakpoints, On-Access and On-Write. On-Access memory breakpoints trigger anytime the page is touched and On-Write memory breakpoints trigger anytime the page is written to.

Trying to reverse OllyDbg v1.10 to see how it implements each type, i found out that:

1) For On-Access memory breakpoints, they are implemented by marking the page that the breakpoint address belongs to as PAGE_NOACESS. PAGE_NOACCESS means that anytime the page is touched, an exception STATUS_ACCESS_VIOLATION is raised. The debugger then receives the debug event and inspects fields in the "EXCEPTION_RECORD" structure in a similar way to the conventional method mentioned above.

2) For On-Write memory breakpoints, they are implemented by depriving the page which the breakpoint address belongs to of the write access right via setting the "flNewProtect" parameter passed to the "VirtualProtectEx" function to PAGE_EXECUTE_READ. Every time the page is written to, an exception STATUS_ACCESS_VIOLATION is received. The debugger then receives the debug event and inspects fields in the "EXCEPTION_RECORD" structure in a similar way to the conventional method mentioned above. Here lies a bug in OllyDbg v1.10 since it assumes that the memory protection of any single page in the process address space can be turned into PAGE_EXECUTE_READ while this is not true for example memory page at 0x10000 can never be executable (Windows 7).

After we have seen how memory breakpoints are implemented, i will show you two tricks that can be used as anti-memory-breakpoints.

Trick 1)

Given the knowledge above, we can conclude that in order to defeat memory breakpoints esp. those of type On-Execution, we should cause the "VirtualProtectEx" function to fail. How is that possible?
By copying our code to a dynamically-allocated memory page whose page protection attributes can be executable and in the same time can not be guarded or no-access. This type of memory pages does really exist. For every thread you create, the kernel allocates one page (three pages in case of Wow64 processes) for the TEB. The TEB page(s) can't be non-writable and can't be assigned the "PAGE_GUARD" protection attribute. How can this be implemented?
All you have to do to implement this trick is call the "CreateThread" function with the "dwCreationFlags" parameter set to CREATE_SUSPENDED. At this point, we have the new thread's TEB with the page protection attributes set to PAGE_READWRITE. The next thing we should do is make the TEB page executable by calling the "VirtualProtect" function with the "flNewProtect" parameter set to PAGE_EXECUTE_READWRITE.

You can use this demo to test this trick.

N.B. For more stealthy way to conceal the point at which the page protection is changed to executable, use the "VirtualAlloc" function instead of "VirtualProtect". The allocation type in this case must be MEM_COMMIT only.

Trick 2)

This trick can easily detect memory breakpoints. It relies on the fact that the "ReadProcessMemory" function returns false if you try to read guarded or no-access memory. To use this trick, all you have to do is call the "ReadProcessMemory" function with the "Handle" parameter set to 0xFFFFFFFF, the "lpBaseAddress" parameter set to the image base, and the "nSize" parameter set to the size of image. If it returns false, then at least one memory breakpoint is present.

You can use this demo to test this trick.

N.B. Certain executables have gap inaccessible pages e.g. those pages intended for anti-dumping described in a previous post. So you have to take care of that if implementing this trick.

N.B. ReadProcessMemory has also been used as a stealthy way to read memory without triggering Hardware Breakpoints.


Any comments or ideas are very welcome.

You can follow me on Twitter @waleedassar





Monday, November 5, 2012

SizeOfStackReserve As Anti-Attaching Trick

In this post i will show you a new anti-attaching trick that has been tested on Windows 7. It does not work on Windows XP due to the changes Microsoft introduced in the way threads are created.

Let's first see how thread creation in Windows 7 is different from that of Windows XP.

In Windows XP, whenever you call the kernel32 "CreateRemoteThread" or the ntdll "RtlCreateUserThread" function to create a new thread, the following occurs underneath:

The kernel32 "BaseCreateStack" or ntdll "RtlpCreateStack" function is called in case of  "CreateRemoteThread" or "RtlCreateUserThread" successively to allocate space for the new thread's stack in the address space of the target process.

N.B. The kernel32 "CreateThread" function is only a call to the kernel32 "CreateRemoteThread" function with the "hProcess" parameter set to -1.

Since there is no big difference between the "BaseCreateStack" and "RtlpCreateStack" functions, it is enough for us to take the "BaseCreateStack" function in disassembly in this post.

The "BaseCreateStack" function takes four parameters, only three of them are of interest. The first parameter is the handle to the process in which we are about to allocate user stack memory. The second parameter is the size in bytes of user stack memory to COMMIT into the target process's address space. The third parameter is the size in bytes of user stack memory to RESERVE into the target process's address space. Hereafter, i will refer to them as hProcess, CommitSize, and ReserveSize.

N.B. If you call the "CreateRemoteThread" function with the "dwStackSize" parameter set to e.g. 0x10000, then BaseCreateStack commits 0x10000 bytes. On the other side, if the "CreateRemoteThread" function is called with the "dwCreationFlags" parameter having the "STACK_SIZE_PARAM_IS_A_RESERVATION" flag set, then BaseCreateStack Reserves 0x10000.

Now, let's dive into the "BaseCreateStack" function and see what is going on inside.

1) It extracts the value of ImageBase from the PEB of the process in which it is called, the value is then passed to the "RtlImageNtHeader" function. If the "RtlImageNtHeader" function fails an error ERROR_BAD_EXE_FORMAT is returned.


2)
If the "ReserveSize" parameter passed to it is zero, it uses the value of the "SizeOfStackReserve" field of the IMAGE_OPTIONAL_HEADER structure.



3) Similarly, If the "CommitSize" parameter passed to it is zero, it uses the value of the "SizeOfStackCommit" field of the IMAGE_OPTIONAL_HEADER structure. Please remember that the values are extracted from the PE header of the main executable of the process that is calling the "CreateRemoteThread" function, not the target process.



4) It then makes some sanitization checks on the ReserveSize and CommitSize, for example to ensure that the commit size is never greater than the reserve size. It also checks to ensure that the commit size is never lower than the value of the "MinimumStackCommit" field of PEB.




5) It calls the "ZwAllocateVirtualMemory" function to reserve memory of size ReserveSize into the address space of the target process with the PAGE_READWRITE protection attribute.


6) It calls the "ZwAllocateVirtualMemory" function to commit CommitSize+0x1000 of the memory reserved in the previous step.



7) The extra page committed in the previous step is then given the PAGE_GUARD protection attribute.


Here is a similar reversed code of the "BaseCreateStack" function. From here.


The reason why a PAGE_GUARD page always exists at the end of committed stack is for the kernel to be notified each time the stack needs to be expanded. For example, if a thread tries to touch its stack's PAGE_GUARD page, an STATUS_GUARD_PAGE_VIOLATION exception is raised and swallowed by the kernel and it automatically commits one more page.

N.B. If a thread tries to touch the PAGE_GUARD page of another thread's stack, the exception is passed to the application or the debugger.

After the stack has been allocated in the target process's address space, the "CreateRemoteThread" function formulates a CONTEXT structure for the new thread. After the previous steps have completed successfully, the "ZwCreateThread" function is called to initiate the new remote thread.

Now let's see how threads are created in Windows 7.

In Windows 7, if we take the "CreateRemoteThread" or "RtlCreateUserThread" function into disassembly, we will see that the "dwStackSize" is directly passed to the "ZwCreateThreadEx" function.
So, our first assumption here is that stack allocation is now forwarded to the kernel. Also, we can note that now in later versions of Windows than XP, the "ZwCreateThreadEx" function is by default used for thread creation instead of the "ZwCreateThread" function.

Now let's check the "NtCreateThreadEx" function in ntoskrnl.exe.

We can easily see in "NtCreateThreadEx" a call to the "PspCreateThread" function.
The "PspCreateThread" function calls the "PspAllocateThread" function which calls "RtlCreateUserStack" function.


The "RtlCreateUserStack" function is called after attaching to the target process's address space. Now let's look at the "RtlCreateUserStack" function in disassembly.

Now it is easy to see that it reads the PE header from the main executable of the process in which the remote thread is being created unlike XP where information was extracted from the main executable of the process that creates the thread. Yeah, it seems Microsoft fixed a very minor issue.


From the image above, it is also easy to conclude that if we forced the "RtlImageNtHeader" function to fail, we can prevent any foreign process including the debugger from attaching to our process. The easiest way to accomplish that is by erasing the PE header at runtime.  Any call to ZwCreateThreadEx as part of calling the "DebugActiveprocess" function (Used for attaching to a running process) would fail. For more information and examples, please refer to my previous post.

N.B. DebugActiveProcess calls DbgUiIssueRemoteBreakin which calls ~RtlCreateUserThread which calls "ZwCreateThreadEx".

One may say, "Erasing the whole PE header may render many APIs which read from the PE header useless e.g. FindResource or GetProcAddress". My answer will be "Yes, you are right".

So, we should find a smarter way to do it.

Okay, let's continue disassembling the "RtlCreateUserStack" function.


As you can see in the image above if the size of stack commit argument passed to it is zero, it takes the value of the "SizeOfStackCommit" field from the PE header. The same measure is taken if the size of stack reserve passed is zero. It is also noteworthy that if both the size of stack commit argument passed and "SizeOfStackCommit" of the PE header are zero, the commit size becomes 0x4000 (The default commit size is 0x4000).

The function then checks the size of stack commit against the size of stack reserve. If the size of stack commit happens to be greater, then the size of stack reserve is adjusted to be greater.

The function then ensures that the size to be committed is not less than the "MinimumStackCommit" field of  the process's PEB. If it is less, the size to be committed is adjusted.


The function then calls the "ZwSetInformationProcess" function with the "ProcessInformationClass" parameter set to 0x29 (ProcessThreadStackAllocation). The size to be reserved is passed in the 4th member of the structure passed in the "ProcessInformation" parameter.

Now let's quickly have a look at the "NtSetInformationProcess" function.

As you can see in the two images above, the value of the 4th member of the structure passed to the "ZwSetInformationProcess" function is used as the "RegionSize" parameter passed to the "ZwAllocateVirtualMemory" function.

Given this knowledge, if we at runtime change the value of the "SizeOfStackReserve" field of the PE header to a huge value, then we can cause the "ZwAllocateVirtualMemory", "ZwSetInformationProcess", "RtlCreateUserThread", "PspAllocateThread", "PspCreateThread", and "NtCreateThreadEx" functions to successively fail preventing any foreign processes including debuggers from creating any thread in our process.

A demo can be found here and its source code from here.

Any comments or ideas are more than welcome.

You can follow me on Twitter @waleedassar