HuntingCallbacks – Enumerating the Entire system32

What are Callbacks? Certain Windows APIs support passing a function pointer as one of its parameters. This parameter is then called when a particular event is triggered, or a scenario takes place. Either way, this is usually user-controlled and can be abused from an offensive perspective by passing a malicious function or shellcode. Some of the popularly known callbacks are EnumChildWindows, RegisterClass, etc. There has been much research on the topic of callbacks where they have been abused to be used as call stack evasion, sleep timers, evasion from memory scanners, DLL loading and execution, etc. In this blog post, we try to uncover previously unknown callbacks that could be abused maliciously and produce a tool for the same. In this research, we aim to show how various static analysis methods can be used to create and automate the discovery of previously unknown scenarios and details, and we hope to see more similar contributions in the future. Overview There are two main types of branching in most assembly languages, Windows APIs that support callback opportunities usually take in a function pointer address (or a structure, as you will see) as one of its function arguments. The prototype of the function usually is of the format below: To find a potential function that allows the user to send in a callback function pointer, it needs to satisfy two conditions, As mentioned above, indirect calls are usually compiled to call reg, where reg is any of the registers. But since this research is focused on scanning through multiple windows DLLs, we need to accommodate Control Flow Guard or CFG. Control Flow Guard CFG, in a very simple sense, is a protection mechanism that is placed during compile time within Windows executables to prevent malicious indirect calls that could be abused using ROPs and other memory corruption bugs. Most if not all of the DLLs compiled that we will be scanning are compiled with Control Flow Guard, meaning any function pointers passed through a potential function will not result in a call reg signature. As shown in the following picture, when CFG is enabled, the callee is passed via the RAX register to another wrapper function called __guard_xfg_dispatch_icall_fptr. This function takes in the RAX register to perform the call further. The function __guard_xfg_dispatch_icall_fptr is just a wrapper for a bunch of other sub-functions, which eventually boils down to _guard_dispatch_icall_nop. This function is the final wrapper around the instruction jmp rax where the control flow is redirected toward the user-controlled function pointer. So our plan to scan for potential target Windows APIs that support a callback opportunity is as follows, Note: Some functions have certain checks and verifications that will need to be passed in order to have code reachability towards the CFG dispatch. While executing those potential target functions, the user might need to pass in other parameters or sometimes initialize structures. Prepping Miasm To implement my scanning idea, I chose to go with the Miasm framework because it contains a lot of necessary features that I needed to overengineer the solution. There are several alternatives that could be used that perhaps would be faster or offer a more efficient solution, but coming from a CTF background, Miasm seemed to be the simplest and most appropriate choice to pick, although the concepts described could very well be applicable to other tools. Miasm is a reverse engineering framework that supports symbolic execution and contains its own lifter and IR. It supports Use-Def graphs, in-built disassembler, PE loader, and emulator amongst other stuff. Each of these features will be necessary down the line. Before we proceed to scan every function, we need to initialize a bunch of things with Miasm that will need to be queried moving forward. First, we start by reading the file and passing it to the PE container, which is defined by ContainerPE class. The file data that was read can be passed to the function Container.from_string(data, loc_key) whose return value will be the ContainerPE class. Since we are going to scan an entire directory, we need to check if the files we are scanning are DLLs (some executables also have exported functions, but that is for a future update). We check if the file that we read is a DLL by parsing the Characteristics parameter from it’s COFF Header. If the Characteristics parameter contains the IMAGE_FILE_DLL flag, then we can be mostly sure that we are dealing with a DLL. We then proceed to extract all the functions in the export table of the DLL. Miasm uses an object called LocationDB() that, simply speaking, is an object that keeps track of the symbol names and the corresponding offsets of those symbols within the binary. This is sort of like a separate database populated whenever a new symbol for an offset is defined in the binary. Since the Import Address Table (IAT) contains symbols that will be required for us to check against, we need to populate the loc_db by adding their details and corresponding offset within the binary. This can be performed by parsing the IAT and retrieving both the offset and the symbol using Miasm’s in-built functions. Once all the previously mentioned objects and details are initialized, we can proceed to the core part of the hunt to find the potential target functions. Hunting for Indirect Function Calls In some DLLs, we notice that exported APIs behave as a proxy call to some other DLL; these functions are usually not in the .text section. Attempting to disassemble these exported API obviously will create an exception since there is no actual code block to be disassembled and the exported API is merely a proxy call. Hence, to avoid the tool from crashing in between, we need to add a check that identifies if the address of the exported API that we are scanning is within the .text section and contains the section flag 0x60000020. This can be implemented as follows: We then initialize the disassembly engine of Miasm and

LayeredSyscall – Abusing VEH to Bypass EDRs

Asking any offensive security researcher how an EDR could be bypassed will result one of many possible answers, such as removing hooks, direct syscalls, indirect syscalls, etc. In this blog post, we will take a different perspective to abuse Vectored Exception Handlers (VEH) as a foundation to produce a legitimate thread call stack and employ indirect syscalls to bypass user-land EDR hooks. Disclaimer: The research below must only be used for ethical purposes. Please be responsible and do not use it for anything illegal. This is for educational purposes only. Introduction EDRs use user-land hooks that are usually placed in ntdll.dll or sometimes within the kernel32.dll that are loaded into every process in the Windows operating system. They implement their hooking procedure typically in one of two ways: Hooks are not placed in every function within the target dll. Within ntdll.dll, most of the hooks are placed in the Nt* syscall wrapper functions. These hooks are often used to redirect the execution safely to the EDR’s dll to examine the parameters to determine if the process is performing any malicious actions. Some popular bypasses for circumventing these hooks are: There are more bypass techniques, such as blocking any unsigned dll from being loaded, blocking the EDR’s dll from being loaded by monitoring LdrLoadDll, etc. On the flipside, there are detection strategies that could be employed to detect and perhaps prevent the above-mentioned evasion techniques: The research presented below attempts to address the above detection strategies. LayeredSyscall – Overview The general idea is to generate a legitimate call stack before performing the indirect syscall while switching modes to the kernel land and also to support up to 12 arguments. Additionally, the call stack could be of the user’s choice, with the assumption that one of the stack frames satisfies the size requirement for the number of arguments of the intended Nt* syscall. The implemented concept could also allow the user to produce not only the legitimate call stack but also the indirect syscall in between the user’s chosen Windows API, if needed. Vectored Exception Handler (VEH) is used to provide us with control over the context of the CPU without the need to raise any alarms. As exception handlers are not widely attributed as malicious behavior, they provide us with access to hardware breakpoints, which will be abused to act as a hook. To note, the call stack generation mentioned here is not constructed by the tool or by the user, but rather performed by the system, without the need to perform unwinding operations of our own or separate allocations in memory. This means the call stack could be changed by simply calling another Windows API if detections for one are present. VEH Handler #1 – AddHwBp We register the first handler required to set up the hardware breakpoint at two key areas, the syscall opcode and the ret opcode, both within Nt* syscall wrappers within ntdll.dll. The handler is registered to handle EXCEPTION_ACCESS_VIOLATION, which is generated by the tool, just before the actual call to the syscall takes place. This could be performed in many ways, but we’ll use the basic reading of a null pointer to generate the exception. However, since we must support any syscall that the user could call, we need a generic approach to set the breakpoint. We can implement a wrapper function that takes one argument and proceeds to trigger the exception. Furthermore, the handler can retrieve the address of the Nt* function by accessing the RCX register, which stores the first argument passed to the wrapper function. Once retrieved, we perform a memory scan to find out the offset where the syscall opcode and the ret opcode (just after the syscall opcode) are present. We can do this by checking that the opcodes 0x0F and 0x05 are adjacent to each other like in the code below. Syscalls in Windows as seen in the following screenshot are constructed using the opcodes, 0x0F and 0x05. Two bytes after the start of the syscall, you can find the ret opcode, 0xC3. Hardware breakpoints are set using the registers Dr0, Dr1, Dr2, and Dr3 where Dr6 and Dr7 are used to modify the necessary flags for their corresponding register. The handler uses Dr0 and Dr1 to set the breakpoint at the syscall and the ret offset. As seen in the code below, we enable them by accessing the ExceptionInfo->ContextRecord->Dr0 or Dr1. We also set the last and the second bit of the Dr7 register to let the processor know that the breakpoint is enabled. As you can see in the image below, the exception is thrown because we are trying to read a null pointer address. Once the exception is thrown, the handler will take charge and place the breakpoints. Take note, once the exception is triggered, it is necessary to step the RIP register to the number of bytes required to pass the opcode that generated the exception. In this case, it was 2 bytes. After that, the CPU will continue the rest of the exception and this will perform as our hooks. We will see this performed in the second handler below. VEH Handler #2 – HandlerHwBp This handler contains three major parts: Part #1 – Handling the Syscall Breakpoint Hardware breakpoints, when executed by the system, generate an exception code, EXCEPTION_SINGLE_STEP, which is checked to handle our breakpoints. In the first order of the control flow, we check if the exception was generated at the Nt* syscall start using the member ExceptionInfo->ExceptionRecord->ExceptionAddress, which points to the address where the exception was generated. We proceed to save the context of the CPU when the exception was generated. This allows us to query the arguments stored, which according to Microsoft’s calling convention, are stored in RCX, RDX, R8, and R9, and also allows us to use the RSP register to query the rest of the arguments, which will be further explained later. Once stored, we can change the RIP to point to our demo function; in