While working on a reverse engineering project, I came across a binary that appeared to be malformed since it couldn’t be disassembled, but when running the executable, it worked. After researching for a bit I was able to discover that parts of the executable were encrypted. What does it mean to encrypt a code segment and why would anyone want to attempt to reverse engineer such a thing?! Well, let’s take a dive into a Windows PE (Portable Executable) file as an example and look into what segments make up a PE program.
Windows uses a paged-based virtual system and having a large code section is easier to maintain within the operating system side of things. Paging is a memory management scheme that eliminates the need for adjacent allocation of physical memory. A physical or virtual memory address is generated by the CPU. An example would be if a Logical Address = 31 bit, then that Logical Address Space = 2³¹ words = 2 G words (1 G = 2³⁰). The mapping from virtual/logical to physical address is done by the Memory Management Unit (MMU) and this mapping is known as paging.
Before diving into the segments, further information on the layout of a PE file compared to a typical C program structure can be seen below:
All underlying code in an executable will reside in the section called .text.
Now for segments of interest; the beginning of the .text segment contains the entry point which points to the IAT (Import Address Table). The IAT is the principle structure, with one entry per DLL (Dynamic Link Library) which is being imported. Each entry contains, among other things, an ILT (Import Lookup Table) and a IAT pointer. The next segment of importance would be the .bss, which is responsible for the uninitialized data for the executable. Finally, a Windows PE has three data segments, .rdata, .edata, and .idata. Oh lord… that’s a lot of data!
The .rdata is the read-only section, where static constants and strings live. The .edata segment has all the export functions, where the names and addresses belonging to them are located. Last but not least (for the data gods), idata contains all the information on imported functions.
The next import segment to discuss is a very important one; it’s where most malware will hide embedded executables or DLLs and sometimes even system files; the resource section. Also known as .rsrc, is where the information behind a module is contained. The resources, such as dialog boxes, menus, icons and so on, are in the data registry directed by the USHORT value IMAGE_DIRECTORY_ENTRY_RESOURCE.
Cool. So what now? Before we dive into the exercise, let’s discuss the purpose behind the encrypted code .text section. One would typically observe code encryption techniques in malware that attempts to stop or to deter a reverse engineer. Another purpose for code encryption is to bypass signature or behavior based anti-virus mechanisms. Although malware is not the only scenario where we would see something like this, usually with commercialized software, we need to protect against intellectual property theft.
Now you might be wondering... how does an encrypted binary run if the opcodes are all scrambled? How does the processor decode unknown instructions? Well, the short answer… it doesn’t. Code segment encryption can make static analysis very complicated but it doesn’t necessarily make dynamic analysis any more challenging. This is because at run-time, these encrypted binaries have something called a decryption stub. The stub decrypts the malicious part of a binary and loads it into memory. Most crypters take the import pieces of your binary file (your code), encrypt it, and place it in the stub. The crypter will hold the same keys used to encrypt the stub as it will for the target file, thus making it a symmetric encryption problem that needs to be solved. Ultimately, the stub acts as a loader which executes before the malware does and redirects the execution of the program. When analyzing programs like these, your best bet is to look at segments of the file and search for anything labelled “.stub”. If that is not the case, and the malware author attempted to be a tad bit sneakier, we’ll have to trace the first entry point of the program before we can even attempt to find where the original decrypted entry point should have been.
Let’s look at an example of a binary that uses this method of anti-reversing as well as other techniques like anti-debugging. The file we are going to use as our lab example is called debugme.
SHA256: da9814d7773262d69631d8c0f6f17cb1a55e8ac4e56d0e18f0ea05fbd0f6da1d debugme.exe
SHA1: 28f71b3065341b0aebc6c16a13cc8a5cd0233f93 debugme.exe
MD5: 5b037e1355ad2f469971255ff93dba84 debugme.exe
debugme.exe: PE32 executable (console) Intel 80386, for MS Windows
The file we will be carefully examining under a microscope is a 32-bit portable executable, so we are going to be doing all of our dynamic analysis in a Windows 10 VM (Virtual Machine).
Let’s run this program and see what we get:
I heard you like bugs so I put bugs in your debugger so you can have bugs while you debug!!!
Seriously though try and find the flag, you will find it in your debugger!!!
First thing we will do is search for the above string and see where it is getting referenced during static analysis in radare2 since IDA Pro had some problems processing the strings.
Now we search for references, also known as xrefs, to the virtual address 0x00409035, but we come up empty. The next idea would be to attempt to locate the entry point to this program and search for where our main() function is.
If you have never heard of the subroutine WinMainCRTStartup, I don’t blame you. In short, CRT’s entry point initializes the global state and then retrieves command line and startup information given by Windows OS and hands it off to main. Let’s try to inspect both _main and ___main functions to see what is really going on.
This doesn’t look right. Doesn’t the order of instruction operations seem to be off? This leads us to believe some good old fashioned code segment encryption is going on here. It is safe to assume the _main at address 0x00401620 will most likely be decrypted at runtime, so we have to debug this program. It is also safe to assume that we are going to see some anti-debug tricks given the obvious name of this program “debugme”. The next step is to look at entry0, which is just an alias for the _start symbol, which is the programs entry point.
Right off the bat, this looks very interesting. At the start of entry0, it immediately jumps to the address 0x00408904 which, low and behold, has some anti-debug instructions, rdtsc, at the address 0x0040892b.
What is this instruction?! There are two of them showing up!
Rdtsc instruction loads the high-order 32 bits of the timestamp register into EDX and the low-order 32 bits into EAX. A bitwise operator OR is carried out to reassemble and store the register value into a local variable. This is typically seen in malware because it would determine how rapidly the processor is executing program instructions. The number of ticks are used as a counter that defines the last system reboot as a 64 bit value placed in EDX:EAX. An example of this calculation will be shown shortly below in the image with the caption Calculated Timestamp.
Finally, the juicy parts of this program are starting to show up. From what we saw at _main(), the subroutine looked fairly obfuscated. After looking here, we can see that _main() is clearly being MOV’d into EAX, then each byte is being XOR’d by the key 0x5c. EAX then increments by one using the INC instruction. At address 0x00408977, there is a compare operation, where EAX is being checked against the address 0x00401791.
In this area, AX the lower 16 bits of EAX, is used to transfer the low byte of flags word to the AH register (the upper 8 bits of EAX) using the LAHF and XCHG instruction. The reason this is happening is to covertly hide the fact that it’s comparing each decrypted byte from EAX to the finishing byte of the code segment. Once this is finished, the program JMPs to where _main() is finally executed.
Our next step is to extract the decrypted code segment and repair the Import Address Table. We can do this one of two ways: we can write a python script that takes each byte and decrypts it using the key we found, or we can resolve this dynamically using x32Dbg which also has a built-in plugin called Scylla to repair all imports once we find the OEP (Original Entry Point).
When we load the executable into our debugger, our goal is to do two things: patch the timestamp checking instructions, and extract our decrypted code. If we step into the program twice after load, we will get to the address 0x004010F9 that instructs the program to jump to our decryption stub. You’ll notice that once we step-into, we land at a part of the program that we previously saw in our radare2 disassembly.
Notice the two breakpoints highlighted at the rdtsc instruction. If we step through these instructions without patching, we will arrive to a check like this:
This will compare the resulting data from rdtsc into EAX, then measure if the integer is greater than 0x3E8.
EAX at this point of the compare is 0xF1335A17, which is an unsigned long value of 4,046,674,455. The program will catch that it is being executed in a debugger environment, jump past the decryption stub, and pretty much return the program’s exit after moving the stack pointer into the base pointer. So let’s rewind and patch these instructions as such:
Make sure to keep the size and fill it with nops. After the patch you will jump to the area where the decrypted code segment is moved into EAX.
The address 0x00401620 gets moved into the EAX register and if we take a look at the memory dump, we see a bunch of garbage bytes that don’t look like much. The next instruction takes each byte from memory and XOR’s it by 0x5c as we discussed earlier. We are going to keep this address in the memory dump window to see what this turns into.
If you set the next breakpoint after the decryption loop, we can see the magic of what happened:
Aaaahhhh! Well this looks a lot more readable! You may be asking yourself, how is this readable?! It still looks like just a bunch of numbers, but the few bytes says it all. Let’s take a look at the first DWORD block (55 89 E5 31). What significance do these hexadecimal values mean? If you take the array literal of these opcodes and reference the X86 Assembly Instruction manual, they look like this:
If you start to disassemble enough programs, you’ll start to see patterns in the starting region of a subroutine and their referenced operands. This is called the Function Prologue and it is the few lines of code (as seen in the image above) at the beginning of a function which sets up the stack and registers for use within a function. Pushing the base pointer onto the stack is an extremely important step, it configures EBP in a way where local data can now be accessed from fixed offsets. Doing this plus moving the stack pointer to the base pointer sets up a starting point for the function’s own activation record.
By knowing this now, it seems fairly obvious if we go to the address 0x00401620 in the disassembly map, it should be readable assembly code and most likely our original entry point.
Like magic! Immediately there appears to be plaintext in our disassembly that we weren’t able to reference before in our static analysis using radare2. Nice!
Next, to fix our import address table we can get a proper disassembly view of the real program. Keep your cursor at this address, and pop open Scylla by going to your plugins menu. Once this is open, you’re going to fill in the OEP field with your given address. Then use the IAT Autosearch button and be sure to fill in the virtual address and size field. When that is finished, we will hit Get Imports. If it worked, you should see something like this:
Looks like we have resolved some imports. For the finale, we have to dump this to a binary file.
Go ahead and open up this new file in IDA Pro to see how this challenge can be solved. Head over to the main() subroutine.
This area stands out the most because we can see one of two things. 1) Looking over at address 0x00401789, it shows another decryption routine by XOR’ing out EAX by EBX. 2) The last place EBX gets called is the address 0x00401780, which moves one byte (0x4B) into EBX. It is safe to assume that is our decryption key. EAX likely holds our encrypted flag data!
Let’s do the math and figure out what bytes will go into our array in the python script. A lot of data is being moved into EAX, we can gather the data quickly like this:
[6A253E2Dh, EAX — 560C29FCh, EAX & 41414141h & 3E3E3E3Eh, 6A253E2Dh, EAX — 49FD1BF4h, 6A253E2Dh, EAX — 5E190004h, EAX & 41414141h & 3E3E3E3Eh, 6A253E2Dh, EAX — 3E001C06h,..] ^ 0x4B
After all is said and done, our array can be prepared one byte at a time:
The decryption routine is fairly simple and can be solved with an easy for loop, iterating through each byte of the array and XOR’ing it by the key we retrieved from the disassembly.
Save this as a .py file and execute the program. After running the script in our terminal, the results should be pleasing.
Finally, in this challenge we have covered a variety of topics that range from symmetric encryption, anti-debugging, anti-reversing, all the way to how packed binaries operate.
Thank you for following along! I hope you enjoyed it as much as I did. If you have any questions on this article or where to find the challenge, please DM me at my instagram: @hackersclub
Happy Hunting :)
This publication is sponsored by GuardToro. GuardToro fills in the gap while giving you complete control. Data events are streamed in using runtime agents managed with your own hardware. Using codified responses, you can finally automate any processing gaps you have — putting your focus on the most crucial incidents that only a human-being can handle. Sounds cool? Check out our docs at guardtoro.com/.