Using Z3 Theorem on AVR Firmware

As many of you may or may not be aware of, I have a serious obsession with embedded systems security. It wasn’t until about two years ago where I started my journey of incorporating my knowledge in reverse engineering software applications into pulling apart firmware from embedded devices. Additionally, I also started learning hardware security concepts such as side-channel attacks, fault-injections/glitch attacks, bit flipping, and more.

Anyone who has began their careers in embedded security will tell you that the Arduino board is the best device to get started on. They have a pretty intricate IDE, and an abundant amount of HAL (Hardware Abstraction Layer) API calls to use. Needless to say, writing your first hello world application in the embedded systems industry is fairly rudimentary, which is just blinking an LED.

The goal of this paper to show you how to use the Z3 formula (with python) to break weak password checkers that use linear inequalities or equalities such as satisfiability modulo theories.

AVR Architecture

To give you some background on what Arduino uses as their main MCU (Microcontroller Unit), it will help to understand what AVR is. AVR is an 8-bit RISC architecture (Reduced Instruction Set Computing). AVR based microcontrollers became widely popular for being one of the first in the game to have on-chip flash storage. For the lab exercise, we will be using an Arduino Nano board with an ATmega328P MCU.

What we have here is a 28-pin chip with an operating voltage that ranges from +1.8V to +5.5V. Only 23 of the 28 pins are programmable as GPIOs (General Purpose Input/Output). It does come with your standard set of communication interfaces such as Master/Slave SPI (Serial Peripheral Interface), USB, USART (Universal Synchronous/Asynchronous Receiver Transmitter), and Two-Wire peripheral protocols such as I2C (Inter-Integrated Circuit). We will discuss USB-to-USART more in depth later on as this will become important when flashing our chip with the target firmware and communicating with our chip by sending and receiving data.

Let’s go over this diagram above;

  • The data bus handles data in 8-bit chunks that travel inside the microcontroller, as the bus line is 8-bits wide.
  • General Purpose Registers are used for arithmetic operations such adding and subtracting numbers, setting jump pointers, and compare operations. From the previous article I wrote, we discussed registers for x86 architecture, and in this case of AVR we have R0…R31.
  • ALU stands for Arithmetic Logic Unit. This works in direct connection with the general purpose registers. Arithmetic operations between a GPR and an immediate operand can be executed within a single clock cycle. We can divide these operations into three categories: arithmetic, logical, and bit-wise functions.
  • Instruction Decoder is a combinational circuit whose purpose is to translate an instruction code into the address at micro-memory.
  • Program Counter (PC) is 14 bits wide and addresses the program memory locations.
  • As we can see from the picture above, there are three different types of memory being used here. 1) Flash Memory, which is programmable read-only memory (ROM), and firmware can only be changed using a programmer or bootloader. 2) SRAM, an area for volatile memory that holds data only when electrically powered. 3) Lastly, EEPROM (Electronically Erasable Programmable Read Only Memory), a semi permanent data storage that can only be accessed using special registers inside the AVR. It has the ability to control things like addresses to be written, data to be written, and flags used to read data.
  • I/O Lines are used for controlling applications through the digital input and output pins. These pins can detect any voltage changes such as HIGH or LOW electromagnetic force digital output. All I/O pins are connected to diodes that trace to a VCC and GND line.

Now that we have a better understanding of AVR architecture, let’s briefly go over how we are going to communicate with our ATmega328P chip using USB-to-USART.

USART

The USB-to-USART converter/bridge is a serial port to your computer and is responsible for sending serial data over two wires. There is no clock signal and no parleying between the two devices. To communicate accurately, both devices must be configured beforehand to use the same speed of communication, otherwise known as the baud rate. The baud rate is simply the rate at which information is transferred in a communication channel. In the serial port context, 9600 baud means that the serial port is capable of transferring at a maximum of 9600 bits per second. It can be calculated in two ways:

Where B is the defined as units of total bits per second, D is the application of data bits per second, and N is the bits that are conveyed per symbol, and the gross bit rate is R, inclusive of channel coding overhead, the symbol rate fs can be calculated as:

The USART clock must operate at 16 times the desired baud rate. The clock is based around the operation of a crystal oscillator which, in the case of the ATmega328P USART, is set to a constant 3.6881 MHz.

Lab Assignment

There are two files we are going to need for this exercise. Both of these files can be found at VirusTotal.

  • One is the bin file that is a dumped firmware image -> 7afa940272694061bde3d1eea7f4827a.
  • The second is the hex firmware file used to upload and be processed by our Arduino Nano board -> 4f055c15d3841872ba8156ffb968a8ab.

The next step to getting our board ready will be to upload the hex file we just downloaded by using avrdude. Connect the board to your computer via USB, and be sure to have installed avr-tools. The typical baud rate upload speed is going to be 57,600. We are going to run this at the command line to get the firmware on the board:

At this point we need to figure out the what is the baud rate the USART is using to communicate all of its serial data. After running baudrate.py, you should get the result of 19,200. Now we can use the application screen to communicate with the board from our machine.

Looks like a pretty straightforward single-password check. Let’s move on over to the jumpy.bin file and start our reverse engineering process. We are going to use Ghidra for this project as I found it to be a lot cleaner when using the AVR disassembler. After we load the file into a project directory, specifying the language is a must; so in our case, AVR8 Little Endian 16-bit assembly was the best option.

Ghidra will automatically analyze all the code including the vector table at the start of the disassembly output. One thing to note: you will see a lot of opcodes that have not been analyzed in code section, and in that case we will highlight all those bytes and press D, this will automatically disassemble the opcodes and you can use the F key afterwards to create the newly examined code into a function.

After Ghidra has completed analyzing the binary file, the output can look a bit convoluted. The first set of subroutines are responsible for initializing the peripherals such as the USART/TX-RX, SPI, TIMER, PCINT, Interrupts, and Reset table.

Side note: If you ever want to find the main() function in firmware code, trace to the base address to RESET -> code:000000 of the bootloader. It will always contain a simple jump (I.E. jmp lab_000045) to an address outside of the bootloader that will directly lead to a start routine that holds main() after performing a lot of operations to setup the environment.

We will most likely have better luck doing dynamic analysis, which won’t be easy, but will definitely give a lot more satisfying results. The strings table doesn’t show much and there are no XREF’s to any of the offsets in code segment.

Emulator

The goal right now is to emulate the code and figure out where the firmware is asking for input from the user in the USART. We have to find an offset in the beginning to set a breakpoint at so we can commence step-tracing. The first offset (0xaa) I noticed that shows logical code is at function 0x003cc.

Before we start up our debugger, there are three tools you are going to need to download:

Open three terminal windows and let’s start with the first one. Run this simavr-simduino command:

Next is picocom, which will be used as a bridge for the USART communication. In the next terminal window, run this command:

This setup will now get us ready to connect to a GDB server. We can now run avr-gdb in the third terminal.

Looking at picocom, it doesn’t look like we have hit any sort of text output yet, so we haven’t yet reached the point where the code is sending data through the USART terminal. Using the gdb command stepi 20 is a good trial and error technique to see how much we should step in the beginning before the first letter is output to the pico terminal. Interestingly, we hit the breakpoint again at 0xaa, and if we switch back to picocom, we can see the letters In being printed on the screen. This function looks like it is responsible for sending data over TX. Tracing back into Ghidra, it looks like we are hitting this peripheral component:

After stepping a few more times, we come to a point where we can find directly what is printing to the picocom terminal.

Hex 0x75 is ‘u’ which makes sense if it’s printing the string Input. Whats happening here in the code is the register r18 holds the next byte to store into Z using the st instruction. Looking at the Atmel instruction manual, we can see exactly what this operation is doing. It indirectly stores from the register to the data space using index X or Z. If you look how Ghidra decompiles this logic, it will look like this:

Y is the array index that holds our string table, and the st instruction is seen as a volatile write to the DR0 (Data/Debug Register). Now we can make note that in GDB offset 0xaa (0x66 in Ghidra) is part of the USART-RX, and offset 0xbe (0x70 in Ghidra) is responsible for the USART-TX. Side Note: Some register pairs can be used for 16-bit operations. These can only be used with the register pairs R26:R27 (X), R28:R29 (Y), and R30:R31 (Z). I’m clarifying this because there was some confusion around debugging information in memory; so if we wanted to dump contents of Y using GDB, the command would be crafted like this -> x/10x $R28.

Now that the TX/RX interrupts are located, we need to be able to find where our input is actually processed. After stepping through a bunch of the code, it looks like we may have found in Ghidra where our input gets processed:

CPI is responsible for comparing an immediate value. What sparked my interest when looking at this was the value 0xd and 0xa which is \r and \n in ascii. My theory is, if we set a breakpoint at the start of this function which is offset 0x2ac (0x167 in Ghidra), the picocom terminal will prompt us for an input and allow us to enter a string. After hitting enter, we should hit the breakpoint.

It looks like the breakpoint was hit without giving us the opportunity to input a string. Let’s continue browsing the stack:

From where we set our breakpoint, it looks like there is a function call at offset 0x154. Decompiling this function, we get the output:

Subroutine looks simple enough to understand… it sets up the W register to interpret bits from the USART status and control register, if successful, it then takes a byte at a time and reads from the USART data register. Until a null terminating character is found, W will hold the entire value. Knowing this, we have to set a breakpoint after this function call, which will be at offset 0x2bc (0x15b in Ghidra). Let’s try this again, set a breakpoint at the new offset, and see if it works this time.

Perfect! From what we saw before $r24 holds each byte and compares it to an \n or an \r. To confirm we can see if the register holds our first byte ‘h’, (gdb) i r $r24 == 0x68 <=> 104; and it does! At this offset, we will be hitting this breakpoint X amount of times, X being equal to the size of our string + 1. Once the code processes the null terminating string, the function exits. Our input string will be held at this address below.

Time to step out of this function and see where this routine is being cross-referenced. Once we reach the return instruction, the $pc will be redirected to a module at:

This could be the winning ticket for what happens next with our string. Just to keep track, we are at offset 0x302 (0x192 in Ghidra). I started to step through this function one instruction at a time, and it looks like it wasn’t actually comparing anything of interest once this function returned. Something interesting started to show once we got a bit further down the code base:

Looks like two registers are being added then compared to a value of 0xd3. Let’s set a breakpoint here in gdb at offset 0x36c, and see what values of our input are possibly being added together.

We can see index 7 (‘r’) and index 8 (‘l’) added; this will give us the hexadecimal value of 0xde. Now that we know this routine begins at 0x1bb in Ghidra, we can rename this function compare_index_7_8(). As we move further down the code base, we start to see a lot of similar code. Side note: start labeling all comparative functions to keep track of what indices your input string are being are being compared to.

Over at address 0x2b5, there is an operation that looks a bit different from the other comparisons. Index 6 and 7 of our input string gets their ordinals multiplied by each other, then gets compared to a lower WORD byte value of the result to 0xc and higher byte value of that WORD to 0x2b. In our example, we can take the two indices of ‘o’ and ‘r’ from helloworld923 and multiply the two integers to obtain the value 0x316e; the lower byte of this WORD (LOWORD) in this equation is 0x6e and the higher byte (HIWORD) value is 0x31. This is how the comparison will look like in pseudo-code:

The sbci instruction subtracts with the carry immediate command, and our final value that index 6 and 7 will need to equal is 0x2b0c. Static analysis will really take care of the rest from here on out because we know what specific mnemonics to keep an eye out for. We can build up our constraints table and use the Z3 theorem to solve the rest. The size of our string has to be 13 characters as seen in this snippet of decompiled code:

Inside your Ghidra Functions tab, you should have labeled all 13 subroutines responsible for comparing each byte of your input.

Z3 Theorem Solution

To put it simply, Z3 Prover was created for Satisfiability Modulo Theories (SMT) problems. These are decision problems for logical formulas with combinations that include arithmetics, bit-vectors, arrays, and unresolved functions. Z3 is an effective SMT solver with specialized algorithms to solve background theories as mentioned above. You can download the python library here.

At this point we can formulate the conditionals on how this firmware code checks for the right password and what constraints we can enter into our bit-vector that can be used as shorthands for sub-terms.

Side Note: The flag string gets loaded at Ghidra’s offset 0x55. It will use the instruction lpm to load a data byte from the FLASH program memory into the register file. The Z-register in the register file is then used to access the program memory and place the data in register R0. A final check function will be called at offset 0x3a7 to confirm how many times the variable isCompared was set to be true | 1. If equal to (1<<14)–1, then the code will proceed to print the flag.

After all values are added to your constraints, you can use Z3’s check() function to examine whether the assertions in the given solver plus the optional assumptions are consistent or not. If the assertion returns true, the next step would be to return a model() for the last check(). This function raises an exception if a model is not available (e.g., last check() returned unsat; your assignment doesn’t satisfy the quantified axiom). A simple test to see how the library works would be to use a three unknown linear equation from our challenge like x+y = 0xd3 && y*z = 0x15c0:

We can apply the same logic to our unknowns mentioned above. But as you can see, if Z3 is unable to satisfy the requirements needed to model the solution, it will return an unknown and everything added after will be considered unsatisfiable.

After running our solver, we finally get the password that is needed to decrypt the flag:

The next step would be to program a module that takes this password, automatically sends it to the ATMega’s USART serial port, and retrieves the flag. The source code for this script can be found here. In order to do this, make sure to have your board plugged into the USB port and connected to your linux VM USB ports. To ensure your VM recognizes this device, check your kernel message logs right after its plugged in and you should see that the usb-core registered a new interface driver called ch341:

In my case, the port I’m going to connect to in my python script will be located at /dev/ttyUSB0. After running our script, we get a successful decrypted flag from the microcontroller.

Thank you for following along! I hope you enjoyed it as much as I did. If you have any questions on this article or where to find the challenge, please DM me at my Instagram: @hackersclub or Twitter: @ringoware

Happy Hunting :)

P.S. If you missed the link to my Z3 solver program above, you can find the source code here.

Security Researcher | Reverse Engineer | Embedded Systems

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store