Using Z3 Theorem on AVR Firmware

17 min readMay 26, 2020

As many of you may or may not be aware of, I have a serious obsession with embedded systems security. It wasn’t until about two years ago where I started my journey of incorporating my knowledge in reverse engineering software applications into pulling apart firmware from embedded devices. Additionally, I also started learning hardware security concepts such as side-channel attacks, fault-injections/glitch attacks, bit flipping, and more.

Anyone who has began their careers in embedded security will tell you that the Arduino board is the best device to get started on. They have a pretty intricate IDE, and an abundant amount of HAL (Hardware Abstraction Layer) API calls to use. Needless to say, writing your first hello world application in the embedded systems industry is fairly rudimentary, which is just blinking an LED.

The goal of this paper to show you how to use the Z3 formula (with python) to break weak password checkers that use linear inequalities or equalities such as satisfiability modulo theories.

AVR Architecture

To give you some background on what Arduino uses as their main MCU (Microcontroller Unit), it will help to understand what AVR is. AVR is an 8-bit RISC architecture (Reduced Instruction Set Computing). AVR based microcontrollers became widely popular for being one of the first in the game to have on-chip flash storage. For the lab exercise, we will be using an Arduino Nano board with an ATmega328P MCU.

What we have here is a 28-pin chip with an operating voltage that ranges from +1.8V to +5.5V. Only 23 of the 28 pins are programmable as GPIOs (General Purpose Input/Output). It does come with your standard set of communication interfaces such as Master/Slave SPI (Serial Peripheral Interface), USB, USART (Universal Synchronous/Asynchronous Receiver Transmitter), and Two-Wire peripheral protocols such as I2C (Inter-Integrated Circuit). We will discuss USB-to-USART more in depth later on as this will become important when flashing our chip with the target firmware and communicating with our chip by sending and receiving data.

Let’s go over this diagram above;

The data bus handles data in 8-bit chunks that travel inside the microcontroller, as the bus line is 8-bits wide.
General Purpose Registers are used for arithmetic operations such adding and subtracting numbers, setting jump pointers, and compare operations. From the previous article I wrote, we discussed registers for x86 architecture, and in this case of AVR we have R0…R31.
ALU stands for Arithmetic Logic Unit. This works in direct connection with the general purpose registers. Arithmetic operations between a GPR and an immediate operand can be executed within a single clock cycle. We can divide these operations into three categories: arithmetic, logical, and bit-wise functions.
Instruction Decoder is a combinational circuit whose purpose is to translate an instruction code into the address at micro-memory.
Program Counter (PC) is 14 bits wide and addresses the program memory locations.
As we can see from the picture above, there are three different types of memory being used here. 1) Flash Memory, which is programmable read-only memory (ROM), and firmware can only be changed using a programmer or bootloader. 2) SRAM, an area for volatile memory that holds data only when electrically powered. 3) Lastly, EEPROM (Electronically Erasable Programmable Read Only Memory), a semi permanent data storage that can only be accessed using special registers inside the AVR. It has the ability to control things like addresses to be written, data to be written, and flags used to read data.
I/O Lines are used for controlling applications through the digital input and output pins. These pins can detect any voltage changes such as HIGH or LOW electromagnetic force digital output. All I/O pins are connected to diodes that trace to a VCC and GND line.

Now that we have a better understanding of AVR architecture, let’s briefly go over how we are going to communicate with our ATmega328P chip using USB-to-USART.

USART

The USB-to-USART converter/bridge is a serial port to your computer and is responsible for sending serial data over two wires. There is no clock signal and no parleying between the two devices. To communicate accurately, both devices must be configured beforehand to use the same speed of communication, otherwise known as the baud rate. The baud rate is simply the rate at which information is transferred in a communication channel. In the serial port context, 9600 baud means that the serial port is capable of transferring at a maximum of 9600 bits per second. It can be calculated in two ways:

𝐷 = 𝐵⋅(𝑛/(𝑏+𝑛)) 
𝐵 = 𝐷⋅((𝑏+𝑛)/𝑛)

Where B is the defined as units of total bits per second, D is the application of data bits per second, and N is the bits that are conveyed per symbol, and the gross bit rate is R, inclusive of channel coding overhead, the symbol rate fs can be calculated as:

f⋅(s) = R / N

The USART clock must operate at 16 times the desired baud rate. The clock is based around the operation of a crystal oscillator which, in the case of the ATmega328P USART, is set to a constant 3.6881 MHz.

Lab Assignment

There are two files we are going to need for this exercise. Both of these files can be found at VirusTotal.

One is the bin file that is a dumped firmware image -> 7afa940272694061bde3d1eea7f4827a.
The second is the hex firmware file used to upload and be processed by our Arduino Nano board -> 4f055c15d3841872ba8156ffb968a8ab.

The next step to getting our board ready will be to upload the hex file we just downloaded by using avrdude. Connect the board to your computer via USB, and be sure to have installed avr-tools. The typical baud rate upload speed is going to be 57,600. We are going to run this at the command line to get the firmware on the board:

➜ jumpy git:(master) avrdude -c arduino -p atmega328p -P /dev/cu.wchusbserial1410 -b57600 -u -V -U flash:w:jumpy.hex

At this point we need to figure out the what is the baud rate the USART is using to communicate all of its serial data. After running baudrate.py, you should get the result of 19,200. Now we can use the application screen to communicate with the board from our machine.

screen /dev/cu.wchusbserial1410 19200

Looks like a pretty straightforward single-password check. Let’s move on over to the jumpy.bin file and start our reverse engineering process. We are going to use Ghidra for this project as I found it to be a lot cleaner when using the AVR disassembler. After we load the file into a project directory, specifying the language is a must; so in our case, AVR8 Little Endian 16-bit assembly was the best option.

Ghidra will automatically analyze all the code including the vector table at the start of the disassembly output. One thing to note: you will see a lot of opcodes that have not been analyzed in code section, and in that case we will highlight all those bytes and press D, this will automatically disassemble the opcodes and you can use the F key afterwards to create the newly examined code into a function.

After Ghidra has completed analyzing the binary file, the output can look a bit convoluted. The first set of subroutines are responsible for initializing the peripherals such as the USART/TX-RX, SPI, TIMER, PCINT, Interrupts, and Reset table.

Side note: If you ever want to find the main() function in firmware code, trace to the base address to RESET -> code:000000 of the bootloader. It will always contain a simple jump (I.E. jmp lab_000045) to an address outside of the bootloader that will directly lead to a start routine that holds main() after performing a lot of operations to setup the environment.

We will most likely have better luck doing dynamic analysis, which won’t be easy, but will definitely give a lot more satisfying results. The strings table doesn’t show much and there are no XREF’s to any of the offsets in code segment.

Emulator

The goal right now is to emulate the code and figure out where the firmware is asking for input from the user in the USART. We have to find an offset in the beginning to set a breakpoint at so we can commence step-tracing. The first offset (0xaa) I noticed that shows logical code is at function 0x003cc.

FUN_code_0003cc:0003d2(c)  
     code:0000aa cf 93           push       Ylo
     code:0000ab df 93           push       Yhi
     code:0000ac cd b7           in         Ylo,SPL
     code:0000ad de b7           in         Yhi,SPH
     code:0000ae 60 97           sbiw       Y,0x10

Before we start up our debugger, there are three tools you are going to need to download:

SimAvr => https://github.com/buserror/simavr
Picocom => https://github.com/npat-efault/picocom
Avr-Gdb => https://github.com/igormiktor/build-avr-gcc

Open three terminal windows and let’s start with the first one. Run this simavr-simduino command:

$ board_simduino/obj-x86_64-linux-gnu/simduino.elf -d jumpy.hex
atmega328p booloader 0x00000: 3402 bytes
avr_special_init
avr_gdb_init listening on port 1234
uart_pty_init bridge on port *** /dev/pts/1 ***
uart_pty_connect: /tmp/simavr-uart0 now points to /dev/pts/1
note: export SIMAVR_UART_XTERM=1 and install picocom to get a terminal

Next is picocom, which will be used as a bridge for the USART communication. In the next terminal window, run this command:

$ picocom /tmp/simavr-uart0
picocom v2.2port is        : /tmp/simavr-uart0
flowcontrol    : none
baudrate is    : 9600
parity is      : none
databits are   : 8
stopbits are   : 1
escape is      : C-a
local echo is  : no
noinit is      : no
noreset is     : no
nolock is      : no
send_cmd is    : sz -vv
receive_cmd is : rz -vv -E
imap is        : 
omap is        : 
emap is        : crcrlf,delbs,Type [C-a] [C-h] to see available commandsTerminal ready

This setup will now get us ready to connect to a GDB server. We can now run avr-gdb in the third terminal.

$ avr-gdb ./jumpy.bin
(gdb) target remote localhost:1234
Remote debugging using localhost:1234
0x00000000 in ?? ()
(gdb) break *($pc + 0x000000aa)
Breakpoint 1 at 0xaa
(gdb) c
Continuing.Breakpoint 1 hit, 0x000000aa in ?? ()

Looking at picocom, it doesn’t look like we have hit any sort of text output yet, so we haven’t yet reached the point where the code is sending data through the USART terminal. Using the gdb command stepi 20 is a good trial and error technique to see how much we should step in the beginning before the first letter is output to the pico terminal. Interestingly, we hit the breakpoint again at 0xaa, and if we switch back to picocom, we can see the letters In being printed on the screen. This function looks like it is responsible for sending data over TX. Tracing back into Ghidra, it looks like we are hitting this peripheral component:

undefined USART2_RX()
             undefined         Wlo:1          <RETURN>
             undefined1[255]   Stack[-0x100   buf
                             USART2_RX                                       XREF[1]:     Entry Point(*)  
     code:000066 1f 92           push       R1
     code:000067 cd b7           in         Ylo,SPL

After stepping a few more times, we come to a point where we can find directly what is printing to the picocom terminal.

(gdb) x/10i $pc
=> 0xd0: st Z, r18
   0xd2: pop r0
   0xd4: pop r29
   0xd6: pop r28
   0xd8: ret
   0xda: push r28
   0xdc: push r29
   0xde: in r28, 0x3d ; 61
   0xe0: in r29, 0x3e ; 62
   0xe2: ldi r24, 0xC5 ; 197
(gdb) i r $r18
r18            0x75 11

Hex 0x75 is ‘u’ which makes sense if it’s printing the string Input. Whats happening here in the code is the register r18 holds the next byte to store into Z using the st instruction. Looking at the Atmel instruction manual, we can see exactly what this operation is doing. It indirectly stores from the register to the data space using index X or Z. If you look how Ghidra decompiles this logic, it will look like this:

undefined2 USART3_TX(byte param_1)
{
  ...
  R18 = *(undefined *)(Y + 1);
  write_volatile_1(UDR0,R18);
}

Y is the array index that holds our string table, and the st instruction is seen as a volatile write to the DR0 (Data/Debug Register). Now we can make note that in GDB offset 0xaa (0x66 in Ghidra) is part of the USART-RX, and offset 0xbe (0x70 in Ghidra) is responsible for the USART-TX. Side Note: Some register pairs can be used for 16-bit operations. These can only be used with the register pairs R26:R27 (X), R28:R29 (Y), and R30:R31 (Z). I’m clarifying this because there was some confusion around debugging information in memory; so if we wanted to dump contents of Y using GDB, the command would be crafted like this -> x/10x $R28.

Now that the TX/RX interrupts are located, we need to be able to find where our input is actually processed. After stepping through a bunch of the code, it looks like we may have found in Ghidra where our input gets processed:

code:000171 8a 30           cpi        r24,0xa
code:000172 99 f0           brbs       LAB_code_000186,Zflg
code:000173 8b 81           ldd        r24,Y+0x3
code:000174 8d 30           cpi        r24,0xd

CPI is responsible for comparing an immediate value. What sparked my interest when looking at this was the value 0xd and 0xa which is \r and \n in ascii. My theory is, if we set a breakpoint at the start of this function which is offset 0x2ac (0x167 in Ghidra), the picocom terminal will prompt us for an input and allow us to enter a string. After hitting enter, we should hit the breakpoint.

/* GDB Window */
(gdb) break *($pc + 0x000002ac)
Breakpoint 1 at 0x2ac
(gdb) c
Continuing.
Note: automatically using hardware breakpoints for read-only addresses.Breakpoint 1, 0x000002ac in ?? ()
(gdb)/* Pico Terminal */
Terminal ready
Input:

It looks like the breakpoint was hit without giving us the opportunity to input a string. Let’s continue browsing the stack:

process_input                                   
     code:000167 1f 92           push       R1
     code:000168 cd b7           in         Ylo,SPL
     code:000169 de b7           in         Yhi,SPH
     code:00016a 1a 82           std        Y+0x2,R1
     code:00016b 19 82           std        Y+0x1,R1
     code:00016c 15 c0           rjmp       LAB_code_000182
                             LAB_code_00016d                                  
     code:00016d 0e 94 54 01     call       FUN_code_000154

From where we set our breakpoint, it looks like there is a function call at offset 0x154. Decompiling this function, we get the output:

uint FUN_code_000154(void)
{
  byte bVar1;
  
  do {
    W._0_1_ = uart_read_char();
  } while ((char)W == '\0');
  bVar1 = read_volatile_1(UDR0);
  W = (uint)bVar1;
  Z = 0xc6;
  return W;
}

Subroutine looks simple enough to understand… it sets up the W register to interpret bits from the USART status and control register, if successful, it then takes a byte at a time and reads from the USART data register. Until a null terminating character is found, W will hold the entire value. Knowing this, we have to set a breakpoint after this function call, which will be at offset 0x2bc (0x15b in Ghidra). Let’s try this again, set a breakpoint at the new offset, and see if it works this time.

/* GDB Window */
(gdb) break *($pc + 0x000002bc)
Breakpoint 1 at 0x2bc
(gdb) c
Continuing.
Note: automatically using hardware breakpoints for read-only addresses./* Pico Terminal */
Terminal ready
Input:helloworld/* Breakpoint hit in GDB Window after hitting enter */
Breakpoint 1, 0x000002bc in ?? ()
(gdb)

Perfect! From what we saw before $r24 holds each byte and compares it to an \n or an \r. To confirm we can see if the register holds our first byte ‘h’, (gdb) i r $r24 == 0x68 <=> 104; and it does! At this offset, we will be hitting this breakpoint X amount of times, X being equal to the size of our string + 1. Once the code processes the null terminating string, the function exits. Our input string will be held at this address below.

(gdb) x/1s 0x80013e
0x80013e: "helloworld"

Time to step out of this function and see where this routine is being cross-referenced. Once we reach the return instruction, the $pc will be redirected to a module at:

undefined FUN_code_000192()
             undefined         Wlo:1          <RETURN>
             undefined1[256]   Stack[-0x100]   buffer
                             FUN_code_000192
     code:000192 cf 93           push       Ylo
     code:000193 df 93           push       Yhi
     code:000194 1f 92           push       R1
     code:000195 cd b7           in         Ylo,SPL
     code:000196 de b7           in         Yhi,SPH
     code:000197 19 82           std        Y+0x1,R1

This could be the winning ticket for what happens next with our string. Just to keep track, we are at offset 0x302 (0x192 in Ghidra). I started to step through this function one instruction at a time, and it looks like it wasn’t actually comparing anything of interest once this function returned. Something interesting started to show once we got a bit further down the code base:

code:0001c7 82 0f           add        Wlo,R18
code:0001c8 93 1f           adc        Whi,R19
code:0001c9 83 3d           cpi        Wlo,0xd3
code:0001ca 91 05           cpc        Whi,R1
code:0001cb 51 f4           brbc       LAB_code_0001d6,Zflg

Looks like two registers are being added then compared to a value of 0xd3. Let’s set a breakpoint here in gdb at offset 0x36c, and see what values of our input are possibly being added together.

/* GDB Window */
(gdb) break *($pc + 0x0000036c)
Breakpoint 1 at 0x36c
(gdb) c
Continuing.
Note: automatically using hardware breakpoints for read-only addresses./* Pico Terminal */
Terminal ready
Input:helloworld923/* Breakpoint hit in GDB Window after hitting enter */
Breakpoint 1, 0x0000036c in ?? ()
(gdb) x/3i $pc
=> 0x36c: add r24, r18
   0x36e: adc r25, r19
   0x370: cpi r24, 0xD3 ; 211
(gdb) i r $r24
r24            0x6c 108
(gdb) i r $r18
r18            0x72 114

We can see index 7 (‘r’) and index 8 (‘l’) added; this will give us the hexadecimal value of 0xde. Now that we know this routine begins at 0x1bb in Ghidra, we can rename this function compare_index_7_8(). As we move further down the code base, we start to see a lot of similar code. Side note: start labeling all comparative functions to keep track of what indices your input string are being are being compared to.

/* Comparing Index 9 and 10 */
code:00021f 82 0f           add        Wlo,R18
code:000220 93 1f           adc        Whi,R19
code:000221 8f 38           cpi        Wlo,0x8f/* Comparing Index 11 and 12 */
code:0002fa 82 0f           add        Wlo,R18
code:0002fb 93 1f           adc        Whi,R19
code:0002fc 80 3a           cpi        Wlo,0xa0/* Comparing Index 6 and 7 */
code:0002b5 52 9f           mul        R21,R18
code:0002b6 90 0d           add        Whi,R0
code:0002b7 11 24           eor        R1,R1
code:0002b8 8c 30           cpi        Wlo,0xc
code:0002b9 9b 42           sbci       Whi,0x2b
code:0002ba 51 f4           brbc       LAB_code_0002c5,Zflg

Over at address 0x2b5, there is an operation that looks a bit different from the other comparisons. Index 6 and 7 of our input string gets their ordinals multiplied by each other, then gets compared to a lower WORD byte value of the result to 0xc and higher byte value of that WORD to 0x2b. In our example, we can take the two indices of ‘o’ and ‘r’ from helloworld923 and multiply the two integers to obtain the value 0x316e; the lower byte of this WORD (LOWORD) in this equation is 0x6e and the higher byte (HIWORD) value is 0x31. This is how the comparison will look like in pseudo-code:

if (ord(‘o’) * ord(‘r’)) >> 8 == 0x2b && 
   (ord(‘o’) * ord(‘r’)) & 0xFF == 0x0c ⇒

The sbci instruction subtracts with the carry immediate command, and our final value that index 6 and 7 will need to equal is 0x2b0c. Static analysis will really take care of the rest from here on out because we know what specific mnemonics to keep an eye out for. We can build up our constraints table and use the Z3 theorem to solve the rest. The size of our string has to be 13 characters as seen in this snippet of decompiled code:

if (var_37 == 0xd) {
  W = (char *)(CONCAT11(DAT_mem_013f,DAT_mem_013e) | 1);
  input_string = (char)W;
}

Inside your Ghidra Functions tab, you should have labeled all 13 subroutines responsible for comparing each byte of your input.

Z3 Theorem Solution

To put it simply, Z3 Prover was created for Satisfiability Modulo Theories (SMT) problems. These are decision problems for logical formulas with combinations that include arithmetics, bit-vectors, arrays, and unresolved functions. Z3 is an effective SMT solver with specialized algorithms to solve background theories as mentioned above. You can download the python library here.

At this point we can formulate the conditionals on how this firmware code checks for the right password and what constraints we can enter into our bit-vector that can be used as shorthands for sub-terms.

if string[7] + string[8] == 0xd3 ⇒
  if string[8] * string[9] == 0x15c0 ⇒
     if string[0] * string[1] == 0x13b7 ⇒
        if string[2] * string[3] == 0x1782 ⇒
           if string[3] + string[4] == 0x92 ⇒
              if string[6] * string[7] == 0x2b0c ⇒
                 if string[5] + string[6] == 0xa5 ⇒
                    if string[9] + string[10] == 0x8f ⇒
                       if string[1] + string[2] == 0xa7 ⇒
                          if string[10] * string[11] == 0x2873 ⇒         
                             if string[12] * 13 == 0x297 ⇒
                                 ...
                                 -> decrypt_flag()

Side Note: The flag string gets loaded at Ghidra’s offset 0x55. It will use the instruction lpm to load a data byte from the FLASH program memory into the register file. The Z-register in the register file is then used to access the program memory and place the data in register R0. A final check function will be called at offset 0x3a7 to confirm how many times the variable isCompared was set to be true | 1. If equal to (1<<14)–1, then the code will proceed to print the flag.

LAB_code_000051:                              
code:000051    lpm        R0,Z=>s__FLAG:_code_0006e4+ = "\r\nFLAG:"
...
code:0003af 8f 3f           cpi        Wlo,0xff
code:0003b0 9f 43           lsl        Whi,0x0e
code:0003b1 99 f4           brbc       LAB_0003c5,Zflg ; not equal
code:0003b2 80 e0           ldi        Wlo,0x0
code:0003b3 91 e0           ldi        Whi,0x1
code:0003b4 0e 94 aa 00     call       usart_print
...
LAB_0003c5:  // if landed here, you'll get the "wrong password" msg                         
code:0003c5 8b e0           ldi        Wlo,0xb
code:0003c6 91 e0           ldi        Whi,0x1
code:0003c7 0e 94 aa 00     call       usart_print

After all values are added to your constraints, you can use Z3’s check() function to examine whether the assertions in the given solver plus the optional assumptions are consistent or not. If the assertion returns true, the next step would be to return a model() for the last check(). This function raises an exception if a model is not available (e.g., last check() returned unsat; your assignment doesn’t satisfy the quantified axiom). A simple test to see how the library works would be to use a three unknown linear equation from our challenge like x+y = 0xd3 && y*z = 0x15c0:

>>> from z3 import *
>>> x,y,z = Int('x'), Int('y'), Int('z')# Test with no solutions
>>> s = SimpleSolver()
>>> s.add(2**x == 4)
>>> s.check()
unknown
>>> s.add(2**x == 16)
>>> s.check()
unsat# Reset general purpose solver with limited amount of preprocessing
>>> s = SimpleSolver()
>>> s.add(x + y == 0xd3)
>>> s.add(y * z == 0x15c0)
>>> s.check()
sat
>>> s.model()
[z = 348, y = 16, x = 195]

We can apply the same logic to our unknowns mentioned above. But as you can see, if Z3 is unable to satisfy the requirements needed to model the solution, it will return an unknown and everything added after will be considered unsatisfiable.

After running our solver, we finally get the password that is needed to decrypt the flag:

ryancor@ryancor-VirtualBox:~$ python solve.py 
Found Password: g1v3_1t_t0_m3

The next step would be to program a module that takes this password, automatically sends it to the ATMega’s USART serial port, and retrieves the flag. The source code for this script can be found here. In order to do this, make sure to have your board plugged into the USB port and connected to your linux VM USB ports. To ensure your VM recognizes this device, check your kernel message logs right after its plugged in and you should see that the usb-core registered a new interface driver called ch341:

ryancor@ryancor-VirtualBox:~$ dmesg
[  231.342712] VBOXGUEST_IOCTL_HGCM_CALL: 64 Failed. rc=-54 
[ 1054.110148] usb 2-2: new full-speed USB device number 3 using 
[ 1054.447882] usb 2-2: New USB device found, idVendor=1a86, 
[ 1054.447885] usb 2-2: New USB device strings: Mfr=0, Product=2, 
[ 1054.447886] usb 2-2: Product: USB2.0-Serial
[ 1054.477153] usbcore: registered new interface driver
[ 1054.477430] usbserial: USB Serial support registered for generic
[ 1054.479961] usbcore: registered new interface driver ch341
[ 1054.480080] usbserial: USB Serial support registered for ch341-
[ 1054.480141] ch341 2-2:1.0: ch341-uart converter detected
[ 1054.502669] usb 2-2: ch341-uart converter now attached to ttyUSB0

In my case, the port I’m going to connect to in my python script will be located at /dev/ttyUSB0. After running our script, we get a successful decrypted flag from the microcontroller.

ryancor@ryancor-VirtualBox:~$ python solve.py 
[+] Found Password: g1v3_1t_t0_m3
[!] Sending input to driver...FLAG:D0_you_3ven_ROP?

Thank you for following along! I hope you enjoyed it as much as I did. If you have any questions on this article or where to find the challenge, please DM me at my Instagram: @hackersclub or Twitter: @ringoware

Happy Hunting :)

P.S. If you missed the link to my Z3 solver program above, you can find the source code here.