Writing a crappy DBI harness for ASISCTF 2019: Silkroad
I spent a disproportionately large amount of time solving the first portion of this binary, because I was having fun learning things via it. I did not solve anything past the first function, but hoo boy, there’s a great pleasure in diving really deeply into a small problem and learning a lot.
We are given a pwn binary that asks for a magic value and runs through six
obfuscated constraints. It reads from stdin, stores the ASCII, strtol
s
(str to long int) it, and strlen
s it.
I ultimately made a Python script that finds the value efficiently, and a C harness that instruments a function from the binary and iterates through all possible inputs.
Interestingly, I tried a case that the designer may not have considered:
leading 0s on the input change the strlen
result in such a way that it’s
possible to cause the remote binary to run into a divide by 0 exception.
Here’s my code in a GitHub gist for easy viewing. All of the files I created / used are here.
Process
After a little reversing (Ghidra + Binja), I noticed there’s 6 constraints (roughly six basic blocks. I modeled these in Python, first sampling randomly, then iterating from 0 to 2^32-1. I split this into four workers, and it takes around 80m.
I poked around at Z3 and angr to see if I could model the constraints with them, but my hello-world-y attempts at both weren’t fruitful. Z3 ran forever, and angr didn’t return anything. Also, parens in lisp suck. See those attemps in the gist.
Spending some more time with the Python, I rewrote the loop to cut down the runtime by a factor of about 100x (4 workers on 4 cores in 80m to about 2m). I simplified some of the constraints into the loop, avoiding a lot of iterations. I had bugs, so the script didn’t return any valid values.
At this point I was really tired and really confused by indexing, and I had the idea of trying to instrument just the one C function. I used Binja to patch it — looping on a failed check, printing the string on all passed.
I used r2 to extract all of the contiguous assembly and write it to a file,
bin
. I mmap
d the file into memory, set that page as executable, and
wrote some inline ASM to jmp
to it.
This revealed a problem — none of the library functions would work, since the offsets were incorrect, and this mmapped blob didn’t have any compile-time information. However, the calls all tried passing control flow to somewhere slightly above the code blob. So, I padded the blob with nops, and then wrote jmps of the form
mov rax, 0x40xxxx
jmp rax
as a replacement PLT. I had to manually patch these every time the offsets
changed in the harness binary (iterate
). I did this a lot. It wasn’t fun.
There’s probably a good way to do this with a Python library to read the
ELF and ask for the address of those functions, then construct & assemble
the jmp
instruction and write it to an offset in bin
.
I had a really weird issues where I’d jmp
to the loaded code, hit ni
,
and gdb would run until a segfault. I asked a few friends and they helped
me realize that I was running into some difference between ni
and si
—
doing ni
would continue out of the function until segfaulting later,
unless I stepped in a few instructions first. I’m not sure what the reason
is, but my guess is that GDB has some heuristic for determining if you’re
in a function or not, and that landing on the first instruction after a
call in my weird mmappy situation didn’t set up those heuristics correctly.
If you have any idea why, @ me on twitter or something.
After a lot of tedious debugging, I ended up with a statically compiled
binary iterate
that pretty quickly (<1m) finds the value. I used this to
find the bugs in my Python code, which were due to not really understanding
how the binary chopped up the int representation of the input. I rewrote
the Python code to make it work.
I had fun! I learned a lot. Like, I’m going to reach for DynamoRIO / PINTools / QEMU process emulation next time.