Locating main() in a stripped & dynamically linked binary
Learned:
- stripped binaries don’t point to main
- binja can find it anyway
- the dynamic linker runs before any of your program code
- trying
break _start
in gdb will probably hit the linker’s instead
I was working on TokyoWesterns 2018 load
and this came up
(stackoverflow). Sebastian
and I both tried identifying main this way and scratched our heads at why
this approach wasn’t working, so I’m taking a second stab at it now. Note
that the binary is both 64b and dynamically linked.
I don’t actually know what strip()
does so let’s man strip
:
GNU strip discards all symbols from object files objfile. The list of
object files may include archives. At least one object file must be
given.
OK, so it removes the symbol that tells us main is main.
So, I can’t b main
in gdb.
(gdb doesn’t have command separation but you can add
it though it won’t work for
this case because the b * 0
thing is a hack and it throws a python
exception)
I can b * 0 ; r ; print $rip
to get the address of _start
, but that’s not want I want.
_start
calls __libc_start_main()
, which has the following prototype:
int __libc_start_main(int (*main) (int, char**, char**),
int argc,
char *__unbounded *__unbounded ubp_av,
void (*init) (void),
void (*fini) (void),
void (*rtld_fini) (void),
void (*__unbounded stack_end));
So we can look at the args of __libc_start_main
to get the address of main.
I need to:
- break on the entrypoint
- find
__libc_start_main
call - set breakpoint on: last addr saved on stack before
__libc_start_main
call - continue (until breakpoint for main is hit)
Which is
b * 0
r
b * $rip
d 1
r
Except, there’s a bunch of dlopen
stuff that I don’t expect, since it doesn’t
come up in Binja. objdump -d
’s output matches up with Binja, so there’s
something up here involving dynamic loading.
Why doesn’t code just start executing at the beginning of .text
? It’s clearly
the entry point of the binary. I’m guessing the linker must be doing some
shenanigans.
I notice that rip points to 0x7ffff7dd7c30
, where static disassembly points
to 0x400720
as the address of _start
. I can still x/16i 0x400720
and see
what binja/objdump call _start/the start of .text, but we’re clearly not
there.
Stepped through dl_main
forever, kinda fruitlessly. ldd load
gives
linux-vdso.so.1 => (0x00007ffe1a792000)
libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007fa016032000)
/lib64/ld-linux-x86-64.so.2 (0x00007fa0163fc000)
We can look at the disassembly objdump -d /lib64/ld-linux-x86-64.so.2 | less
and it’s long.
Breaking on _start
takes me to the first instruction of the dynamic loader,
which is weird to me. I would’ve expected the first instruction of .text
section. But my guess is that the dynamic loader is what’s called first, and it
loads everything else. That or gdb just uses it b/c the binary is stripped. So,
the solution is to pull out the entry point manually and then break on that.
And with that, we can see that rdi
contains 0x400816
, which is main. (had a
brief moment of confusion where I didn’t realize that I was x64 and x64 is
register-based arguments). So, that confusion is likely what got us during
the CTF as well.
Binja does _start
identification automatically anyway, which is nice.