16.30 用特定动态链接器和LIBC执行ELF https://scz.617.cn/unix/202104091427.txt Q: 在某x64环境中有个32位ELF,假设叫some。现将some及其依赖库(包含动态链接器)复 制到另一个环境中,第一想让some跑起来,第二想用gdb调试some及其依赖库。出现 过真实需求,不是伪需求。 在原环境中用"ldd some"确认some及其依赖库(包含动态链接器)如下: some ld-2.12.2.so libc-2.12.2.so libcrypto.so.1.0.2 libdl-2.12.2.so A: scz 2021-04-09 14:27 这个问题比较复杂,但确实有解。为了增加演示难度,将some迁移到一个不同发行版 x64环境中。若原环境与新环境的内核、GLIBC相差巨大,估计要歇菜,不考虑这种情 形。后续演示操作均在新环境中进行。 简单展示新环境: $ uname -a Linux ... 3.10.0-862.14.4.el7.x86_64 #1 SMP ... x86_64 GNU/Linux $ ldd $(which id) linux-vdso.so.1 => (0x00007ffff7ffa000) libselinux.so.1 => /lib64/libselinux.so.1 (0x00007ffff7bb4000) libc.so.6 => /lib64/libc.so.6 (0x00007ffff77e7000) libpcre.so.1 => /lib64/libpcre.so.1 (0x00007ffff7585000) libdl.so.2 => /lib64/libdl.so.2 (0x00007ffff7381000) /lib64/ld-linux-x86-64.so.2 (0x00007ffff7ddb000) libpthread.so.0 => /lib64/libpthread.so.0 (0x00007ffff7165000) 检查some及其依赖库: $ chmod +x * $ file -b some ELF 32-bit LSB executable, Intel 80386, version 1 (SYSV), dynamically linked (uses shared libs), for GNU/Linux 2.4.9, not stripped 尽管一个内核是2.4.9,另一个内核是3.10.0,这种不算相差巨大。 $ ldd some not a dynamic executable ldd无法用于some,换种方式: $ LD_TRACE_LOADED_OBJECTS=1 LD_WARN=yes LD_BIND_NOW=yes ./some -bash: ./some: /lib/ld-linux.so.2: bad ELF interpreter: No such file or directory 提示很明确,some所用动态链接器不存在。参看: 《动态链接器符号链接被破坏后的灾难恢复》 https://scz.617.cn/unix/201809191202.txt 《查看/修改ELF的动态链接器》 https://scz.617.cn/unix/201907041603.txt 可以用patchelf查看、修改some所用动态链接器: $ patchelf --print-interpreter some /lib/ld-linux.so.2 $ cp some some-new $ patchelf --set-interpreter "./ld-2.12.2.so" some-new $ patchelf --print-interpreter some-new ./ld-2.12.2.so 为什么改成ld-2.12.2.so?因为在原环境中some最终所用动态链接器就是它。 $ LD_TRACE_LOADED_OBJECTS=1 LD_WARN=yes LD_BIND_NOW=yes ./some-new linux-gate.so.1 => (0xf7fdf000) libcrypto.so.1.0.2 => not found libc.so.6 => not found Segmentation fault 已经有进展,可以看到依赖库,该LD_LIBRARY_PATH上场了。 $ LD_LIBRARY_PATH=. LD_TRACE_LOADED_OBJECTS=1 LD_WARN=yes LD_BIND_NOW=yes ./some-new linux-gate.so.1 => (0xf7fdf000) libcrypto.so.1.0.2 => ./libcrypto.so.1.0.2 (0xf7dc8000) libc.so.6 => not found libdl.so.2 => not found libc.so.6 => not found Segmentation fault 只有libcrypto用了当前目录下的版本,libc、libdl仍然未找到,为什么?该 LD_DEBUG上场了。 $ LD_DEBUG=libs LD_LIBRARY_PATH=. LD_TRACE_LOADED_OBJECTS=1 LD_WARN=yes LD_BIND_NOW=yes ./some-new 123059: find library=libcrypto.so.1.0.2 [0]; searching 123059: search path=./tls/i686/sse2:./tls/i686:./tls/sse2:./tls:./i686/sse2:./i686:./sse2:. (LD_LIBRARY_PATH) 123059: trying file=./tls/i686/sse2/libcrypto.so.1.0.2 ... 123059: trying file=./libcrypto.so.1.0.2 123059: 123059: find library=libc.so.6 [0]; searching 123059: search path=./tls/i686/sse2:./tls/i686:./tls/sse2:./tls:./i686/sse2:./i686:./sse2:. (LD_LIBRARY_PATH) 123059: trying file=./tls/i686/sse2/libc.so.6 ... 123059: trying file=./libc.so.6 ... 123059: find library=libdl.so.2 [0]; searching ... 123059: trying file=./libdl.so.2 ... linux-gate.so.1 => (0xf7fdf000) libcrypto.so.1.0.2 => ./libcrypto.so.1.0.2 (0xf7dc8000) libc.so.6 => not found libdl.so.2 => not found libc.so.6 => not found Segmentation fault some-new在找 ./libc.so.6 ./libdl.so.2 当前目录下只有 libc-2.12.2.so libdl-2.12.2.so 所以找不到。用符号链接解决该问题: $ ln -s libc-2.12.2.so libc.so.6 $ ln -s libdl-2.12.2.so libdl.so.2 再次奇技淫巧ldd: $ LD_LIBRARY_PATH=. LD_TRACE_LOADED_OBJECTS=1 LD_WARN=yes LD_BIND_NOW=yes ./some-new linux-gate.so.1 => (0xf7fdf000) libcrypto.so.1.0.2 => ./libcrypto.so.1.0.2 (0xf7dc8000) libc.so.6 => ./libc.so.6 (0xf7c66000) libdl.so.2 => ./libdl.so.2 (0xf7c62000) ./ld-2.12.2.so (0xf7fe0000) 假设新环境是x86而不是x64,且没有前述两个符号链接,很可能看到: $ LD_LIBRARY_PATH=. LD_TRACE_LOADED_OBJECTS=1 LD_WARN=yes LD_BIND_NOW=yes ./some-new linux-gate.so.1 => (0xb7f79000) libcrypto.so.1.0.2 => ./libcrypto.so.1.0.2 (0xb7d5e000) libc.so.6 => /lib/i386-linux-gnu/libc.so.6 (0xb7b5a000) libdl.so.2 => /lib/i386-linux-gnu/libdl.so.2 (0xb7b54000) ./ld-2.12.2.so (0xb7f7b000) Segmentation fault 此时some直接用新环境中的libc、libdl,原始需求是some只用来自原环境的东西, 若不注意就掉坑里了,可用LD_DEBUG了解发生了什么。无论新环境是x86还是x64,都 强烈不建议直接用ldd解决此类问题,应该用前面展示的技巧。 接下来测试gdb调试some-new: $ gdb -q -nx ./some-new (gdb) info files ... Entry point: 0x804ea38 ... (gdb) x/15i 0x804ea38 0x804ea38 <_start>: xor %ebp,%ebp 0x804ea3a <_start+2>: pop %esi 0x804ea3b <_start+3>: mov %esp,%ecx 0x804ea3d <_start+5>: and $0xfffffff0,%esp 0x804ea40 <_start+8>: push %eax 0x804ea41 <_start+9>: push %esp 0x804ea42 <_start+10>: push %edx 0x804ea43 <_start+11>: push $0x80631c0 0x804ea48 <_start+16>: push $0x8063160 0x804ea4d <_start+21>: push %ecx 0x804ea4e <_start+22>: push %esi 0x804ea4f <_start+23>: push $0x804d200 0x804ea54 <_start+28>: call 0x804cef0 <__libc_start_main@plt> 0x804ea59 <_start+33>: hlt 0x804ea5a <_start+34>: nop 最初只用LD_LIBRARY_PATH和符号链接解决普通.so定位,未修改some所用动态链接器, 调试some时在__libc_start_main()中触发SIGSEGV,这是动态链接器与LIBC版本不匹 配所致。 (gdb) b *0x804d200 Breakpoint 1 at 0x804d200 在main()上设断 set environment LD_LIBRARY_PATH=. set startup-with-shell off 上面2步是必须的。严格来说第2步要视环境而定,不那么必须,比如在某些x86环境 下必须,在某些x64环境下不必须,但我懒得解释细节。 display/5i $pc set backtrace past-main on set backtrace past-entry on set pagination off set disassembly-flavor intel 上面几步不是必须的 (gdb) r Breakpoint 1, 0x0804d200 in main () 1: x/5i $pc => 0x804d200
: push ebp 0x804d201 : mov ebp,esp 0x804d203 : push edi 0x804d204 : push esi 0x804d205 : push ebx (gdb) si 0x0804d201 in main () 1: x/5i $pc => 0x804d201 : mov ebp,esp 0x804d203 : push edi 0x804d204 : push esi 0x804d205 : push ebx 0x804d206 : and esp,0xfffffff0 (gdb) bt #0 0x0804d201 in main () #1 0xf7c7eb67 in __libc_start_main () from ./libc.so.6 #2 0x0804ea59 in _start () (gdb) info proc mappings ... Start Addr End Addr Size Offset objfile 0x8048000 0x8066000 0x1e000 0x0 /tmp/scz/some-new ... 0xf7c62000 0xf7c64000 0x2000 0x0 /tmp/scz/libdl-2.12.2.so ... 0xf7c66000 0xf7dc1000 0x15b000 0x0 /tmp/scz/libc-2.12.2.so ... 0xf7dc8000 0xf7fc0000 0x1f8000 0x0 /tmp/scz/libcrypto.so.1.0.2 ... 0xf7fe0000 0xf7ffc000 0x1c000 0x0 /tmp/scz/ld-2.12.2.so ... A: scz & bluerust 2021-04-09 $ cp some some-new-2 $ patchelf --set-interpreter "./ld-2.12.2.so" some-new-2 $ patchelf --set-rpath "." some-new-2 $ patchelf --print-rpath some-new-2 . 这样理论上或可省去LD_LIBRARY_PATH,但有一些微妙的坑。 "patchelf --set-rpath"缺省设置的并不是DT_RPATH,而是DT_RUNPATH,现在ELF不 推荐使用DT_RPATH。 $ readelf -d some-new-2 | grep RPATH (无输出) $ readelf -d some-new-2 | grep RUNPATH 0x0000001d (RUNPATH) Library runpath: [.] 设置DT_RUNPATH之后,some-new-2使用当前目录下的libcrypto、libc,但未使用当 前目录下的libdl。 $ LD_TRACE_LOADED_OBJECTS=1 LD_WARN=yes LD_BIND_NOW=yes ./some-new-2 linux-gate.so.1 => (0xf7fdf000) libcrypto.so.1.0.2 => ./libcrypto.so.1.0.2 (0xf7dc8000) libc.so.6 => ./libc.so.6 (0xf7c66000) ./ld-2.12.2.so (0xf7fe0000) libdl.so.2 => not found undefined symbol: dlclose, version GLIBC_2.0 (./libcrypto.so.1.0.2) undefined symbol: dlerror, version GLIBC_2.0 (./libcrypto.so.1.0.2) undefined symbol: dlsym, version GLIBC_2.0 (./libcrypto.so.1.0.2) undefined symbol: dladdr, version GLIBC_2.0 (./libcrypto.so.1.0.2) undefined symbol: dlopen, version GLIBC_2.1 (./libcrypto.so.1.0.2) libdl没能找到,用LD_DEBUG看看发生了什么: $ LD_DEBUG=libs LD_TRACE_LOADED_OBJECTS=1 LD_WARN=yes LD_BIND_NOW=yes ./some-new-2 54009: find library=libcrypto.so.1.0.2 [0]; searching 54009: search path=./tls/i686/sse2:./tls/i686:./tls/sse2:./tls:./i686/sse2:./i686:./sse2:. (RUNPATH from file ./some-new-2) ... 54009: trying file=./libcrypto.so.1.0.2 54009: 54009: find library=libc.so.6 [0]; searching 54009: search path=./tls/i686/sse2:./tls/i686:./tls/sse2:./tls:./i686/sse2:./i686:./sse2:. (RUNPATH from file ./some-new-2) ... 54009: trying file=./libc.so.6 54009: 54009: find library=libdl.so.2 [0]; searching 54009: search cache=/etc/ld.so.cache 54009: search path=/lib/tls/i686/sse2:/lib/tls/i686:/lib/tls/sse2:/lib/tls:/lib/i686/sse2:/lib/i686:/lib/sse2:/lib:/usr/lib/tls/i686/sse2:/usr/lib/tls/i686:/usr/lib/tls/sse2:/usr/lib/tls:/usr/lib/i686/sse2:/usr/lib/i686:/usr/lib/sse2:/usr/lib (system search path) ... 54009: linux-gate.so.1 => (0xf7fdf000) libcrypto.so.1.0.2 => ./libcrypto.so.1.0.2 (0xf7dc8000) libc.so.6 => ./libc.so.6 (0xf7c66000) ./ld-2.12.2.so (0xf7fe0000) libdl.so.2 => not found ... 找libcrypto、libc时都用了RUNPATH,找libdl时未用RUNPATH,在x86、x64上测试均 如此。 $ cp some some-new-3 $ patchelf --set-interpreter "./ld-2.12.2.so" some-new-3 $ patchelf --force-rpath --set-rpath "." some-new-3 上述命令设置并不推荐的DT_RPATH,而非缺省的DT_RUNPATH。 $ readelf -d some-new-3 | grep RPATH 0x0000000f (RPATH) Library rpath: [.] 设置DT_RPATH之后,some-new-3使用当前目录下的libcrypto、libc、libdl,不再依 赖LD_LIBRARY_PATH。 $ LD_TRACE_LOADED_OBJECTS=1 LD_WARN=yes LD_BIND_NOW=yes ./some-new-3 linux-gate.so.1 => (0xb7f4f000) libcrypto.so.1.0.2 => ./libcrypto.so.1.0.2 (0xb7d34000) libc.so.6 => ./libc.so.6 (0xb7bd2000) libdl.so.2 => ./libdl.so.2 (0xb7bce000) ./ld-2.12.2.so (0xb7f51000) 找libdl时认RPATH、LD_LIBRARY_PATH,不认RUNPATH,libdl有什么说法? 搜索优先级上RPATH、LD_LIBRARY_PATH、RUNPATH依次递减,但官方不推荐用RPATH。 小结一下,若要绿色化某ELF,可行方案之一: patchelf --set-interpreter "./ld-*.so" some patchelf --force-rpath --set-rpath "." some 将some及其依赖库(包含动态链接器)置于同一目录下,再用如下命令检查之: LD_TRACE_LOADED_OBJECTS=1 LD_WARN=yes LD_BIND_NOW=yes ./some D: scz & houyunsong 2021-04-12 $ readelf -d some | grep NEEDED 0x00000001 (NEEDED) Shared library: [libcrypto.so.1.0.2] 0x00000001 (NEEDED) Shared library: [libc.so.6] $ readelf -d libcrypto.so.1.0.2 | grep NEEDED 0x00000001 (NEEDED) Shared library: [libdl.so.2] 0x00000001 (NEEDED) Shared library: [libc.so.6] $ readelf -d libdl | grep NEEDED libdl-2.12.2.so libdl.so.2 $ readelf -d libdl-2.12.2.so | grep NEEDED 0x00000001 (NEEDED) Shared library: [libc.so.6] 0x00000001 (NEEDED) Shared library: [ld-linux.so.2] $ readelf -d libc-2.12.2.so | grep NEEDED 0x00000001 (NEEDED) Shared library: [ld-linux.so.2] some的直接依赖库只有libcrypto、libc,libdl是libcrypto的依赖库之一。 $ patchelf --print-needed some libcrypto.so.1.0.2 libc.so.6 "patchelf --print-needed"只显示目标ELF的直接依赖库,ldd及其变种会递归显示 目标ELF的所有依赖库。 合理猜测,递归时非直接依赖库不认RUNPATH,但认RPATH、LD_LIBRARY_PATH,官方 不推荐RPATH可能与此相关。libdl是some的非直接依赖库。 D: scz 2021-04-12 14:55 以为这样就行了,结果云海说有新麻烦。在原环境中some会以daemon形式运行,但在 新环境中执行Patch过的some-new-3,发现其自动结束,"ps auwx | grep some"找不 到进程,需要排查。 strace -v -i -f -ff -o some.log ./some-new-3 strace一般会产生大量输出,应该启用文件输出,并将父子进程的输出分隔开。在父 进程的strace输出中注意到文件: /var/log/some.log /var/run/some.ctl /var/run/some.pid 在some.log尾部看到 Switching to daemon user A FATAL Error has occured: missing 'daemon' id, exiting 用IDA反汇编some-new-3,通过"missing 'daemon' id, exiting"交叉引用发现因为 getpwnam("daemon")失败,导致进程主动结束。 云海找到一个参数使some-new-3不试图进入daemon状态,暂时规避了该问题,但他希 望我能找出getpwnam("daemon")失败的原因并解决之。 为什么getpwnam("daemon")失败?这个函数只是在找名为daemon的用户,如果没有 daemon用户,确实会失败,但新环境/etc/passwd里有daemon用户。曾经怀疑原环境、 新环境passwd文件格式不同,但passwd文件格式多少年前已定型,这种可能性极低。 getpwnam()就是读取passwd填充结构,能有多复杂以致失败? gdb -q -nx -x /tmp/gdbinit_x64.txt -x "/tmp/ShellPipeCommand.py" -x "/tmp/GetOffset.py" -ex 'display/5i $pc' ./some-new-3 此番涉及父子进程,为了调试子进程需要特殊设置 set follow-fork-mode child set follow-exec-mode new catch fork r 命中后 ni 确保在调试子进程。对getpwnam("daemon")的主调位置设断,单步跟踪,进入libc的 代码。先后到达过这些位置: b *__nscd_get_map_ref b *__nss_lookup (gdb) bt #0 0xf7d6e300 in __nscd_get_map_ref () from ./libc.so.6 #1 0xf7d6b5d7 in nscd_getpw_r () from ./libc.so.6 #2 0xf7d6b98d in __nscd_getpwnam_r () from ./libc.so.6 #3 0xf7d050d1 in getpwnam_r@@GLIBC_2.1.2 () from ./libc.so.6 #4 0xf7d04a8f in getpwnam () from ./libc.so.6 #5 0x0804dc0a in main () (gdb) bt #0 0xf7d4bde0 in __nss_lookup () from ./libc.so.6 #1 0xf7d4ce5c in __nss_passwd_lookup2 () from ./libc.so.6 #2 0xf7d05135 in getpwnam_r@@GLIBC_2.1.2 () from ./libc.so.6 #3 0xf7d04a8f in getpwnam () from ./libc.so.6 #4 0x0804dc0a in main () 没想到getpwnam()底层实现如此复杂,下载GLIBC源码,用Source Insight查看。 https://ftp.gnu.org/gnu/libc/glibc-2.12.2.tar.bz2 https://ftp.gnu.org/gnu/libc/glibc-2.1.2.tar.gz 硬是没找到getpwnam()的函数体,将就着看了看相关函数,不得要领。其中 __nscd_get_map_ref()看着像是在找/etc/passwd在内存中的映射,动态调试发现没 找到。 重看getpwnam(3),想到应该检查新环境中/etc/nsswitch.conf,别不是没配置files 项。但在nsswitch.conf中看到的是: passwd: files sss 重看nsswitch.conf(5),注意到: /lib/libnss_files.so.X implements "files" source. 意识到新环境当前目录下没有libnss_files库,getpwnam(3)为了读passwd,必须有 这个库,又一个天坑。用如下命令调试确认: $ LD_DEBUG=libs ./some-new-3 ... 27081: transferring control: ./some-new-3 27081: 27081: find library=libnss_files.so.2 [0]; searching 27081: search path=./tls/i686/sse2:./tls/i686:./tls/sse2:./tls:./i686/sse2:./i686:./sse2:. (RPATH from file ./some-new-3) ... 27081: find library=libnss_dns.so.2 [0]; searching 27081: search path=./tls/i686/sse2:./tls/i686:./tls/sse2:./tls:./i686/sse2:./i686:./sse2:. (RPATH from file ./some-new-3) ... 27081: find library=libnss_myhostname.so.2 [0]; searching ... 因为没找到libnss_files,所以又尝试找libnss_dns、libnss_myhostname。从原环 境中析取libnss_files-2.12.2.so到新环境当前目录,建符号链接: ln -s libnss_files-2.12.2.so libnss_files.so.2 再次执行 ./some-new-3 ps auwx | grep some 已能看到daemon化的some。原始问题已经解决,下面多讨论一些东西。 getpwnam("daemon")失败原因至少有二: a) daemon用户不存在,检查/etc/passwd b) libnss_files库未就位,用LD_DEBUG=libs检查 some-new-3何时加载libnss_files库? gdb -q -nx -x /tmp/gdbinit_x64.txt -x "/tmp/ShellPipeCommand.py" -x "/tmp/GetOffset.py" -ex 'display/5i $pc' ./some-new-3 catch load nss_files (gdb) bt #0 0xf7fef120 in _dl_debug_state () from ./ld-2.12.2.so #1 0xf7ff283c in dl_open_worker () from ./ld-2.12.2.so #2 0xf7fee756 in _dl_catch_error () from ./ld-2.12.2.so #3 0xf7ff2366 in _dl_open () from ./ld-2.12.2.so #4 0xf7d71992 in do_dlopen () from ./libc.so.6 #5 0xf7fee756 in _dl_catch_error () from ./ld-2.12.2.so #6 0xf7d71a86 in dlerror_run () from ./libc.so.6 #7 0xf7d71afb in __libc_dlopen_mode () from ./libc.so.6 #8 0xf7d4bc85 in __nss_lookup_function () from ./libc.so.6 #9 0xf7d4bdff in __nss_lookup () from ./libc.so.6 #10 0xf7d4cc7c in __nss_hosts_lookup2 () from ./libc.so.6 #11 0xf7d51e46 in gethostbyname_r@@GLIBC_2.1.2 () from ./libc.so.6 #12 0xf7d51566 in gethostbyname () from ./libc.so.6 #13 0x0805e9c8 in ... () #14 0x080554e1 in ... () #15 0x0804d382 in main () 父进程就会加载libnss_files库。"catch load"只有加载成功时才会命中,若想拦载 所有加载.so的企图,比如库不存在,但想知道在哪儿试图加载,用"b *do_dlopen"。 删掉符号链接做第二个实验: rm -f libnss_files.so.2 gdb -q -nx -x /tmp/gdbinit_x64.txt -x "/tmp/ShellPipeCommand.py" -x "/tmp/GetOffset.py" -ex 'display/5i $pc' ./some-new-3 b *_start set follow-fork-mode child set follow-exec-mode new catch fork r 命中_start()后增设断点 b *do_dlopen c 命中后查看调用栈回溯 (gdb) bt #0 0xf7d71930 in do_dlopen () from ./libc.so.6 #1 0xf7fee756 in _dl_catch_error () from ./ld-2.12.2.so #2 0xf7d71a86 in dlerror_run () from ./libc.so.6 #3 0xf7d71afb in __libc_dlopen_mode () from ./libc.so.6 #4 0xf7d4bc85 in __nss_lookup_function () from ./libc.so.6 #5 0xf7d4bdff in __nss_lookup () from ./libc.so.6 #6 0xf7d4cc7c in __nss_hosts_lookup2 () from ./libc.so.6 #7 0xf7d51e46 in gethostbyname_r@@GLIBC_2.1.2 () from ./libc.so.6 #8 0xf7d51566 in gethostbyname () from ./libc.so.6 #9 0x0805e9c8 in ... () #10 0x080554e1 in ... () #11 0x0804d382 in main () (gdb) x/s *(*(char***)($esp+4)) 0xffffcfe0: "libnss_files.so.2" 父进程中"b *do_dlopen"还有两次命中,分别对应libnss_dns、libnss_myhostname。 继续调试,直至"catch fork"命中 ni c 子进程中"b *do_dlopen"再次命中,对应libnss_sss。 (gdb) bt #0 0xf7d71930 in do_dlopen () from ./libc.so.6 #1 0xf7fee756 in _dl_catch_error () from ./ld-2.12.2.so #2 0xf7d71a86 in dlerror_run () from ./libc.so.6 #3 0xf7d71afb in __libc_dlopen_mode () from ./libc.so.6 #4 0xf7d4bc85 in __nss_lookup_function () from ./libc.so.6 #5 0xf7d4be34 in __nss_lookup () from ./libc.so.6 #6 0xf7d4ce5c in __nss_passwd_lookup2 () from ./libc.so.6 #7 0xf7d05135 in getpwnam_r@@GLIBC_2.1.2 () from ./libc.so.6 #8 0xf7d04a8f in getpwnam () from ./libc.so.6 #9 0x0804dc0a in main () (gdb) x/s *(*(char***)($esp+4)) 0xffffd140: "libnss_sss.so.2" 好像处理/etc/passwd的是__nss_passwd_lookup2(),未进一步确认。 some-new-3未显式调用dlopen(),gethostbyname()、getpwnam()隐式调用do_dlopen()。 不要用"b *dlopen"。libc中可能没有名为dlopen的符号,"b *dlopen"可能实际断在 其他库的"dlopen@plt"上,不够底层,很可能拦不住你想要的东西。 (gdb) info symbol dlopen dlopen@plt in section .plt of ./libcrypto.so.1.0.2 (gdb) info symbol do_dlopen do_dlopen in section .text of ./libc.so.6 关于这方面的讨论,参看: 《未知网络服务分析之调试技巧》 https://scz.617.cn/unix/201812111322.txt 恢复符号链接做第三个实验: ln -s libnss_files-2.12.2.so libnss_files.so.2 gdb -q -nx -x /tmp/gdbinit_x64.txt -x "/tmp/ShellPipeCommand.py" -x "/tmp/GetOffset.py" -ex 'display/5i $pc' ./some-new-3 set follow-fork-mode child set follow-exec-mode new catch fork r 命中后 ni b *do_dlopen b *__nscd_get_map_ref b *__nss_lookup 因父进程已成功加载libnss_files,子进程的"b *do_dlopen"不会命中,其余两个断 点仍会依次命中。c之后Ctrl-C断不下来,但可以从其他终端"kill -INT"。 父进程的strace日志中能看到加载libnss_files失败,但这是事后诸葛亮,毕竟有很 多失败的系统调用并不真地影响功能,不大可能提前知道哪次失败是致命的。 假设some-new-3自动结束,但没有/var/log/some.log可供排查,此时只能尝试 "b *_exit",待命中后查看调用栈回溯,这是普适方案。 rm -f libnss_files.so.2 gdb -q -nx -x /tmp/gdbinit_x64.txt -x "/tmp/ShellPipeCommand.py" -x "/tmp/GetOffset.py" -ex 'display/5i $pc' ./some-new-3 set follow-fork-mode child set follow-exec-mode new catch fork r 命中后 ni b *_exit c (gdb) bt #0 0xf7d06464 in _exit () from ./libc.so.6 #1 0xf7c95b9a in __run_exit_handlers () from ./libc.so.6 #2 0xf7c95bdf in exit () from ./libc.so.6 #3 0x080609ef in ... () #4 0x0804de88 in main () 收一下,本案例强调,检查ELF的依赖库,不要只用ldd或其变种技巧,要考虑动态加 载尤其是隐式动态加载的情形,"LD_DEBUG=libs"更有效。但是,"LD_DEBUG=libs"看 不到子进程试图动态加载的库,除非export后对子进程也用之,"strace -f -ff"可 以看到子进程试图动态加载的库。 A: John Reiser 2004-06-29 可以不Patch出some-new、some-new-3,直接调试"动态链接器+some"。 $ gdb -q -nx ./ld-2.12.2.so (gdb) disas _dl_start_user Dump of assembler code for function _dl_start_user: 0x000010b7 <+0>: mov %eax,%edi 0x000010b9 <+2>: call 0x10a0 0x000010be <+7>: add $0x1bf36,%ebx 0x000010c4 <+13>: mov -0x188(%ebx),%eax 0x000010ca <+19>: pop %edx 0x000010cb <+20>: lea (%esp,%eax,4),%esp 0x000010ce <+23>: sub %eax,%edx 0x000010d0 <+25>: push %edx 0x000010d1 <+26>: mov 0x2c(%ebx),%eax 0x000010d7 <+32>: lea 0x8(%esp,%edx,4),%esi 0x000010db <+36>: lea 0x4(%esp),%ecx 0x000010df <+40>: mov %esp,%ebp 0x000010e1 <+42>: and $0xfffffff0,%esp 0x000010e4 <+45>: push %eax 0x000010e5 <+46>: push %eax 0x000010e6 <+47>: push %ebp 0x000010e7 <+48>: push %esi 0x000010e8 <+49>: xor %ebp,%ebp 0x000010ea <+51>: call 0xe970 <_dl_init_internal> 0x000010ef <+56>: lea -0xe2d4(%ebx),%edx 0x000010f5 <+62>: mov (%esp),%esp 0x000010f8 <+65>: jmp *%edi 0x000010fa <+67>: lea 0x0(%esi),%esi End of assembler dump. display/5i $pc set backtrace past-main on set backtrace past-entry on set pagination off set disassembly-flavor intel set startup-with-shell off b *_dl_init_internal r --library-path . ./some "--library-path ."相当于"set environment LD_LIBRARY_PATH=." 断在_dl_init_internal()时检查内存映射: (gdb) info proc mappings process 13445 Mapped address spaces: Start Addr End Addr Size Offset objfile 0x8048000 0x8066000 0x1e000 0x0 /tmp/scz/some 0x8066000 0x8067000 0x1000 0x1e000 /tmp/scz/some 0x8067000 0x8068000 0x1000 0x1f000 /tmp/scz/some ... 确认some的入口点(e_entry): (gdb) shell objdump -f some | grep start start address 0x0804ea38 确认main()所在: (gdb) x/15i 0x0804ea38 0x804ea38: xor ebp,ebp 0x804ea3a: pop esi 0x804ea3b: mov ecx,esp 0x804ea3d: and esp,0xfffffff0 0x804ea40: push eax 0x804ea41: push esp 0x804ea42: push edx 0x804ea43: push 0x80631c0 0x804ea48: push 0x8063160 0x804ea4d: push ecx 0x804ea4e: push esi 0x804ea4f: push 0x804d200 0x804ea54: call 0x804cef0 0x804ea59: hlt 0x804ea5a: nop (gdb) b *0x804d200 Breakpoint 2 at 0x804d200 (gdb) c Continuing. Breakpoint 2, 0x0804d200 in ?? () 1: x/5i $pc => 0x804d200: push ebp 0x804d201: mov ebp,esp 0x804d203: push edi 0x804d204: push esi 0x804d205: push ebx 已经在some中,但没有符号,可以手工加载符号。参看: 《2.49 GDB加载调试信息》 确定some的.text、.data基址: (gdb) shell objdump -h some | grep -F .text 11 .text 00015ffc 0804d200 0804d200 00005200 2**4 (gdb) shell objdump -h some | grep -F .data 21 .data 000000ac 08067000 08067000 0001f000 2**5 手工加载符号: (gdb) add-symbol-file some 0x804d200 -s .data 0x8067000 add symbol table from file "some" at .text_addr = 0x804d200 .data_addr = 0x8067000 (gdb) x/5i $pc => 0x804d200
: push ebp 0x804d201 : mov ebp,esp 0x804d203 : push edi 0x804d204 : push esi 0x804d205 : push ebx (gdb) bt #0 0x0804d200 in main () #1 0xf7c7eb67 in __libc_start_main () from ./libc.so.6 #2 0x0804ea59 in _start () A: scz 2021-04-10 14:36 John Reiser的办法很好,可以优化。 $ gdb -q -nx ./ld-2.12.2.so display/5i $pc set backtrace past-main on set backtrace past-entry on set pagination off set disassembly-flavor intel set startup-with-shell off b *_dl_start_user r --library-path . ./some 断在_dl_start_user() (gdb) disas _dl_start_user Dump of assembler code for function _dl_start_user: => 0xf7fe10b7 <+0>: mov edi,eax 0xf7fe10b9 <+2>: call 0xf7fe10a0 0xf7fe10be <+7>: add ebx,0x1bf36 0xf7fe10c4 <+13>: mov eax,DWORD PTR [ebx-0x188] 0xf7fe10ca <+19>: pop edx 0xf7fe10cb <+20>: lea esp,[esp+eax*4] 0xf7fe10ce <+23>: sub edx,eax 0xf7fe10d0 <+25>: push edx 0xf7fe10d1 <+26>: mov eax,DWORD PTR [ebx+0x2c] 0xf7fe10d7 <+32>: lea esi,[esp+edx*4+0x8] 0xf7fe10db <+36>: lea ecx,[esp+0x4] 0xf7fe10df <+40>: mov ebp,esp 0xf7fe10e1 <+42>: and esp,0xfffffff0 0xf7fe10e4 <+45>: push eax 0xf7fe10e5 <+46>: push eax 0xf7fe10e6 <+47>: push ebp 0xf7fe10e7 <+48>: push esi 0xf7fe10e8 <+49>: xor ebp,ebp 0xf7fe10ea <+51>: call 0xf7fee970 <_dl_init_internal> 0xf7fe10ef <+56>: lea edx,[ebx-0xe2d4] 0xf7fe10f5 <+62>: mov esp,DWORD PTR [esp] 0xf7fe10f8 <+65>: jmp edi 0xf7fe10fa <+67>: lea esi,[esi+0x0] End of assembler dump. 最后那个"jmp edi"就是跳转到some的e_entry。 b *_dl_start_user+65 c x/15i $edi 参看: 《24.27 在main()之前执行的函数》 https://scz.617.cn/unix/201507251602.txt 《Unix系列(4)--Unix反调试技术》(Unix_1.txt) before _start() 大致有下面这些流程: ld-linux.so e_entry _dl_start // 返回"normal e_entry" _dl_start_final _dl_sysdep_start dl_main // 处理LD_PRELOAD,加载fake.so,相关符号解析已被劫持 _dl_init call_init // 调用fake.so的.init_array[] normal e_entry