标题: 陈旧系统中如何用GDB调试子进程 创建: 2019-07-05 18:34 更新: 2019-07-09 09:22 链接: https://scz.617.cn/unix/201907051834.txt 本文未做科普,有强前置条件,要求读者对OS及GDB有清楚认知,小白莫要浪费时间。 Windows中windbg支持"-o"命令行参数,或者在交互式提示符中输入".childdbg 1"显 式允许调试子进程。 现代Linux中gdb对调试子进程有如下支持: set follow-fork-mode child set follow-exec-mode new set detach-on-fork on 参看: -------------------------------------------------------------------------- 《如何用GDB调试子进程》 https://scz.617.cn/unix/201206011217.txt 《GDB启动被调试进程时如何尽早断下》 https://scz.617.cn/unix/201901161404.txt 《start the inferior without using a shell》 https://scz.617.cn/unix/201901171036.txt -------------------------------------------------------------------------- 对于各种*nix变体,gdb是否直接支持调试子进程,依赖于内核、libc、gdb的具体实 现,上述gdb设置未必有效。本文考虑陈旧系统中极端不便情况下对子进程的调试。 不考虑如下手段: a) 拦截fork()修改返回值强制走子进程流程 b) 有源码,在子进程流程中插入sleep(),等待Attach 以x86/FreeBSD 6.1为测试系统,这个系统比较陈旧,但有一些设备以此为蓝本修改 而来,显然我们并不是坐在象牙塔中提出这个诡异的需求。设计实验,调试csh中执 行id的过程。 查看csh中fork()、exec()、sleep()相关的PLT项所在: $ objdump -j .plt -d csh | grep -A3 "fork@plt>:" 08049ea4 : 8049ea4: ff 25 c4 fa 08 08 jmp *0x808fac4 8049eaa: 68 98 02 00 00 push $0x298 8049eaf: e9 b0 fa ff ff jmp 8049964 <_init+0x14> -- 0804a054 : 804a054: ff 25 30 fb 08 08 jmp *0x808fb30 804a05a: 68 70 03 00 00 push $0x370 804a05f: e9 00 f9 ff ff jmp 8049964 <_init+0x14> $ objdump -j .plt -d csh | grep -A3 "exec.*@plt>:" 08049eb4 : 8049eb4: ff 25 c8 fa 08 08 jmp *0x808fac8 8049eba: 68 a0 02 00 00 push $0x2a0 8049ebf: e9 a0 fa ff ff jmp 8049964 <_init+0x14> $ objdump -j .plt -d csh | grep -A3 "sleep@plt>:" 08049ba4 : 8049ba4: ff 25 04 fa 08 08 jmp *0x808fa04 8049baa: 68 18 01 00 00 push $0x118 8049baf: e9 b0 fd ff ff jmp 8049964 <_init+0x14> 这个版本的csh中有fork()、vfork()、execv()、sleep()。 查看csh的PID: $ echo $$ 28416 在另一个SSH会话中Attch它: $ gdb-7.6 -q -nx -x /tmp/gdbinit_x86_bsd.txt -p 28416 (gdb) display/5i $pc 通过动态调试确认,csh中执行id时,用vfork()。用IDA分析vfork()主调函数附近流 程: -------------------------------------------------------------------------- 08064DB8 A1 38 08 09 08 mov eax, ds:dword_8090838 ... 08064DD8 85 C0 test eax, eax ... 08064E0E 0F 84 1A 03 00 00 jz loc_806512E 08064E14 E8 8B 50 FE FF call _fork ... 0806512E E8 21 4F FE FF call _vfork -------------------------------------------------------------------------- if ( dword_8090838 ) { pid = fork(); } else { pid = vfork(); } -------------------------------------------------------------------------- 下列代码片段位于pid为0的第一条指令处,意即vfork()后子进程流程的起始点: -------------------------------------------------------------------------- 08064E31 A1 E0 10 0A 08 mov eax, ds:dword_80A10E0 08064E36 31 FF xor edi, edi 08064E38 85 C0 test eax, eax 08064E3A 0F 85 9C 05 00 00 jnz loc_80653DC -------------------------------------------------------------------------- (gdb) b *0x8064e36 Breakpoint 1 at 0x8064e36 (gdb) c 如果直接对0x8064e36设断,在csh中执行id,效果如下: $ id Trace/BPT trap (core dumped) 这是因为子进程中有int3(0xcc)命中,产生SIGTRAP信号;理想情况下支持子进程调 试的gdb会捕获并处理该信号,但在"FreeBSD 6.1+GDB 7.6"场景中,gdb无法直接调 试子进程,gdb没有捕获并处理该信号,该信号被直接分发至子进程;子进程本身肯 定没有捕获并处理该信号,没有安装SIGTRAP信号句柄,此时该信号的缺省行为是使 子进程终止。 已知无法简单调试子进程。设想修改子进程流程中的代码片段,制造死循环,然后从 其他SSH会话中用GDB Attch子进程。Patch点就选0x8064e36。 $ kstoolex x32nasm "jmp ." 0 0x8064e36 0000000008064e36 [ eb fe ] jmp . $ rasm2 -a x86 -b 32 -s intel -o 0x8064e36 -D "eb fe" 0x08064e36 2 ebfe jmp 0x8064e36 (gdb) x/3i 0x8064e36 0x8064e36: xor edi,edi 0x8064e38: test eax,eax 0x8064e3a: jne 0x80653dc (gdb) x/1wx 0x8064e36 0x8064e36: 0xc085ff31 (gdb) set *(short int*)0x8064e36=0xfeeb (gdb) x/1i 0x8064e36 0x8064e36: jmp 0x8064e36 (gdb) x/1wx 0x8064e36 0x8064e36: 0xc085feeb 0x8064e36处出现死循环。 (gdb) c 在csh中执行id触发子进程的死循环。在其他SSH会话中查看子进程PID: $ ps l | grep csh | grep 28416 1000 28416 28601 0 8 0 4728 3632 ppwait DX+ p1 0:00.01 csh 1000 28605 28416 172 117 0 4728 3632 - RV+ p1 0:46.02 csh $ ps uwx -p 28605 USER PID %CPU %MEM VSZ RSS TT STAT STARTED TIME COMMAND scz 28605 99.0 1.4 4728 3632 p1 RV+ 11:25AM 13:00.37 csh 28605是子进程PID,CPU占用率99%,这是死循环的表象。 STAT列意义如下: D Marks a process in disk (or other short term, uninterruptible) wait. I Marks a process that is idle (sleeping for longer than about 20 seconds). L Marks a process that is waiting to acquire a lock. R Marks a runnable process. S Marks a process that is sleeping for less than about 20 seconds. T Marks a stopped process. W Marks an idle interrupt thread. Z Marks a dead process (a ``zombie''). + The process is in the foreground process group of its control terminal. < The process has raised CPU scheduling priority. E The process is trying to exit. J Marks a process which is in jail(2). The hostname of the prison can be found in /proc//status. L The process has pages locked in core (for example, for raw I/O). N The process has reduced CPU scheduling priority (see setpriority(2)). s The process is a session leader. V The process is suspended during a vfork(2). W The process is swapped out. X The process is being traced or debugged. FreeBSD与Linux在STAT列的显示有区别,不要看Linux的man手册。 在其他SSH会话中Attach子进程: $ gdb-7.6 -q -nx -x /tmp/gdbinit_x86_bsd.txt -p 28605 Attach之后没有看到想像中的"(gdb)"提示符,这个gdb僵那儿了,前一个gdb也僵了, 在两个gdb中Ctrl-C断不下来。我以为第二个gdb会断在0x8064e36附近,现在看来有 其他干挠。 $ ps uwx -p 28605 USER PID %CPU %MEM VSZ RSS TT STAT STARTED TIME COMMAND scz 28605 99.0 1.4 4728 3632 p1 RXV+ 11:25AM 18:49.31 csh STAT列的X表明子进程正在被调试,为什么gdb没断下来? $ kill -9 28605 第二个gdb看到: Program terminated with signal SIGKILL, Killed. The program no longer exists. (gdb) 第一个gdb不僵了,可以Ctrl-C打断。 既然死循环有幺蛾子,换sleep()试试。 $ echo -n -e "push 0x7fffffff\ncall 0x8049ba4" | kstoolex x32nasm - 0 0x8064e36 0000000008064e36 [ 68 ff ff ff 7f ] push 0x7fffffff 0000000008064e3b [ e8 64 4d fe ff ] call 0x8049ba4 $ echo -n -e "push 0x7fffffff\ncall 0x8049ba4" | kstoolex x32nasm - q 0x8064e36 68 ff ff ff 7f e8 64 4d fe ff $ rasm2 -a x86 -b 32 -s intel -o 0x8064e36 -D "68 ff ff ff 7f e8 64 4d fe ff" 0x08064e36 5 68ffffff7f push 0x7fffffff 0x08064e3b 5 e8644dfeff call 0x8049ba4 查看原来的字节流: (gdb) x/3i 0x8064e36 0x8064e36: xor edi,edi 0x8064e38: test eax,eax 0x8064e3a: jne 0x80653dc (gdb) x/3wx 0x8064e36 0x8064e36: 0xc085ff31 0x059c850f 0x94a10000 (gdb) db 0x8064e36 10 08064e36: 31 ff 85 c0 0f 85 9c 05 00 00 1......... Patch: set *(int*)0x8064e36=0xffffff68 set *(int*)(0x8064e36+4)=0x4d64e87f set *(short int*)(0x8064e36+8)=0xfffe UnPatch: set *(int*)0x8064e36=0xc085ff31 set *(int*)(0x8064e36+4)=0x059c850f set *(int*)(0x8064e36+8)=0x94a10000 查看Patch后的代码: (gdb) x/3i 0x8064e36 0x8064e36: push 0x7fffffff 0x8064e3b: call 0x8049ba4 0x8064e40: mov eax,ds:0x80d6194 (gdb) x/3wx 0x8064e36 0x8064e36: 0xffffff68 0x4d64e87f 0x94a1fffe (gdb) db 0x8064e36 10 08064e36: 68 ff ff ff 7f e8 64 4d fe ff h.....dM.. 0x8064e36在调用sleep(0x7fffffff),sleep()的单位是秒,足够长。 (gdb) c 在csh中执行id触发子进程的sleep(0x7fffffff)。在其他SSH会话中查看子进程PID: $ ps l | grep csh | grep 28416 1000 28416 28601 0 8 0 4728 3632 ppwait DX+ p1 0:00.01 csh 1000 28704 28416 0 8 0 4728 3632 nanslp IV+ p1 0:00.00 csh $ ps uwx -p 28704 USER PID %CPU %MEM VSZ RSS TT STAT STARTED TIME COMMAND scz 28704 0.0 1.4 4728 3632 p1 IV+ 12:01PM 0:00.00 csh 28704是子进程PID,STAT列的I表明子进程正在sleep()。 在其他SSH会话中Attach子进程: $ gdb-7.6 -q -nx -x /tmp/gdbinit_x86_bsd.txt -p 28704 0x2804e3c8 in .rtld_start () from /libexec/ld-elf.so.1 (gdb) display/5i $pc 1: x/5i $pc => 0x2804e3c8 <.rtld_start>: xor ebp,ebp 0x2804e3ca <.rtld_start+2>: mov eax,esp 0x2804e3cc <.rtld_start+4>: mov esi,esp 0x2804e3ce <.rtld_start+6>: and esp,0xfffffff0 0x2804e3d1 <.rtld_start+9>: sub esp,0x10 与死循环不同,sleep()中的子进程被Attach后断下来了: (gdb) x/3i 0x8064e36 0x8064e36: Cannot access memory at address 0x8064e36 (gdb) info file Symbols from "/usr/bin/id". Unix child process: Using the running image of attached process 28704. While running this, GDB does not access memory from... Local exec file: `/usr/bin/id', file type elf32-i386-freebsd. Entry point: 0x80489ec ... 我以为子进程此刻仍然对应csh,谁知gdb显示对应id,并且无法访问内存,$PC也不 是我想像的sleep()或0x8064e36附近。 此时已切入id进程空间,但很早,流程尚未经过id的e_entry。对id的e_entry设置临 时断点会命中: (gdb) tb *0x80489ec Temporary breakpoint 1 at 0x80489ec (gdb) c Continuing. Program received signal SIGSTOP, Stopped (signal). 0x2804e3c8 in .rtld_start () from /libexec/ld-elf.so.1 1: x/5i $pc => 0x2804e3c8 <.rtld_start>: xor ebp,ebp 0x2804e3ca <.rtld_start+2>: mov eax,esp 0x2804e3cc <.rtld_start+4>: mov esi,esp 0x2804e3ce <.rtld_start+6>: and esp,0xfffffff0 0x2804e3d1 <.rtld_start+9>: sub esp,0x10 (gdb) c Continuing. Temporary breakpoint 1, 0x080489ec in ?? () 1: x/5i $pc => 0x80489ec: push ebp 0x80489ed: mov ebp,esp 0x80489ef: push edi 0x80489f0: push esi 0x80489f1: push ebx (gdb) c Continuing. [Inferior 1 (process 28704) exited normally] 连续c,id会正常结束,有输出。 0x8064e36处的原始流程是: -------------------------------------------------------------------------- v35 = 0; if ( dword_80A10E0 ) { sigsetmask( ::mask ); dword_80A10E0 = 0; } -------------------------------------------------------------------------- Patch成sleep()之后,上述代码片段得不到执行,但这个流程上的微小差异不影响后 续id的执行,所以前面连续c之后id会正常结束。 至此,脑海中闪出一个疑问,这种鬼现象是不是跟vfork()有关? vfork(2)中有: -------------------------------------------------------------------------- The vfork() system call can be used to create new processes without fully copying the address space of the old process, which is horrendously inefficient in a paged environment. It is useful when the purpose of fork(2) would have been to create a new system context for an execve(2). The vfork() system call differs from fork(2) in that the child borrows the parent's memory and thread of control until a call to execve(2) or an exit (either by a call to _exit(2) or abnormally). The parent process is suspended while the child is using its resources. The vfork() system call returns 0 in the child's context and (later) the pid of the child in the parent's context. The vfork() system call can normally be used just like fork(2). It does not work, however, to return while running in the child's context from the procedure that called vfork() since the eventual return from vfork() would then return to a no longer existent stack frame. Be careful, also, to call _exit(2) rather than exit(3) if you cannot execve(2), since exit(3) will flush and close standard I/O channels, and thereby mess up the parent processes standard I/O data structures. (Even with fork(2) it is wrong to call exit(3) since buffered data would then be flushed twice.) This system call will be eliminated when proper system sharing mechanisms are implemented. Users should not depend on the memory sharing semantics of vfork() as it will, in that case, be made synonymous to fork(2). To avoid a possible deadlock situation, processes that are children in the middle of a vfork() are never sent SIGTTOU or SIGTTIN signals; rather, output or ioctl(2) calls are allowed and input attempts result in an end-of-file indication. -------------------------------------------------------------------------- W. Richard Stevens在APUE的8.4小节对比了fork()和vfork()。有两点要引起注意: -------------------------------------------------------------------------- a) vfork()得到的子进程在exec*()之前与父进程共用地址空间,在此期间子进程对内存 的修改将影响父进程,这里说的内存包括全局变量和stack。 b) vfork()得到的子进程优先于父进程得到调度执行,子进程调用exec*()之后父进程才 有机会得到调度执行;在此期间子进程如果有依赖父进程的操作,会出现死锁。 -------------------------------------------------------------------------- 猜测一下前述子进程中死循环、sleep()状态被Attach时的表现: -------------------------------------------------------------------------- 对于vfork()得到的子进程,GDB Attach时断在"(gdb)"提示符的时机是子进程调用 exec*()之后,在此之前的流程会被Attach操作影响,但不会断到"(gdb)"提示符。 死循环情形,子进程永远无法调用exec*(),Attch上去的gdb永远断不下来,第二个 gdb外在表现为僵死。父进程永远没有机会得到调度执行,第一个gdb外在表现为僵死。 slee()情形,子进程的nanosleep()被Attch操作打断,返回-1,errno被设成EINTR。 子进程流程从0x8064e40处继续,直至调用exec*()后断到"(gdb)"提示符。这可以解 释为什么"info file"看到的不是csh而是id,也可以解释为什么不能访问0x8064e36, 因为地址空间布局已经不是csh的了。 -------------------------------------------------------------------------- 上面只是一种合理猜测,我没有调试FreeBSD内核及GDB 7.6代码。 0x8064db8处的代码表明,如果0x8090838处不为0,将调用fork(),否则调用vfork()。 (gdb) x/1wx 0x8090838 0x8090838: 0x00000000 在gdb里看了一下,0x8090838处为0。做个实验,将0x8090838处改成1,迫使父进程 调用fork(),看这次能否调试子进程。 set *(int*)0x8090838=1 set *(int*)0x8064e36=0xffffff68 set *(int*)(0x8064e36+4)=0x4d64e87f set *(short int*)(0x8064e36+8)=0xfffe 测试sleep()情形。 (gdb) c 在csh中执行id触发子进程的sleep(0x7fffffff)。在其他SSH会话中查看子进程PID: $ ps l | grep csh | grep 28416 1000 28416 28601 0 20 0 4728 3632 pause IX+ p1 0:00.02 csh 1000 28888 28416 0 8 0 4728 3632 nanslp I+ p1 0:00.00 csh $ ps uwx -p 28888 USER PID %CPU %MEM VSZ RSS TT STAT STARTED TIME COMMAND scz 28888 0.0 1.4 4728 3632 p1 I+ 1:15PM 0:00.00 csh 28888是子进程PID。 在其他SSH会话中Attach子进程: $ gdb-7.6 -q -nx -x /tmp/gdbinit_x86_bsd.txt -p 28888 0x281aa217 in nanosleep () from /lib/libc.so.6 (gdb) display/5i $pc 1: x/5i $pc => 0x281aa217 : jb 0x281aa1fc 0x281aa219 : ret 0x281aa21a : nop 0x281aa21b : nop 0x281aa21c : push ebx (gdb) bt #0 0x281aa217 in nanosleep () from /lib/libc.so.6 #1 0x2818e669 in sleep () from /lib/libc.so.6 #2 0x08064e40 in ?? () #3 0x08064a24 in ?? () #4 0x0804a8f5 in ?? () #5 0x0804c8bc in ?? () #6 0x0804a26a in ?? () #7 0x00000001 in ?? () (gdb) info file Symbols from "/bin/csh". Unix child process: Using the running image of attached process 28888. While running this, GDB does not access memory from... Local exec file: `/bin/csh', file type elf32-i386-freebsd. Entry point: 0x804a1f4 ... 如预期般地断在nanosleep()中,调用栈回溯里有0x8064e40,子进程此刻仍然对应 csh。 在子进程中恢复到Patch之前的状态: set *(int*)0x8090838=0 set *(int*)0x8064e36=0xc085ff31 set *(int*)(0x8064e36+4)=0x059c850f set *(int*)(0x8064e36+8)=0x94a10000 (gdb) x/1wx 0x8090838 0x8090838: 0x00000000 (gdb) x/3i 0x8064e36 0x8064e36: xor edi,edi 0x8064e38: test eax,eax 0x8064e3a: jne 0x80653dc 修改保存在栈中的sleep()的RetAddr,使之指向0x8064e36: (gdb) frame 1 #1 0x2818e669 in sleep () from /lib/libc.so.6 (gdb) x/2wx $ebp 0xbfbf1be4: 0xbfbf40d8 0x08064e40 (gdb) set *(int*)($ebp+4)=0x8064e36 (gdb) bt 3 #0 0x281aa217 in nanosleep () from /lib/libc.so.6 #1 0x2818e669 in sleep () from /lib/libc.so.6 #2 0x08064e36 in ?? () (More stack frames follow...) 在0x8064e36处设断临时断点并命中: (gdb) tb *0x08064e36 Temporary breakpoint 1 at 0x8064e36 (gdb) c Continuing. Temporary breakpoint 1, 0x08064e36 in ?? () 1: x/5i $pc => 0x8064e36: xor edi,edi 0x8064e38: test eax,eax 0x8064e3a: jne 0x80653dc 0x8064e40: mov eax,ds:0x80d6194 0x8064e45: test eax,eax (gdb) i r eax esp eax 0x7ffffff7 2147483639 esp 0xbfbf1bec 0xbfbf1bec 恢复eax: (gdb) x/1wx 0x80a10e0 0x80a10e0: 0x00000001 (gdb) set $eax=1 Patch时0x8064e36处的"push 0x7fffffff"多消耗了栈上4字节,需要恢复esp: (gdb) set $esp+=4 至此已经严格恢复到原始子进程状态,没有Patch的状态,并且断在vfork()后子进程流 程的起始点附近(0x8064e36)。 在子进程中拦截对execv()的调用: (gdb) tb *0x8049eb4 (gdb) c Continuing. Temporary breakpoint 2, 0x08049eb4 in execv@plt () 1: x/5i $pc => 0x8049eb4 : jmp DWORD PTR ds:0x808fac8 0x8049eba : push 0x2a0 0x8049ebf : jmp 0x8049964 如愿断下,定位主调点: (gdb) x/1wx $esp 0xbfbf1b48: 0x08052c57 (gdb) bt #0 0x08049eb4 in execv@plt () #1 0x08052c57 in ?? () #2 0x08053219 in ?? () #3 0x080648c5 in ?? () #4 0x08064a24 in ?? () #5 0x0804a8f5 in ?? () #6 0x0804c8bc in ?? () #7 0x0804a26a in ?? () #8 0x00000001 in ?? () -------------------------------------------------------------------------- 08052C52 E8 5D 72 FF FF call _execv 08052C57 C7 05 60 83 0F 08 00 00+ mov ds:dword_80F8360, 0 -------------------------------------------------------------------------- 080648C0 E8 AB E6 FE FF call sub_8052F70 080648C5 83 C4 10 add esp, 10h -------------------------------------------------------------------------- 0x80648c0(调用execv)与0x806512e(调用vfork)位于同一函数sub_80643B8中。 (gdb) c Continuing. Program received signal SIGTRAP, Trace/breakpoint trap. 0x2804e3c8 in ?? () 1: x/5i $pc => 0x2804e3c8: xor ebp,ebp 0x2804e3ca: mov eax,esp 0x2804e3cc: mov esi,esp 0x2804e3ce: and esp,0xfffffff0 0x2804e3d1: sub esp,0x10 (gdb) info file Symbols from "/bin/csh". Unix child process: Using the running image of attached process 28888. While running this, GDB does not access memory from... Local exec file: `/bin/csh', file type elf32-i386-freebsd. Entry point: 0x804a1f4 ... c之后遭遇SIGTRAP信号是execv()引起的,当前进程已经由csh换成id,"info file" 未能正确反映这个变化,仍然错误显示成csh。用"ps uwx -p 28888"可以看到 COMMAND列已从csh变成id。 注意0x2804e3c8在前文出现过一次,当时对应符号".rtld_start"。 对id的e_entry设置临时断点会命中: (gdb) tb *0x80489ec Temporary breakpoint 3 at 0x80489ec (gdb) c Continuing. Temporary breakpoint 3, 0x080489ec in ?? () 2: x/5i $pc => 0x80489ec: push ebp 0x80489ed: mov ebp,esp 0x80489ef: push edi 0x80489f0: push esi 0x80489f1: push ebx 这个系统不支持"catch exec",但前面的演示实际达到了"catch exec"的效果。 至此,强制fork()之后,成功调试子进程,无论流程位于execv()之前还是之后。强 制fork()之后,不用sleep(),就用死循环呢? set *(int*)0x8090838=1 set *(short int*)0x8064e36=0xfeeb 测试死循环情形。 (gdb) c 在csh中执行id触发子进程的死循环。在其他SSH会话中查看子进程PID: $ ps l | grep csh | grep 28416 1000 28416 28601 0 20 0 4728 3632 pause SX+ p1 0:00.03 csh 1000 29526 28416 119 110 0 4728 3632 - R+ p1 0:02.27 csh $ ps uwx -p 29526 USER PID %CPU %MEM VSZ RSS TT STAT STARTED TIME COMMAND scz 29526 97.0 1.4 4728 3632 p1 R+ 6:09PM 0:17.18 csh 29526是子进程PID。 在其他SSH会话中Attach子进程: $ gdb-7.6 -q -nx -x /tmp/gdbinit_x86_bsd.txt -p 29526 0x08064e36 in ?? () (gdb) display/5i $pc 1: x/5i $pc => 0x8064e36: jmp 0x8064e36 0x8064e38: test eax,eax 0x8064e3a: jne 0x80653dc 0x8064e40: mov eax,ds:0x80d6194 0x8064e45: test eax,eax 如愿断在0x8064e36处,Patch过的死循环所在,子进程此刻仍然对应csh。 在子进程中恢复到Patch之前的状态: set *(int*)0x8090838=0 set *(int*)0x8064e36=0xc085ff31 单步跟踪正常: (gdb) x/3i $pc => 0x8064e36: xor edi,edi 0x8064e38: test eax,eax 0x8064e3a: jne 0x80653dc (gdb) i r eax esp eax 0x1 1 esp 0xbfbf1bf0 0xbfbf1bf0 (gdb) si 0x08064e38 in ?? () 1: x/5i $pc => 0x8064e38: test eax,eax 0x8064e3a: jne 0x80653dc 0x8064e40: mov eax,ds:0x80d6194 0x8064e45: test eax,eax 0x8064e47: je 0x8064e78 强制fork()之后,死循环比sleep()简便多了,不用恢复eax、esp。最开始的死循环 方案不能用,是vfork()造成的。 实际中肯定不需要对"csh中执行id"进行调试,我只是以此举例,演示在极端不便情 况下对子进程进行调试。文中之所以使用gdb-7.6,因为这是在FreeBSD 6.1上能编译 通过的最高版本。 小结一下要点: -------------------------------------------------------------------------- a) 设法将vfork()换成fork()。考虑修改影响流程的全局变量,vfork()对应的GOT[i] (.got.plt)或PLT[j](.plt),甚至直接Patch .text。 b) 调试父进程时,在将来的子进程流程起始点附近(确认fork()返回值pid为0处)Patch 出死循环。这是CPU相关的操作,需要汇编语言功底。 c) 设法触发fork()子进程,子进程将陷入Patch出来的死循环。从其他SSH会话识别子进 程PID,用另一个GDB Attch子进程。 d) 恢复子进程中Patch点附近代码,此后可以调试fork()之后、execv()之前的子进程流 程。 e) 子进程调用execv()之后会收到SIGTRAP信号而断在"(gdb)"提示符,类似于 "catch exec"的效果,不要受"info file"误导,此后可以调试execv()之后的新进程。 --------------------------------------------------------------------------