标题: kd调试的某些技术原理 作者: Tarik Soulami 链接: https://scz.617.cn/windows/201711151139.txt 内核态调试时设置断点,与用户态调试时一样,在指定地址写入0xCC(int3)。断点命 中时,OS暂停运行,调试器接手,在进入"break-in send/receive loop"之前,恢复 断点所在地址的原始字节。因此,你在kd提示符下,看不到断点处的0xCC。这里不讨 论硬件断点、内存(属性)断点之类的。 在内核态设置断点,目标地址所在内存页有可能已被扇出(到磁盘)。这种情况下,调 试子系统记录断点信息;当目标内存页被扇入(到内存)时,处理缺页异常的相关代码 (比如nt!MmAccessFault)会真正写入0xCC。我不确认这个说法的细节。 在kd中对用户态内存设置断点,只能使用当前进程上下文,必须使用".process /i" 将目标进程变成当前进程。 在内核态单步调试,同样使用单步中断(int1)。有时你正在t/p,突然发现流程跑飞, RIP到了一个与关注点毫无关系的、看似随机的地址。通常这是由于之前单步调试所 在线程的时间片到期;比如,你p过一个函数,该函数内部导致当前线程进入等待状 态;OS任务调度机制开始切换线程。幸运的是,内核态单步调试时,不仅利用TF标志, 也会在下一条指令处写入0xCC,当目标线程再次被调度成当前线程时,之前单步效果 仍在。 使用".process /i"切换进程上下文时,OS先退出"break-in send/receive loop", 然后安排一个高优先级的工作线程,利用它切换到目标进程上下文。 当前进程暂时是System: kd> !process -1 0 PROCESS ffff948de7c9a480 SessionId: none Cid: 0004 Peb: 00000000 ParentCid: 0000 DirBase: 001ab000 ObjectTable: ffffa80a3bc02240 HandleCount: 2180. Image: System 切换到cmd.exe: kd> !process 0 0 cmd.exe PROCESS ffff948de82e35c0 SessionId: 1 Cid: 15f4 Peb: 5160305000 ParentCid: 0fe8 DirBase: 5378d000 ObjectTable: ffffa80a42a170c0 HandleCount: 238. Image: cmd.exe kd> .process /i 0xffff948de82e35c0 You need to continue execution (press 'g' ) for the context to be switched. When the debugger breaks in again, you will be in the new process context. kd> g Break instruction exception - code 80000003 (first chance) rax=0000000000000000 rbx=00000000000000bd rcx=0000000000000007 rdx=0000000000000000 rsi=0000000000000000 rdi=ffff948de82e35c0 rip=fffff80119bee4e0 rsp=ffffa501274f4ae8 rbp=ffff948de82e35c0 r8=ffff948deaae6798 r9=7ffff8011a26fbb0 r10=7ffffffffffffffc r11=ffffa501274f4ae8 r12=0000000000000200 r13=0000000000000000 r14=0000000000000000 r15=fffff80119e97900 iopl=0 nv up ei ng nz na po nc cs=0010 ss=0018 ds=002b es=002b fs=0053 gs=002b efl=00000286 nt!DbgBreakPointWithStatus: fffff801`19bee4e0 cc int 3 查看当前线程及调用栈回溯: kd> !thread -p -1 0xf kd> !thread -p -1 0x1f PROCESS ffff948de82e35c0 SessionId: 1 Cid: 15f4 Peb: 5160305000 ParentCid: 0fe8 DirBase: 5378d000 ObjectTable: ffffa80a42a170c0 HandleCount: 238. Image: cmd.exe THREAD ffff948deaae6700 Cid 0004.1ae0 Teb: 0000000000000000 Win32Thread: 0000000000000000 RUNNING on processor 1 Not impersonating DeviceMap ffffa80a41ef25e0 Owning Process ffff948de7c9a480 Image: System Attached Process ffff948de82e35c0 Image: cmd.exe Wait Start TickCount 43813331 Ticks: 2 (0:00:00:00.031) Context Switch Count 1887 IdealProcessor: 1 UserTime 00:00:00.000 KernelTime 00:00:00.031 Win32 Start Address nt!ExpWorkerThread (0xfffff80119adf740) Stack Init ffffa501274f4c90 Current ffffa501274f4810 Base ffffa501274f5000 Limit ffffa501274ef000 Call 0000000000000000 Priority 12 BasePriority 12 PriorityDecrement 0 IoPriority 2 PagePriority 5 Child-SP RetAddr Call Site ffffa501`274f4ae8 fffff801`1a212137 nt!DbgBreakPointWithStatus ffffa501`274f4af0 fffff801`19adf835 nt!ExpDebuggerWorker+0x107 ffffa501`274f4b80 fffff801`19b684e7 nt!ExpWorkerThread+0xf5 ffffa501`274f4c10 fffff801`19bedef6 nt!PspSystemThreadStartup+0x47 ffffa501`274f4c60 00000000`00000000 nt!KiStartSystemThread+0x16 当前线程与众不同,无论Flags的0x10是否置位,它的调用栈回溯都没有用户态。当 前进程确实是cmd.exe,但上述线程的Cid显示"0004.1ae0",前面的4对应System进程; "Owning Process"是System,"Attached Process"是cmd.exe。 查看当前进程上下文: kd> !process -1 0xf kd> !process -1 0x1f PROCESS ffff948de82e35c0 SessionId: 1 Cid: 15f4 Peb: 5160305000 ParentCid: 0fe8 DirBase: 5378d000 ObjectTable: ffffa80a42a170c0 HandleCount: 238. Image: cmd.exe VadRoot ffff948de8494d90 Vads 92 Clone 0 Private 498. Modified 988. Locked 8. DeviceMap ffffa80a41ef25e0 Token ffffa80a42ae4a60 ElapsedTime 14 Days 20:24:33.429 UserTime 00:00:00.031 KernelTime 00:00:00.093 QuotaPoolUsage[PagedPool] 202648 QuotaPoolUsage[NonPagedPool] 12832 Working Set Sizes (now,min,max) (473, 50, 345) (1892KB, 200KB, 1380KB) PeakWorkingSetSize 3756 VirtualSize 2097279 Mb PeakVirtualSize 2097292 Mb PageFaultCount 9333 MemoryPriority BACKGROUND BasePriority 8 CommitCharge 1046 THREAD ffff948de82e4080 Cid 15f4.15f8 Teb: 0000005160306000 Win32Thread: ffff948deb9bc790 WAIT: (Executive) KernelMode Alertable ffff948deb97bd38 NotificationEvent IRP List: ffff948de873f9a0: (0006,0160) Flags: 00060030 Mdl: 00000000 Not impersonating DeviceMap ffffa80a41ef25e0 Owning Process ffff948de82e35c0 Image: cmd.exe Attached Process N/A Image: N/A Wait Start TickCount 28271013 Ticks: 15542320 (2:19:27:28.750) Context Switch Count 1351 IdealProcessor: 1 UserTime 00:00:00.031 KernelTime 00:00:00.125 Win32 Start Address cmd!mainCRTStartup (0x00007ff65c066b00) Stack Init ffffa501278fbc90 Current ffffa501278fb480 Base ffffa501278fc000 Limit ffffa501278f6000 Call 0000000000000000 Priority 8 BasePriority 8 PriorityDecrement 0 IoPriority 2 PagePriority 5 Child-SP RetAddr Call Site ffffa501`278fb4c0 fffff801`19b05a40 nt!KiSwapContext+0x76 ffffa501`278fb600 fffff801`19b053be nt!KiSwapThread+0x190 ffffa501`278fb6c0 fffff801`19b04b99 nt!KiCommitThreadWait+0x10e ffffa501`278fb760 fffff801`19f7bb6e nt!KeWaitForSingleObject+0x1c9 ffffa501`278fb840 fffff801`19f7b30c nt!IopSynchronousServiceTail+0x23e ffffa501`278fb8f0 fffff801`19f7ac86 nt!IopXxxControlFile+0x66c ffffa501`278fba20 fffff801`19bf3d53 nt!NtDeviceIoControlFile+0x56 ffffa501`278fba90 00007ffa`c815ff24 nt!KiSystemServiceCopyEnd+0x13 (TrapFrame @ ffffa501`278fbb00) 00000051`6010f808 00007ffa`c46be86d ntdll!NtDeviceIoControlFile+0x14 00000051`6010f810 00007ffa`c4755e20 KERNELBASE!ConsoleCallServerGeneric+0x10d 00000051`6010f970 00007ffa`c4755eaa KERNELBASE!ReadConsoleInternal+0x174 00000051`6010fac0 00007ff6`5c07444e KERNELBASE!ReadConsoleW+0x1a 00000051`6010fb00 00007ff6`5c06b8b9 cmd!ReadBufFromConsole+0x10e 00000051`6010fbd0 00007ff6`5c05dfb5 cmd!_chkstk+0x3a29 00000051`6010fc30 00007ff6`5c05d8a6 cmd!Lex+0x485 00000051`6010fca0 00007ff6`5c05d5e8 cmd!GeToken+0x26 00000051`6010fcd0 00007ff6`5c06d097 cmd!Parser+0x118 00000051`6010fd00 00007ff6`5c066a89 cmd!_chkstk+0x5207 00000051`6010fda0 00007ffa`c5cd1fe4 cmd!wil::details_abi::ProcessLocalStorage::~ProcessLocalStorage+0x289 00000051`6010fde0 00007ffa`c812ef91 KERNEL32!BaseThreadInitThunk+0x14 00000051`6010fe10 00000000`00000000 ntdll!RtlUserThreadStart+0x21 可以看到,cmd.exe只有一个线程,但它不是当前线程。 2017-11-21 11:06 scz 尽管目标地址未被扇入,并不影响对之设普通断点(0xCC)。这种情况下,调试子系统 记录断点信息;当目标内存页被扇入(到内存)时,处理缺页异常的相关代码会真正写 入0xCC。 Tarik Soulami的大意如此。今天我看了一下处理缺页异常的相关代码,补充一些细节。 以x64/Win10 16299.15.amd64fre.rs3_release.170928-1534为例: nt!KiPageFault+0x167 nt!KdpSetOwedBreakpoints+0x13e nt!KdpInsertBreakpoint+0x3d nt!KdpCopyCodeStream+0x30 nt!KdpCopyMemoryChunks+0x89 nt!MmDbgCopyMemory+0x70 nt!MiDbgCopyMemory+0x264 nt!MiCopyToUntrustedMemory+0xfd -------------------------------------------------------------------------- nt!KiPageFault () { /* * 单字节布尔变量 */ if ( nt!KdpOweBreakpoint == TRUE ) { /* * 之前有"欠下"的未被设置的0xCC断点,在此补上 */ KdSetOwedBreakpoints( ... ); } } -------------------------------------------------------------------------- nt!KdpSetOwedBreakpoints+0xce处会将全局变量nt!KdpOweBreakpoint设成FALSE。 2017-11-22 12:18 scz 0xCC断点命中,g之后0xCC会被写回断点处,涉及的代码路径是: nt!KiDispatchException+0x272 nt!KdTrap+0x27 nt!KdpTrap+0x148 nt!KdpReport+0xaf nt!KdpReportExceptionStateChange+0x96 nt!KdpSendWaitContinue+0x28c nt!KdpAddBreakpoint+0x14a nt!KdpInsertBreakpoint+0x38 nt!KdpCopyCodeStream+0x2b nt!KdpCopyMemoryChunks+0x84 nt!MmDbgCopyMemory+0x6b nt!MiDbgCopyMemory+0x25f nt!MiCopyToUntrustedMemory+0xfa