标题: IDA Hex-Rays Microcode技术入门 创建: 2025-07-07 16:31 更新: 2025-07-24 12:23 链接: https://scz.617.cn/python/202507071631.txt https://bbs.kanxue.com/thread-287634.htm -------------------------------------------------------------------------- 目录: ☆ 背景介绍 ☆ 用Hex-Rays Microcode技术析取函数参数 ☆ 在F5伪码中使用解密后的明文字符串 ☆ idaapi.Hexrays_Hooks示例 ☆ 其他讨论 1) 优化器 2) 增强阅读 3) 微码编程中的排错 -------------------------------------------------------------------------- ☆ 背景介绍 参看 -------------------------------------------------------------------------- 《Angr符号执行练习--XorDDoS某样本字符串解密》 https://scz.617.cn/unix/202504242226.txt XorDDoS僵尸网络家族的某样本 https://www.virustotal.com/gui/file/0e9e859d22b009e869322a509c11e342 https://www.virustotal.com/gui/file/cad80071b9af3fa742ec7fbbeae0e2ffe2566742b20bfdf436b8138da3fd20e9 -------------------------------------------------------------------------- 上文提到XorDDoS僵尸网络家族的某样本,此样本已无技术分析价值,仅用于演示逆 向工程相关技术。 IDA32反汇编样本,会调dec_conf()解密还原字符串: -------------------------------------------------------------------------- 0804CFA3 C7 44 24 08 0B 00 00 00 mov dword ptr [esp+8], 0Bh 0804CFAB C7 44 24 04 B1 2F 0B 08 mov dword ptr [esp+4], offset aM7a4nqNa_0 ; "m7A4nQ_/nA" 0804CFB3 8D 85 B3 EA FF FF lea eax, [ebp+var_154D] 0804CFB9 89 04 24 mov [esp], eax 0804CFBC E8 67 B2 FF FF call dec_conf 0804CFC1 C7 44 24 08 07 00 00 00 mov dword ptr [esp+8], 7 0804CFC9 C7 44 24 04 BC 2F 0B 08 mov dword ptr [esp+4], offset aMN3_0 ; "m [(n3" 0804CFD1 8D 85 B3 E9 FF FF lea eax, [ebp+var_164D] 0804CFD7 89 04 24 mov [esp], eax 0804CFDA E8 49 B2 FF FF call dec_conf -------------------------------------------------------------------------- 上文用r2pipe模块静态分析样本,找到"call dec_conf"指令所在地址,再从附近的 汇编指令析取dec_conf()的参数,比如加密字符串的地址、长度。此法要求汇编指令 布局具有固定模式,否则要考虑各种情况,实现繁琐。 IDA F5反编译main()时,注意到 -------------------------------------------------------------------------- dec_conf(v23, "m7A4nQ_/nA", 11); dec_conf(v22, "m [(n3", 7); dec_conf(v21, "m6_6n3", 7); dec_conf(v19, aM4s4nacNZv, 18); dec_conf(v18, aMN4C, 17); dec_conf(v17, "m.[$n3", 7); dec_conf(v16, a6f6, 512); dec_conf(v20, "m4S4nAC/nA", 11); -------------------------------------------------------------------------- Hex-Rays已识别出dec_conf()的参数。IDAPython脚本可调用Microcode API,充分利 用Hex-Rays分析结果,从中析取dec_conf()的参数。此法不依赖汇编指令布局,依赖 IDA生成的微码。相比汇编指令,微码抽象层级更高。在IDA F5语境中,一般有 汇编指令 => 微码 => cfunc 利用微码技术析取加密字符串的地址、长度后,由于所涉及的解密算法很简单,可在 IDAPython脚本中完成解密。假设已有明文字符串,更直观的需求是,能否将F5结果 替换成 -------------------------------------------------------------------------- dec_conf(v23, "/usr/bin/", 11); dec_conf(v22, "/bin/", 7); dec_conf(v21, "/tmp/", 7); dec_conf(v19, "/var/run/gcc.pid", 18); dec_conf(v18, "/lib/libudev.so", 17); dec_conf(v17, "/lib/", 7); dec_conf(v16, "http://pcdown.gddos.com:8080/cfg.rar", 512); dec_conf(v20, "/var/run/", 11); -------------------------------------------------------------------------- 本文围绕此需求进一步演示相关技术。 ☆ 用Hex-Rays Microcode技术析取函数参数 Microcode API文档不多,若能在某种IDE中调试使用它的IDAPython脚本,对学习API 非常有益。参看 -------------------------------------------------------------------------- 《用VSCode+debugpy调试IDAPython插件(简略版)》 https://scz.617.cn/python/202507021618.txt 《用VSCode+debugpy调试IDAPython插件》 https://scz.617.cn/python/202506261718.txt https://gist.github.com/icecr4ck/ec39ddedf3f1948fdf7873094561739a -------------------------------------------------------------------------- 用VSCode调试IDAPython脚本时,应避免使用中文目录名,否则可能出现断点不生效 的现象。 样本涉及的解密算法很简单,就是异或: -------------------------------------------------------------------------- char *__cdecl encrypt_code(char *buf, int size) { char *p; int i; p = buf; for ( i = 0; i < size; ++i ) *p++ ^= xorkeys[i % 16]; return buf; } -------------------------------------------------------------------------- 080CF3E8 42 42 32 46 41 33 36 41…xorkeys db 'BB2FA36AAA9541F0' -------------------------------------------------------------------------- XorDDoS_analyse_b.py -------------------------------------------------------------------------- #!/usr/bin/env python # -*- coding: cp936 -*- import traceback import idaapi, idautils # # Wrap a function in try-except block # def catch ( func ) : def wrapper ( *args, **kwargs ) : try : return func( *args, **kwargs ) except Exception as e : traceback.print_exc() print( e ) return wrapper # # Visit all mop_t # class MopVisitor ( idaapi.mop_visitor_t ) : def __init__ ( self, target, xorkey, keysize ) : super().__init__() self.target = target self.xorkey = xorkey self.keysize = keysize @catch def visit_mop ( self, op, var_type, is_target ) : del var_type, is_target # logging.debug( "Operand: %s", op.dstr() ) # # mop_f list of arguments # if op.t != idaapi.mop_f : return 0 call_info = op.f if call_info.callee != self.target : return 0 # # mop_a 绝对地址 # mop_n immediate number constant # if len( call_info.args ) < 3 or \ call_info.args[0].t != idaapi.mop_a or \ call_info.args[1].t != idaapi.mop_a or \ call_info.args[2].t != idaapi.mop_n : return 0 addr = call_info.args[1].a # # mop_v global variable # mop_S local stack variable # # 有封装好的is_glbaddr()可用,检查 # # t == mop_a && a->t == mop_v # if call_info.args[1].a.t != idaapi.mop_v : return 0 addr = call_info.args[1].a.g size = call_info.args[2].nnn.value dec_buf = bytearray() for i in range( size ) : enc_byte = idaapi.get_byte( addr + i ) # # 此处可优化,没必要每次从some.idb中取,可直接固化在代码中 # key_byte = idaapi.get_byte( self.xorkey + ( i % self.keysize ) ) dec_buf.append( enc_byte ^ key_byte ) try : dec_str = dec_buf.split( b'\0' )[0].decode( 'latin-1' ) except UnicodeDecodeError : dec_str = f"Unprintable data: {dec_buf.hex()}" print( f"{addr:#x}: {dec_str}" ) # # repeatable comment # idaapi.set_cmt( addr, dec_str, 1 ) # # 返回0表示继续遍历 # return 0 def main () : target = idaapi.get_name_ea( 0, "dec_conf" ) if target == idaapi.BADADDR : print( "Not found dec_conf()" ) return xorkey = 0x80cf3e8 keysize = 16 callerset = set() for ref in idautils.XrefsTo( target ) : f = idaapi.get_func( ref.frm ) # # 可能多个交叉引用位于同一主调函数,对此只gen_microcode()一次 # if f and f.start_ea not in callerset : callerset.add( f.start_ea ) if not idaapi.is_code( idaapi.get_flags( f.start_ea ) ) : continue hf = idaapi.hexrays_failure_t() mbr = idaapi.mba_ranges_t( f ) try : # # 对指定函数生成微码(Hex-Rays Microcode),本例是dec_conf() # 的主调函数 # mba = idaapi.gen_microcode( mbr, hf, None, idaapi.DECOMP_WARNINGS, idaapi.MMAT_CALLS ) if mba : # # 在微码中析取指定函数的调用参数,与抽象层级更高的微码 # 打交道,而非与汇编代码直接打交道 # visitor = MopVisitor( target, xorkey, keysize ) mba.for_all_ops( visitor ) except idaapi.DecompilationFailure : print( f"Decompilation failed for func at {f.start_ea:#x}" ) if "__main__" == __name__ : if '__idacode__' in globals() and __idacode__ : dbg.bp( __idacode__, 'Hit breakpoint' ) main() -------------------------------------------------------------------------- 简介整体框架。通过交叉引用获取dec_conf()的主调函数,依次对它们生成微码。为 析取函数参数,gen_microcode()必须指定MMAT_CALLS。for_all_ops()对指定函数进 行mop遍历,每遇上一个mop都用自定义visit_mop()处理一遍。 visit_mop()是使用微码API的核心,通过op.t识别出函数调用,检查是否在调用 dec_conf(),检查是否有三个参数,检查三个参数的类型,要求第二形参必须是全局 变量的地址,第三形参必须是整数;这些检查可减少误命中。检查通过后,取加密字 符串的地址、长度,对之进行异或解密,用解密结果设置"可重复注释"。 复杂解密逻辑另说,本文只是用此样本演示Microcode API的使用。 ☆ 在F5伪码中使用解密后的明文字符串 XorDDoS_analyse_e.py -------------------------------------------------------------------------- #!/usr/bin/env python # -*- coding: cp936 -*- import traceback import idaapi, idautils # # 不要与已知段名冲突 # def PrivateGetSeg ( name ) : seg = idaapi.get_segm_by_name( name ) if seg : return seg seg_start = idaapi.inf_get_max_ea() seg_start = ( seg_start + 0xf ) & ~0xf seg_size = 0x1000 seg = idaapi.segment_t() seg.start_ea = seg_start seg.end_ea = seg_start + seg_size seg.perm = idaapi.SEGPERM_READ seg.bitness = 2 if idaapi.get_inf_structure().is_64bit() else 1 idaapi.add_segm_ex( seg, name, "CONST", idaapi.ADDSEG_NOSREG ) return seg def Private_add_bytes_to_idb ( segname, off, buf ) : seg = PrivateGetSeg( segname ) if not seg : return idaapi.BADADDR if off < seg.start_ea or off + len( buf ) > seg.end_ea : return idaapi.BADADDR # # This function does not save the original values of bytes. See also # patch_bytes() # idaapi.put_bytes( off, buf ) return off + len( buf ) def catch ( func ) : def wrapper ( *args, **kwargs ) : try : return func( *args, **kwargs ) except Exception as e : traceback.print_exc() print( e ) return wrapper class MopVisitor ( idaapi.mop_visitor_t ) : def __init__ ( self, target, xorkey, keysize, addrdict, seg ) : super().__init__() self.target = target self.xorkey = xorkey self.keysize = keysize self.addrdict = addrdict self.seg = seg @catch def visit_mop ( self, op, var_type, is_target ) : del var_type, is_target if op.t != idaapi.mop_f : return 0 call_info = op.f if call_info.callee != self.target : return 0 if len( call_info.args ) < 3 or \ call_info.args[0].t != idaapi.mop_a or \ call_info.args[1].t != idaapi.mop_a or \ call_info.args[2].t != idaapi.mop_n : return 0 addr = call_info.args[1].a if call_info.args[1].a.t != idaapi.mop_v : return 0 addr = call_info.args[1].a.g seg = PrivateGetSeg( self.seg["name"] ) if seg and addr >= seg.start_ea and addr < seg.end_ea : return 0 if addr not in self.addrdict : size = call_info.args[2].nnn.value dec_buf = bytearray() for i in range( size ) : enc_byte = idaapi.get_byte( addr + i ) key_byte = idaapi.get_byte( self.xorkey + ( i % self.keysize ) ) dec_buf.append( enc_byte ^ key_byte ) dec_buf = dec_buf.split( b'\0' )[0] try : dec_str = dec_buf.decode( 'latin-1' ) except UnicodeDecodeError : dec_str = f"Unprintable data: {dec_buf.hex()}" print( f"{self.curins.ea:#x} {addr:#x} {dec_str}" ) idaapi.set_cmt( addr, dec_str, 1 ) dec_buf.append( 0 ) off = Private_add_bytes_to_idb( self.seg["name"], self.seg["off"], bytes( dec_buf ) ) if idaapi.BADADDR == off : return 0 idaapi.create_strlit( self.seg["off"], len( dec_buf ), idaapi.STRTYPE_C ) self.addrdict[addr] \ = self.seg["off"] self.seg["off"] \ = off newaddr = idaapi.mop_addr_t() newaddr.make_gvar( self.addrdict[addr] ) call_info.args[1].a.assign( newaddr ) return 0 def find_pseudocode_view ( func_ea, auto_open=False ) : widget = idaapi.find_widget( "Pseudocode-A" ) if widget and idaapi.get_widget_type( widget ) == idaapi.BWN_PSEUDOCODE : vu = idaapi.get_widget_vdui( widget ) if vu and vu.cfunc.entry_ea == func_ea : return vu if auto_open : idaapi.open_pseudocode( func_ea, idaapi.OPF_REUSE ) return find_pseudocode_view( func_ea, False ) return None # # 因为main()、decompile_func()用同一套,只好放在全局变量中 # addrdict = {} seg = {} seg["name"] = ".patch" seg["off"] = PrivateGetSeg( seg["name"] ).start_ea def decompile_func ( func_ea ) : target = idaapi.get_name_ea( 0, "dec_conf" ) if target == idaapi.BADADDR : print( "Not found dec_conf()" ) return xorkey = 0x80cf3e8 keysize = 16 vu = find_pseudocode_view( func_ea, True ) if not vu : return mba = vu.cfunc.mba if not mba : return visitor = MopVisitor( target, xorkey, keysize, addrdict, seg ) mba.for_all_ops( visitor ) mba.verify( True ) # # false means to rebuild ctree without regenerating microcode # vu.refresh_view( False ) def decompile () : ea = idaapi.get_screen_ea() func = idaapi.get_func( ea ) if func : decompile_func( func.start_ea ) def main () : target = idaapi.get_name_ea( 0, "dec_conf" ) if target == idaapi.BADADDR : print( "Not found dec_conf()" ) return xorkey = 0x80cf3e8 keysize = 16 callerset = set() for ref in idautils.XrefsTo( target ) : f = idaapi.get_func( ref.frm ) if f and f.start_ea not in callerset : callerset.add( f.start_ea ) if not idaapi.is_code( idaapi.get_flags( f.start_ea ) ) : continue decompile_func( f.start_ea ) if "__main__" == __name__ : if '__idacode__' in globals() and __idacode__ : dbg.bp( __idacode__, 'Hit breakpoint' ) main() -------------------------------------------------------------------------- 为了影响F5伪码,visit_mop()不但需要得到明文字符串,还要在微码层面修改调用 dec_conf()时所传递的参数,原来传的指针指向密文,现将指针指向明文,代码中的 newaddr对应明文。 处理此需求的传统方式是静态Patch样本或者IDA数据库(some.idb)。本文演示另一种 技术方案,优劣另说。 在PrivateGetSeg()中创建名为".patch"的段,Private_add_bytes_to_idb()在 ".patch"中安置明文字符串。newaddr指向".patch"中适当位置,在微码层面使 newaddr成为dec_conf()的参数。 此次不能用gen_microcode()生成微码,需要找"Pseudocode-A"窗口对应的微码,并 对之修改,否则无法影响F5伪码。find_pseudocode_view()找窗口,找不到就打开一 个。对vu.cfunc.mba使用visit_mop(),最后需要vu.refresh_view(False)。 此法未持久化,并不推荐。比如在使用明文字符串的F5窗口再次手工F5,将恢复到密 文状态,因为此操作产生新的微码,且未被修改。可在IDAPython提示符中执行 decompile(),将再次看到使用明文字符串的F5伪码。 ☆ idaapi.Hexrays_Hooks示例 上一小节的办法不够正规,脱离明文字符串状态后,得手工执行decompile(),或重 新加载XorDDoS_analyse_e.py,太傻。本小节演示idaapi.Hexrays_Hooks技术,脚本 加载后,对微码的修改是持久化的,不会脱离明文字符串状态。 上一小节创建新段,在其中放置解密后的明文字符串。bluerust就此演示更好的技术 方案,不创建新段的情况下用明文字符串替换密文字符串。 XorDDoS_analyse_c_1.py -------------------------------------------------------------------------- #!/usr/bin/env python # -*- coding: cp936 -*- import sys, traceback import idaapi, idautils def catch ( func ) : def wrapper ( *args, **kwargs ) : try : return func( *args, **kwargs ) except Exception as e : traceback.print_exc() print( e ) return wrapper class MopVisitor ( idaapi.mop_visitor_t ) : def __init__ ( self, target, xorkey, keysize, addrdict ) : super().__init__() # self.mba = mba self.target = target self.xorkey = xorkey self.keysize = keysize self.addrdict = addrdict @catch def visit_mop ( self, op, var_type, is_target ) : del var_type, is_target if op.t != idaapi.mop_f : return 0 call_info = op.f if call_info.callee != self.target : return 0 if len( call_info.args ) < 3 or \ call_info.args[0].t != idaapi.mop_a or \ call_info.args[1].t != idaapi.mop_a or \ call_info.args[2].t != idaapi.mop_n : return 0 addr = call_info.args[1].a if call_info.args[1].a.t != idaapi.mop_v : return 0 addr = call_info.args[1].a.g if addr not in self.addrdict : size = call_info.args[2].nnn.value dec_buf = bytearray() for i in range( size ) : enc_byte = idaapi.get_byte( addr + i ) key_byte = idaapi.get_byte( self.xorkey + ( i % self.keysize ) ) dec_buf.append( enc_byte ^ key_byte ) dec_buf = dec_buf.split( b'\0' )[0] try : dec_str = dec_buf.decode( 'latin-1' ) except UnicodeDecodeError : dec_str = f"Unprintable data: {dec_buf.hex()}" print( f"{self.curins.ea:#x} {addr:#x} {dec_str}" ) idaapi.set_cmt( addr, dec_str, 1 ) # # 本方案每次F5都会产生新的mop,此处有必要每次都copy一下 # newmop = idaapi.mop_t() newmop.t = idaapi.mop_str newmop.cstr = dec_str newmop.size = call_info.args[1].size call_info.args[1].copy_mop( newmop ) return 0 class PrivateHexraysHooks ( idaapi.Hexrays_Hooks ) : def __init__ ( self, target, xorkey, keysize, addrdict ) : super().__init__() self.target = target self.xorkey = xorkey self.keysize = keysize self.addrdict = addrdict # # All calls have been analyzed. # # 每次F5都会触发此函数 # def calls_done ( self, *args ) : # # mba means "micro block array". This object contains the microcode # of the function being decompiled # mba = args[0] print( f"Hexrays_Hooks calls_done() triggered for function at {mba.entry_ea:#x}" ) visitor = MopVisitor( self.target, self.xorkey, self.keysize, self.addrdict ) mba.for_all_ops( visitor ) mba.verify( True ) return 0 HOOKS_CONTAINER = "Private Hooks" def install_hooks () : if HOOKS_CONTAINER in sys.modules : print( "Hooks are already installed." ) return target = idaapi.get_name_ea( 0, "dec_conf" ) xorkey = 0x80cf3e8 keysize = 16 addrdict = {} hooks = PrivateHexraysHooks( target, xorkey, keysize, addrdict ) if hooks.hook() : class HooksContainer : pass container = HooksContainer() container.hooks = hooks sys.modules[HOOKS_CONTAINER] \ = container print( "Hex-Rays hooks installed successfully and will persist." ) else : print( "Failed to install Hex-Rays hooks." ) def remove_hooks () : if HOOKS_CONTAINER in sys.modules : container = sys.modules[HOOKS_CONTAINER] container.hooks.unhook() del sys.modules[HOOKS_CONTAINER] print( "Hex-Rays hooks removed." ) else: print( "No active hooks found." ) def main () : target = idaapi.get_name_ea( 0, "dec_conf" ) if target == idaapi.BADADDR : print( "Not found dec_conf()" ) return install_hooks() # remove_hooks() # # 这是最简单粗暴的,若不想删除所有函数的F5缓存,换用后面那种 # # Flush all cached decompilation results. # # idaapi.clear_cached_cfuncs() # # 强制重新反编译taregt的各主调函数,与上一种办法任选其一 # callerset = set() for ref in idautils.XrefsTo( target ) : f = idaapi.get_func( ref.frm ) if f and f.start_ea not in callerset : callerset.add( f.start_ea ) if not idaapi.is_code( idaapi.get_flags( f.start_ea ) ) : continue idaapi.mark_cfunc_dirty( f.start_ea ) if "__main__" == __name__ : if '__idacode__' in globals() and __idacode__ : dbg.bp( __idacode__, 'Hit breakpoint' ) main() -------------------------------------------------------------------------- 安装Hexrays_Hooks,重载calls_done(),每次F5都会触发此函数,在其中对微码使 用visit_mop(),确保每次F5得到的都是修改过的微码,从而实现持久化。 ☆ 其他讨论 1) 优化器 bluerust展示过另一种称之为优化器的持久化技术,实现idaapi.optinsn_t的派生类, 重载func(),在其中触发visit_minsn(),继而触发visit_mop()。这种方案可能更正 规,好奇者不妨一试。 2) 增强阅读 可查看ida_hexrays.py中下列函数的注释,再问问AI refresh_view false means to rebuild ctree without regenerating microcode refresh_ctext refreshes the pseudocode window by regenerating its text from cfunc_t refresh_func_ctext refresh_pseudocode mark_lists_dirty mark_chains_dirty mark_cfunc_dirty 可以问AI这些问题: 什么情况下需要调用mark_chains_dirty?它的注释是 Mark the microcode use-def chains dirty. Call this function is any inter-block data dependencies got changed because of your modifications to the microcode. Failing to do so may cause an internal error. 什么是"use-def chains"? 什么是"inter-block data dependencies"? 什么情况下需要调用mark_lists_dirty? 3) 微码编程中的排错 IDAPython脚本修改微码,容易遭遇"内部错"。有时IDA直接闪退,有时IDA会弹框提 示错误,让你发送some.idb给官方,帮助他们发现潜在BUG。 bluerust建议修改微码后始终显式调用mba.verify(True),若微码一致性受损,将弹 框示警,比如: Internal error 50735 has occurred and further decompilation is impossible. SDK中有verifier源码,verify.cpp负责"Verify microcode consistency",可在其 中搜错误码,一般可直接找到错误原因,不必逆向。比如: -------------------------------------------------------------------------- if ( a.type.get_size() != a.size ) INTERR(50735); // argument size and its type size mismatch -------------------------------------------------------------------------- 最初XorDDoS_analyse_c_1.py未正确设置newmop.size,会报50735错。 类似,cverify.cpp负责"Verify ctree consistency",可在其中搜索相应错误码。 有些错误比较复杂,需要上调试器调一下verifier。