标题: IDA插件epanos(MIPS to C decompiler)简介 创建: 2014-11-25 17:10 更新: 链接: https://scz.617.cn/misc/201411251710.txt https://github.com/drvink/electroportis ElectroPortis是ElectroPaint(一个IRIX上的屏保程序)的Windows版本,它不是后者 的重新实现,也不是手工逆向工程的产物,它是一个定制版MIPS反编译器的杰作。 drvink用他开发的MIPS反编译器处理了ElectroPaint,得到一份C代码,编译成Windows 版本。 https://github.com/drvink/epanos epanos是一款IDA插件,通过IDAPython加载执行,号称可以将MIPS汇编反编译成C代 码,目前Hex-Rays没有这种功能。作者好像是日本人,因为他说自己在东京上班。 下载源码,假设存放在"Z:\epanos"。它依赖另外三个Python库: -------------------------------------------------------------------------- pycparser http://github.com/eliben/pycparser flufl.enum http://pythonhosted.org/flufl.enum/ pyc-fmtstr-parser http://github.com/drvink/pyc-fmtstr-parser -------------------------------------------------------------------------- 它们的安装都一样: setup.py build setup.py install 用IDA 6.6反编译XXXXX的xxx,Alt-F7加载epanos.py,一堆异常: -------------------------------------------------------------------------- Traceback (most recent call last): File "Z:\IDA 6.6\python\idaapi.py", line 601, in IDAPython_ExecScript execfile(script, g) File "Z:/epanos/epanos.py", line 86, in print run(decompile=True) File "Z:/epanos/epanos.py", line 67, in run return decomp.run(externs, text_fns, cpp_filter, cpp_all, decompile) File "Z:/epanos\decomp\c\gen.py", line 110, in run data_txt = data.get_data(data_segs, cpp_filter) File "Z:/epanos\decomp\data.py", line 123, in get_data cdecl.get_decls(declstrs, cpp_in).decls[utils.cpp_decomp_tag]) File "Z:/epanos\decomp\c\decl.py", line 98, in get_decls decls = cpp.preprocess('%s\n%s' % (cpp_in, decls)) File "Z:/epanos\decomp\c\cpp.py", line 49, in preprocess ('Original error: %s' % e)) RuntimeError: Unable to invoke 'cpp'. Make sure its path was passed correctly Original error: [Error 2] -------------------------------------------------------------------------- 找不到cpp.exe。参看"https://github.com/eliben/pycparser"了解更多关于cpp的 解释,pycparser\utils\目录下有一个cpp.exe。检查epanos.py中的run(),变量 pycparser_dir被赋成"%HOMEPATH%\local\pycparser\utils",在Windows上这个位置 绝不可能找到cpp.exe。修改pycparser_dir成cpp.exe所在目录,比如: Z:\pycparser\utils -------------------------------------------------------------------------- Traceback (most recent call last): File "Z:\IDA 6.6\python\idaapi.py", line 601, in IDAPython_ExecScript execfile(script, g) File "Z:/epanos/epanos.py", line 87, in print run(decompile=True) File "Z:/epanos/epanos.py", line 68, in run return decomp.run(externs, text_fns, cpp_filter, cpp_all, decompile) File "Z:/epanos\decomp\c\gen.py", line 128, in run (lib_fns, tds) = data.get_fns_and_types(fn_segs, externs, cpp_all) File "Z:/epanos\decomp\data.py", line 145, in get_fns_and_types wanted_fns): File "Z:/epanos\decomp\utils.py", line 39, in zip_prefixes raise BugError('lists vary in length or unable to match all items') BugError: lists vary in length or unable to match all items -------------------------------------------------------------------------- 需要修改decomp\utils.py中的zip_prefixes(),它原来是: -------------------------------------------------------------------------- return [(max([s for s in shorts if l.startswith(s)]),l) for l in longs] -------------------------------------------------------------------------- 不知什么原因导致max()的形参为[],抛出异常。可以改成: -------------------------------------------------------------------------- def zip_prefixes(shorts, longs): xxx = [] for l in longs : yyy = [s for s in shorts if l.startswith(s)] if yyy : xxx.append( ( max( yyy ), l ) ) return xxx -------------------------------------------------------------------------- 抛出新异常: -------------------------------------------------------------------------- Traceback (most recent call last): File "Z:\IDA 6.6\python\idaapi.py", line 601, in IDAPython_ExecScript execfile(script, g) File "Z:/epanos/epanos.py", line 87, in print run(decompile=True) File "Z:/epanos/epanos.py", line 68, in run return decomp.run(externs, text_fns, cpp_filter, cpp_all, decompile) File "Z:/epanos\decomp\c\gen.py", line 132, in run STKVAR_MAP = data.get_stkvars(our_fns) File "Z:/epanos\decomp\data.py", line 173, in get_stkvars return reduce(make_input, fns, {}) File "Z:/epanos\decomp\data.py", line 166, in make_input var_map = ida.get_stkvar_map(ida.loc_by_name(fn)) File "Z:/epanos\decomp\ida.py", line 169, in get_stkvar_map return reduce(make_map, idautils.StructMembers(frame.id), {}) AttributeError: 'NoneType' object has no attribute 'id' -------------------------------------------------------------------------- 这个异常的起因未明,追溯到epanos.py中的run(),修改成: -------------------------------------------------------------------------- text_fns = frozenset(['main']) -------------------------------------------------------------------------- 原来的text_fns被赋了一堆函数名,除了"main"之外,其他函数不知是哪儿冒出来的, 像是一种Hacking。 抛出新异常: -------------------------------------------------------------------------- Traceback (most recent call last): File "Z:\IDA 6.6\python\idaapi.py", line 601, in IDAPython_ExecScript execfile(script, g) File "Z:/epanos/epanos.py", line 90, in print run(decompile=True) File "Z:/epanos/epanos.py", line 71, in run return decomp.run(externs, text_fns, cpp_filter, cpp_all, decompile) File "Z:/epanos\decomp\c\gen.py", line 142, in run for decl in protos))) File "Z:/epanos\decomp\c\gen.py", line 142, in for decl in protos))) File "Z:/epanos\decomp\c\gen.py", line 66, in generate c_for_insn(start_ea, our_fns, extern_reg_map, stkvars)] File "Z:/epanos\decomp\c\gen.py", line 49, in c_for_insn ea, our_fns, extern_reg_map, stkvars, from_delay=False) File "Z:/epanos\decomp\cpu\mips\gen.py", line 434, in fmt_insn delayed = extern_call(callee, sig, mnem, ea) File "Z:/epanos\decomp\cpu\mips\gen.py", line 190, in extern_call va_arg = data.get_arg_for_va_function(callee, ea) File "Z:/epanos\decomp\cpu\mips\data.py", line 102, in get_arg_for_va_function rd = ida.get_op(ea, 0) File "Z:/epanos\decomp\ida.py", line 237, in get_op opd = cmd[op] File "Z:\IDA 6.6\python\idaapi.py", line 50036, in __getitem__ return self.Operands[idx] IndexError: list index out of range -------------------------------------------------------------------------- 修改decomp\ida.py中的get_op(),在函数入口加一句: -------------------------------------------------------------------------- print "ea=0x%08X" % ea -------------------------------------------------------------------------- 确认抛出异常时的地址: ea=0x00404CA0 这是一条位于延迟插槽的nop指令。 -------------------------------------------------------------------------- 00404C58 00 40 C8 21 move $t9, $v0 00404C5C 03 20 F8 09 jalr $t9 ; strcmp 00404C60 00 00 00 00 nop 00404C64 8F DC 00 10 lw $gp, 0xE8+var_D8($fp) 00404C68 14 40 00 74 bnez $v0, loc_404E3C 00404C6C 00 00 00 00 nop 00404C70 3C 02 00 43 lui $v0, 0x43 00404C74 24 50 F1 08 addiu $s0, $v0, (aS_ - 0x430000) # "\r\n%s.\n" 00404C78 8F 82 82 F4 la $v0, XXX_XXX_XXXXXXXXX 00404C7C 00 40 C8 21 move $t9, $v0 00404C80 03 20 F8 09 jalr $t9 ; XXX_XXX_XXXXXXXXX 00404C84 00 00 00 00 nop 00404C88 8F DC 00 10 lw $gp, 0xE8+var_D8($fp) 00404C8C 02 00 20 21 move $a0, $s0 # format 00404C90 00 40 28 21 move $a1, $v0 00404C94 8F 82 80 CC la $v0, printf 00404C98 00 40 C8 21 move $t9, $v0 00404C9C 03 20 F8 09 jalr $t9 ; printf 00404CA0 00 00 00 00 nop -------------------------------------------------------------------------- ea=0x00404C5C // 同样是jalr指令,后面的nop指令被越过去了 ea=0x00404C64 ea=0x00404C64 ea=0x00404C68 ea=0x00404C68 ea=0x00404C70 ea=0x00404C70 ea=0x00404C74 ea=0x00404C74 ea=0x00404C74 ea=0x00404C78 ea=0x00404C78 ea=0x00404C7C ea=0x00404C7C ea=0x00404C80 // 同样是jalr指令,后面的nop指令被越过去了 ea=0x00404C88 ea=0x00404C88 ea=0x00404C8C ea=0x00404C8C ea=0x00404C90 ea=0x00404C90 ea=0x00404C94 ea=0x00404C94 ea=0x00404C98 ea=0x00404C98 ea=0x00404C9C // 这条jalr指令的延迟插槽未被越过,printf()是变参函数 ea=0x00404CA0 // 延迟插槽 -------------------------------------------------------------------------- 检查decomp\cpu\mips\data.py中的get_arg_for_va_function(),这个函数有很多问 题。它试图从延迟插槽开始向低址方向寻找fmtstr。 检查decomp\ida.py中的get_op(),它调用idautils.DecodeInstruction()对指定地 址的指令进行解码。可以在给cmd赋值之后加一条代码: print "0x%08X | %s" % ( ea, cmd.get_canon_mnem() ) get_operands()靠idaapi.o_void区分有效操作数,并返回由它们组成的list;如果 当前指令(操作码)是nop,返回空list([])。get_op()的第二形参是前述list的下标, 返回指定位置的操作数。该函数假设肯定有操作数,如果碰上没有操作数的情形,就 会触发"list index out of range"。 get_op()的主调函数get_arg_for_va_function()没有考虑遭遇nop指令的情形,估计 drvink反编译IRIX屏保时通篇就没有nop指令出现。修改get_op(),没有操作数时返 回None,而不是抛出异常: -------------------------------------------------------------------------- def get_op ( ea, op, stkvars=None ) : cmd = idautils.DecodeInstruction( ea ) cmd.Operands = get_operands( cmd ) if not cmd.Operands : return None ... -------------------------------------------------------------------------- 修改get_arg_for_va_function(),检查get_op()的返回值: -------------------------------------------------------------------------- # # 取目标寄存器 # rd = ida.get_op( ea, 0 ) if rd is not None and rd.val == wanted_reg : -------------------------------------------------------------------------- 另一种不太好的修改是: -------------------------------------------------------------------------- if "nop" == ida.get_mnem( ea ) : ea = ida.prev_head( ea ) continue rd = ida.get_op( ea, 0 ) -------------------------------------------------------------------------- get_arg_for_va_function()假设fmtstr的地址直接赋给$a0或$a1,但0x00404C94处 printf()所用fmtstr是先赋给$s0,再从$s0赋给$a0。解析到0x00404C8C时, ida.get_opvals()返回由2个操作数组成的list,这两个操作数都是寄存器(o_reg), 对于"s = ida.get_string( opvals[-1].target )"来说,target等于0。 ida.get_string()实际调的是idc.GetString(),后者的帮助信息: https://www.hex-rays.com/products/ida/support/idadoc/1516.shtml https://www.hex-rays.com/products/ida/support/idadoc/283.shtml 函数原型是: string GetString ( long ea, long len, long type ); 在IDA里测试: Message( GetString( 0x00430000, -1, 0 ) ); 确实需要3个形参,为啥ida.get_string()只传了第一形参,难道后两个形参有缺省 值?先不管这个,ida.get_string( opvals[-1].target )本意是获取fmtstr,结果 返回None。get_arg_for_va_function()继续向低址方向无谓地寻找fmtstr,抛出新 异常: -------------------------------------------------------------------------- Traceback (most recent call last): File "Z:\IDA 6.6\python\idaapi.py", line 601, in IDAPython_ExecScript execfile(script, g) File "Z:/epanos/epanos.py", line 90, in print run(decompile=True) File "Z:/epanos/epanos.py", line 71, in run return decomp.run(externs, text_fns, cpp_filter, cpp_all, decompile) File "Z:/epanos\decomp\c\gen.py", line 142, in run for decl in protos))) File "Z:/epanos\decomp\c\gen.py", line 142, in for decl in protos))) File "Z:/epanos\decomp\c\gen.py", line 66, in generate c_for_insn(start_ea, our_fns, extern_reg_map, stkvars)] File "Z:/epanos\decomp\c\gen.py", line 49, in c_for_insn ea, our_fns, extern_reg_map, stkvars, from_delay=False) File "Z:/epanos\decomp\cpu\mips\gen.py", line 434, in fmt_insn delayed = extern_call(callee, sig, mnem, ea) File "Z:/epanos\decomp\cpu\mips\gen.py", line 190, in extern_call va_arg = data.get_arg_for_va_function(callee, ea) File "Z:/epanos\decomp\cpu\mips\data.py", line 100, in get_arg_for_va_function 'between %s..%s' % (ida.atoa(ea), ida.atoa(start_ea))) VarargsError: encountered branch/jump while looking for varargs argument between .text:404C80...text:404C9C -------------------------------------------------------------------------- 意思是在0x00404C80处碰到流程转移指令(jalr),寻找fmtstr失败,无法继续。一路 辛辛苦苦修改至此,发现epanos所谓的反编译针对性极强,碰上稍微复杂一些的编译 (优化)结果,就没辙了。其实这个问题就是MIPS的交叉引用问题,只能针对特定模式 编写特定解析代码。结论,epanos是个美好的PoC,真正想用起来,需要做大量修改。 多年不写IDAPython、IDC脚本,有些帮助信息都找不到了(怀念与hume一起战斗的日 子),重新找一些: -------------------------------------------------------------------------- Alphabetical list of IDC functions https://www.hex-rays.com/products/ida/support/idadoc/162.shtml Module idautils https://www.hex-rays.com/products/ida/support/idapython_docs/idautils-module.html Class insn_t https://www.hex-rays.com/products/ida/support/idapython_docs/idaapi.insn_t-class.html Class op_t https://www.hex-rays.com/products/ida/support/idapython_docs/idaapi.op_t-class.html https://www.hex-rays.com/products/ida/support/idadoc/index.shtml https://www.hex-rays.com/products/ida/support/idapython_docs/ -------------------------------------------------------------------------- 更新了idahelper_20141125.py,主要使用dump_here()。 编写hello_mips.c如下: -------------------------------------------------------------------------- /* * gcc-2.95 -Wall -pipe -O2 -s -o hello_mips hello_mips.c */ #include #include int main ( int argc, char * argv[] ) { printf( "hello world\n" ); return( EXIT_SUCCESS ); } -------------------------------------------------------------------------- epanos抛出异常: -------------------------------------------------------------------------- Traceback (most recent call last): File "Z:\IDA 6.6\python\idaapi.py", line 601, in IDAPython_ExecScript execfile(script, g) File "Z:/epanos/epanos.py", line 90, in print run(decompile=True) File "Z:/epanos/epanos.py", line 71, in run return decomp.run(externs, text_fns, cpp_filter, cpp_all, decompile) File "Z:/epanos\decomp\c\gen.py", line 110, in run data_txt = data.get_data(data_segs, cpp_filter) File "Z:/epanos\decomp\data.py", line 123, in get_data cdecl.get_decls(declstrs, cpp_in).decls[utils.cpp_decomp_tag]) KeyError: '' -------------------------------------------------------------------------- 这个异常我就完全看不懂了,不知所云。此时utils.cpp_decomp_tag等于""。 写信问问drvink,撞个大运。 按lyx的观点,这类反编译器产生的C代码其实很低级,还不如直接看汇编。从ep.c以 及前面的分析看,似乎lyx说的是对的。 https://github.com/drvink/electroportis/tree/master/ElectroPortis/src https://github.com/drvink/electroportis/blob/master/ElectroPortis/src/ep.c