标题: "mov edx,0x01234567"导致rdx高32-bits清零

创建: 2017-08-14 17:05
链接: https://scz.617.cn/misc/201708141705.txt

起因是我想手工测试如下代码片段:

mov edx,0x01234567
neg rdx
shl rdx,2

> .dvalloc 0x1000
Allocated 1000 bytes starting at 00000000`00060000
> r $t0=0`00060000
> !vprot @$t0
BaseAddress:       0000000000060000
AllocationBase:    0000000000060000
AllocationProtect: 00000040  PAGE_EXECUTE_READWRITE
RegionSize:        0000000000001000
State:             00001000  MEM_COMMIT
Protect:           00000040  PAGE_EXECUTE_READWRITE
Type:              00020000  MEM_PRIVATE
> eb @$t0 ba 67 45 23 01 48 f7 da 48 c1 e2 02
> u @$t0 l 3
00000000`00060000 ba67452301      mov     edx,1234567h
00000000`00060005 48f7da          neg     rdx
00000000`00060008 48c1e202        shl     rdx,2
> r rip=@$t0
> r rdx=0
> p 3
> r rdx
rdx=fffffffffb72ea64

> r rip=@$t0
> r rdx=0xffffffff00000000
> p 3
> r rdx
rdx=fffffffffb72ea64

意外发现，无论rdx初值是多少，最后结果都是0xfffffffffb72ea64。单步跟踪发现
"mov edx,0x01234567"导致rdx高32-bits清零，同时"mov dl,1"不影响rdx的高32+24
位。这颠覆了我多年的32位汇编经验，一度怀疑自己犯了什么低级错误。

请教hume，他指出:

《Intel 64 and IA-32 Architectures Software Developer Manual: Vol 1》
https://www.intel.com/content/dam/www/public/us/en/documents/manuals/64-ia-32-architectures-software-developer-vol-1-manual.pdf

"3.4.1.1 General-Purpose Registers in 64-Bit Mode"小节有如下内容:

--------------------------------------------------------------------------
When in 64-bit mode, operand size determines the number of valid bits in
the destination general-purpose register:

64-bit operands generate a 64-bit result in the destination
general-purpose register.

32-bit operands generate a 32-bit result, zero-extended to a 64-bit result
in the destination general-purpose register.

8-bit and 16-bit operands generate an 8-bit or 16-bit result. The upper 56
bits or 48 bits (respectively) of the destination general-purpose register
are not modified by the operation. If the result of an 8-bit or 16-bit
operation is intended for 64-bit address calculation, explicitly
sign-extend the register to the full 64-bits.
--------------------------------------------------------------------------

另，bluerust找到:

Why do most x64 instructions zero the upper part of a 32 bit register
https://stackoverflow.com/questions/11177137/why-do-most-x64-instructions-zero-the-upper-part-of-a-32-bit-register

节录其最佳答案如下:

--------------------------------------------------------------------------
I'm not AMD or speaking for them, but I would have done it the same way.
Because zeroing the high half doesn't create a dependency on the previous
value, that the cpu would have to wait on. The register renaming mechanism
would essentially be defeated if it wasn't done that way. This way you can
write fast 32bit code in 64bit mode without having to explicitly break
dependencies all the time. Without this behaviour, every single 32bit
instruction in 64bit mode would have to wait on something that happened
before, even though that high part would almost never be used.

The behaviour for 16bit instructions is the strange one. The dependency
madness is one of the reasons that 16bit instructions are avoided now.
--------------------------------------------------------------------------

好吧，对64位汇编完全不熟，没有折腾过，今天这个坑对我来说太大了。我只是看一
段IDA的反汇编，读来读去觉得逻辑不自洽，完全没想到有这种坑。