标题: liblzma后门疑似国家级APT

创建: 2024-03-29 09:00
更新: 2024-05-08 09:32
链接: https://scz.617.cn/unix/202403290900.txt

这是到2024.4.1为止，我看过的liblzma后门相关的技术文章，做了部分摘录。

--------------------------------------------------------------------------
backdoor in upstream xz/liblzma leading to ssh server compromise - Andres Freund & Florian Weimer [2024-03-29]
https://www.openwall.com/lists/oss-security/2024/03/29/4
(第一个发现者，微软工程师)
(含有丰富的初步技术细节)

To analyze I primarily used "perf record -e intel_pt//ub" to observe where
execution diverges between the backdoor being active and not. Then also
gdb, setting breakpoints before the divergence.
--------------------------------------------------------------------------
XZ Utils backdoor - Lasse Collin
https://tukaani.org/xz-backdoor/
(xz原作者Larhzu的一点说明)

Everything I Know About the XZ Backdoor - Evan Boehs [2024-03-29]
https://boehs.org/node/everything-i-know-about-the-xz-backdoor
(replaces safe_fprintf with an unsafe variant fprintf)
(梳理Jia Tan的来龙去脉)
(185.128.24.163 Singapore/Jia Cheong Tan)

FAQ on the xz-utils backdoor
https://gist.github.com/thesamesam/223949d5a074ebc3dce9ee78baad9e27
(IFUNC, a mechanism in glibc that allows for indirect function calls)
(diff build-to-host.m4)

xz/liblzma: Bash-stage Obfuscation Explained - Gynvael Coldwind [2024-03-30]
https://gynvael.coldwind.pl/?lang=en&id=782
(解释build-to-host.m4及bash脚本)
(用awk实现的RC4变体)
(获取liblzma_la-crc64-fast.o)

RC4 recognizer here - nugxperience
https://twitter.com/nugxperience/status/1773906926503591970

第二篇解释build-to-host.m4及bash脚本
https://pastebin.com/5gnnL2yT

第三篇解释build-to-host.m4及bash脚本 - Jonathan Schleifer [2024-03-30]
https://github.com/Midar/xz-backdoor-documentation/wiki

一张关于liblzma后门的总览图 - Thomas Roccia (@fr0gger_)
https://twitter.com/fr0gger_/status/1774342248437813525

XZ Backdoor: Times, damned times, and scams - Rhea Karty & Simon Henniger [2024-03-30]
https://rheaeve.substack.com/p/xz-backdoor-times-damned-times-and
(从时区角度看liblzma后门)
(作者认为Jia Tan试图让人判定他是中国人，但作者认为Jia更可能是在UTC+02/03工作)
(评论区有不同意见)

Backdoor in XZ Utils allows RCE: everything you need to know - Merav Bar, Amitai Cohen, Danielle Aminov [2024-03-30]
https://www.wiz.io/blog/cve-2024-3094-critical-rce-vulnerability-found-in-xz-utils
(浑水摸鱼之作，标题党)
--------------------------------------------------------------------------
CVE-2024-3094 XZ Backdoor: All you need to know - Shachar Menashe, Jonathan Sar Shalom, Brian Moussalli [2024-03-31]
https://jfrog.com/blog/xz-backdoor-attack-cve-2024-3094-all-you-need-to-know/
(Timeline of the attack)

Another possible workaround is to take advantage of the backdoor's "kill
switch". Adding the following string to /etc/environment will disable the
malicious backdoor functionality (applies after restarting SSH and Systemd)

yolAbejyiejuvnup=Evjtgvsh5okmkAvj

The payload hooks the RSA_public_decrypt function, a function originally
used for validating RSA signatures. The malicious hook code examines the
RSA public modulus ("N" value) passed inside the RSA struct (4th argument
of RSA_public_decrypt). Note that this modulus is completely controlled by
the connecting SSH client (in our case, the attackers).

The malicious hook code examines the first 16 bytes of the "N" value,
which are used in a simple calculation to derive a "Command Number"
between 0 and 3. The command number sets the backdoor's current operation:

Command 0x00    Unknown
Command 0x01    SSH authentication bypass
Command 0x02    Execute shell command
Command 0x03    Execute shell command with specified UID/GID

The malicious hook code then decrypts the last 240 bytes of the "N" value
using the ChaCha20 symmetric stream cipher, with a hardcoded decryption
key:

0a 31 fd 3b 2f 1f c6 92 92 68 32 52 c8 c1 ac 28
34 d1 f2 c9 75 c4 76 5e b1 f6 88 58 88 93 3e 48

Since this is a symmetric, hardcoded key it can be used to decrypt network
captures of real-world attack attempts to understand which commands were
sent from the attacker to the victim.

The decrypted data contains 114 bytes of signature which are checked for
validity by using the Ed448 asymmetric elliptic curve signing algorithm,
specifically using the following Ed448 public key:

0a 31 fd 3b 2f 1f c6 92 92 68 32 52 c8 c1 ac 28
34 d1 f2 c9 75 c4 76 5e b1 f6 88 58 88 93 3e 48
10 0c b0 6c 3a be 14 ee 89 55 d2 45 00 c7 7f 6e
20 d3 2c 60 2b 2c 6d 31 00

While the public key is well-known, only the attackers have the
corresponding Ed448 private signing key, ensuring that only the attackers
can generate valid payloads for the backdoor. Furthermore, the signature
is bound to the host's public key, meaning that a valid signature for one
host cannot be reused on a different host.

If the signature was verified as valid, the backdoor uses the bytes
directly following the signature bytes as command-specific payload data.
For example in command 0x02, the payload bytes contain A NULL-terminated
shell command string (ex. cat /etc/shadow), that is directly passed to
system().

If the data is invalid in any way (malformed payload, invalid signature),
the original implementation of RSA_public_decrypt is resumed in a
transparent manner. This means the detection of vulnerable machines over
the network may be impossible for anyone besides the attackers.

The sophisticated nature of this attack and the use of highly future proof
crypto algorithms (Ed448 vs the more standard Ed25519) led many to believe
that the attack may be a nation-state level cyberattack.
--------------------------------------------------------------------------
It's RCE, not auth bypass, and gated/unreplayable - Filippo Valsorda [2024-03-31]
https://bsky.app/profile/filippo.abyssdomain.expert/post/3kowjkx2njy2b

The hooked RSA_public_decrypt verifies a signature on the server's host
key by a fixed Ed448 key, and then passes a payload to system(). The
payload is extracted from the N value (the public key) passed to
RSA_public_decrypt, checked against a simple fingerprint, and decrypted
with a fixed ChaCha20 key before the Ed448 signature verification.
RSA_public_decrypt is a (weirdly named) signature verification function.
Why "decrypt"? RSA sig verification is the same op of RSA encryption.
The RSA_public_decrypt public key can be attacker-controlled pre-auth by
using OpenSSH certificates. OpenSSH certs are weird in that they include
the signer's public key. OpenSSH checks the signature on parsing. Here's a
script by Keegan Ryan for sending a custom public key in a certificate,
which on a backdoored system will reach the hooked function.

modify_ssh_rsa_pubkey.py
https://gist.github.com/keeganryan/a6c22e1045e67c17e88a606dfdf95ae4

Apparently the backdoor reverts back to regular operation if the payload
is malformed or the signature from the attacker's key doesn't verify.
Unfortunately, this means that unless a bug is found, we can't write a
reliable/reusable over-the-network scanner. To clarify, by "gated" I mean
it takes the attacker's private key to use the backdoor (it's NOBUS); by
"unreplayable" I mean that even if we observe an attack against one host,
we can't reuse it against another host (the attacker's signature is bound
to the host public key, but not to the command).
--------------------------------------------------------------------------
Information about the liblzma (xz-utils) backdoor - karcherm [2024-03-31]
https://github.com/karcherm/xz-malware
(Stuff discovered while analyzing the malware hidden in xz-utils 5.6.0 and 5.6.1)
(从.o中还原了字符串)

I am a reverse engineer, and tried some static analysis on that code. One
key feature is that the code does not contain any ASCII strings, neither
in clear text nor in obfuscated form. Instead, it recognizes all relevant
strings using one single deterministic finite automaton, a technique
commonly used to search for terms given by regular expressions.

I wrote a script that decodes the tables for the table-driven DFA and
outputs the strings recognized by it accompanied with the "ID" assigned to
the terminal accepting state that represents that string.
--------------------------------------------------------------------------
XZ Backdoor Analysis and symbol mapping - smx-smx
https://gist.github.com/smx-smx/a6112d54777845d389bd7126d6e9f504
(巨NB的逆向工程，解释5.6.0的.o中那些符号实际是啥意思)
(Retrieves the index of the encoded string given the plaintext string in memory)

XZ backdoor reverse engineering - Stefano Moioli <smxdev4@gmail.com>
https://github.com/smx-smx/xzre

From 0x指纹(5845952017)

xzre.h里面有liblzma_la-crc64-fast.o相关函数声明、结构体声明和大量注释说明
等等，整个项目差不多算是作者把自己逆向分析的idb巨详细地分享出来了。
--------------------------------------------------------------------------
xz-utils后门漏洞CVE-2024-3094分析 - 360漏洞研究院 [2024-04-01]
https://mp.weixin.qq.com/s/4ju-aG027mFJC2r9HbCaPQ

Exploration of the xz backdoor (CVE-2024-3094) - [2024-04-01]
https://github.com/amlweems/xzbot
(notes, honeypot, and exploit demo for CVE-2024-3094)
(RCE PoC，要Patch签名的key)

An ssh honeypot with the XZ backdoor - [2024-04-02]
https://github.com/lockness-Ko/xz-vulnerable-honeypot

xz/liblzma后门恶意代码注入方式分析 - Lenny Wang, UID(2045181921) [2024-04-03]
https://lennysec.github.io/xz-backdoor-code-injection-analysis/
(解释第一步Hook)
--------------------------------------------------------------------------
the xz sshd backdoor rabbithole - [2024-04-07]
https://twitter.com/bl4sty/status/1776691497506623562
(除了RCE，也有登录认证绕过)

whoever designed this stuff had to take a deep dive into openSSH(d)
internals.
--------------------------------------------------------------------------
XZ backdoor story Initial analysis - GReAT (Global Research & Analysis Team, Kaspersky Lab) [2024-04-12]
https://securelist.com/xz-backdoor-story-part-1/112354/
(卡巴斯基的分析，展示恶意代码Hook细节)

https://en.wikipedia.org/wiki/Trie
(digital tree or prefix tree)

rtld-audit - auditing API for the dynamic linker
https://www.man7.org/linux/man-pages/man7/rtld-audit.7.html

Backdoored source distributions

xz-5.6.0
MD5 c518d573a716b2b2bc2413e6c9b5dbde
SHA1    e7bbec6f99b6b06c46420d4b6e5b6daa86948d3b
SHA256  0f5c81f14171b74fcc9777d302304d964e63ffc2d7b634ef023a7249d9b5d875

xz-5.6.1
MD5     5aeddab53ee2cbd694f901a080f84bf1
SHA1    675fd58f48dba5eceaf8bfc259d0ea1aab7ad0a7
SHA256  2398f4a8e53345325f44bdd9f0cc7401bd9025d736c6d43b372f4dea77bf75b8

https://www.virustotal.com/gui/file/<SHA256>/detection
https://s.threatbook.com/report/file/<SHA256>

One of the distinctive features of the backdoor is the use of a single
trie structure for string operations. Instead of directly comparing
strings or using string hashes to match a particular constant (for example,
the name of a library function), the code performs a trie lookup, and
checks if the result is equal to a certain constant number.

The primary objective of the backdoor initialization is to successfully
hook functions. To do so, the backdoor makes use of rtdl-audit, a feature
of the dynamic linker that enables the creation of custom shared libraries
to be notified when certain events occur within the linker, such as symbol
resolution.
--------------------------------------------------------------------------

看完所有liblzma后门的技术分析，有些感慨。Hook就位之后的逆向工程已经被一些
天才的安全人员完成并公布，攻击方与分析方都巨NB。

作为围观群众，我在等待分析方展示一下，Hook是怎么安装上去的，就是说.so被加
载后，怎么就call到那个.o里去了，我问的是第一次。换句话说，是不是与IFUNC相
关，究竟怎么完成第一步Hook？(已解决)

就目前展现出来的状态看，疑似国家级APT，不是一般的NB。但是，架不住全世界的
顶级安全人员分析一个死样本啊。能分析清楚那个.o的，都是顶级水准，深表佩服。

必须感谢 Andres Freund & Florian Weimer，若非他们发现并披露了liblzma后门，
灾难性的后果不敢想像。某种意义上，大伙儿这次运气不错，在真核弹级安全灾难
出现前被摁住了。可，我们能总是这么运气不错吗？

从攻击方看，名字什么的隐隐指向东方，但个人不负责任地胡说两句，不像。咱就不
说技术水平、细致程度这些方面，只说一个「隐忍」。这种「前人种树、后人乘凉、
功成不必在我、风物长宜放眼量」的远见卓识及具体实施，放在1949年之前的心怀信
仰的那个群体中，我信；放在当下东方体制内网络安全团队身上，说实话，我不大信。
你要问我凭什么这么说，我还真不敢回答你。但我相信，会有不少我并不认识的同行
认可我的观点。