PyTorch is one of the most popular and widely used machine learning toolkits out there.
(We won’t be drawn on where it ranks on the artificial intelligence leaderboard – as with many open source tools widely used in a competitive field, the answer seems to depend on who you ask, and which team tools they use. themselves.)
Originally developed and released as an open source project by Facebook, now Meta, the software was handed over to the Linux Foundation in late 2022, which now runs it under the auspices of the PyTorch Foundation.
Unfortunately, the project was compromised by a supply chain attack during the holiday season at the end of 2022, between Christmas Day [2022-12-25] and the day before New Year’s Eve [2022-12-30].
The attackers created a Python package called
torchtriton on PyPI, the popular Python Package Index repository.
torchtriton It was chosen to match a package name in the PyTorch system itself, leading to a dangerous situation as explained by the PyTorch team (our emphasis) as follows:
[A] malicious dependency package (
torchtriton) […] is uploaded to the Python Package Index (PyPI) code repository with the same package name as the one we send to the PyTorch nightly package index. Since the PyPI index has priority, this malicious package was being installed instead of the version from our official repository. This design enables one to register a package under the same name as in a third-party index, and
pipthey will install their version by default.
pipby the way, used to be called
pyinstalland it seems to be a recursive joke that is short
pip installs packages. Despite its original name, it is not intended to install Python itself – it is the standard way for Python users to manage software libraries and applications written in Python, such as PyTorch and many other popular tools.
Pwned by a supply chain trick
Anyone who was unfortunate enough to install the pwned version of PyTorch during the threat period almost certainly ended up with data-stealing malware implanted on their computer.
According to PyTorch’s own brief but useful analysis of the malware, the attackers stole some, most or all of the following significant data from infected systems:
- System information, including hostname, username, known users on the system, and the contents of all system environment variables. Environment variables are a way to provide memory-only input data that programs can access when they start, often including data that is not intended to be saved to disk, such as cryptographic keys and authentication tokens that access to cloud based services. Removed from the list of known users
/etc/passwdwhich, unfortunately, have no passwords or password hashes.
- Your local Git configuration. This is stolen from
$HOME/.gitconfigand generally contains useful information for the personal setup of anyone using the popular Git source code management system.
- Your SSH keys. These are stolen from the directory
$HOME/.ssh. SSH keys typically contain the private keys used to connect securely using SSH (secure shell) or SCP (secure copy) to other servers on your own networks or in the cloud. Many developers keep at least some of their private keys unencrypted, so that scripts and software tools they use can automatically connect to remote systems without having to ask for a password or hardware security key every time.
- The next 1000 files in your home directory were less than 100 kilobytes in size. The PyTorch malware report does not say how the “first 1000 file list” is calculated. The content and order of file lists depends on whether the list is in alphabetical order; whether subdirectories are visited before, during or after processing the files in any directory; whether hidden files are included; and whether any randomness is used in the code that walks its way through the directories. You should probably assume that any file below the size threshold could be stolen.
At this point, we will mention the good news: only those who received the “nightly”, or trial, version of the software were at risk. (The name “nightly” comes from the fact that it is the latest build, which is usually created automatically at the end of each working day.)
Most PyTorch users will probably stick with it the so-called “stable” version, which was not affected by this attack.
Also, from PyTorch’s report, it appears to be the Triton malware executable specifically targeted 64-bit Linux environments.
We are therefore assuming that this malware would only run on Windows computers if the Windows Subsystem for Linux (WSL) was installed.
Don’t forget, though, that the people most likely to install regular “nightlies” include developers of PyTorch itself or applications that use it – possibly including your own in-house developers, who may have private key access to a build corporate. , test and production servers.
Steal DNS data
Interestingly, the Triton malware does not de-filter its data (the military jargon term the cyber security industry likes to use instead. steal or copied illegally) using HTTP, HTTPS, SSH, or any other high-level protocol.
Instead, it encrypts and encodes the data it wants to steal into a sequence that looks like “server names” associated with a domain name controlled by the criminals.
This means that by performing a sequence of DNS lookups, the crooks can extract a small amount of data in each fake request.
This is the same kind of trick used by Log4Shell hackers at the end of 2021, who released encryption keys by doing DNS lookups for “servers” with “names” that just happened to be the value of the AWS secret access key, obtained from a variable memorial environment.
So what seemed like an innocent, if pointless, DNS lookup for “server” as
S3CR3TPA55W0RD.DODGY.EXAMPLE quietly leaked an access key under the guise of a simple lookup pointing to the official DNS server listed for
LOG4SHELL LIVE DEMO EXPLAINING DATA EXCEPTION THROUGH DNS
If you can’t read the text clearly here, try using Full Screen mode, or watch directly on YouTube.
Click the button in the video player to speed up playback or turn on subtitles.
If the domain belongs to the crooks
DODGY.EXAMPLEthey get to tell the world which DNS server to connect to when doing those lookups.
More importantly, even networks that strictly filter TCP-based network connections using HTTP, SSH and other high-level data sharing protocols…
…sometimes don’t filter UDP-based network connections used for DNS lookups at all.
The only downside to the crooks is that DNS requests have a relatively limited volume.
Individual server names are limited to 64 characters from a set of 37 (AZ, 0-9 and the dash or hyphen), and many networks limit individual DNS packets, including all enclosed requests, headers, and metadata, exactly 512 bytes each.
We’re guessing that’s because the malware in this case started by following your private keys, and then limited itself to a maximum of 1000 files, each less than 100,000 bytes.
That way, the crooks get to steal plenty of private data, especially including server access keys, without generating an unmanageably large number of DNS lookups.
An unusually large number of DNS lookers could be observed for routine operational reasons, even in the absence of any scrutiny specifically applied for cyber security purposes.
What to do?
PyTorch has already taken action to block this attack, so if you haven’t been hit yet, you almost certainly won’t be now, because of the malware
torchtriton the package on PyPI was replaced by a deliberately “dud” package of the same name.
This means that no one, or any software, has tried to install
torchtriton from PyPI after 2022-12-30T08:38:06Z, whether by accident or design, the malware would not be found.
PyTorch has published a handy list of IoCs, or reconciliation indicatorswhich you can search for throughout your network.
Remember, as mentioned above, even if almost all users stick to the “stable” version, which was not affected by this attack, you might have developers or enthusiasts who try “nightlies”, even if they use the stable. released as well.
According to PyTorch:
- The malware is installed with the file name
triton. By default, you would expect to find it in the subdirectory
triton/runtimein your Python site packages directory. However, since filenames alone are weak indicators of malware, consider the presence of this file as evidence of danger; do not make it obvious that he is absent.
- The malware in this particular attack has a SHA256 sum
2385b29489cd9e35f92c072780f903ae2e517ed422eae67246ae50a5cc738a0e. Once again, the malware could easily be recompiled to produce a different checksum, so the absence of this file is not a sure sign of health, but you can treat its presence as a sign of infection.
- DNS lookups used to steal data ended with the domain name
H4CK.CFD. If you have network logs that record DNS lookups by name, you can look for this text string as evidence that confidential data has been leaked.
- It appears that the malicious DNS responses went to, and responses, if any, came from a known DNS server
WHEEZY.IO. At the moment, we can’t find any IP numbers associated with that service, and PyTorch hasn’t provided any IP data linking DNS taffic to this malware, so we’re not sure how useful this information is for current threat hunting [2023-01-01T21:05:00Z].
Fortunately, we’re guessing that most PyTorch users won’t be affected by this, either because they don’t use nightly builds, or haven’t been working over the holidays, or both.
But if you’re a PyTorch enthusiast who tinkers with a nightly build, and you’re working over the holidays, then even if you can’t find any clear evidence that you were compromised…
…you might still want to consider generating new SSH key pairs as a precaution, and updating the public keys you uploaded to the various servers you access via SSH.
If you suspect you’ve been compromised, of course, don’t send out those SSH key updates – if you haven’t already, do them now!