A Deep Dive on Monero Tracing And Key Image Analysis

Sep 12, 2024

In the past I’ve written about Monero’s creation history, which is full of fraudulent dealings on its own. Today I will write about Monero’s privacy tech, which has just been proven to be a honeypot. The icebreaker is this Chainalysis video that shows a presentation given by an investigator to the IRS. For reasons that will become clear by the end of this article, there is a XMR mob of medium sized social media accounts that have been trying to gloss over the actual video and deflect attention to their interpretation of it. So before continuing further, let me debunk quickly some of the main talking points in their deflection strategy:

Monero’s vulnerability are not IP addresses but key images (I will explain what key images are later in this article)
The fact that Monero’s community* shared the video first is not proof that there is nothing damning in the video. But it is suggestive that they have opted for pre-emptive disclosure, which is a well known PR tactic (people are going to find out anyway, so let them find out from us so we can control the narrative)
The fact that Chainalysis is using copyright strikes to suppress the video is not proof that Monero is not broken because Chainalysis sells monero tracing services. And the more people believe monero to be untraceable the more valuable such services are, so it’s not in Chainalysis’ commercial interest to publish proof that Monero is traceable

This being said now let’s proceed to break down Monero and see how it is chain analyzed to deanonymize transactions.

The structure of a Monero transaction

To understand how a Monero transaction is built we must understand a few key concepts. First, that Monero is a UTXO blockchain with a small tweak, UTXO balances are not public but replaced with Pedersen commitments. Addresses in Monero are also not public, in fact users only share stealth addresses with each other. So Monero is a UTXO blockchain where the actual addresses are also not visible. The only thing we can publicly see in Monero is the public key of each UTXO and since UTXO stands for “unspent transaction output“ but in Monero we are not supposed to know if a UTXO has been spent or not, we refer to UTXOs as TXOs. Moreover each TXO has also one key image. By default, nobody knows the key image of a TXO except of the owner of the TXO. This key image is published by the owner only when the TXO is effectively spent. Finally we also have RingCT, which refers to rings of TXOs that are grouped together in each transaction. So to summarize the key concepts are:

TXO: an index that identifies an underlying UTXO one to one, TXOs can be spent only once and when that happens their key image is included in the transaction
Key image: an alphanumeric string unique to each TXO that is included in the transaction where a TXO is spent. TXOs can be spent only once so each key image appears only once.
RingCT: a ring of TXOs used to hide the TXO being spent. This is what was supposed to ensure privacy in Monero, because by mixing the TXO being spent with other TXOs there is no way to know which TXO the key image belongs to.
Transaction outputs: when a transaction happens then old TXOs are spent and some new TXOs are created. In each transaction at least 2 TXOs are created, destination and change TXO

With this in mind now let’s start looking at a random monero transaction from a Monero block explorer. Here is one I just picked at the time of writing and below is a screenshot of the displayed information:

On top we can see the key image, that key image corresponds to one of the 16 TXOs shown in the red square number 2. And in the red square number 3 we can see the transaction outputs. It should be clear that a transaction can have more than one key image. In a consolidation transaction, for example, balances from multiple TXOs are consolidated into one TXO so a common pattern for consolidation transactions is to have multiple key images. By looking among recent transactions I found an example of transaction with 2 key images. Below is a screenshot of what the transaction looks like on the blockchain explorer.

I recommend spending some time playing around in the Monero explorer to familiarize yourself with the structure of transactions and to understand how a core part of Monero’s privacy model is the ostensible inability to tie a key image to a specific TXO. In fact, as you see, for each key image we have 16 TXOs and there is no way to tell (from the outside) which TXO that key image belongs to. However, if we can find ways to connect a TXO to a specific key image then that will make it easier for us to deanonymize other transactions where that TXO appears among inputs because we will be able to quickly discard it as decoy. Now that transaction structure is clear we can start looking at how Monero transactions are traced phase by phase and why there is no privacy in Monero today.

Day 0: Centralized service data and Clusters

Day 0 refers to a time where the blockchain is young and there is not much data. So we have no database of TXOs whose key images we have been able to expose and we are starting from zero. How do we tie key images to TXOs in such scenario? We know that in each side of a transaction there is a sender and a receiver. As you can see in the transactions above, it’s easy for a sender to tell which one, among outputs, is the TXO of the receiver because one of those TXOs is the change TXO that goes back to the sender. So the other one is certainly, from sender’s perspective, that of the receiver. If we are able to aggregate data such as this for multiple transactions and for multiple people we can then start mapping key images to TXOs. But where would we find such information in aggregated form? If we think about it, any exchange or centralized service provider that deals with monero would be a very useful data trove to aggregate such information.

Exchanges, for example, handle deposits and withdrawals from multiple users. This of course generates and consumes a high number of TXOs. As a chain analysis firm we could start building a database of CEX TXOs that CEXes have confirmed to have spent. Since whenever a TXO is spent its key image is unmasked, then a CEX would have to report something along these lines for all TXOs spent:

[TXO1; keyimage1; TXID1], [TXO2; keyimage2; TXID2], [TXO3; keyimage3; TXID3]….[TXOn; keyimagen; TXIDn]

Now let’s call this database DB. DB would of course be small in the beginning (consisting only of confirmed CEX transactions) but if it can grow then it becomes an invaluable weapon in exposing decoys because any TXO contained in our DB can be excluded as decoy in any other transactions where it appears that is not the respective TXID. This is because we already have its key image in the respective transaction marked by TXID, so if included in the rings of any other transactions it can only be as decoy (since a TXO cannot be spent twice).

Another way in which DB can grow is by analyzing clusters of TXOs. For example, if a user withdraws Monero from an exchange, as seen above, the exchange can mark the TXOs connected to that specific user. If multiple such TXOs appear among the inputs of the same transaction, then that’s a strong indicator that those TXOs are being spent in that transaction. We must consider that decoys are picked randomly from a pool that went from hundreds of thousands in the early days, to hundreds of millions today. So probabilistically the odds that a TXO is picked as decoy for ring size 16 are at most 1.5/100000. That means that the odds of 2 TXOs associated with the same entity being picked as decoy are at most 2.25/10000000000 or 2.25 in 10 billion. As the pool of decoys grows those odds get smaller and smaller. What this means is that we can start unmasking more TXO-keyimage pairs by cross referencing exchange data and monitoring TXOs associated with the same entity (as per CEX data). Whenever we see 2, 3 or more related TXOs appear in the same transaction then we can assume with increasing certainty that the key images in those transactions belong to those related TXOs and can proceed to compile more pairs like these to add to DB:

[TXO’1, TXO’2, TXO’3; keyimage’1,keyimage’2,keyimage’3]

Let’s say a transaction had 3 inputs associated to a single party and it has 3 key images. So now we add to our database DB three TXOs at once. Our DB can grow very fast this way, especially if we have access to the data of centralized parties (we could simply buy the data from them although CEX are required by law to report all transactions quarterly). For an estimate of how fast a database like this can grow over the years, just think that an exchange like Binance has 200 million users. For obvious reasons such database would be fatal for Monero’s privacy, because it would be able to considerably reduce the real anonymity of rings. And maybe eliminate it completely. Now does such database exist? Privacy savvy users stopped using Monero long ago for this reason, but there hasn’t been any proof of the existence of such database until last week when the Chainalysis presentation was dropped. The video contains proof such database exists. In it we can see how their internal block explorer automatically eliminates decoys it recognizes. Below are a few examples from the leaked presentation where eliminated decoys (done automatically by their internal block explorer) are shown.

This alone invalidates any claims to privacy for Monero, because the existence of such database makes it extremely easy to trace any TXOs on par with Bitcoin where the blockchain is transparent. In fact, if I’m a wealthy entity with access to Chainalysis, then if I send you some money via monero I can trace your TXO indefinitely and even more easily if you’re a naïve user who thinks that monero is private and impossible to trace. Which is why it’s convenient for Chainalysis if people think en masse that Monero is untraceable.

Key Image Analysis and Triangulation

So far we have seen how TXO tracing works through aggregation of data from centralized parties like exchanges (for an in depth analysis of how TXO tracing works without the help of centralized exchanges, please read the second part of this article here). The leaked presentation has also shown strong evidence indicating the existence of such database. Now on top of TXO tracing, there is an additional attack vector that goes from tracing a single transaction, to establishing a footprint and looking for other transactions that show similar patterns and can be tied to the same user. To understand how key image analysis works we must understand how 1 key image can be tied to multiple TXOs. I recently made a Twitter thread about this in depth that I recommend reading. Below you can find a scheme of a KI TXO set starting from a transaction that we know has as receiver our target (because we maybe lured him into accepting some money from us).

By using the database DB of TXOs whose key images are known, we can easily check when those TXOs that we know belong to the targeted user are spent. Then when we find a transaction where it is spent we can triangulate with centralized services to find out where the money goes and to detect the change. We then keep doing the same with the change until the trail stops. This way we glean behavioral information about the user, not only thanks to the centralized parties where the money goes, but also through other metadata such as time when those transactions took place, nodes where they were propagated, IPs, fee structure and so on. Then once we have drawn behavioral patterns, manually or through AI, we can use an AI to look for the patterns that emerged from this specific onchain trail (the KI TXO set) in other KI TXO sets. For example, when a user withdraws from an exchange that also constitutes the start a new onchain trail (creation of new KI TXO set). So then we can look for KI TXO sets originating from a CEX that show similar patterns to the one we have at hand: ie, user from a specific CEX that uses monezon, uses a specific mining pool and uses a specific darknet vendor.

The introduction of AI technologies makes the detection of subtle patterns even more effective and the determination of the full onchain footprint (all trail marks left by a user onchain) even easier.

Conclusion

The core vulnerability in Monero are key images, because they allow us to build a huge database of TXOs that we know to have been spent. These databases can then be used to deanonymize senders quickly making the tracing of TXOs as effective as in Bitcoin. This was shown to be part of the process in the leaked Chainalysis presentation. Then by studying key image TXO sets we can glean behavioral patterns that can unearth other past activity of the user under surveillance. These consist of onchain activity that bears no direct TXO connection with the starting key image TXO set. Therefore despite not seeing amounts, in Monero we draw the same behavioral information on a user as on Bitcoin. Currently one could argue that Monero reveals even more because contrary to Bitcoin, on Monero most users erroneously think to be operating privately, whereas on Bitcoin they are more careful as they know to be operating on a transparent ledger. Since making sure people believe Monero is transparent is key to extract even more information from its users, which makes Monero chain analysis services even more valuable, it should be clear why I believe that a good part of the mob referenced in the opening of the article are PR employees for Chain analysis firms that offer Monero chain analysis services.

*if we are too believe that they are community and not PR agents.

AnnonRex

Sep 27, 2024

Very interesting article. Would it be possible to write one that details the effort involved here and just how likely governments are to use such analysis to go after an individual. From the above it looks like quite an involved process.

Other thoughts: how big of a net does the above process cast? Is it all encompassing, capturing large and little fish alike, or does it go after one individual/entity at a time? How expensive and time consuming? What about time delay; suppose you own XMR and keep them in an offline wallet for months to years does this effect the process?

Please, more thoughts from a practical perspective would be useful here.

Expand full comment

2 replies by TechLeaks24 and others

A Nun Mouse

Nov 12, 2024Edited

Based on this article, to deanonymise a wallet (not even a person) you either need:

1. A database with every Monero transaction/key image to deanonymize a wallet

2. Many transactions from the same wallet

Since Monero has, fortunately, been banned from most central exchanges, because governments are afraid of its anonymity capability, the database for 1 does not exist.

Spending only a handful of times or changing wallets makes 2 impossible, too.

Nice try! Monero is still extremely secure. It may not be 100% bulletproof. The best you might get would be a probability, likely not even a majority.

1 reply by TechLeaks24

3 more comments...

TechLeaks24’s Substack

Discussion about this post