How Compiler Risk Exposes Gaps in Adversarial Thinking Among Today's Crypto Founders

Nothing worse than when you think you know something you know nothing about

Feb 11, 2025

In his Turing Award Lecture Reflections on Trusting Trust in 1984, Ken Thompson explained, for the first time, an attack vector against open source code that allowed introducing perfectly undetectable backdoors by compromising compilers. The attack vector can be explained in simple terms for the layman to understand it, which I will do in a moment. What makes it of particular interest, however, is that today there is widespread ignorance of its nature among the founders/creators of cryptocurrencies that aim to be decentralized and private. As delicately and non-offensively as possible, we’ve got to ask ourselves an important question. Can the minds of those who do not understand a well-documented attack vector like compiler risk, design a censorship resistant and private crypto economic system? That’s a question every reader must, after careful reading, decide for themselves.

Compiler Risk and Code Execution

Computers and machines run on binary code, that’s the only language they understand. Binary code consists of strings of 0s and 1s that machines can interpret to perform certain actions. Of course it’s very complicated for a human to write in binary code, which is why all computer programs are not written in binary code but in so called high level languages that are much easier for humans to learn and write.

The difference between binary code and a high level language is a bit like that between medical jargon and colloquial language. Medical reports are written in medical jargon for doctors to quickly & effectively communicate to each other a specific clinical condition. The same medical condition however has to be broken down in simple terms for the patient to understand his own disease. A high level programming language is like the colloquial language with which a medical condition is explained to the patient and based on which the patient later explains how he wishes to treat it. Python, Go, Solidity, Rust are all examples of high level programming languages. These are languages that humans can learn relatively easily, just like you can learn Chinese and Japanese. Machines, however, do not understand high level languages. So once the code is written in a high level language, before a machine can run it, it must be converted in binary code.

What code written in Solidity looks like

This process of converting a high level language into binary code is known as compiling and the program used is known as compiler. The invention of compilers dates back to 1952 and it was, for obvious reasons, a massive breakthrough in computer programming because it marked the end of the era where humans had to write programs in machine code. At the same time, however, the invention of compilers introduced a new trust point in the creation of computer programs. While before compilers programmers would be talking directly to the machine in binary code, with a high level language they have to rely on a compiler that inserts itself between the developer and the machine. When a developer writes a program in a high level language like Solidity, they are trusting that the compiler they will use to compile that code will compile it correctly. This is a bit like a patient trusting that a doctor will correctly describe their medical condition in medical jargon in their medical folder. Just like a doctor can forget, or intentionally omit, that someone is allergic to amoxicillin (which may lead to complications the next time we’re admitted in the ER), a compiler can skip parts or intentionally remove and/or introduce new parts in the source code it compiles. The source code is the high level code written by the developer which is handed to the compiler to compile. Compiler risk is exactly this, the possibility that the high level code a developer inputs in the compiler may have not been translated accurately in binary code. In 1984, Ken Thompson showed a compiler that introduced a backdoor in the log-in part of any code it was given to compile. This meant that the programs of all developers that used that compiler would have a backdoor that was controlled by yet another third party (whoever had compromised the compiler). In other words, even the developers themselves wouldn’t know that their code had been corrupted.

How A Compiler Vulnerability Can Be Undetectable

A program is said to be open source when the source code is public and can be compiled independently by users. Because of compiler risk however, even an airtight open source program can have a backdoor if it’s compiled using a compromised compiler. In fact, the whole point of Ken Thompson’s lecture was to show that instead of introducing the backdoor in the source code of the program, where it would most likely get caught by other programmers, compromising the compiler makes the backdoor completely undetectable to anyone that limits themselves to auditing the source code. But what if we don’t trust the compiler, and check the source code of the compiler. That will surely catch the bug, no? Well that’s what Alex Chepurnoy said and the answer is - no, not really.

“If compiler is open sourced, you can check what it is doing during compilation and then use reproducible build to check its output vs provided contract” - Alex Chepurnoy, Co-Founder of Ergo & Darkfi

Recently I wrote a brief post on compiler risk in my Telegram channel and one of the responses the post received was the one quoted above from Alex Chepurnoy. Alex Chepurnoy is the founder of Ergo and the co-founder of Darkfi together with (among others) cryptoanarchist Amir Taaki. In his comment Alex shows he badly underestimates compiler risk. The whole point of a compiler attack is that it introduces vulnerabilities in code that cannot be caught by inspecting the source code. If you’re only looking for vulnerabilities in the source code, then you’re implicitly trusting a compiler. Because compilers are programs themselves, and they also must be converted into binary code before they can run on a machine to do their job. So, unknowingly, Alex is just displacing his trust from compiler A (the compiler used to compile the program) to compiler B (the compiler used to compile the source code of compiler A). If compiler B was compromised, then also the source code of compiler A will have no malicious code visible in its source. In other words, inspecting source code is not enough because you’re always implicitly trusting a compiler (the compiler of the program or the compiler of the compiler). Going back to our medical analogy, as long as the patient blindly trusts doctors, he can still die of a fatal allergic reaction to amoxycillin in ER. Asking a senior resident to double check your medical file to make sure nothing important (such as an allergy to amoxycillin) has been omitted, doesn’t eliminate the risk of omission, because the senior resident can also be corrupted. In other words, your chances of dying from an amoxycillin allergy in your next ER admission will always be high unless you make sure to tell the doctor yourself.

“Then reproducible building will output different binary. Seems you dont understand what you are talking about.” - Alex Chepurnoy, Co-Founder of Ergo & Darkfi Contributor

Alex kept insisting on his stance and he seemed to miss the point of reproducible builds. The point of reproducible builds is not to eliminate compiler risk, but to eliminate open source risk. In other words, to mitigate the risk that the official binaries distributed by a developer have not been compiled from the source code that he says is the source code of the application. In my snorkelling around the deep waters of crypto chats and conversations, this is one of the biggest gaps in adversarial thinking I’ve come across. And it’s concerning, for me as a privacy activist, because Alex is a developer involved in the creation of privacy tech. But Alex is not alone. Before him, Amir Taaki (original founder of Darkfi) chimed in and he too showed to have no knowledge of what compiler risk is.

“Do you really think an interpreted compiler protects against "compiler risk"? Do you think there's a little man watching the code running on your CPU? Tell us more about these voices in your head schizo 😂” - Amir Taaki, Founder of Darkfi

For the full exchange please check the Darkfi chat on Telegram

Interpreter VM versus VM that runs on pre-compiled code

In high stakes environments like crypto, compiler risk should be eliminated completely. Ethereum is the most popular smart contract platform today, yet it has a virtual machine that runs on pre-compiled code. Since Eth’s virtual machine runs on binary code, any smart contract written in Solidity, or any other high level language, must be compiled into binary code before it can run on Ethereum. Because of the many different compilers out there, relying on pre-compiled code opens up a huge attack surface that has been at least once exploited with the Vyper compiler. Vyper is a Python-like high level language used to write Ethereum smart contracts. In 2023 it came out that someone had meddled with the Vyper compiler in a way that removed all reentrancy locks from smart contracts compiled via Vyper. This meant that perfectly reproducible and open source Vyper smart contracts could be exploited to perform re-entrancy attacks. By opting for an interpreter VM instead, this type of risk, which is significant, is eliminated completely. And of course, regardless of whether a VM is an interpreter or requires pre-compiled code, there will always be a VM bug risk. But the logic of the VM is public and auditable. So making malicious changes or introducing exploitable bugs in the VM is much easier to detect. Compiler risk, on the other hand, rests on all the offchain compilers that developers use to compile code. These compilers are hidden from end users and, in many cases, even from developers themselves. So in the case of a VM like Ethereum’s (which is the standard today), an immense amount of trust is aggregated onchain and passed on to all smart contract users who are unknowingly trusting all the compilers and their auditors to not have been corrupted.

Conclusion

In a high stakes environment like crypto and privacy, compiler risk should be eliminated completely to avoid massive hidden points of failure that can cause financial loss or loss of sensitive information such as privacy. Despite its critical importance, most crypto founders and privacy tech contributors seem to be completely unaware of compiler risk, let alone properly address it in the design of their systems. The only solution to compiler risk is no compiler in the code supply chain. So an alternative is to better have no smart contract at all than to have a VM that runs on pre-compiled code like Ethereum’s does. Another alternative is to have an interpreter VM that directly interprets the code developers write, rather than requiring precompiled code. To date, the only smart contract platform to have been designed with this in mind is Dero. Dero’s VM is an interpreter VM that interprets code written in DVM-basic.

TechLeaks24’s Substack

Discussion about this post