Fraud proofs — Secure on-chain scalability
Securely scaling blockchains is a significant problem to solve. Failing to do so could compromise real-world impact of public-blockchain technology.
Here I’ll give a light-talk about Fraud Proofs and how they can help, but before some little introduction into scalability.
Blockchain scalability mini-review
If we dig a little in the numbers, we usually find that Visa and MasterCard have a transaction throughput at least 2–3 orders of magnitude higher than most public blockchains. Bitcoin has a transaction throughput of 6–7 transactions per second (tps), whereas Visa is within the range of 2,000–20,000 tps. These numbers are repeated all over the Internet so take them with a grain of salt; accept the order of magnitude difference.
Comparing Visa and Bitcoin throughput is quite unfair since their design priority orders are different. Public blockchains compromise throughput for decentralization. This order of priorities is by design.
In public blockchains decentralization is king.
There are plenty of ideas on how to solve this problem which can be grouped depending on the layer of the blockchain-stack. Also, you can hear about on-chain vs. off-chain solutions.
A recent paper from Mustafa Al-Bassam, Alberto Sonnino, and Vitalik Buterin discusses a solution to reduce the ratio between the number of light-nodes vs. full-nodes without compromising decentralization and thus security.
In their own words:
Fraud and data availability proofs are key to enabling on-chain scaling of blockchains (e.g., via sharding or bigger blocks) while maintaining a strong assurance that on-chain data is available and valid.
The full details of the paper involve Sparse Merkle Trees, Reed-Solomon as erasure codes, and random-sampling analysis. If you don’t have a clue about this, don’t worry.
The goal of this article is to explain the big picture in plain English mentioning how those tools fit in the puzzle.
An analogy of Fraud Proofs
Before getting into fraud proofs in the context of blockchains, it may be useful to go through the following mental exercise.
Suppose you participate in a network where multiple peers share a database that contains big prime numbers. Each node in this network works quite hard in adding new big prime numbers and sharing them with other peers.
If we are a peer in this network, we have two options:
- Trust that new numbers are prime numbers.
- For every new prime number candidate, run some primality test to be sure that our database is consistent.
If the network is comprised of trusted peers, then choosing the first option is quite attractive, since running primality tests to big prime number candidates are costly. Additionally, this problem gets worse if the rate of new numbers to test is higher than our computing power so we may fall behind.
On the contrary, if we network is trustless, then choosing the first option is risky. What if some malicious peer tries to trick the network and commits a big composite number? Having composite numbers in our database is unacceptable.
What if we know that other peers can send us proofs that some number in our database isn’t prime? For any given number n, the proof would imply sending a single number less than n and greater than one that divides n. This idea would have the following advantages:
- We could avoid running primality tests for every new number in the database. If after some time threshold we don’t receive a fraud-proof, we can consider the number is prime.
- If at least one peer is checking the numbers, it could alert the whole network about newly non.prime added numbers.
- The size of the proof is small.
- Verifying the proof is fast.
- We could penalize the peer that added that number to the database for his misbehavior.
This idea is similar to fraud proofs. Instead of validating all the rules of the system, others can demonstrate that the current system state is wrong.
Light-nodes vs. Full-nodes
To understand why fraud-proofs have value in blockchains, we should consider an important fact. To validate a new block we should obtain the data corresponding to the block, and then run the full validation ourselves.
We can think about blockchains as state-machines. The current state of the blockchain corresponds to the state after processing the last known block. When a new block appears, we process its transactions and define the new current state. The nodes that do this work are called full-nodes.
Other participants in the network may not accept to be full-nodes. If a node runs on an Android device, it has limited resources, such as computing power, storage, and bandwidth. These nodes are named light-nodes since they only download block headers instead of all the data associated with it.
Light-nodes can do some minimal validation with the little information held in block headers, e.g., checking proof-of-work of the block. Proof-of-work checking doesn’t mean that transactions are valid, nor the new state. They trust that the majority of full-nodes are honest since they are the ones that are making the real validation.
As mentioned before, increasing block size is a proposed on-chain solution to the scaling problem. If we increase the maximum amount of transactions in each block, then the throughput will be higher. On the other hand, it pressures full-nodes on committing more resources to not falling behind.
Nobody wants to buy a high-end laptop or a supercomputer to run a full-node; thus there will be fewer full-nodes. The preceding implies that most light-nodes are trusting a smaller set of full-nodes which may incentivize them to act dishonestly. Increasing the block-size isn’t absurd, but it doesn’t scale securely.
What if a light-node receive a proof that a block is invalid without trusting the majority of full-nodes?.
If we read the paper we can find that it talks about two kinds of fraud-proofs:
- State transition fraud-proofs.
- Data availability fraud proofs.
Each of them tries to construct a proof that some undesired situation has happened.
State transition Fraud Proofs
Let’s come back to the idea of blockchains as state-machines. New states are created after all transactions in a block are applied.
The first core concept described in the paper is including intermediate states between transactions. Instead of only having a single Merkle root at the end of each block, we could add some intermediate Merkle root between every transaction.
If a full-node detects that a new block has an invalid state transition, it could tell a light-node the following:
- Hey, I found that this block is wrong.
- After processing n-1 transactions from the beginning of the block, if I process the next transaction, the Merkle root of the state is different from the following intermediate state said by the miner.
- I’m sending you a bunch of Merkle proofs with the minimal data needed so you can check by yourself.
As in many parts of most blockchains, Merkle proofs are a crucial tool for proving that some data corresponds to a block. The light-node only have to trust the block header it already has.
We don’t need to save intermediate states between each transaction. An option is saving them every two, five or more transactions. Doing this would save space in the state tree, but make fraud proofs bigger since more Merkle proofs are necessary between states.
Fraud proofs like this are powerful since only one honest full-node can prove to any number of light-nodes that a block is invalid; no majorities are needed. If after some time threshold a light-node doesn’t receive a fraud-proof, it seems reasonable that the block is valid.
Including economic incentives and penalizations for valid and invalid fraud proofs respectively is essential for desired full-nodes behavior.
The Data Availability problem
Fraud proofs for state transitions are great, but they rely on a critical assumption: all the block data is available. If the block miner only publishes the block header without the corresponding correct data, then it can’t be proven wrong. The same way I can’t show that a sum of two numbers is wrong if I don’t know them.
Moreover, even if 99% the data is available, the remaining 1% may be essential to prove that a block is invalid. We need 100% data availability. For block validation, this is a strict requirement, since data could be unavailable for multiple reasons, not only intentionally by malicious nodes. The correct rationale is making data unavailability hard for a malicious node.
This converts a 100% data availability problem into a 50% data availability problem. — Vitalik Buterin
The proposed solution uses Erasure Codes, in particular, multi-dimensional Reed-Solomon codes. In a nutshell, this means transforming the block data from M chunks to N chunks (M<N), so that any M chunks of N are enough to reconstruct the original data. In particular, the paper establishes N=2*M. The multi-dimensional characteristic allows reconstructing partial data without the need to restore all the M chunks. Moreover, this partial data has its own Merkle root that will be useful to prove if the data is correctly available (more on this later).
The paper provides a more helpful way to see this idea:
This powerful tool together with pulling data using random-sampling provides strong mathematical guarantees for the reconstruction of the original data. We can find a full mathematical justification of this in the paper, but here’s an extract of an important conclusion:
In plain English, say you have 1000 light-nodes doing the random-sampling of the extended chunks of data. Increasing the sample-size will ensure that, eventually, they will ask for a chunk that the malicious party won’t like to share. Thus giving more certainty about detecting a data-availability problem.
In the paper, they also discuss another kind of fraud proofs within data availability, in particular proving that the miner generated the Reed-Solomon codes incorrectly. Since we need M out of N chunks to reconstruct a partition of block data, this may be enough to detect if the N chunks match the corresponding Merkle root of the partition.
Fraud proofs and erasure codes are useful tools for scaling public blockchains. They empower light-nodes to have their own opinion in discarding blocks without trusting a majority of honest full-nodes.
Moreover, solving the data-availability problem not only allows fraud proofs to be constructed but also addresses other issues:
Even if succinct zero knowledge proofs can be used to verify correctness, an attacker getting away with publishing unavailable blocks and getting them included in the chain is still very bad, as such a thing happening denies all other validators the ability to fully calculate the state, or to make blocks that interact with the portion of the state that is no longer accessible.
- Merkle proofs are essential for proving that data belongs to a block.
- Intermediate states significantly reduce the amount of Merkle proofs needed to prove that state transitions are wrong.
- Being able to construct the Merkle proofs naively implies 100% data availability guarantees, which isn’t desirable.
- Erasure codes plus random-sampling give strong mathematical guarantees that all the data needed from the block will be available, in particular, if the miner isn’t malicious.
- A miner trying to trick the network by sharing wrong information can be detected thanks to the multi-dimensional characteristic of Reed-Solomon codes plus Merkle roots of partial data.
The mentioned paper, as well as the paper-related post, also indicate that zk-SNARKS/zk-STARKS are other options for proving correct state transitions and data-availability. But those are different and exciting monsters.
Most of what I mentioned in this article is a simple summary of the full idea within those resources. I learned a lot from reading them, so I encourage you to do the same!