The tale of 5 DBs

Demystifying state persistence in go-ethereum source code.

Jul 31, 2024

Special thanks to Gary Rong and Guillaume Ballet for feedback and discussion.

In the mythical land of Geth, there were 5 DBs. One particular inhabitant of this land (yours truly, the author of this article) had a hard time telling them apart and calling them by the right name. statedb is well-known across the land, while not many have heard the tale of state.Database. So the author wrote this reference to serve for his memory in future. He would much rejoice, were it to benefit other visitors to this mysterious land.

If you’ve ever opened the go-ethereum source code you’ve probably encountered various objects with the word DB in them. After all, don’t they say that blockchains are glorified databases?

The backing database(s)

Users of Geth might be familiar with the choice of the underlying databases. This is configured via --db.engine. The options are leveldb and pebbledb (which is the default now). These are 3rd-party key-value databases that Geth depends on. They handle the files in datadir/geth/chaindata and datadir/geth/nodes.

There is a second type of backing database which is known as freezer or ancients. Users might recognize it from datadir/geth/chaindata/ancients. Up until last year it only included ancient chain history. Nowadays it is also being used to keep state history. More on that later. History is mostly static, hence it doesn’t need the faster I/O speed that SSDs offer which means precious SSD space can be saved for the key-value store. There are plans to use the freezer for other data as well, since it offers efficient reads for both single and range accesses, for any type of data as long as they are linked by a monotonically increasing integer index.

The focus of the article is state which is stored in the key-value store. As such references to backing database mean the key-value store.

Ethdb

ethdb, first of his name, is a package that abstracts away the backing database. In fact, nowhere else is the backing database used directly. This allows easily switching from one database to another. To fulfil that purpose, ethdb comes with an interface and a few implementations that are widely used across the codebase. Other than leveldb and pathdb, memorydb is another notable implementation as it powers the Geth dev-mode and is widely used for testing.

I would like to point out here that ethdb has a collection of multiple interfaces. The one we care about here is ethdb.KeyValueStore which roughly (I’ve simplified it for readability) looks like the following. As expected, it has methods to retrieve, set and delete key values:

// KeyValueReader wraps the Has and Get method of a backing data store.
type KeyValueStore interface {
 // Has retrieves if a key is present in the key-value data store.
 Has(key []byte) (bool, error)

 // Get retrieves the given key if it's present in the key-value data store.
 Get(key []byte) ([]byte, error)

 // Put inserts the given value into the key-value data store.
 Put(key []byte, value []byte) error

 // Delete removes the key from the key-value data store.
 Delete(key []byte) error
}

The ethdb.Database interface extends ethdb.KeyValueStore to add methods for read and write access to the freezer. This interface is used often for the chain data. Because the more recent blocks are stored in the key-value store whereas blocks further back which are considered immutable are migrated to the freezer.

The lifetime of an ethdb instance is that of the program. It is spun up right in the beginning and wound down when the node stops. It is the only DB-related object passed to core.Blockchain and from there passed down to various other structs. It is really the essence when it comes to data persistence. The tree of life of Geth if you will. Its roots reach deep into the disk and its branches stretch upward into the EVM and beyond.

ethdb: grounded in disk and serving features such as EVM, state trie, and RPC

Triedb

Buckle up, this will be a meaty part of the article. Next up is triedb. It sits between the trie and the disk layer. It is all about storing and retrieving trie nodes. A triedb instance is created at the beginning of the program and wound down when the node stops. It takes in an ethdb instance as parameter when being created and keeps a handle around for processing the actual persistence. Currently triedb has 2 backends to choose from: hashdb and pathdb.

Let’s first check node retrieval as it is a simpler operation. Triedb backends must return a database.Reader which has the following interface:

type Reader interface {
 Node(owner common.Hash, path []byte, hash common.Hash) ([]byte, error)
}

It looks up a node from a tree given the path and corresponding node hash. Note the return value is a simple byte array. Triedb is agnostic to actual content of the trie nodes. It has no sense of accounts and storage, or even leaf node and branch node. The owner parameter determines which trie the node is located in. For the account trie the owner will be left empty. As you might know the storage of contracts are stored in separate tries. When looking up a storage slot the owner will refer to the contract it belongs to.

Hashdb

Triedb historically persisted trie nodes with their hash as the key and encoded node as the value. This scheme of persisting trie nodes is referred to as hashdb now. It is straightforward yet lets us achieve storing multiple tries. Each trie can be retrieved and traversed given its root hash. This scheme also means that equal subtries, because they will have the exact same node hashes, will be deduplicated with no extra effort. This is a good feature because the state trie is very large and most of it stays the same from one block to the next.

It’s important to realize that hashdb does not persist the trie for every block. That only happens in case the node is in archive mode which is true for a small minority of nodes. Instead it keeps updates to the trie for possibly many blocks in memory. So what are the conditions for the in-memory updates to be flushed to disk, you might ask?

In-memory updates are regularly flushed. The interval between the flushes depends on the execution time of the blocks and is not directly predictable. Just to give you an idea, the default value is 5 minutes worth of block processing.
Or when the cache capacity is reached.
Or when the node shuts down.

A lot of the complexity of hashdb comes from the fact that it tries to garbage-collect the in-memory nodes. Let’s say a contract is created in one block and destroyed in the next. There is no reason to keep the trie nodes concerning that contract in memory anymore. Furthermore if that contract had a corresponding storage trie, well then that whole storage trie should be cleared as well.

Pathdb

Pathdb is a new backend for triedb. It changes how trie nodes are persisted to disk and kept in-memory. As seen above, hashdb stores nodes under their hashes. It turns out this approach makes pruning unused parts of state really difficult. It has been a long-standing project within Geth to solve this.

https://github.com/ethereum/go-ethereum/issues/23427

Pathdb has major differences to hashdb:

Trie nodes are stored in the key-value db by their trie path. Nodes in a storage trie are prefixed with the account hash they belong to.
Hashdb persists the full state for a block to disk regularly. This means your node will still have the full state for an old block you might not care about. In pathdb only one trie is persisted on disk at all times. The same trie is updated every block. Nodes that are modified can be simply overwritten as the keys are path instead of hash. Cleared nodes can be safely deleted from db because no other trie references them.
The persisted trie is not at the head of the chain rather at least 128 blocks behind. For each of the most recent 128 blocks there is a corresponding layer in-memory which tracks their trie changes. This allows handling small reorgs easily in memory. The nodes which are supposed to be flushed to disk are aggregated in memory first and persisted in a batch.
To handle larger reorgs pathdb keeps reverse state differences per block in the freezer. On demand it can apply those stateDiffs to the disk layer in reverse to get to the fork point.

If you’re curious about the path-based scheme I encourage you to checkout the issue linked above. What I’d like you to take away from this section is that triedb and its backends are sitting between the trie and disk. Triedb allows efficient fetching and persistence of nodes. Trie operations such as inserting or deleting a leaf are not performed here and lie in a higher abstraction layer.

State.Database

This is a thin interface layer with utility methods to open an account or storage trie for a given block. It also exposes the underlying ethdb and triedb instances. As such it acts as a one-stop shop, or a loyal side-kick for all of statedb database needs. It is implemented by state.cachingDB. The crucial function of state.cachingDB is caching contract code across multiple blocks. As such the lifetime of a state.Database is that of the Blockchains.

You might well say that state.Database is a just code cache plus some utility methods. However soon this will drastically change. This object will play a crucial role when Verkle trees arrive. It will be the central piece which tracks the Verkle conversion. The conversion is the process through which the whole state is migrated from the current structure, i.e. Merkle Patricia Tree, to the new Verkle structure. Due to the size of state, this process will happen in the span of many blocks.

Statedb

Ladies and gentlemen, I present you the StateDB.

Ok but joking aside, the reason why I insist on statedb's popularity is because this is the struct that most Geth forks modify to suit their logic. E.g. Arbitrum changes StateDB to manage their Stylus programs. EVMOS changes StateDB to track calls to their stateful precompiles. The reason is that StateDB is the sole state-related interface exposed to EVM. EVM cares about accounts and storage slots, not trie nodes and key-value stores. It turns out most projects relying on Geth’s source code also don’t care about the exactly underlying ethdb or triedb. Those are working so why touch them.

Let’s start with the fact that StateDB has the lifetime of a single block. It is discarded and is not functional anymore after a block has been processed and comitted. It manages a set of state objects in-memory. Each state object represents an account. The first time EVM reads an address, it is fetched from the db and a fresh state object is initialized for it. This is considered a clean object. As the transaction interacts with the account and makes changes to it the object becomes dirty. State objects tracks the original account data as well as the data post all mutations. It manages its corresponding storage slots and their clean/dirty status.

As you probably know calls and transactions can revert. On revert the state must return to just prior the transaction. StateDB manages this by keeping a journal of all the modifications to the state. The journal will be more like a stack of change sets because a call can succeed even if it calls a contract that reverts. If the transaction as a whole succeeds statedb.Finalise is invoked which is responsible for clearing out the selfdestructed contracts as well as resetting the journal and refund counter.

Finally after all transactions have been processed in the block, statedb.Commit is invoked. Until this point the trie has not been changed at all. Only now does statedb update the storage tries based on the accumulated changes to compute their respective root. This in turn determines the final state for accounts. Then, the dirty state objects are flushed to the account trie to update its structure and compute the new state root. Finally, the dirty node set is passed to triedb who we met earlier. triedb depending on the backend caches these nodes and eventually persists them to disk in case they are not reorged out.

Honorary mention: rawdb

You can think of rawdb as a schema layer over the key-value store, i.e. handles how various data objects are keyed to their read/write locations in the DB. It additionally defines getters and setters for those same objects. To take an example let’s see how code is fetched from the key-value store.

var CodePrefix = []byte("c") // CodePrefix + code hash -> account code

// codeKey = CodePrefix + hash
func codeKey(hash common.Hash) []byte {
 return append(CodePrefix, hash.Bytes()...)
}

// ReadCodeWithPrefix retrieves the contract code of the provided code hash.
func ReadCodeWithPrefix(db ethdb.KeyValueReader, hash common.Hash) []byte {
 data, _ := db.Get(codeKey(hash))
 return data
}

This package also contains the freezer. I hinted at the freezer in the section about backing databases. Apart from the freezer logic itself there are helpers in rawdb that can fetch data regardless of where it resides. To illustrate this with an example, recent blocks are stored in the key-value store. Once they have matured they will be transferred to the freezer. The block body getter function will therefore first search the freezer and if the block wasn’t found will attempt to find it in the key-value store.

Fin.

You have now been introduced to these 5 characters, visitor. Are you ready to get to know them better? Then it’s best to directly head to the source. The best time is now as many adventures await the 5 DBs in the short-term future. There are rumors that pathdb is eyeing up archive mode. A new backend to triedb with the name of verkledb has been training hard at the gym, waiting for its moment to take the stage. And with that, I say farewell.

Follow me on X

Sina’s Substack

Discussion about this post