Skip to content
This repository has been archived by the owner on Dec 8, 2023. It is now read-only.

Latest commit

 

History

History
1543 lines (1216 loc) · 130 KB

1.1_regions.asciidoc

File metadata and controls

1543 lines (1216 loc) · 130 KB

Regions

Files Layer Description

net.{h,cpp}

Network

Handles node communication with the P2P network

net_processing.{h,cpp}

Network Processing

Adapts the incoming network messages to the Validation layer

validation.{h,cpp}

Validation

Handles modifying in-memory data structures for chainstate and transactions

txmempool.{h,cpp}

Mempool

Manages the in-memory data structure for the unconfirmed transactions the node has seen

coins.{h,cpp} & txdb.{h,cpp}

Coins

Manages the UTXO cache and chainstate database

dbwrapper.{h,cpp} & indexes/

Databse and Indexes

Manages LevelDB database operation and the creation and access to indexes

script/

Script

Executes Bitcoin scripts and signs transactions

consensus/

Consensus

Enforces the consensus rules

policy/

Policy

Contains logic for assessing transactions and for fee estimation

interface/

Interface

Provides a common interface for components to interact with each other

qt/

GUI

This region contains all the code for the graphical user interface

rpc/

RPC Server

Manages the RPC server and handles the requests

wallet/

Wallet

Includes utilities for generating blocks to be mined

miner.{h,cpp}

Mining

Includes utilities for generating blocks to be mined

net.{h,cpp}

The src/net.{h,cpp} files implement the most basic network level. It is the "bottom" of the Bitcoin Core stack. It handles node communication with the P2P network.

The network connection is enabled when !node.connman→Start(*node.scheduler, connOptions) is called in the application’s main function src/init.cpp:AppInitMain(…​). Note that there are two parameters: node.scheduler and connOptions.

The node variable refers to the struct NodeContext. It is a struct that contains references to chain state and connection state. This is used by the init function, RPC, GUI, and test code to pass object references around without declaring the same variables and parameters repeatedly or using globals. The struct is defined in src/node/context.h.

Before this struct was created, the global variable g_conman was used to manage the connection. But using global variables reduces the modularity and flexibility of the program, so the PR #16839 has gotten rid of some global variables and has made g_conman a NodeContext member (now called connman).

struct NodeContext {
    std::unique_ptr<CAddrMan> addrman;
    std::unique_ptr<CConnman> connman;
    std::unique_ptr<CTxMemPool> mempool;
    std::unique_ptr<CBlockPolicyEstimator> fee_estimator;
    std::unique_ptr<PeerManager> peerman;
    // ...
}

The connOptions parameter is a CConnman::Options object which stores many configurable network parameters that the user can define when starting the node. If no parameters are defined, the default values are in net.h.

// src/init.cpp
bool AppInitMain(...)
{
    // ...
    CConnman::Options connOptions;
    connOptions.nLocalServices = nLocalServices;
    connOptions.nMaxConnections = nMaxConnections;
    connOptions.m_max_outbound_full_relay = std::min(MAX_OUTBOUND_FULL_RELAY_CONNECTIONS, connOptions.nMaxConnections);
    connOptions.m_max_outbound_block_relay = std::min(MAX_BLOCK_RELAY_ONLY_CONNECTIONS, connOptions.nMaxConnections-connOptions.m_max_outbound_full_relay);
    connOptions.nMaxAddnode = MAX_ADDNODE_CONNECTIONS;
    connOptions.nMaxFeeler = MAX_FEELER_CONNECTIONS;
    // ...
}

The scheduler parameter is a CScheduler object. In this function, it is used to schedule how often the peer IP addresses will be stored on the disk. In that case, it is every 15 minutes, as defined in the DUMP_PEERS_INTERVAL variable. The file that stores information about the peers is called peers.dat.

The function bool CConnman::Start(…​) loads the addresses from peers.dat and stores them in CAddrMan& addrman variable. CAddrMan has a table with information about all stored peers, the std::map<int, CAddrInfo> mapInfo, and another field with the peers' IDs and their network addresses, the std::map<CNetAddr, int> mapAddr.

There is another file called anchors.dat, which is also deserialized during the startup. This file contains addresses that saved during the previous clean shutdown. The node will attempt to make block-relay-only connections to them. These addresses are stored in std::vector<CAddress> m_anchors.

The reason there are two files is a risk mitigation measure implemented in the PR #15759 and the PR #17428. The first change was motivated by the TxProbe paper, which describes how transaction relay leaks information that adversaries can use to infer the network topology. The second one was motivated by the Eclipse Attack paper, which presents an attack that allows an adversary to control a sufficient number of IP addresses to monopolize all connections to and from a victim bitcoin node.

The connection to the peers from anchors.dat is called block-relay-only, and to the peers from peers.dat is called outbound-full-relay. The first type only relays blocks/block headers messages and the second one includes all message types.

bool CConnman::Start(CScheduler& scheduler, const Options& connOptions)
{
    // ...
    // Load addresses from peers.dat
    int64_t nStart = GetTimeMillis();
    {
        CAddrDB adb;
        if (adb.Read(addrman))
            LogPrintf("Loaded %i addresses from peers.dat  %dms\n", addrman.size(), GetTimeMillis() - nStart);
        else {
            addrman.Clear(); // Addrman can be in an inconsistent state after failure, reset it
            LogPrintf("Recreating peers.dat\n");
            DumpAddresses();
        }
    }

    if (m_use_addrman_outgoing) {
        // Load addresses from anchors.dat
        m_anchors = ReadAnchors(GetDataDir() / ANCHORS_DATABASE_FILENAME);
        if (m_anchors.size() > MAX_BLOCK_RELAY_ONLY_ANCHORS) {
            m_anchors.resize(MAX_BLOCK_RELAY_ONLY_ANCHORS);
        }
        LogPrintf("%i block-relay-only anchors will be tried for connections.\n", m_anchors.size());
    }
    // ...
}

After addresses are loaded from files, the threadSocketHandler is started. It enables the node to accept new connections (function CConnman::AcceptConnection(…​)) and to receive and send data.

Next, the following threads are initiated sequentially: threadDNSAddressSeed, threadOpenAddedConnections, threadOpenConnections and the threadMessageHandler.

The first one (threadDNSAddressSeed) checks if the node was able to connect successfully to at least 2 peers loaded from the files. If so, it skips querying DNS and the thread execution finishes. On the other hand, if there is a reasonable number of peers in CAddrMan addrman, it spends some time trying them first. This improves user privacy by creating fewer identifying DNS requests, reduces trust by giving seeds less influence on the network topology, and reduces traffic to the seeds.

The threadOpenAddedConnections calls GetAddedNodeInfo() to get information about the nodes added through the addnode RPC command. These nodes are stored in std::vector<std::string> vAddedNodes, which is protected by cs_vAddedNodes mutex. ThreadOpenAddedConnections() is a infinite loop that checks the if added addresses are connected and, if not, tries to connect to them.

ThreadOpenConnections tries to open connections to the peers. Opening block-relay connections to addresses from anchors.dat gets the highest priority. Then opening outbound-full-relay is the priority until the node reaches its full-relay capacity.

And finally, ThreadMessageHandler is the thread that receives messages, processes them in src/net_processing.cpp:PeerManagerImpl::ProcessMessages(…​) and sends messages to the peers.

net_processing.{h,cpp}

The main class of this region is the PeerManagerImpl. It implements three interfaces: CValidationInterface, NetEventsInterface and PeerManager.
The CValidationInterface was already discussed in [notification-mechanism].
NetEventsInterface is about handling network events triggered by the peers, like initializing or removing a peer.
And PeerManager interface is the high-level interaction with the peer such as processing their messages, managing the peer’s misbehavior score, or relaying transactions.

PeerManagerImpl
Figure 1. Class PeerManagerImpl
// src/net_processing.h
class PeerManager : public CValidationInterface, public NetEventsInterface
{
    // ...
}
// src/net_processing.cpp
class PeerManagerImpl final : public PeerManager
{
    //...
    /** Overridden from CValidationInterface. */
    void BlockConnected(const std::shared_ptr<const CBlock>& pblock, const CBlockIndex* pindexConnected) override;
    void BlockDisconnected(const std::shared_ptr<const CBlock> &block, const CBlockIndex* pindex) override;
    // ...

    /** Implement NetEventsInterface */
    void InitializeNode(CNode* pnode) override;
    void FinalizeNode(const CNode& node) override;
    // ...

    /** Implement PeerManager */
    void CheckForStaleTipAndEvictPeers() override;
    bool GetNodeStateStats(NodeId nodeid, CNodeStateStats& stats) override;
    // ...
}

Note that there are two methods with very similar names: bool PeerManagerImpl::ProcessMessages(…​) and void PeerManagerImpl::ProcessMessage(…​). The first thing to observe is that they are from different interfaces. The first method comes from the NetEventsInterface interface and the second one, from PeerManager interface.

bool PeerManagerImpl::ProcessMessages(…​) is a lower level method that is called from net.{h,cpp} region.
First, it checks if there are getdata requests from a peer and in that event, it calls PeerManagerImpl::ProcessGetData(…​). Then, it checks for orphan transactions, calling PeerManagerImpl::ProcessOrphanTx(…​) if there are..
If neither of these is the case, the PeerManagerImpl::ProcessMessage(…​) function will be called to handle the message.

bool PeerManagerImpl::ProcessMessages(...)
{
    bool fMoreWork = false;

    PeerRef peer = GetPeerRef(pfrom->GetId());
    if (peer == nullptr) return false;

    {
        LOCK(peer->m_getdata_requests_mutex);
        if (!peer->m_getdata_requests.empty()) {
            ProcessGetData(*pfrom, *peer, interruptMsgProc);
        }
    }

    {
        LOCK2(cs_main, g_cs_orphans);
        if (!peer->m_orphan_work_set.empty()) {
            ProcessOrphanTx(peer->m_orphan_work_set);
        }
    }

    try {
        ProcessMessage(*pfrom, msg_type, msg.m_recv, msg.m_time, interruptMsgProc);
        // ...
    }

    // ...
}

src/net_processing.cpp:PeerManagerImpl::ProcessMessages(…​) is the main function of this region and is a giant conditional to handle the messages sent by peers.
It is a high-level network function that understands the message types and knows how to handle them, extracting the data and sending them to the next region, validation.{h,cpp}.

void PeerManagerImpl::ProcessMessage(...)
{
    // ...
    if (msg_type == NetMsgType::VERACK) {
        // ...
        return;
    }

    if (msg_type == NetMsgType::SENDHEADERS) {
        // ...
        return;
    }

    if (msg_type == NetMsgType::SENDCMPCT) {
        // ...
        return;
    }

    // ...

    if (msg_type == NetMsgType::INV) {
        // ...
        return;
    }
}

PeerManager interface also provides the method void Misbehaving(…​) to handle potentially malicious nodes. It increments peers' misbehavior score. Whenever a possibly harmful behavior is identified, this method is called passing the nodeId as a parameter, how many points the node must add (howmuch), and the message that describes the misbehavior (message).

void PeerManagerImpl::Misbehaving(const NodeId pnode, const int howmuch, const std::string& message)
{
    // ..

    LOCK(peer->m_misbehavior_mutex);
    peer->m_misbehavior_score += howmuch;
    const std::string message_prefixed = message.empty() ? "" : (": " + message);
    if (peer->m_misbehavior_score >= DISCOURAGEMENT_THRESHOLD && peer->m_misbehavior_score - howmuch < DISCOURAGEMENT_THRESHOLD) {
        LogPrint(BCLog::NET, "Misbehaving: peer=%d (%d -> %d) DISCOURAGE THRESHOLD EXCEEDED%s\n", pnode, peer->m_misbehavior_score - howmuch, peer->m_misbehavior_score, message_prefixed);
        peer->m_should_discourage = true;
    } else {
        LogPrint(BCLog::NET, "Misbehaving: peer=%d (%d -> %d)%s\n", pnode, peer->m_misbehavior_score - howmuch, peer->m_misbehavior_score, message_prefixed);
    }
}

If the peer’s m_misbehavior_score attribute is equal to or greater than DISCOURAGEMENT_THRESHOLD value (which is 100), the peer will be marked to be discouraged, meaning the peer might be disconnected and added to the discouragement filter. The discouraged nodes are stored in src/banman.h:BanMan::m_discouraged.

There are two methods focused on applying penalties if something is wrong. They are PeerManagerImpl::MaybePunishNodeForBlock(…​) and PeerManagerImpl::MaybePunishNodeForTx(…​). However, not all conflicts are necessarily invalid, as can be seen in the code for these methods.

Misbehavior # Points Added

Peer provides a block whose data does not match the data committed by PoW

100

Peer sends a block that has been cached as invalid

100

Peer sends a block whose previous block is invalid

100

Peer sends a block when the node doesn’t have its previous block

100

Peer sends a transaction that does not comply with consensus rules

100

Peer requests an index in GETBLOCKTXN higher than the total number of transactions in a block

100

Peer sends non-connecting headers

20

Peer sends non-continuous headers sequence

20

Peer sends ADDR or ADDRv2 message whose size is greater than the allowed (1000 addresses, as defined in the src/net.h:MAX_ADDR_TO_SEND)

20

Peer sends INV message whose number of entries is greater than the allowed (50000 entries, as defined in the src/net_processing.h:MAX_INV_SZ)

20

The same verification is done for GETDATA message

20

validation.{h,cpp}

The validation file handles verifying received data and the modification of in-memory data structures for chainstate and transaction (mempool) based on certain acceptance rules.

Although CValidationInterface is not directly related to validation.cpp file, almost all the events of this interface are triggered in that file, except for the TransactionRemovedFromMempool event which is called in src/txmempool.cpp. All the events are triggered by calling the publisher GetMainSignals().

One of the most important tasks of this region is the UTXO set management. The Unspent Transaction Output (UTXO) set is a subset of Bitcoin transaction outputs that have not been spent at a given moment. Bitcoin relies on the UTXO set to verify newly generated transactions efficiently. Every unspent output, no matter its type, age, value or length is stored in every full node, that keeps a copy of the UTXO set in order to validate transactions and produce new ones without having to check the whole blockchain.

In Bitcoin Core, the UTXO set is also called chain state, and the class that represents the most recent UTXO state is the CChainState. It has been created in the PR #10279 as a way to clarify the internal interfaces. However, recently, a new class called ChainstateManager has been added.

This class has been introduced in the PR #17737 as part of the assumeutxo project. Assume UTXO is an idea similar to assumevalid. In assumevalid, there is a hash that is hard-coded into the code. The user assumes all the blocks in the chain that end in that hash and whose transactions have valid scripts. This is an optimization for startup, but the node skips script validation, implicitly trusting the developers who hard-coded the default block hash. Bitcoin Core will still validate most parts of the block, including Proof of Work, UTXOs, amounts, etc. The only thing that is not validated are the scripts because they are expensive. assumevalid was introduced in PR #9484.

The assumeutxo does something similar, but for the UTXO set. It is a way to initialize a node using a headers chain and a serialized version of the UTXO state, which was generated from another node at some block height. The basic idea is to allow nodes to initialize using a serialized version of the UTXO set rendered by another node at some predetermined height. The initializing node syncs the headers chain from the network, then obtains and loads one of these UTXO snapshots.

Based upon the snapshot, the node is able to quickly reconstruct its chainstate, and compare a hash of the resulting UTXO set to a preordained hash hard-coded in the software (exactly like assumevalid).

The node then syncs to the network tip and afterward begins a simultaneous background validation (conventional IBD) up to the base height of the snapshot in order to achieve full validation. Crucially, even while the background validation is happening, the node can validate incoming blocks and transact with the benefit of the full (assumed-valid) UTXO set. Snapshots could be obtained from multiple separate peers in the same way as block download.

The project is in progress at the time of writing, and much of the code is still being refactored. ChainstateManager is one of the newly created classes for the project. It provides an interface for managing one or two chainstates: an IBD chainstate generated by downloading blocks and an optional snapshot chainstate loaded from a UTXO snapshot.

class ChainstateManager
{
private:
    std::unique_ptr<CChainState> m_ibd_chainstate GUARDED_BY(::cs_main);
    std::unique_ptr<CChainState> m_snapshot_chainstate GUARDED_BY(::cs_main);
    CChainState* m_active_chainstate GUARDED_BY(::cs_main) {nullptr};
    // ...
}

The m_ibd_chainstate field is the chainstate used under normal operation (regular IBD). If a snapshot is in use, it is used for background validation while downloading the chain. The m_snapshot_chainstate field is the chainstate initialized on the basis of a UTXO snapshot. If this is non-null, it is always the active chainstate. m_active_chainstate points to either the IBD or snapshot chainstate and indicates the most-work chain. The method below demonstrates this behavior.

CChainState& ChainstateManager::InitializeChainstate(CTxMemPool& mempool, const uint256& snapshot_blockhash)
{
    bool is_snapshot = !snapshot_blockhash.IsNull();
    std::unique_ptr<CChainState>& to_modify =
        is_snapshot ? m_snapshot_chainstate : m_ibd_chainstate;

    if (to_modify) {
        throw std::logic_error("should not be overwriting a chainstate");
    }
    to_modify.reset(new CChainState(mempool, m_blockman, snapshot_blockhash));

    // Snapshot chainstates and initial IBD chaintates always become active.
    if (is_snapshot || (!is_snapshot && !m_active_chainstate)) {
        LogPrintf("Switching active chainstate to %s\n", to_modify->ToString());
        m_active_chainstate = to_modify.get();
    } else {
        throw std::logic_error("unexpected chainstate activation");
    }

    return *to_modify;
}

The chainman.InitializeChainstate(*Assert(node.mempool)) method initializes a new chain state when the node starts up. If, for some reason, it has already been created, an exception will be thrown. Note that the second parameter, snapshot_blockhash, has no value. At the time of writing, it is not yet possible to start the server by passing a snapshot block hash as a parameter. In the function, only if the snapshot_blockhash is null, m_ibd_chainstate will be the active chainstate (m_active_chainstate). This code snippet makes clear that the priority for the active chainstate is the snapshot chainstate.

ChainstateManager has other methods related to assumeutxo such as ActivateSnapshot(…​) and ValidatedChainstate(…​), but they are not being used yet, except for test unit. But there are other methods related to block management like ProcessNewBlockHeaders(…​) and ProcessNewBlock(…​). These functions were originally stand-alone and defined in validation.h, but PR #18698 has made them members of ChainstateManager.

ProcessNewBlockHeaders(…​) is called in src/net_processing.cpp when a cmpctblock message arrives or through the PeerManagerImpl::ProcessHeadersMessage(…​) function when a headers message arrives.
ProcessNewBlock(…​) is called when block, blocktxn or cmpctblock message arrives.
In order for the net_processing.{h,cpp} region to be able to communicate with the validation.{h,cpp} region, the PeerManagerImpl class has a ChainstateManager & m_chainman member variable .

class PeerManagerImpl final : public PeerManager
{
    // ...
    ChainstateManager& m_chainman;
    // ...

    void PeerManagerImpl::ProcessMessage(...) {
        if (msg_type == NetMsgType::CMPCTBLOCK)
        {
            if (!m_chainman.ProcessNewBlockHeaders(...) {
                // ...
            }
            // ...
            if (fBlockReconstructed) {
                // ...
                m_chainman.ProcessNewBlock(...);
                //...
            }
        }

        if (msg_type == NetMsgType::BLOCKTXN)
        {
            // ...
            if (fBlockRead) {
                //...
                m_chainman.ProcessNewBlock(...);
                // ...
            }
        }

        if (msg_type == NetMsgType::BLOCK) {
            // ...
            m_chainman.ProcessNewBlock(...);
            // ...
        }
    }
}

Another important method is ChainstateManager::ActiveChainstate(), which is used to find out which chainstate is active (m_snapshot_chainstate or m_ibd_chainstate) and returns a CChainState object.

CChainState provides an API to update and store our local knowledge of the current best chain. When a new block arrives, this class will perform most of the the work. ChainstateManager::ProcessNewBlock() will trigger the following methods sequentially: CChainState::AcceptBlock(…​), CChainState::ActivateBestChain(…​), CChainState::ActivateBestChainStep(…​), CChainState::ConnectTip(…​) and CChainState::ConnectBlock(…​). Note that all these methods are members of CChainState and they manage the entire cycle of accepting or rejecting a new block.

When accepting a received block, it is necessary to save the block to a file in order to track and store the block information. Thus, CChainState::AcceptBlock(…​) calls src/node/blockstorage.cpp:SaveBlockToDisk(…​) which calls src/validation.cpp:FindBlockPos(…​), which finds the current file position (e.g., 157 from blk00157.dat) and then src/node/blockstorage.cpp:WriteBlockToDisk(…​), which writes block to the history file.

The SaveBlockToDisk(…​) and WriteBlockToDisk(…​) stand-alone functions were originally in the src/validation.cpp file.
The PR #21575 has moved them to the src/node/blockstorage.cpp file, focused on block storage. This PR is part of the effort to break down the massive files src/init.cpp and src/validation.cpp into single-responsibility logical units.

// src/node/blockstorage.cpp
static bool WriteBlockToDisk(....)
{
    // Open history file to append
    CAutoFile fileout(OpenBlockFile(pos), SER_DISK, CLIENT_VERSION);
    //...

    // Write index header
    unsigned int nSize = GetSerializeSize(block, fileout.GetVersion());
    fileout << messageStart << nSize;

    // Write block
    //...
    fileout << block;

    return true;
}

The method above serializes the block to the file (fileout << block).

Although WriteBlockToDisk(…​) and ReadBlockFromDisk(…​) have been removed from the validation.{h,cpp} region, it still contains other utility functions for storing and reading data from the disk, like WriteUndoDataForBlock(…​), DumpMempool(…​), LoadMempool(…​) and bool CChainState::FlushStateToDisk(…​). The latter is particulary important.

This method is called frequently, at any change in the chain state or during shutdown via CChainState::ForceFlushStateToDisk(…​). It checks several conditions to decide whether to update the data on the disk. Examples of these conditions are: the cache is over the limit, it has been a while since the block index was written to the disk, or it has been very long since the cache was last flushed. All these conditions are combined in a variable called fDoFullFlush.

bool CChainState::FlushStateToDisk(...)
{
    // ...
    bool fPeriodicWrite = mode == FlushStateMode::PERIODIC && nNow > nLastWrite + DATABASE_WRITE_INTERVAL;
    // ...
    // Combine all conditions that result in a full cache flush.
    fDoFullFlush = (mode == FlushStateMode::ALWAYS) || fCacheLarge || fCacheCritical || fPeriodicFlush || fFlushForPrune;

    // Write blocks and block index to disk.
    if (fDoFullFlush || fPeriodicWrite) {
        {
            // ...
            FlushBlockFile();
        }
        // Then update all block file information (which may refer to block and undo files).
        {
            // ...
            if (!pblocktree->WriteBatchSync(vFiles, nLastBlockFile, vBlocks)) {
                return AbortNode(state, "Failed to write to block index database");
            }
        }
        // Flush best chain related state. This can only be done if the blocks / block index write was also done.
        if (fDoFullFlush && !CoinsTip().GetBestBlock().IsNull()) {
            // Flush the chainstate (which may refer to block index entries).
            if (!CoinsTip().Flush())
                return AbortNode(state, "Failed to write to coin database");
        }
    }
}

Note that there are three data writes in this code.

The first one is FlushBlockFile() which makes sure the all block and undo data are flushed to disk. They are usually stored in ~/.bitcoin/blocks/. Block files have a name similar to blk02031.dat and the undo file similar to rev02031.dat. The number after blk or rev is increased after the file reaches its maximum limit, defined in MAX_BLOCKFILE_SIZE, which currently has a value of 128 MiB. A file can contain multiple blocks until it reaches this limit.

The second write operation is pblocktree→WriteBatchSync(…​). pblocktree is a CBlockTreeDB object and it represents the block database (usually stored in ~/.bitcoin/blocks/index/), and it is a LevelDB database that contains metadata about all known blocks. CBlockTreeDB is a subclass of CDBWrapper, which is a wrapper class for LevelDB operations.

CoinsTip().Flush() ends up calling CCoinsViewDB::BatchWrite(…​). CCoinsViewDB represents the coins database (chainstate/) and it has std::unique_ptr<CDBWrapper> m_db member to access the database. This is also a LevelDB database with a compact representation of all currently unspent transaction outputs (the UTXO Set). In simplified terms, the chainstate directory contains the state of the latest block. It stores every spendable coin, who owns it, and how much it’s worth.

In short, all these operations handle basically four pieces of data:

  • blocks/blk*.dat: the actual Bitcoin blocks, in network format, dumped in raw on the disk. They are only needed for rescanning missing transactions in a wallet, reorganizing to a different part of the chain, and serving the block data to other synchronizing nodes.

  • blocks/index/*: this is a LevelDB database that contains metadata about all known blocks, and where to find them on the disk. Without this, finding a block would be very slow.

  • chainstate/*: this is a LevelDB database with a compact representation of all currently unspent transaction outputs and some metadata about the transactions they are from. The data here is necessary for validating new incoming blocks and transactions.

  • blocks/rev*.dat: these contain "undo" data. Blocks are like 'patches' for the chain state (they consume some unspent outputs and produce new ones), and the undo data as reverse patches. They are necessary for rolling back the chainstate, which is necessary in case of reorganizations.

Another important detail about the leveldb-stored chainstate is that the data can be obfuscated. In order to do it, a randomly generated string consisting of 8 random bytes is used as an obfuscating key. The CDBWrapper::CreateObfuscateKey() method creates the key and it is stored in std::vector<unsigned char> obfuscate_key member variable. This has been implemented in PR #6650 to avoid spurious detection by anti-virus software.

There is one more data file managed by the validation.{h,cpp} region: the mempool.dat. This file stores the mempool when the node is restarted. It was implemented in the PR #8448. As can be seen in the comments, the functionality was directly requested by miners because nodes then ended up making small blocks after being restarted. Mempool sync is also a bandwidth concern. If it’s lost on a restart, then every quick restart would waste bandwidth.

The methods that handle the mempool persistence are: DumpMempool(…​) and LoadMempool(…​).

bool DumpMempool(...)
{
    // ...
    std::vector<TxMempoolInfo> vinfo;
    // ...
    try {
        FILE* filestr{mockable_fopen_function(GetDataDir() / "mempool.dat.new", "wb")};
        // ...
        CAutoFile file(filestr, SER_DISK, CLIENT_VERSION);

        uint64_t version = MEMPOOL_DUMP_VERSION;
        file << version;

        file << (uint64_t)vinfo.size();
        for (const auto& i : vinfo) {
            file << *(i.tx);
            file << int64_t{count_seconds(i.m_time)};
            file << int64_t{i.nFeeDelta};
            mapDeltas.erase(i.tx->GetHash());
        }

        // ...

        file.fclose();
        if (!RenameOver(GetDataDir() / "mempool.dat.new", GetDataDir() / "mempool.dat")) {
            throw std::runtime_error("Rename failed");
        }
        // ...
    }
    // ...
    return true;
}

The mempool persistence is a simple (de)serialization operation using CAutoFile stream. DumpMempool(…​) is called during src/init.cpp:Shutdown(…​) and the load operation, LoadMempool(…​), is called in src/node/blockstorage.cpp:ThreadImport(…​), which as seen in [concurrency_model] section, is called at the startup.

This region accumulates many responsibilities (such as chain state, validation, and persistence) and the reason for this is that it is the result of refactoring the main.{h,cpp} file. Efforts are underway to break it down into smaller units.

The diagram below shows the most relevant classes in the validation.{h,cpp} region and summarizes what has been demonstrated so far.

chainstate2
Figure 2. Validation Region Classes

txmempool.{h,cpp}

Unlike a bank, the Bitcoin protocol does not have a central server to which users send their payments. It is purely peer-to-peer. When a transaction is broadcasted, it is sent from a node to its peers, who, in turn, pass it on to their peers.

Nodes will run a series of checks to ensure that the transaction is valid – that is, verifying that signatures are correct, outputs do not exceed inputs, and funds have not yet been spent. The class that performs these tasks is the validation.cpp:MemPoolAccept. Note that validation is done in the previously discussed validation.{h,cpp} region before the transaction is sent to txmempool.{h,cpp} region.

When a new transaction arrives (tx message), the transaction is deserialized (vRecv >> ptx) into the CTransaction& tx variable. The node will then check if it already has this transaction (if (AlreadyHaveTx(…​))). If not, MempoolAcceptResult validation.cpp:AcceptToMemoryPool(…​) is called to validate the transaction against multiple rules.

void PeerManagerImpl::ProcessMessage(...)
{
    if (msg_type == NetMsgType::TX) {
        // ...
        CTransactionRef ptx;
        vRecv >> ptx;
        const CTransaction& tx = *ptx;

        const uint256& txid = ptx->GetHash();
        const uint256& wtxid = ptx->GetWitnessHash();
        // ...
        if (AlreadyHaveTx(GenTxid(/* is_wtxid=*/true, wtxid))) {
            // ...
        }
        const MempoolAcceptResult result = AcceptToMemoryPool(m_chainman.ActiveChainstate(), m_mempool, ptx, false /* bypass_limits */);
        // ...
    }
}

It is interesting to note that a member variable called m_mempool is passed as a paramater to AcceptToMemoryPool(…​). This variable (of type CTxMemPool) allows the net_processing.{h,cpp} region to communicate with the txmempool.{h,cpp} region.

AcceptToMemoryPoolWithTime(…​) tries to add the transaction and the the current time (nAcceptTime parameter) to the memory pool.
The first line creates a vector (coins_to_uncache) that will be used to remove coins that were not previously present in the coins cache but were added to assist in validating the transaction.
Then, a set of parameters that is useful for validation is created. This set of parameters includes the transaction to be added and the coins_to_uncache vector.
Next, a MemPoolAccept object is instantiated, receiving the mempool and the active chainstate as arguments. This object calls AcceptSingleTransaction(…​) to process the transaction acceptance.

static MempoolAcceptResult AcceptToMemoryPoolWithTime(...) EXCLUSIVE_LOCKS_REQUIRED(cs_main)
{
    std::vector<COutPoint> coins_to_uncache;
    MemPoolAccept::ATMPArgs args { chainparams, nAcceptTime, bypass_limits, coins_to_uncache, test_accept };

    assert(std::addressof(::ChainstateActive()) == std::addressof(active_chainstate));
    const MempoolAcceptResult result = MemPoolAccept(pool, active_chainstate).AcceptSingleTransaction(tx, args);
    if (result.m_result_type != MempoolAcceptResult::ResultType::VALID) {
        for (const COutPoint& hashTx : coins_to_uncache)
            active_chainstate.CoinsTip().Uncache(hashTx);
    }
    BlockValidationState state_dummy;
    active_chainstate.FlushStateToDisk(chainparams, state_dummy, FlushStateMode::PERIODIC);
    return result;
}

The validation.cpp:MemPoolAccept class manages all validation steps and, if the transaction passes all of the checks, adds it to the mempool. This is done through five functions:

  • bool PreChecks(…​): Runs the policy checks on a given transaction, excluding any script checks. Looks up inputs, calculates feerate, considers replacement, evaluates package limits, etc. All tests are done here and fast are computationally cheap to avoid CPU Denial of Service (DoS).

  • bool PolicyScriptChecks(…​): Runs the script checks using the policy flags. As this can be slow, we should only invoke this on transactions that have already passed policy checks performed by the previously mentioned function PreChecks(…​)`.

  • bool ConsensusScriptChecks(…​): Re-runs the script checks, using consensus flags, and tries to cache the result in the scriptcache. This should be done after PolicyScriptChecks(…​). This requires that all the inputs are in the UTXO set or the mempool.

  • bool Finalize(…​): Tries to add the transaction to the mempool, removing any conflicts first. Returns true if the transaction is in the mempool after any size limiting is performed. Otherwise, it returns false.

  • bool CheckFeeRate(…​): Checks that the transaction is not below the minimum fee rate allowed.

By knowing the purpose of each function, it is easier to understand the MemPoolAccept::AcceptSingleTransaction(…​) code. It calls the validation methods in increasing order of CPU effort, starting with PreChecks(args, ws). This way, if the validation fails, it does it in the cheapest way possible, without overloading the CPU.

In the last method, MemPoolAccept::Finalize(…​), if everything has been successfully validated, the command m_pool.addUnchecked(…​) is called to add the transaction to the mempool.

MempoolAcceptResult MemPoolAccept::AcceptSingleTransaction(const CTransactionRef& ptx, ATMPArgs& args)
{
    // ...

    Workspace ws(ptx);

    if (!PreChecks(args, ws)) return MempoolAcceptResult(ws.m_state);

    PrecomputedTransactionData txdata;

    if (!PolicyScriptChecks(args, ws, txdata)) return MempoolAcceptResult(ws.m_state);

    if (!ConsensusScriptChecks(args, ws, txdata)) return MempoolAcceptResult(ws.m_state);

    // Tx was accepted, but not added
    if (args.m_test_accept) {
        return MempoolAcceptResult(std::move(ws.m_replaced_transactions), ws.m_base_fees);
    }

    if (!Finalize(args, ws)) return MempoolAcceptResult(ws.m_state);

    GetMainSignals().TransactionAddedToMempool(ptx, m_pool.GetAndIncrementSequence());

    return MempoolAcceptResult(std::move(ws.m_replaced_transactions), ws.m_base_fees);
}

Note that there is also a Workspace ws variable. It represents all the intermediate states that get passed between the various levels of checking a given transaction. But more importantly, it has a std::unique_ptr<CTxMemPoolEntry> m_entry member variable, which represents the new entry that will be added to the mempool if the transaction is completely valid.

CTxMemPoolEntry represents not only the transaction that is in the mempool, but it also stores data about the corresponding transaction, like the fee, the weight, the memory usage, local time when entered into the mempool, and others.

There are also two important fields: Parents m_parents and Children m_children. A child transaction is a transaction that spends one or more of the UTXOs from another transaction, called parent transaction. Generally, a descendant transaction spends a UTXO from a transaction that derives from one or more previous transactions, called ancestor transactions.

Typically, when a new transaction is added to the mempool, it has no in-mempool children, but it can have parents. If a child transaction appeared, it would be spending a UTXO that didn’t exist and would therefore be invalid.

class CTxMemPoolEntry
{
public:
    typedef std::set<CTxMemPoolEntryRef, CompareIteratorByHash> Parents;
    typedef std::set<CTxMemPoolEntryRef, CompareIteratorByHash> Children;

private:
private:
    const CTransactionRef tx;
    mutable Parents m_parents;
    mutable Children m_children;
    const CAmount nFee;
    const size_t nTxWeight;
    const size_t nUsageSize;
    const int64_t nTime;
    // ...
}
// ...
class CTxMemPool
{
    // ...
public:
    typedef boost::multi_index_container<
        CTxMemPoolEntry,
        boost::multi_index::indexed_by<
            // sorted by txid
            boost::multi_index::hashed_unique<mempoolentry_txid, SaltedTxidHasher>,
            // sorted by wtxid
            boost::multi_index::hashed_unique<
                boost::multi_index::tag<index_by_wtxid>,
                mempoolentry_wtxid,
                SaltedTxidHasher
            >,
            // ...
        >
    > indexed_transaction_set;
    // ...
    indexed_transaction_set mapTx GUARDED_BY(cs);

    using txiter = indexed_transaction_set::nth_index<0>::type::const_iterator;
    std::vector<std::pair<uint256, txiter>> vTxHashes GUARDED_BY(cs);
    // ...
}

CTxMemPool::mapTx is a CTxMemPoolEntry container (data structure) that represents the mempool. It uses boost::multi_index which sorts the mempool on 5 criteria:

  • transaction hash (txid)

  • witness-transaction hash (wtxid)

  • descendant feerate

  • time in mempool

  • ancestor feerate

For Mempool operations to be executed (such as removing a transaction or updating its descendants), several attributes of the transactions need to be indexed and directly accessed. A common solution for this case would be to store the transactions in multiple data structures, but boost::multi_index is a container that offers a customizable interface and allows the same elements to be accessed in different ways.

Instead of having to store the CTxMemPoolEntry elements in a vector or a set, and then synchronizing them continuously, the boost::multi_index container can be used since it provides a unique interface with one or more indexes with different sorting and access semantics.

This way, new transactions (i.e., CTxMemPoolEntry objects) can be added simply by calling mapTx.insert(entry) in CTxMemPool::addUnchecked(…​) and can be acessed according to any of the 5 criteria mentioned above.

Likewise, the transactions that already exist in the mempool can be removed by calling mapTx.erase(it) in CTxMemPool::removeUnchecked(…​). It usually happens when a transaction is included in a block; or expired (specified by -mempoolexpiry); or when there is a conflict transaction; or the memory pool size limit has been reached (set by -maxmempool). The lowest fee transactions are removed first.

The eviction logic for removing transactions due to size limit can be found in the void CTxMemPool::TrimToSize(…​) method.

Another relevant class in this region is the CCoinsViewMemPool, which provides a way to access all coins which are either unspent in the UTXO set or are outputs from any mempool transaction. Thus, all the inputs of a transaction can be checked before inserting it into the mempool, even if the inputs are not in the coins cache. It also allows the signature of a double-spend transaction directly in signrawtransactionwithkey and signrawtransactionwithwallet, as long as the conflicting transaction has not yet been confirmed.

coins.{h,cpp} & txdb.{h,cpp}

Both CCoinsViewMemPool and CCoinsViewCache are classes derived from CCoinsViewBacked, which basically functions as a common interface for these two subclasses.

CCoinsViewMemPool has already been explained in the previous section. CCoinsViewCache represents a cache of some coins available in UTXO Set, and it keeps as many coins in the memory as can fit according to the -dbcache setting. Using the cache reduces the frequency of expensive read operations from the chainstate/* LevelDB database, in which the most recent UTXO set is stored.

Access to chainstate/* database is managed by CCoinsViewDB class through the std::unique_ptr<CDBWrapper> m_db property.
CDBWrapper, as the name implies, is a wrapper for common database operations, such as Write(), Read(), Erase(), Exists() and WriteBatch(). All unspent coins reside in the chainstate database.

CCoinsViewDB and CCoinsViewBacked are classes derived from the CCoinsView, which is an abstract class that defines the methods to be used to access both the database and the cache.

To access the coins.{h,cpp} & txdb.{h,cpp} region and manage the UTXO set, CChainState has the member field CoinsViews m_coins_views.
CoinsViews is a convenience class for constructing the CCoinsView hierarchy and is used to facilitate access to the UTXO set. This class consists of an arrangement of layered CCoinsView objects. It prefers to store and retrieve coins in memory via m_cacheview but ultimately falling back on disk, m_dbview.

class CoinsViews {

public:
    CCoinsViewDB m_dbview GUARDED_BY(cs_main);

    CCoinsViewErrorCatcher m_catcherview GUARDED_BY(cs_main);

    std::unique_ptr<CCoinsViewCache> m_cacheview GUARDED_BY(cs_main);

    CoinsViews(std::string ldb_name, size_t cache_size_bytes, bool in_memory, bool should_wipe);

    void InitCache() EXCLUSIVE_LOCKS_REQUIRED(::cs_main);
};

The diagram below shows the CoinsViews classes.

ccviews
Figure 3. CoinsViews Classes

dbwrapper.{h,cpp} & indexes/

The CDBWrapper is a class that manages the access and the operations for the LevelDB database. It has already been presented in the previous section.
The section on validation.{h,cpp} also mentioned the obfuscation mechanism used by this class to avoid spurious detection by anti-virus software.

It was also previously stated that CDBWrapper is used with CCoinsViewDB to manage the UTXO Set database and with CBlockTreeDB to manage metadata about all known blocks. Two classes, however, have not yet been mentioned: BlockFilterIndex and TxIndex. Both are derived from BaseIndex.

BlockFilterIndex was introduced in PR #14121 to implement a new index, which stores the compact block filters for blocks that have been validated.
This is part of the BIP 157, which defines a light client protocol based on deterministic filters of block content. The filters are designed to minimize the expected bandwidth consumed by light clients, downloading filters, and full blocks.

The filter construction proposed is an alternative to Bloom filters, used in BIP 37, which have known flaws that weaken security and privacy. BIP 157 can be seen as the opposite of BIP 37: instead of the client sending a filter to a full node peer, full nodes generate deterministic filters on block data that are served to the client. A light client can then download an entire block if the filter matches the data it is looking for. As filters are deterministic, they only need to be constructed once and stored on the disk whenever a new block is connected to the chain.

Note that BaseIndex implements the CValidationInterface, so that it can listen to BlockConnected(…​) events. When a new block is connected, BaseIndex calls the virtual method WriteBlock(…​) which should be implemented by the derived class. Therefore, BlockFilterIndex only needs to implement this method to write the block to database.

void BaseIndex::BlockConnected(...)
{
    // ...
    if (WriteBlock(*block, pindex)) {
        m_best_block_index = pindex;
    } else {
        FatalError("%s: Failed to write block %s to index",
                   __func__, pindex->GetBlockHash().ToString());
        return;
    }
    // ...
}

bool BlockFilterIndex::WriteBlock(...)
{
    // ...
    BlockFilter filter(m_filter_type, block, block_undo);

    size_t bytes_written = WriteFilterToDisk(m_next_filter_pos, filter);
    if (bytes_written == 0) return false;

    std::pair<uint256, DBVal> value;
    value.first = pindex->GetBlockHash();
    value.second.hash = filter.GetHash();
    value.second.header = filter.ComputeHeader(prev_header);
    value.second.pos = m_next_filter_pos;

    if (!m_db->Write(DBHeightKey(pindex->nHeight), value)) {
        return false;
    }
    // ...
}

To enable compact block filters, the node should be started with -blockfilterindex = 1. The blockfilter database is located in indexes/blockfilter/.

TxIndex class was introduced in PR #13033, which refactored the transaction index code. Like BlockFilterIndex, this class builds the transaction index, listens to the CValidationInterface events, and overrides the WriteBlock(…​) method.
TxIndex looks up transactions included in the blockchain by hash. The index is written to a LevelDB database and records the filesystem location of each transaction by transaction hash. The txindex database is located in indexes/txindex/.

By default, Bitcoin Core doesn’t maintain any transaction-level data, except for those in the mempool or pertinent to the user’s wallet addresses. But if the node is started with the argument, -txindex = 1, Bitcoin Core will build and maintain an index of all transactions that have ever happened. Block explorers require -txindex=1.

The -txindex is incompatible with prune mode (-prune). -blockfilterindex was also incompatible, but it has changed with PR #15946, allowing the maintenance of the block filter index when using prune.

All the index files are located in the src/index/ folder.

dbwrapper
Figure 4. DB Wrapper Classes

script/

The script.{h,cpp} file originally concentrated all the functionality related to creating and executing scripts on Bitcoin Core. But, in PR #5093, it was split into standard.{h,cpp} (commit c4408a), sign.{h,cpp} (commit e088d6) and interpreter.{h,cpp} (commit da03e6).

The src/script/script.{h,cpp} file has the opcodes and the CScript class.
CScript was initially a derived class from std::vector<unsigned char> but the PR 6914 has changed it to a derived class from prevector<28, unsigned char>, reducing the memory consumption.

CScript basically represents a sequence of opcodes, which is the char type. But there is also a CScriptNum to handle the result of numeric operations between two numeric opcodes. Although the operands are also restricted to operate on 4-byte integers, the results may overflow. CScriptNum enforces this semantics by storing results as an int64.

The src/script/standard.{h,cpp} defines the common Bitcoin script templates (PKHash, ScriptHash, WitnessV0ScriptHash and WitnessV0KeyHash). There is also CNoDestination and WitnessUnknown for unknown or incorrect patterns.

The method src/script/standard.cpp:GetScriptForDestination(…​) is used when creating a new transaction to decode the recipient’s address and return the corresponding script. It is also used by the descriptor wallet for fetching new addresses through the DescriptorScriptPubKeyMan::TopUp(..) and std::vector<CScript> MakeScripts(…​) methods.

using CTxDestination = std::variant<CNoDestination, PKHash, ScriptHash, WitnessV0ScriptHash, WitnessV0KeyHash, WitnessUnknown>;
// ...
class CScriptVisitor
{
public:
    //...
    CScript operator()(const PKHash& keyID) const
    {
        return CScript() << OP_DUP << OP_HASH160 << ToByteVector(keyID) << OP_EQUALVERIFY << OP_CHECKSIG;
    }

    CScript operator()(const ScriptHash& scriptID) const
    {
        return CScript() << OP_HASH160 << ToByteVector(scriptID) << OP_EQUAL;
    }

    CScript operator()(const WitnessV0KeyHash& id) const
    {
        return CScript() << OP_0 << ToByteVector(id);
    }
    // ...
};

CScript GetScriptForDestination(const CTxDestination& dest)
{
    return std::visit(CScriptVisitor(), dest);
}
//...

The src/script/sign.{h,cpp} handles transaction signing. sign.cpp:SignTransaction(…) is used by legacy and descriptor wallets to sign transactions.

The src/script/interpreter.cpp:EvalScript(…​) receives CScript& script as a parameter, reads each opcode and processes them. This function is used in src/script/interpreter.cpp:VerifyScript(…​), which is called by src/validation.cpp:CheckInputScripts(…​) which validate the script of each input of a transaction. It is called every time a new transaction or new block is announced.

CheckInputScripts(…​), after executing the validation of all the provided scripts, stores the transaction’s scripts in a cache called g_scriptExecutionCache. Note that the cache is indexed only by the script’s execution flags and the transaction witness hash. Therefore, if the node sees the transaction again, it will avoid a costly script verification.
g_scriptExecutionCache is initialized in validation.cpp:InitScriptExecutionCache() and has its size defined by the -maxsigcachesize argument. If the node is started without this argument, the DEFAULT_MAX_SIG_CACHE_SIZE (32 MB) will be used.
This functionality was introduced in PR #10192.

bool CheckInputScripts(...)
{
    // ...
    uint256 hashCacheEntry;
    CSHA256 hasher = g_scriptExecutionCacheHasher;
    hasher.Write(tx.GetWitnessHash().begin(), 32).Write((unsigned char*)&flags, sizeof(flags)).Finalize(hashCacheEntry.begin());
    AssertLockHeld(cs_main); //TODO: Remove this requirement by making CuckooCache not require external locks
    if (g_scriptExecutionCache.contains(hashCacheEntry, !cacheFullScriptStore)) {
        return true;
    }
    // ...
    if (cacheFullScriptStore && !pvChecks) {
        g_scriptExecutionCache.insert(hashCacheEntry);
    }

    return true;
}

There is another cache, called CSignatureCache which stores valid signatures to avoid doing expensive ECDSA signature checking twice for every transaction (once when accepted into memory pool, and again when accepted into the blockchain).
The ECDSA signature cache was introduced in PR #1349 but has changed significantly since then. In the 4890 it was moved to src/script/sigcache.cpp file and in the PR 8895, a new cache mechanism called CuckooCache was adopted, replacing the previous data structure, boost::unordered_set.

This cache is initalized in sigcache.cpp:InitSignatureCache(). It also uses DEFAULT_MAX_SIG_CACHE_SIZE or -maxsigcachesize as a reference to cache size.
When EvalScript(…​) is processing OP_CHECKSIG, OP_CHECKSIGVERIFY, OP_CHECKMULTISIG or OP_CHECKMULTISIGVERIFY opcodes, it calls src/script/sigcache.cpp:CachingTransactionSignatureChecker::VerifyECDSASignature(..) to do the check. If the signature exists in the cache (signatureCache.Get(entry, !store)), it will return true. Otherwise, the ECDSA signature will be verified and, if valid, it will be stored in the cache (signatureCache.Set(entry)).

bool CachingTransactionSignatureChecker::VerifyECDSASignature(...) const
{
    uint256 entry;
    signatureCache.ComputeEntryECDSA(entry, sighash, vchSig, pubkey);
    if (signatureCache.Get(entry, !store))
        return true;
    if (!TransactionSignatureChecker::VerifyECDSASignature(vchSig, pubkey, sighash))
        return false;
    if (store)
        signatureCache.Set(entry);
    return true;
}

consensus/

This region contains procedures for consensus critical actions like computing Merkle Tree, the maximum allowed size for a serialized block, the maximum allowed weight for a block (BIP 141), coinbase maturity and so on.

// ...
static const unsigned int MAX_BLOCK_SERIALIZED_SIZE = 4000000;
static const unsigned int MAX_BLOCK_WEIGHT = 4000000;
static const int64_t MAX_BLOCK_SIGOPS_COST = 80000;
static const int COINBASE_MATURITY = 100;
static const int WITNESS_SCALE_FACTOR = 4;
// ...

These constants are defined in the src/consensus/consensus.h file and are used at various points in the application. For instance, COINBASE_MATURITY is used when validating a new transaction or connecting a new block. If the input is coinbase and it is not mature enough (more than 100 blocks later), the validation will fail.

This validation, in particular, is done in another file in the consensus/ region, called tx_verify.h, which contains a set of functions to check if the transaction follows consensus rules.
The Consensus::CheckTxInputs(…​) function, for example, checks whether the transaction inputs are available, checks for negative input values or that the transaction fee is lower than the the MAX_MONEY constant (21 million).

To keep track of the validation results in various parts of the application, there are the TxValidationState and BlockValidationState classes. These states are used in the PeerManagerImpl::MaybePunishNodeForTx(…​) and MaybePunishNodeForBlock(…​) methods to penalize peers who have sent blocks or transactions that do not comply with the consensus rules.

bool PeerManagerImpl::MaybePunishNodeForTx(...)
{
    switch (state.GetResult()) {
    // ...
    // The node is providing invalid data:
    case TxValidationResult::TX_CONSENSUS:
        Misbehaving(nodeid, 100, message);
        return true;
    // ...
    }
}

bool PeerManagerImpl::MaybePunishNodeForBlock(...)
{
    switch (state.GetResult()) {
    // ...
    // The node is providing invalid data:
    case BlockValidationResult::BLOCK_CONSENSUS:
    case BlockValidationResult::BLOCK_MUTATED:
        if (!via_compact_block) {
            Misbehaving(nodeid, 100, message);
            return true;
        }
        break;
    // ...
    }
}

Merkle Tree is another important component of the consensus rules. There are two important functions BlockMerkleRoot(…​) and BlockWitnessMerkleRoot(…​), and they are used when checking an incoming block in validation.cpp:CheckBlock(…) or when generating a new block in src/rpc/mining.cpp:generateblock().

The file src/consensus/params.h defines important chain validation parameters such as block heights in which critical consensus rules have been implemented (like BIP 16, BIP 34, Segwit and others).

struct Params {
    uint256 hashGenesisBlock;
    int nSubsidyHalvingInterval;
    uint256 BIP16Exception;
    int BIP34Height;
    uint256 BIP34Hash;
    int BIP65Height;
    int BIP66Height;
    int CSVHeight;
    int SegwitHeight;
    // ...
}

These parameters are set in src/chainparams.cpp for each chain (mainnet, testnet, regtest or signet) and they are used when validating new blocks or new transactions.

class CMainParams : public CChainParams {
public:
    CMainParams() {
        strNetworkID = CBaseChainParams::MAIN;
        //...
        consensus.BIP16Exception = uint256S("0x00000000000002dc756eebf4f49723ed8d30cc28a5f108eb94b1ba88ac4f9c22");
        consensus.BIP34Height = 227931;
        consensus.BIP34Hash = uint256S("0x000000000000024b89b42a942fe0d9fea3bb44ab7bd1b19115dd6a759c0808b8");
        consensus.BIP65Height = 388381;
        consensus.BIP66Height = 363725;
        consensus.CSVHeight = 419328;
        consensus.SegwitHeight = 481824;
        // ...
    }
    // ...
}

policy/

This region contains logic for making various assessments about transactions and for doing fee estimation.

Methods for transactions such as IsStandardTx(…​), GetVirtualTransactionSize(…​) or IsRBFOptIn(…​) can be found in rbf.h, policy.h and settings.h. In these files, there also are some constants related to transactions, like MAX_STANDARD_TX_WEIGHT or MIN_STANDARD_TX_NONWITNESS_SIZE. These methods and constants are usually used in validation.cpp for the purpose of validation, in methods like PreChecks(…​) which verifies a new transaction before inserting it in the mempool.

bool MemPoolAccept::PreChecks(ATMPArgs& args, Workspace& ws)
{
    if (fRequireStandard && !IsStandardTx(tx, reason))
        return state.Invalid(TxValidationResult::TX_NOT_STANDARD, reason);
    // ...
    if (::GetSerializeSize(tx, PROTOCOL_VERSION | SERIALIZE_TRANSACTION_NO_WITNESS) < MIN_STANDARD_TX_NONWITNESS_SIZE)
        return state.Invalid(TxValidationResult::TX_NOT_STANDARD, "tx-size-small");
    // ...
}

The CBlockPolicyEstimator class is used for estimating the fee rate needed for a transaction to be included in a block within a certain number of blocks. When a transaction is accepted to the mempool, the method CBlockPolicyEstimator::processTransaction(…​) is called to consider the new transaction in the fee calculations. This method stores the fee and the size of the newly added transaction in a CFeeRate object.

The CFeeRate class is used to represent the fee rate in satoshis per kilobyte (CAmount / kB). Then the fee rate and the current block height are added to the fee statistics (represented by class TxConfirmStats, that tracks historical data on transaction confirmations).

void CBlockPolicyEstimator::processTransaction(...)
{
    // ...
    CFeeRate feeRate(entry.GetFee(), entry.GetTxSize());

    mapMemPoolTxs[hash].blockHeight = txHeight;
    unsigned int bucketIndex = feeStats->NewTx(txHeight, (double)feeRate.GetFeePerK());
    // ...
}

The method that is used by the wallet and by the RPC to get an estimate fee is the CBlockPolicyEstimator::estimateSmartFee (…​).

interface/

An attentive reader may have noticed in the previous section that the wallet does not access the CBlockPolicyEstimator::estimateSmartFee (…​) directly, but through the property interfaces::Chain* m_chain. A look at the interface’s code will show that it accesses the fee estimator through a NodeContext, which, as mentioned in the net.{h,cpp} section, is a struct to provide a single point of access to the chain state and the connection state.

But what would be the problem if the wallet directly accesses the fee estimator? As mentioned in the [executables] section, node and wallet are completely different concepts, although they can eventually be implemented together in the same software.

The fee estimator is a node function. The node needs to access the mempool and the history blocks to calculate the best fee rate. In a good separation of concerns, the wallet should never access any node function directly.

In general, software with tightly coupled components is difficult to maintain and understand. It happens in Bitcoin Core, especially in the older and legacy code.
An example of it is the monolithic architecture. bitcoind runs p2p code, validation code, and wallet code.
bitcoin-qt runs all the same things that the deamon runs plus the GUI code. Therefore, only one can be run at a time.

A better approach would be a multiprocess architecture, with three executables: bitcoin-node, which would only run the node and validation code only; bitcoin-wallet, which would only run the wallet code only and bitcoin-gui, which would only run the GUI code only. So, the processes can communicate with each other and can be started and stopped independently. That is exactly the proposal of the Process Separation project.

There is a page in bitcoin-devwiki dedicated to this project, where there is a more detailed description and links to presentations and answers about the project.

PR #15288 (which is part of this project) removes all calls from the wallet to the global node. So instead of accessing the NodeContext directly, CWallet calls the intermediate Chain interface to access the chain state. To do this, CWallet has an interfaces::Chain* m_chain member variable.

// src/wallet/wallets.h
class CWallet final : public WalletStorage, public interfaces::Chain::Notifications
{
    interfaces::Chain* m_chain;
    // ...
    bool HaveChain() const { return m_chain ? true : false; }
    // ...
    interfaces::Chain& chain() const { assert(m_chain); return *m_chain; }
    // ...
}

// src/wallet/fees.cpp
CFeeRate GetMinimumFeeRate(...)
{
    CFeeRate feerate_needed;
    // ...
    feerate_needed = wallet.chain().estimateSmartFee(target, conservative_estimate, feeCalc);
    // ...
}

At the time of this writing, there are the following other interfaces defined in src/interfaces/:

  • Chain — used by the wallet to access blockchain and mempool state. Added in #14437, #14711, #15288, and #10973.

  • ChainClient — used by the node to start and stop Chain clients. Added in #14437.

  • Node — used by the GUI to start and stop bitcoin node. Added in #10244.

  • Wallet — used by the GUI to access wallets. Added in #10244.

  • Handler — returned by handle[Event] methods on interfaces above and used to manage the lifetime of event handlers.

qt/

This region contains all the code for the graphical user interface.

The entry point for starting Bitcoin Core in graphical mode is src/qt/bitcoin.cpp.

There are two main classes there: BitcoinCore and BitcoinApplication. The first one encapsulates the startup and the shutdown logic and also allows running startup and shutdown in a different thread from the UI thread. BitcoinApplication extends QApplication and is the main Bitcoin Core application object.

In the int GuiMain(int argc, char* argv[)], the node interface (interfaces::Node) is created, and the splash screen is launched. This interface was briefly mentioned in the last section. It acts as a bridge between the GUI wallet and the node. This way, the wallet can obtain any information about the node (UTXO Set, mempool, etc …​) without directly accessing it. This interface also provides methods for starting or shutting down the node. This approach provides better modularization and isolation between components.

// src/qt/bitcoin.cpp
// ...
void BitcoinCore::initialize()
{
    // ...
    bool rv = m_node.appInitMain(&tip_info);
    // ...
}
// ...
void BitcoinApplication::setNode(interfaces::Node& node)
{
    assert(!m_node);
    m_node = &node;
    if (optionsModel) optionsModel->setNode(*m_node);
    if (m_splash) m_splash->setNode(*m_node);
}
//...
int GuiMain(int argc, char* argv[])
{
    // ...
    util::ThreadSetInternalName("main");

    NodeContext node_context;
    std::unique_ptr<interfaces::Node> node = interfaces::MakeNode(&node_context);
    // ...
    if (gArgs.GetBoolArg("-splash", DEFAULT_SPLASHSCREEN) && !gArgs.GetBoolArg("-min", false))
        app.createSplashScreen(networkStyle.data());

    app.setNode(*node);
    // ...
}

The m_node.appInitMain(&tip_info) starts Bitcoin Core using init.cpp:AppInitMain(…​), just like when running the daemon.

The UI files are located in qt/forms folder, and the translation files are int the qt/locale.

rpc/

Remote Procedure Call (RPC) allows a program to request a service from a program located in another computer on a network without having to understand the network’s details. Bitcoin Core JSON-RPC Server allows the node to be accessed and operated remotely. It is often used by indexers and by client wallets that connect to a node.

Like almost all Bitcoin Core features, the RPC server is started at init.cpp:AppInitMain(…​) by calling AppInitServers(…​). Making the server available requires the -server argument, but note that bitcoind sets this argument as true by default. The default port for the server is 8332.

// src/init.cpp
static bool AppInitServers(NodeContext& node)
{
    const ArgsManager& args = *Assert(node.args);
    RPCServer::OnStarted(&OnRPCStarted);
    RPCServer::OnStopped(&OnRPCStopped);
    //..
    StartRPC();
    node.rpc_interruption_point = RpcInterruptionPoint;
    if (!StartHTTPRPC(&node))
        return false;
    //...
}

bool AppInitMain(...)
{
    // ...
    bool x = args.GetBoolArg("-server", false);
    if (args.GetBoolArg("-server", false)) {
        uiInterface.InitMessage_connect(SetRPCWarmupStatus);
        if (!AppInitServers(node))
            return InitError(_("Unable to start HTTP server. See debug log for details."));
    }
    // ...
}
// src/bitcoind.cpp
static bool AppInit(int argc, char* argv[])
{
    // ...
    // -server defaults to true for bitcoind but not for the GUI so do this here
    args.SoftSetBoolArg("-server", true);
    // ...
}

The files that implement the RPC commands are located in src/rpc (except src/wallet/rpcwallet.cpp). These functions return RPCHelpMan type, which contains not only the result but the help message, the function name, the supported arguments and examples for that command.
The results of the RPC commands must be the UniValue type, which means universal value class, with JSON encoding and decoding.
UniValue is an abstract data type that may be a null, boolean, string, number, array container, or a key/value dictionary container, nested to an arbitrary depth. An example is shown below:

// src/rpc/blockchain.cpp
static RPCHelpMan getblockcount()
{
    return RPCHelpMan{"getblockcount",
                "\nReturns the height of the most-work fully-validated chain.\n"
                "The genesis block has height 0.\n",
                {},
                RPCResult{
                    RPCResult::Type::NUM, "", "The current block count"},
                RPCExamples{
                    HelpExampleCli("getblockcount", "")
            + HelpExampleRpc("getblockcount", "")
                },
        [&](const RPCHelpMan& self, const JSONRPCRequest& request) -> UniValue
{
    LOCK(cs_main);
    return ::ChainActive().Height();
},
    };
}
//...
void RegisterBlockchainRPCCommands(CRPCTable &t)
{
// clang-format off
static const CRPCCommand commands[] =
{ //  category              actor (function)
  //  --------------------- ------------------------
    { "blockchain",         &getblockchaininfo,                  },
    { "blockchain",         &getchaintxstats,                    },
    { "blockchain",         &getblockstats,                      },
    { "blockchain",         &getbestblockhash,                   },
    { "blockchain",         &getblockcount,                      },
    // ...
}
//..
}

To register the RPC functions, at the end of each file there are the RegisterBlockchainRPCCommands(…​), RegisterNetRPCCommands(…​) or RegisterMiningRPCCommands(…​) and other registration functions. This pattern applies to all RPC functions.

They are stored in a global variable called CRPCTable tableRPC. The CRPCTable::execute(…​) will execute the command sent by RPC and return the UniValue result to the server.
UniValue::write(…​) converts the result to string format, responding to the request.

// src/httprpc.cpp
static bool HTTPReq_JSONRPC(const std::any& context, HTTPRequest* req)
{
    // ...
        } else if (valRequest.isObject()) {
            //....
            UniValue result = tableRPC.execute(jreq);

            // Send reply
            strReply = JSONRPCReply(result, NullUniValue, jreq.id);
        }
    // ...
}
// src/rpc/server.cpp
UniValue CRPCTable::execute(const JSONRPCRequest &request) const
{
    //...
    auto it = mapCommands.find(request.strMethod);
    if (it != mapCommands.end()) {
        UniValue result;
        if (ExecuteCommands(it->second, request, result)) {
            return result;
        }
    }
    throw JSONRPCError(RPC_METHOD_NOT_FOUND, "Method not found");
}
// src/rpc/register.h
static inline void RegisterAllCoreRPCCommands(CRPCTable &t)
{
    RegisterBlockchainRPCCommands(t);
    RegisterNetRPCCommands(t);
    RegisterMiscRPCCommands(t);
    RegisterMiningRPCCommands(t);
    RegisterRawTransactionRPCCommands(t);
}

The simplest way to make requests to the RPC Server is through bitcoin-cli. But it can also be done using cURL or any programming language like Python, Java, Go, and C#.

wallet/

Bitcoin wallets have three main functions: key management, persistence, and transaction construction.

Regarding key management, Bitcoin Core v0.21 has introduced a new type of wallet - Descriptor Wallets, which store scriptPubKey information using output descriptors. This contrasts with the Legacy Wallet structure, where keys are used to implicitly generate scriptPubKeys and addresses. The Descriptor Wallets function has been added in PR #16528.

When the wallet is created, two methods can be called to set a seed to the wallet: LegacyScriptPubKeyMan::SetupGeneration(…​) if the wallet is legacy type or CWallet::SetupDescriptorScriptPubKeyMans(…​) if the wallet is descriptor type.

The CExtKey master_key is used to build the descriptor for each output type (LEGACY, P2SH_SEGWIT and BECH32) in the method DescriptorScriptPubKeyMan::SetupDescriptorGeneration(…​). It is done for both internal and external derivation paths.
And finally, the DescriptorScriptPubKeyMan::TopUp(…​) method is called to generate the wallet’s addresses. The number of addresses that will be generated and stored is defined by -keypool argument or by the default value DEFAULT_KEYPOOL_SIZE (which is 1000).

// src/wallet/wallet.cpp
void CWallet::SetupDescriptorScriptPubKeyMans()
{
    if (!IsWalletFlagSet(WALLET_FLAG_EXTERNAL_SIGNER)) {
        CKey seed_key;
        seed_key.MakeNewKey(true);
        CPubKey seed = seed_key.GetPubKey();
        assert(seed_key.VerifyPubKey(seed));

        // Get the extended key
        CExtKey master_key;
        master_key.SetSeed(seed_key.begin(), seed_key.size());

        for (bool internal : {false, true}) {
            for (OutputType t : OUTPUT_TYPES) {
                // ...
                spk_manager->SetupDescriptorGeneration(master_key, t);
                // ...
            }
        }
        // ...
    }
}

// src/wallet/scriptpubkeyman.cpp
bool DescriptorScriptPubKeyMan::SetupDescriptorGeneration()
{
    // ...
    // Build descriptor string
    std::string desc_prefix;
    std::string desc_suffix = "/*)";
    switch (addr_type) {
    case OutputType::LEGACY: {
        desc_prefix = "pkh(" + xpub + "/44'";
        break;
    }
    case OutputType::P2SH_SEGWIT: {
        desc_prefix = "sh(wpkh(" + xpub + "/49'";
        desc_suffix += ")";
        break;
    }
    case OutputType::BECH32: {
        desc_prefix = "wpkh(" + xpub + "/84'";
        break;
    }
    }
    // ...
}

LegacyScriptPubKeyMan::SetupGeneration(…​) creates a new seed with GenerateNewSeed() and then sets it as the root seed for an HD wallet with SetHDSeed(…​). NewKeyPool() calls LegacyScriptPubKeyMan::TopUp(…​) to generate the wallet’s addresses. As with the descriptor wallets, the number of addresses that will be generated is defined by -keypool or by DEFAULT_KEYPOOL_SIZE.

// src/wallet/wallet.cpp
bool LegacyScriptPubKeyMan::SetupGeneration(bool force)
{
    if ((CanGenerateKeys() && !force) || m_storage.IsLocked()) {
        return false;
    }

    SetHDSeed(GenerateNewSeed());
    if (!NewKeyPool()) {
        return false;
    }
    return true;
}

bool LegacyScriptPubKeyMan::TopUp(unsigned int kpSize)
{
    // ...
    unsigned int nTargetSize;
    if (kpSize > 0)
        nTargetSize = kpSize;
    else
        nTargetSize = std::max(gArgs.GetArg("-keypool", DEFAULT_KEYPOOL_SIZE), (int64_t) 0);
    //..
    int64_t missingExternal = std::max(std::max((int64_t) nTargetSize, (int64_t) 1) - (int64_t)setExternalKeyPool.size(), (int64_t) 0);
    int64_t missingInternal = std::max(std::max((int64_t) nTargetSize, (int64_t) 1) - (int64_t)setInternalKeyPool.size(), (int64_t) 0);
    for (int64_t i = missingInternal + missingExternal; i--;)
    {
        // ...
        CPubKey pubkey(GenerateNewKey(batch, m_hd_chain, internal));
        AddKeypoolPubkeyWithDB(pubkey, internal, batch);
    }
}

Note that both TopUp() methods (legacy and descriptors) call a databse function to store the public key or the descriptor (AddKeypoolPubkeyWithDB(…​) and AddDescriptorKeyWithDB(…​) respectively).
This is another wallet function: persistence. Wallets should be able to store the addresses, the coins, transactions history, and so on.

The legacy wallet uses Berkeley DB 4.8, which was released in 2010. This version is more than 10 years old. As Descriptor Wallets are a whole new type of wallet which is backward incompatible, a new database backend was introduced. So, Descriptor Wallets use SQLite as the database backend instead of Berkeley DB, which is still used for Legacy Wallets. SQLite backend was implemented in PR #19077.

The src/wallet/walletdb.{h,cpp} file is for higher-level database read/write/erase operations. The WalletBatch class accesses the wallet database. It opens the database and provides read and write access to it.

src/wallet/db.{cpp,h} is for the low-level interaction with bdb or sqlite (e.g., setting up environment, opening and closing database, batch writes, etc). src/wallet/bdb.{h,cpp} handles Berkeley DB 4.8 functions and the src/wallet/sqlite.{h,cpp} handles SQLite functions.

In order to be able to comunicate with both database, WalletBatch has two fields: std::unique_ptr<DatabaseBatch> m_batch and WalletDatabase& m_database.
WalletDatabase is an interface that represents an instance of a database and it is implemented by BerkeleyDatabase and SQLiteDatabase classes.
But the database is not accessed directly. The access is done through another interface called DatabaseBatch, which is the interface that provides access to the database. It is implemented by BerkeleyBatch and SQLiteBatch classes.

class WalletBatch
{
    // ...
private:
    std::unique_ptr<DatabaseBatch> m_batch;
    WalletDatabase& m_database;
};

The other wallet function is the ability to create transactions. It is done by the method CWallet::CreateTransaction(…​), which is basically a wrapper for CreateTransactionInternal(…​).

The creation of a transaction can be divided into 3 main steps: selecting the UTXOs (coin selection) that meet the payment amount, signing them, and broadcasting the transaction over the network.

The coin selection is done by CWallet::SelectCoins(…​). By preference, coins with more confirmations are chosen. The actual logic for selecting which UTXOs to use is in src/wallet/coinselection.cpp, which implements the branch and bound algorithm in the SelectCoinsBnB(…​) method. If that fails, the KnapsackSolver algorithm (KnapsackSolver((…​)) is used as a fallback method.

Manual coin selection is also possible. If the user has chosen manual selection, the method CCoinControl::HasSelected() will return true.

The signing is one of the last steps in CreateTransactionInternal(…​). This is done by calling CWallet::SignTransaction(…​). The descriptor and legacy wallets have different methods for obtaining the signing provider (the private key). Both end up calling bool src/script/sign.cpp:SignTransaction(…​) to get the signature.

The last step - broadcasting the transaction - is initiated by CWallet::CommitTransaction(…​). This method is used both by GUI and RPC commands after the transaction is created and it adds the transaction to the wallet and marks the used coins as spent.

But to broadcast the transaction, the wallet does not call the node directly due to separation of concerns, which was presented in the interface/ section. It calls the ChainImpl::broadcastTransaction interface method, which then calls the BroadcastTransaction(…​) node method.

BroadcastTransaction(…​) calls AcceptToMemoryPool(…​) to add the transaction to mempool and PeerManagerImpl::RelayTransaction to propagate the transaction to the connected peers.

miner.{h,cpp}

This region includes utilities for generating blocks to be mined. It is used in conjunction with rpc/mining.cpp by miners.

One of the RPC commands used by pools is getblocktemplate which sends the block structure and lets the miner to (optionally) customize and assemble it. This block template can be distributed by a mining pool so that all participants can work on the same block. It has been implemented in PR #936.

Another common command is submitblock() which, as the name implies, tries to submit a new block. In order to do it, the ChainstateManager::ProcessNewBlock(…​) is used. Note that submitblock_StateCatcher is registered as a notification interface in order to get the BlockValidationState state.

static RPCHelpMan submitblock()
{
    // ...
    bool new_block;
    auto sc = std::make_shared<submitblock_StateCatcher>(block.GetHash());
    RegisterSharedValidationInterface(sc);
    bool accepted = EnsureChainman(request.context).ProcessNewBlock(Params(), blockptr, /* fForceProcessing */ true, /* fNewBlock */ &new_block);
    UnregisterSharedValidationInterface(sc);
    if (!new_block && accepted) {
        return "duplicate";
    }
    if (!sc->found) {
        return "inconclusive";
    }
    return BIP22ValidationResult(sc->state);
};

Summary

Bitcoin Core has been the reference implementation since its first version. It is a solution that includes a node, a graphical interface, and a command-line interface.

The Bitcoin protocol has two different core concepts: the node and the wallet. Ideally, they are separate codebases, but this is not how it was originally implemented in Bitcoin Core.

There is a project in progress called Process Separation, which implements this separation of functions in the Bitcoin Core software.

In order to be able to perform several activities simultaneously, a multithreading environment is required. For example, there are threads to query the DNS seeds, connect to peers, process incoming messages, and so on.

Dividing the code in Regions provides a high-level view of which parts of the system perform specific tasks.