Who Am I

Valentin Galea - @valentin_galea
more than 10 years doing mobile, indie and AAA games

currently at Splash Damage in London, UK

Disclaimer

Games coder
I bought some Bitcoin in the 2017 craze
I analyzed a bunch of times the source code

Therefore I am perfectly suited to give this talk! :)

No investment advice!
No technical how-to!
Mostly C++ trivia!

Agenda

Bitcoin overview and history
‎Bitcoin Core
‎C++ overview and stats
‎Usage of STL and Boost
Serialization
Data structures

Bitcoin - Summary

First cryptocurrency, most well known
Leading the digital currency revolution
Meteoric rise in price and popularity as of 2017, along with others

Bitcoin - What Is It

decentralized, peer-to-peer digital currency
invented the public distributed ledger - the blockchain
solves the "double spending" problem
achieves secure transactions in an unsecure environment
utilizes a proof-of-work algorithm ("mining") to help achieve all this

Bitcoin - Overview

Bitcoin - History

Introduced by the unknown Satoshi Nakamoto in 2009
- in the paper "Bitcoin: A Peer-to-Peer Electronic Cash System"
- a reference C++ implementation
Fully open source
- bitcoin.org project
- https://github.com/bitcoin/bitcoin
Satoshi disappeared around 2010, other devs continued
It inspired and allowed the birth of other crypto currencies
Evolved economically
- from everyday people running the software on their PC’s
- to large conglomerate mining pools in China!

Bitcoin - Example

Alice wants to pay Bob 1 BTC
Alice doesn’t own any bitcoin physically, her coins are scattered across the network as unspent transactions
She does own cryptographic keys for signing or generating payments
She uses a wallet app that monitors the network, tracks her balance and creates a new transaction signed with Bob’s public key
The wallet sends it into the p2p node network where it will float around
A miner will pick it up, include it with others in a block and it will begin Proof-Of-Work to sign the block
- this usually takes 10 min, and the difficulty is adjusted by consensus to keep that true
When it’s successful (or when others are) the new block is broadcast to the network and added to the blockchain
- fees are collected by the miner as well as newly minted BTC as reward
Bob’s wallet app sees the new confirmed transaction. Bob has now unspent coins he can redeem later!

Bitcoin Core

Bitcoin Core (continued)

the reference implementation of the bitcoin system
originally started by Satoshi as companion to his paper
implements all aspects of the system: transactions, validation, network peer to peer, etc
GUI written in QT
- CLI interface as well
has wallet to "store" bitcoin but it’s not recommended to use
PoW algorithms and it can mine but again not recommended
lots of other coins are adapted from it: Litecoin, Verge, ZCash, Dash, Doge, Qtum, etc

The focus of this talk

C++ - Overview

https://github.com/bitcoin/bitcoin
- 16K commits since 2009
- 500+ contributors, 60+ peek
- approx. 100k SLOC ¹

C++ - Overview (continued)

...
-a----        14-Jan-18   3:30 PM           6353 bitcoind.cpp
-a----        14-Jan-18   3:30 PM          10559 blockencodings.cpp
-a----        14-Jan-18   3:30 PM           7320 blockencodings.h
-a----        14-Jan-18   3:30 PM          11457 bloom.cpp
-a----        14-Jan-18   3:30 PM           5496 bloom.h
-a----        14-Jan-18   3:30 PM           5750 chain.cpp
-a----        14-Jan-18   3:30 PM          16257 chain.h
-a----        14-Jan-18   3:30 PM          19276 chainparams.cpp
-a----        14-Jan-18   3:30 PM           4574 chainparams.h
-a----        14-Jan-18   3:30 PM           2857 chainparamsbase.cpp
-a----        14-Jan-18   3:30 PM           1944 chainparamsbase.h
-a----        14-Jan-18   3:30 PM         140338 chainparamsseeds.h
-a----        14-Jan-18   3:30 PM            880 checkpoints.cpp
-a----        14-Jan-18   3:30 PM            689 checkpoints.h
-a----        14-Jan-18   3:30 PM           7020 checkqueue.h
-a----        14-Jan-18   3:30 PM           3894 clientversion.cpp
-a----        14-Jan-18   3:30 PM           1955 clientversion.h
-a----        14-Jan-18   3:30 PM          10919 coins.cpp
-a----        14-Jan-18   3:30 PM          11020 coins.h
...

C++ - Overview (continued)

relatively flat structure, most things are split in a .h/.cpp pair
initially a lot of functionality was buried into a giant main.cpp file
- with time that has been refactored out
primarily relies on STL and Boost and some other embedded libs like:
- Qt for UI
- leveldb for key-value storage
- secp256k1 for low-level crypto ops
quite well commented - in Doxygen format
multiplatform - with macro magic compatibility glue layer
MIT license

C++ - `class` Design

Design and tone largely set by Satoshi’s original implementation

straightforward C++ class design
- little polymorphism
templates usually only for container-like things or helper functions
- no TMP
RAII used for wrappers over synchronization primitives and files

C++ - `class` Design (continued)

A Null-ify technique is used to complement constructors:

class CBlockHeader
{
public:
    int32_t nVersion;
    uint256 hashPrevBlock;
    uint256 hashMerkleRoot;
    uint32_t nTime;
    uint32_t nBits;
    uint32_t nNonce;

    CBlockHeader()
    {
        SetNull();
    }

C++ - `class` Design (continued)

    void SetNull()
    {
        nVersion = 0;
        hashPrevBlock.SetNull();
        hashMerkleRoot.SetNull();
        nTime = 0;
        nBits = 0;
        nNonce = 0;
    }

    bool IsNull() const
    {
        return (nBits == 0);
    }

not that much used, seems relic from early days
in some instances not all members are cleared

Modern C++

Occurrences of C++11/14 specific constructs in all the files

	Bitcoin	Ethereum	Ripple
Files	659	477	3672
`auto`	12%	36%	36%
`std::move`	7%	7%	13%
`override`	6%	13%	19%
`static_assert`	2%	4%	3%
lambda expressions	2%	13%	11%
`std::enable_if`	none	one file	1%

C++ Extensions

clang

Thread Safety Analysis
- attribute decorations for static analysis of race conditions
- macro wrappers for compatibility:
  - GUARDED_BY(x), SCOPED_LOCKABLE, SHARED_LOCKS_REQUIRED etc
- see more: https://clang.llvm.org/docs/ThreadSafetyAnalysis.html

GCC

Just a couple of __attribute__ for various hints

Deterministic Build

In order to increase the confidence of packaged binaries they are built deterministically

That means that the source code is handled in such a way that it always produces the same binary no matter the triggering conditions/environment

People are encouraged to build their own using a controlled environment (usually a VM with special scripts) rather than rely on packaged distributions in the wild

More info: https://gitian.org/

STL

Major usage: 60% of the files

std::vector major work horse
- used in 1/3 of files
- used vanilla, with no custom allocation
std::string gets major usage as well
the various flavours of map see good usage
std::runtime_error
- primary exception handler

STL - Examples

The blockchain in memory:

class CChain {
private:
    std::vector<CBlockIndex*> vChain;
/* ... */

The unspent coins database:

std::unordered_map<COutPoint, CCoinsCacheEntry, SaltedOutpointHasher> CCoinsMap;

STL - Allocators

No custom memory management allocator is used. Instead custom allocators are used to enforce security:

zero_after_free_allocator
- simple std::allocator wrapper that zeroes the memory when it gets released so it’s harder to snoop
secure_allocator
- zeroes the released memory but it also keeps it locked and not paged to disk, to discourage attacks

Boost

Present in about 20% of the files

A lot of the usage is due to code predating C++11 adoption, before Boost constructs made it into the standard, for ex:

call_once, thread, mutex, unique_lock
filesystem,
chrono, etc

signals and bind prevalent in the Qt UI code

Testing handled with the Boost Unit Test framework

Boost - Example

boost::multi_index used to store organized transaction data:

typedef boost::multi_index_container<
    CTransactionRef,
    boost::multi_index::indexed_by<
        // sorted by txid
        boost::multi_index::hashed_unique<
            boost::multi_index::tag<txid_index>,
            mempoolentry_txid,
            SaltedTxidHasher
        >,
        // sorted by order in the blockchain
        boost::multi_index::sequenced<
            boost::multi_index::tag<insertion_order>
        >
    >
> indexed_disconnected_transactions;

Serialization

A reflection mechanism that helps with loading/saving/transfer of objects

To automate work, every class can declare which members gets serialized/deserialized

This is implemented via a combination of template-ed helper functions and macro glue:

Serialization - Example

class CBlockFileInfo
{
public:
    unsigned int nBlocks;      //!< number of blocks stored in file
    unsigned int nSize;        //!< number of used bytes of block file
    /* ... */
    uint64_t nTimeLast;        //!< latest time of block in file

    ADD_SERIALIZE_METHODS;

    template <typename Stream, typename Operation>
    inline void SerializationOp(Stream& s, Operation ser_action) {
        READWRITE(VARINT(nBlocks));
        READWRITE(VARINT(nSize));
        /* ... */
        READWRITE(VARINT(nTimeLast));
    }

Serialization - ADD macro

/**
 * Implement three methods for serializable objects. These are actually wrappers over
 * "SerializationOp" template, which implements the body of each class' serialization
 * code. Adding "ADD_SERIALIZE_METHODS" in the body of the class causes these wrappers to be
 * added as members.
 */
#define ADD_SERIALIZE_METHODS                                         \
    template<typename Stream>                                         \
    void Serialize(Stream& s) const {                                 \
        NCONST_PTR(this)->SerializationOp(s, CSerActionSerialize());  \
    }                                                                 \
    template<typename Stream>                                         \
    void Unserialize(Stream& s) {                                     \
        SerializationOp(s, CSerActionUnserialize());                  \
    }

Serialization - Macro magic

Within the body of SerializationOp the READWRITE generic macro is used. It will expand differently depending if a read(unserialize) or write(serialize) is taking place

#define READWRITE(obj)      (::SerReadWrite(s, (obj), ser_action))

template<typename Stream, typename T>
inline void SerReadWrite(Stream& s, const T& obj, CSerActionSerialize ser_action)
{
    ::Serialize(s, obj);
}

template<typename Stream, typename T>
inline void SerReadWrite(Stream& s, T& obj, CSerActionUnserialize ser_action)
{
    ::Unserialize(s, obj);
}

Serialization - Template helpers - basic types

template<typename Stream> inline void Serialize(Stream& s, char a    ) { ser_writedata8(s, a); } // TODO Get rid of bare char
template<typename Stream> inline void Serialize(Stream& s, int8_t a  ) { ser_writedata8(s, a); }
template<typename Stream> inline void Serialize(Stream& s, uint8_t a ) { ser_writedata8(s, a); }
/* ... */
template<typename Stream> inline void Serialize(Stream& s, uint64_t a) { ser_writedata64(s, a); }
template<typename Stream> inline void Serialize(Stream& s, float a   ) { ser_writedata32(s, ser_float_to_uint32(a)); }
template<typename Stream> inline void Serialize(Stream& s, double a  ) { ser_writedata64(s, ser_double_to_uint64(a)); }

template<typename Stream> inline void Unserialize(Stream& s, char& a    ) { a = ser_readdata8(s); } // TODO Get rid of bare char
template<typename Stream> inline void Unserialize(Stream& s, int8_t& a  ) { a = ser_readdata8(s); }
template<typename Stream> inline void Unserialize(Stream& s, uint8_t& a ) { a = ser_readdata8(s); }
/* ... */
template<typename Stream> inline void Unserialize(Stream& s, uint64_t& a) { a = ser_readdata64(s); }
template<typename Stream> inline void Unserialize(Stream& s, float& a   ) { a = ser_uint32_to_float(ser_readdata32(s)); }
template<typename Stream> inline void Unserialize(Stream& s, double& a  ) { a = ser_uint64_to_double(ser_readdata64(s)); }

Serialization - Template helpers - std::pair

template<typename Stream, typename K, typename T>
void Serialize(Stream& os, const std::pair<K, T>& item)
{
    Serialize(os, item.first);
    Serialize(os, item.second);
}

template<typename Stream, typename K, typename T>
void Unserialize(Stream& is, std::pair<K, T>& item)
{
    Unserialize(is, item.first);
    Unserialize(is, item.second);
}

Serialization - Template helpers - std::vector

template<typename Stream, typename T, typename A, typename V>
void Serialize_impl(Stream& os, const std::vector<T, A>& v, const V&)
{
    WriteCompactSize(os, v.size());
    for (typename std::vector<T, A>::const_iterator vi = v.begin(); vi != v.end(); ++vi)
        ::Serialize(os, (*vi));
}

Serialization - Template helpers - std::vector (continued)

template<typename Stream, typename T, typename A, typename V>
void Unserialize_impl(Stream& is, std::vector<T, A>& v, const V&)
{
    v.clear();
    unsigned int nSize = ReadCompactSize(is);
    unsigned int i = 0;
    unsigned int nMid = 0;
    while (nMid < nSize)
    {
        nMid += 5000000 / sizeof(T);
        if (nMid > nSize)
            nMid = nSize;
        v.resize(nMid);
        for (; i < nMid; i++)
            Unserialize(is, v[i]);
    }
}

Serialization - Trivia

a cheeky hack:

/**
 * Used to bypass the rule against non-const reference to temporary
 * where it makes sense with wrappers such as CFlatData or CTxDB
 */
template<typename T>
inline T& REF(const T& val)
{
    return const_cast<T&>(val);
}

#define FLATDATA(obj) REF(CFlatData((char*)&(obj), (char*)&(obj) + sizeof(obj)))
/* ... */
unsigned char ip[16];
/* ... */
READWRITE(FLATDATA(ip));

Custom Data Structures

Interesting data structures that are relatively isolated and reusable

I/O

CDataStream - relatively thin abstraction over a secure std::vector of bytes
- used as the backbone of serialization
CAutoFile - non-refcounted RAII wrapper for FILE*
CBufferedFile - same as above
- with ring buffer support

JSON / RPC

JSON is used everywhere for a RPC driven communication layer

UniValue getblockcount(const JSONRPCRequest& request)
{
    if (request.fHelp || request.params.size() != 0)
        throw std::runtime_error(
            "getblockcount\n"
            "\nReturns the number of blocks in the longest blockchain.\n"
            "\nResult:\n"
            "n    (numeric) The current block count\n"
            "\nExamples:\n"
            + HelpExampleCli("getblockcount", "")
            + HelpExampleRpc("getblockcount", "")
        );

    LOCK(cs_main);
    return chainActive.Height();
}

JSON / UniValue

Variant like structure that represents a JSON object value

Stores key/values as a vector of std::string

UniValue entry(UniValue::VOBJ);
entry.pushKV("txid", tx.GetHash().GetHex());
entry.pushKV("hash", tx.GetWitnessHash().GetHex());
entry.pushKV("version", tx.nVersion);

According to the README: "[it] minimizes template use (contra json_spirit)"

prevector

Drop in replacement for std::vector that stores the first N elements in-place

An interesting mix of the standard array and a dynamic vector

size_type _size;
union direct_or_indirect {
    char direct[sizeof(T) * N];
    struct {
        size_type capacity;
        char* indirect;
    };
} _union;

elements must be POD that can be realloc-ed
written in STL style, has internal iterator and the reverse, const variants
support for move semantics by just std::swap-ing the union

prevector - usage

Only usage case is for storing the transaction script opcodes where apparently:

We use a prevector for the script to reduce the considerable memory overhead
of vectors in cases where they normally contain a small number of small elements.
Tests in October 2015 showed use of this reduced dbcache memory usage by 23%
and made an initial sync 13% faster.

CVarInt

A quite simple form of encoding integers depending on their numeric range, in order to save on space/bandwidth

Range	Encoding in bytes
0 .. 252	XX
253 .. 2¹⁶ - 1	0xFD XX XX
2¹⁶ .. 2³² – 1	0xFE XX XX XX XX
2³² .. 2⁶⁴ – 1	0xFF XX XX XX XX XX XX XX XX

good for things that have low values most of the time, for everything else it wastes bytes
there are other better techniques - see https://www.deadalnix.me/2017/01/08/variable-size-integer-encoding/

std::map variations

A couple of small convenience driven modifications:

indirectmap
- stores pointers to elements but offers utility member functions that work with the element type directly
- one time usage in the transaction pool
limitedmap
- map that only stores the N highest values inserted
- one time usage as a sort of priority queue

CuckooCache

Unique Set data structure based on the principles of the cuckoo hash map

Used to avoid double checking transactions - once for the mempool and the other time for the blocks. Replaced a boost::unique_set as optimization in Oct 2016

elements are stored in std::vector and a series of hash functions spread them around
constat-time find
lockfree erase
- next insert will GC

CuckooCache - Hashing

8 way hashing is used to better distribute elements in buckets

An interesting technique is used to avoid the need of a modulus when mapping a random 32 bit number to a fixed N: https://lemire.me/blog/2016/06/27/a-fast-alternative-to-the-modulo-reduction/

inline std::array<uint32_t, 8> compute_hashes(const Element& e) const
{
    return {{(uint32_t)((hash_function.template operator()<0>(e) * (uint64_t)size) >> 32),
                (uint32_t)((hash_function.template operator()<1>(e) * (uint64_t)size) >> 32),
                (uint32_t)((hash_function.template operator()<2>(e) * (uint64_t)size) >> 32),
                (uint32_t)((hash_function.template operator()<3>(e) * (uint64_t)size) >> 32),
                (uint32_t)((hash_function.template operator()<4>(e) * (uint64_t)size) >> 32),
                (uint32_t)((hash_function.template operator()<5>(e) * (uint64_t)size) >> 32),
                (uint32_t)((hash_function.template operator()<6>(e) * (uint64_t)size) >> 32),
                (uint32_t)((hash_function.template operator()<7>(e) * (uint64_t)size) >> 32)}};
}

BloomFilter

Storing the entire blockchain is not trivial: approx. 155 GB Jan 2018

Lightweight clients use a Bloom filter data structure to optimize validity checking before getting full blocks

BloomFilter (continued)

minimal and straightforward C++ implementation

for (unsigned int i = 0; i < nHashFuncs; i++)
{
    unsigned int nIndex = Hash(i, vKey);
    vData[nIndex >> 3] |= (1 << (7 & nIndex));
}

the hashing function:

return MurmurHash3(nHashNum * 0xFBA4C795 + nTweak, vDataToHash) % (vData.size() * 8);

memusage

A sort of generalized sizeof to measure dynamic memory usage for structures

Consists of a large overload set for the function DynamicUsage<T>:

/** Dynamic memory usage for built-in types is zero. */
static inline size_t DynamicUsage(const int8_t& v) { return 0; }
static inline size_t DynamicUsage(const uint8_t& v) { return 0; }
/* ... */

template<typename X>
static inline size_t DynamicUsage(const std::vector<X>& v)
{
    return MallocUsage(v.capacity() * sizeof(X));
}
/* ... */

memusage (continued)

For STL types they shadow the underlying type structures:

template<typename X>
struct stl_tree_node
{
private:
    int color;
    void* parent;
    void* left;
    void* right;
    X x;
};

template<typename X, typename Y, typename Z>
static inline size_t DynamicUsage(const std::map<X, Y, Z>& m)
{
    return MallocUsage(sizeof(stl_tree_node<std::pair<const X, Y> >)) * m.size();
}

MallocUsage is just a small utility function that accounts for alignment

The End

Thank you!

Valentin Galea - @valentin_galea
http://valentingalea.github.io/research/bitcoin/

Attributions

made with Asciidoc
uses GNU Source-highlight
uses Slidy2 plugin
bitcoin price graph https://www.buybitcoinworldwide.com/price/
"Mastering Bitcoin" https://github.com/bitcoinbook/bitcoinbook
Cuckoo hashing image by Rasmus Pagh - File:Cuckoo.png, CC BY-SA 3.0, https://commons.wikimedia.org/w/index.php?curid=20556837
Bloom filter image By david Eppstein - self-made, originally for a talk at WADS 2007, Public Domain, https://commons.wikimedia.org/w/index.php?curid=2609777
all other images under "Fair Use"/"Fair Dealing"

Bitcoin Code Review: How C++ Powers The Biggest Cryptocurrency

Who Am I

Disclaimer

Agenda

Bitcoin - Summary

Bitcoin - What Is It

Bitcoin - Overview

Bitcoin - History

Bitcoin - Example

Bitcoin Core

Bitcoin Core (continued)

C++ - Overview

C++ - Overview (continued)

C++ - Overview (continued)

C++ - class Design

C++ - class Design (continued)

C++ - class Design (continued)

Modern C++

C++ Extensions

Deterministic Build

STL

STL - Examples

STL - Allocators

Boost

Boost - Example

Serialization

Serialization - Example

Serialization - ADD macro

Serialization - Macro magic

Serialization - Template helpers - basic types

Serialization - Template helpers - std::pair

Serialization - Template helpers - std::vector

Serialization - Template helpers - std::vector (continued)

Serialization - Trivia

Custom Data Structures

I/O

JSON / RPC

JSON / UniValue

prevector

prevector - usage

CVarInt

std::map variations

CuckooCache

CuckooCache - Hashing

BloomFilter

BloomFilter (continued)

memusage

memusage (continued)

The End

Attributions

C++ - `class` Design

C++ - `class` Design (continued)

C++ - `class` Design (continued)