Who Am I

splash
  • Valentin Galea - @valentin_galea
  • more than 10 years doing mobile, indie and AAA games
Games worked on
  • currently at Splash Damage in London, UK

Disclaimer

  • Games coder
  • I bought some Bitcoin in the 2017 craze
  • I analyzed a bunch of times the source code

Therefore I am perfectly suited to give this talk! :)

  • No investment advice!
  • No technical how-to!
  • Mostly C++ trivia!

Agenda

  • Bitcoin overview and history
  • ‎Bitcoin Core
  • ‎C++ overview and stats
  • ‎Usage of STL and Boost
  • Serialization
  • Data structures

Bitcoin - Summary

  • First cryptocurrency, most well known
  • Leading the digital currency revolution
  • Meteoric rise in price and popularity as of 2017, along with others
price graph

Bitcoin - What Is It

  • decentralized, peer-to-peer digital currency
  • invented the public distributed ledger - the blockchain
  • solves the "double spending" problem
  • achieves secure transactions in an unsecure environment
  • utilizes a proof-of-work algorithm ("mining") to help achieve all this

Bitcoin - Overview

network

Bitcoin - History

  • Introduced by the unknown Satoshi Nakamoto in 2009
    • in the paper "Bitcoin: A Peer-to-Peer Electronic Cash System"
    • a reference C++ implementation
  • Fully open source
  • Satoshi disappeared around 2010, other devs continued
  • It inspired and allowed the birth of other crypto currencies
  • Evolved economically
    • from everyday people running the software on their PC’s
    • to large conglomerate mining pools in China!

Bitcoin - Example

  1. Alice wants to pay Bob 1 BTC
  2. Alice doesn’t own any bitcoin physically, her coins are scattered across the network as unspent transactions
  3. She does own cryptographic keys for signing or generating payments
  4. She uses a wallet app that monitors the network, tracks her balance and creates a new transaction signed with Bob’s public key
  5. The wallet sends it into the p2p node network where it will float around
  6. A miner will pick it up, include it with others in a block and it will begin Proof-Of-Work to sign the block
    • this usually takes 10 min, and the difficulty is adjusted by consensus to keep that true
  7. When it’s successful (or when others are) the new block is broadcast to the network and added to the blockchain
    • fees are collected by the miner as well as newly minted BTC as reward
  8. Bob’s wallet app sees the new confirmed transaction. Bob has now unspent coins he can redeem later!

Bitcoin Core

bitcoin core diagram

Bitcoin Core (continued)

  • the reference implementation of the bitcoin system
  • originally started by Satoshi as companion to his paper
  • implements all aspects of the system: transactions, validation, network peer to peer, etc
  • GUI written in QT
    • CLI interface as well
  • has wallet to "store" bitcoin but it’s not recommended to use
  • PoW algorithms and it can mine but again not recommended
  • lots of other coins are adapted from it: Litecoin, Verge, ZCash, Dash, Doge, Qtum, etc

The focus of this talk

C++ - Overview

contrib graph

C++ - Overview (continued)

...
-a----        14-Jan-18   3:30 PM           6353 bitcoind.cpp
-a----        14-Jan-18   3:30 PM          10559 blockencodings.cpp
-a----        14-Jan-18   3:30 PM           7320 blockencodings.h
-a----        14-Jan-18   3:30 PM          11457 bloom.cpp
-a----        14-Jan-18   3:30 PM           5496 bloom.h
-a----        14-Jan-18   3:30 PM           5750 chain.cpp
-a----        14-Jan-18   3:30 PM          16257 chain.h
-a----        14-Jan-18   3:30 PM          19276 chainparams.cpp
-a----        14-Jan-18   3:30 PM           4574 chainparams.h
-a----        14-Jan-18   3:30 PM           2857 chainparamsbase.cpp
-a----        14-Jan-18   3:30 PM           1944 chainparamsbase.h
-a----        14-Jan-18   3:30 PM         140338 chainparamsseeds.h
-a----        14-Jan-18   3:30 PM            880 checkpoints.cpp
-a----        14-Jan-18   3:30 PM            689 checkpoints.h
-a----        14-Jan-18   3:30 PM           7020 checkqueue.h
-a----        14-Jan-18   3:30 PM           3894 clientversion.cpp
-a----        14-Jan-18   3:30 PM           1955 clientversion.h
-a----        14-Jan-18   3:30 PM          10919 coins.cpp
-a----        14-Jan-18   3:30 PM          11020 coins.h
...

C++ - Overview (continued)

  • relatively flat structure, most things are split in a .h/.cpp pair
  • initially a lot of functionality was buried into a giant main.cpp file
    • with time that has been refactored out
  • primarily relies on STL and Boost and some other embedded libs like:
    • Qt for UI
    • leveldb for key-value storage
    • secp256k1 for low-level crypto ops
  • quite well commented - in Doxygen format
  • multiplatform - with macro magic compatibility glue layer
  • MIT license

C++ - class Design

Design and tone largely set by Satoshi’s original implementation

  • straightforward C++ class design
    • little polymorphism
  • templates usually only for container-like things or helper functions
    • no TMP
  • RAII used for wrappers over synchronization primitives and files

C++ - class Design (continued)

A Null-ify technique is used to complement constructors:

class CBlockHeader
{
public:
    int32_t nVersion;
    uint256 hashPrevBlock;
    uint256 hashMerkleRoot;
    uint32_t nTime;
    uint32_t nBits;
    uint32_t nNonce;

    CBlockHeader()
    {
        SetNull();
    }

C++ - class Design (continued)

    void SetNull()
    {
        nVersion = 0;
        hashPrevBlock.SetNull();
        hashMerkleRoot.SetNull();
        nTime = 0;
        nBits = 0;
        nNonce = 0;
    }

    bool IsNull() const
    {
        return (nBits == 0);
    }
  • not that much used, seems relic from early days
  • in some instances not all members are cleared

Modern C++

Occurrences of C++11/14 specific constructs in all the files

Bitcoin Ethereum Ripple

Files

659

477

3672

auto

12%

36%

36%

std::move

7%

7%

13%

override

6%

13%

19%

static_assert

2%

4%

3%

lambda expressions

2%

13%

11%

std::enable_if

none

one file

1%

C++ Extensions

clang
GCC
  • Just a couple of __attribute__ for various hints

Deterministic Build

In order to increase the confidence of packaged binaries they are built deterministically

That means that the source code is handled in such a way that it always produces the same binary no matter the triggering conditions/environment

People are encouraged to build their own using a controlled environment (usually a VM with special scripts) rather than rely on packaged distributions in the wild

More info: https://gitian.org/

STL

Major usage: 60% of the files

  • std::vector major work horse
    • used in 1/3 of files
    • used vanilla, with no custom allocation
  • std::string gets major usage as well
  • the various flavours of map see good usage
  • std::runtime_error
    • primary exception handler

STL - Examples

The blockchain in memory:

class CChain {
private:
    std::vector<CBlockIndex*> vChain;
/* ... */

The unspent coins database:

std::unordered_map<COutPoint, CCoinsCacheEntry, SaltedOutpointHasher> CCoinsMap;

STL - Allocators

No custom memory management allocator is used. Instead custom allocators are used to enforce security:

  • zero_after_free_allocator
    • simple std::allocator wrapper that zeroes the memory when it gets released so it’s harder to snoop
  • secure_allocator
    • zeroes the released memory but it also keeps it locked and not paged to disk, to discourage attacks

Boost

Present in about 20% of the files

A lot of the usage is due to code predating C++11 adoption, before Boost constructs made it into the standard, for ex:

  • call_once, thread, mutex, unique_lock
  • filesystem,
  • chrono, etc

signals and bind prevalent in the Qt UI code

Testing handled with the Boost Unit Test framework

Boost - Example

boost::multi_index used to store organized transaction data:

typedef boost::multi_index_container<
    CTransactionRef,
    boost::multi_index::indexed_by<
        // sorted by txid
        boost::multi_index::hashed_unique<
            boost::multi_index::tag<txid_index>,
            mempoolentry_txid,
            SaltedTxidHasher
        >,
        // sorted by order in the blockchain
        boost::multi_index::sequenced<
            boost::multi_index::tag<insertion_order>
        >
    >
> indexed_disconnected_transactions;

Serialization

A reflection mechanism that helps with loading/saving/transfer of objects

To automate work, every class can declare which members gets serialized/deserialized

This is implemented via a combination of template-ed helper functions and macro glue:

Serialization - Example

class CBlockFileInfo
{
public:
    unsigned int nBlocks;      //!< number of blocks stored in file
    unsigned int nSize;        //!< number of used bytes of block file
    /* ... */
    uint64_t nTimeLast;        //!< latest time of block in file

    ADD_SERIALIZE_METHODS;

    template <typename Stream, typename Operation>
    inline void SerializationOp(Stream& s, Operation ser_action) {
        READWRITE(VARINT(nBlocks));
        READWRITE(VARINT(nSize));
        /* ... */
        READWRITE(VARINT(nTimeLast));
    }

Serialization - ADD macro

/**
 * Implement three methods for serializable objects. These are actually wrappers over
 * "SerializationOp" template, which implements the body of each class' serialization
 * code. Adding "ADD_SERIALIZE_METHODS" in the body of the class causes these wrappers to be
 * added as members.
 */
#define ADD_SERIALIZE_METHODS                                         \
    template<typename Stream>                                         \
    void Serialize(Stream& s) const {                                 \
        NCONST_PTR(this)->SerializationOp(s, CSerActionSerialize());  \
    }                                                                 \
    template<typename Stream>                                         \
    void Unserialize(Stream& s) {                                     \
        SerializationOp(s, CSerActionUnserialize());                  \
    }

Serialization - Macro magic

Within the body of SerializationOp the READWRITE generic macro is used. It will expand differently depending if a read(unserialize) or write(serialize) is taking place

#define READWRITE(obj)      (::SerReadWrite(s, (obj), ser_action))
template<typename Stream, typename T>
inline void SerReadWrite(Stream& s, const T& obj, CSerActionSerialize ser_action)
{
    ::Serialize(s, obj);
}
template<typename Stream, typename T>
inline void SerReadWrite(Stream& s, T& obj, CSerActionUnserialize ser_action)
{
    ::Unserialize(s, obj);
}

Serialization - Template helpers - basic types

template<typename Stream> inline void Serialize(Stream& s, char a    ) { ser_writedata8(s, a); } // TODO Get rid of bare char
template<typename Stream> inline void Serialize(Stream& s, int8_t a  ) { ser_writedata8(s, a); }
template<typename Stream> inline void Serialize(Stream& s, uint8_t a ) { ser_writedata8(s, a); }
/* ... */
template<typename Stream> inline void Serialize(Stream& s, uint64_t a) { ser_writedata64(s, a); }
template<typename Stream> inline void Serialize(Stream& s, float a   ) { ser_writedata32(s, ser_float_to_uint32(a)); }
template<typename Stream> inline void Serialize(Stream& s, double a  ) { ser_writedata64(s, ser_double_to_uint64(a)); }

template<typename Stream> inline void Unserialize(Stream& s, char& a    ) { a = ser_readdata8(s); } // TODO Get rid of bare char
template<typename Stream> inline void Unserialize(Stream& s, int8_t& a  ) { a = ser_readdata8(s); }
template<typename Stream> inline void Unserialize(Stream& s, uint8_t& a ) { a = ser_readdata8(s); }
/* ... */
template<typename Stream> inline void Unserialize(Stream& s, uint64_t& a) { a = ser_readdata64(s); }
template<typename Stream> inline void Unserialize(Stream& s, float& a   ) { a = ser_uint32_to_float(ser_readdata32(s)); }
template<typename Stream> inline void Unserialize(Stream& s, double& a  ) { a = ser_uint64_to_double(ser_readdata64(s)); }

Serialization - Template helpers - std::pair

template<typename Stream, typename K, typename T>
void Serialize(Stream& os, const std::pair<K, T>& item)
{
    Serialize(os, item.first);
    Serialize(os, item.second);
}
template<typename Stream, typename K, typename T>
void Unserialize(Stream& is, std::pair<K, T>& item)
{
    Unserialize(is, item.first);
    Unserialize(is, item.second);
}

Serialization - Template helpers - std::vector

template<typename Stream, typename T, typename A, typename V>
void Serialize_impl(Stream& os, const std::vector<T, A>& v, const V&)
{
    WriteCompactSize(os, v.size());
    for (typename std::vector<T, A>::const_iterator vi = v.begin(); vi != v.end(); ++vi)
        ::Serialize(os, (*vi));
}

Serialization - Template helpers - std::vector (continued)

template<typename Stream, typename T, typename A, typename V>
void Unserialize_impl(Stream& is, std::vector<T, A>& v, const V&)
{
    v.clear();
    unsigned int nSize = ReadCompactSize(is);
    unsigned int i = 0;
    unsigned int nMid = 0;
    while (nMid < nSize)
    {
        nMid += 5000000 / sizeof(T);
        if (nMid > nSize)
            nMid = nSize;
        v.resize(nMid);
        for (; i < nMid; i++)
            Unserialize(is, v[i]);
    }
}

Serialization - Trivia

  • a cheeky hack:
/**
 * Used to bypass the rule against non-const reference to temporary
 * where it makes sense with wrappers such as CFlatData or CTxDB
 */
template<typename T>
inline T& REF(const T& val)
{
    return const_cast<T&>(val);
}
#define FLATDATA(obj) REF(CFlatData((char*)&(obj), (char*)&(obj) + sizeof(obj)))
/* ... */
unsigned char ip[16];
/* ... */
READWRITE(FLATDATA(ip));

Custom Data Structures

Interesting data structures that are relatively isolated and reusable

I/O

  • CDataStream - relatively thin abstraction over a secure std::vector of bytes
    • used as the backbone of serialization
  • CAutoFile - non-refcounted RAII wrapper for FILE*
  • CBufferedFile - same as above
    • with ring buffer support

JSON / RPC

JSON is used everywhere for a RPC driven communication layer

UniValue getblockcount(const JSONRPCRequest& request)
{
    if (request.fHelp || request.params.size() != 0)
        throw std::runtime_error(
            "getblockcount\n"
            "\nReturns the number of blocks in the longest blockchain.\n"
            "\nResult:\n"
            "n    (numeric) The current block count\n"
            "\nExamples:\n"
            + HelpExampleCli("getblockcount", "")
            + HelpExampleRpc("getblockcount", "")
        );

    LOCK(cs_main);
    return chainActive.Height();
}

JSON / UniValue

Variant like structure that represents a JSON object value

Stores key/values as a vector of std::string

UniValue entry(UniValue::VOBJ);
entry.pushKV("txid", tx.GetHash().GetHex());
entry.pushKV("hash", tx.GetWitnessHash().GetHex());
entry.pushKV("version", tx.nVersion);

According to the README: "[it] minimizes template use (contra json_spirit)"

prevector

Drop in replacement for std::vector that stores the first N elements in-place

An interesting mix of the standard array and a dynamic vector

size_type _size;
union direct_or_indirect {
    char direct[sizeof(T) * N];
    struct {
        size_type capacity;
        char* indirect;
    };
} _union;
  • elements must be POD that can be realloc-ed
  • written in STL style, has internal iterator and the reverse, const variants
  • support for move semantics by just std::swap-ing the union

prevector - usage

Only usage case is for storing the transaction script opcodes where apparently:

We use a prevector for the script to reduce the considerable memory overhead
of vectors in cases where they normally contain a small number of small elements.
Tests in October 2015 showed use of this reduced dbcache memory usage by 23%
and made an initial sync 13% faster.

CVarInt

A quite simple form of encoding integers depending on their numeric range, in order to save on space/bandwidth

Range Encoding in bytes

0 .. 252

XX

253 .. 216 - 1

0xFD XX XX

216 .. 232 – 1

0xFE XX XX XX XX

232 .. 264 – 1

0xFF XX XX XX XX XX XX XX XX

std::map variations

A couple of small convenience driven modifications:

  • indirectmap
    • stores pointers to elements but offers utility member functions that work with the element type directly
    • one time usage in the transaction pool
  • limitedmap
    • map that only stores the N highest values inserted
    • one time usage as a sort of priority queue

CuckooCache

Unique Set data structure based on the principles of the cuckoo hash map

cuckoo

Used to avoid double checking transactions - once for the mempool and the other time for the blocks. Replaced a boost::unique_set as optimization in Oct 2016

  • elements are stored in std::vector and a series of hash functions spread them around
  • constat-time find
  • lockfree erase
    • next insert will GC

CuckooCache - Hashing

8 way hashing is used to better distribute elements in buckets

An interesting technique is used to avoid the need of a modulus when mapping a random 32 bit number to a fixed N: https://lemire.me/blog/2016/06/27/a-fast-alternative-to-the-modulo-reduction/

inline std::array<uint32_t, 8> compute_hashes(const Element& e) const
{
    return {{(uint32_t)((hash_function.template operator()<0>(e) * (uint64_t)size) >> 32),
                (uint32_t)((hash_function.template operator()<1>(e) * (uint64_t)size) >> 32),
                (uint32_t)((hash_function.template operator()<2>(e) * (uint64_t)size) >> 32),
                (uint32_t)((hash_function.template operator()<3>(e) * (uint64_t)size) >> 32),
                (uint32_t)((hash_function.template operator()<4>(e) * (uint64_t)size) >> 32),
                (uint32_t)((hash_function.template operator()<5>(e) * (uint64_t)size) >> 32),
                (uint32_t)((hash_function.template operator()<6>(e) * (uint64_t)size) >> 32),
                (uint32_t)((hash_function.template operator()<7>(e) * (uint64_t)size) >> 32)}};
}

BloomFilter

Storing the entire blockchain is not trivial: approx. 155 GB Jan 2018

Lightweight clients use a Bloom filter data structure to optimize validity checking before getting full blocks

bloom

BloomFilter (continued)

  • minimal and straightforward C++ implementation
for (unsigned int i = 0; i < nHashFuncs; i++)
{
    unsigned int nIndex = Hash(i, vKey);
    vData[nIndex >> 3] |= (1 << (7 & nIndex));
}
  • the hashing function:
return MurmurHash3(nHashNum * 0xFBA4C795 + nTweak, vDataToHash) % (vData.size() * 8);

memusage

A sort of generalized sizeof to measure dynamic memory usage for structures

Consists of a large overload set for the function DynamicUsage<T>:

/** Dynamic memory usage for built-in types is zero. */
static inline size_t DynamicUsage(const int8_t& v) { return 0; }
static inline size_t DynamicUsage(const uint8_t& v) { return 0; }
/* ... */
template<typename X>
static inline size_t DynamicUsage(const std::vector<X>& v)
{
    return MallocUsage(v.capacity() * sizeof(X));
}
/* ... */

memusage (continued)

For STL types they shadow the underlying type structures:

template<typename X>
struct stl_tree_node
{
private:
    int color;
    void* parent;
    void* left;
    void* right;
    X x;
};

template<typename X, typename Y, typename Z>
static inline size_t DynamicUsage(const std::map<X, Y, Z>& m)
{
    return MallocUsage(sizeof(stl_tree_node<std::pair<const X, Y> >)) * m.size();
}
  • MallocUsage is just a small utility function that accounts for alignment

The End

Attributions