Therefore I am perfectly suited to give this talk! :)
The focus of this talk
... -a---- 14-Jan-18 3:30 PM 6353 bitcoind.cpp -a---- 14-Jan-18 3:30 PM 10559 blockencodings.cpp -a---- 14-Jan-18 3:30 PM 7320 blockencodings.h -a---- 14-Jan-18 3:30 PM 11457 bloom.cpp -a---- 14-Jan-18 3:30 PM 5496 bloom.h -a---- 14-Jan-18 3:30 PM 5750 chain.cpp -a---- 14-Jan-18 3:30 PM 16257 chain.h -a---- 14-Jan-18 3:30 PM 19276 chainparams.cpp -a---- 14-Jan-18 3:30 PM 4574 chainparams.h -a---- 14-Jan-18 3:30 PM 2857 chainparamsbase.cpp -a---- 14-Jan-18 3:30 PM 1944 chainparamsbase.h -a---- 14-Jan-18 3:30 PM 140338 chainparamsseeds.h -a---- 14-Jan-18 3:30 PM 880 checkpoints.cpp -a---- 14-Jan-18 3:30 PM 689 checkpoints.h -a---- 14-Jan-18 3:30 PM 7020 checkqueue.h -a---- 14-Jan-18 3:30 PM 3894 clientversion.cpp -a---- 14-Jan-18 3:30 PM 1955 clientversion.h -a---- 14-Jan-18 3:30 PM 10919 coins.cpp -a---- 14-Jan-18 3:30 PM 11020 coins.h ...
.h/.cpp
pair
main.cpp
file
STL
and Boost
and some other embedded libs like:
Qt
for UI
leveldb
for key-value storage
secp256k1
for low-level crypto ops
class
DesignDesign and tone largely set by Satoshi’s original implementation
class
design
class
Design (continued)A Null
-ify technique is used to complement constructors:
class CBlockHeader { public: int32_t nVersion; uint256 hashPrevBlock; uint256 hashMerkleRoot; uint32_t nTime; uint32_t nBits; uint32_t nNonce; CBlockHeader() { SetNull(); }
class
Design (continued)void SetNull() { nVersion = 0; hashPrevBlock.SetNull(); hashMerkleRoot.SetNull(); nTime = 0; nBits = 0; nNonce = 0; } bool IsNull() const { return (nBits == 0); }
Occurrences of C++11/14 specific constructs in all the files
Bitcoin | Ethereum | Ripple | |
---|---|---|---|
Files |
659 |
477 |
3672 |
|
12% |
36% |
36% |
|
7% |
7% |
13% |
|
6% |
13% |
19% |
|
2% |
4% |
3% |
lambda expressions |
2% |
13% |
11% |
|
none |
one file |
1% |
GUARDED_BY(x)
, SCOPED_LOCKABLE
, SHARED_LOCKS_REQUIRED
etc
__attribute__
for various hints
In order to increase the confidence of packaged binaries they are built deterministically
That means that the source code is handled in such a way that it always produces the same binary no matter the triggering conditions/environment
People are encouraged to build their own using a controlled environment (usually a VM with special scripts) rather than rely on packaged distributions in the wild
More info: https://gitian.org/
Major usage: 60% of the files
std::vector
major work horse
std::string
gets major usage as well
map
see good usage
std::runtime_error
The blockchain in memory:
class CChain { private: std::vector<CBlockIndex*> vChain; /* ... */
The unspent coins database:
std::unordered_map<COutPoint, CCoinsCacheEntry, SaltedOutpointHasher> CCoinsMap;
No custom memory management allocator is used. Instead custom allocators are used to enforce security:
zero_after_free_allocator
std::allocator
wrapper that zeroes the memory when it gets released so it’s harder to snoop
secure_allocator
Present in about 20% of the files
A lot of the usage is due to code predating C++11 adoption, before Boost constructs made it into the standard, for ex:
call_once
, thread
, mutex
, unique_lock
filesystem
,
chrono
, etc
signals
and bind
prevalent in the Qt UI code
Testing handled with the Boost Unit Test framework
boost::multi_index
used to store organized transaction data:
typedef boost::multi_index_container< CTransactionRef, boost::multi_index::indexed_by< // sorted by txid boost::multi_index::hashed_unique< boost::multi_index::tag<txid_index>, mempoolentry_txid, SaltedTxidHasher >, // sorted by order in the blockchain boost::multi_index::sequenced< boost::multi_index::tag<insertion_order> > > > indexed_disconnected_transactions;
A reflection mechanism that helps with loading/saving/transfer of objects
To automate work, every class can declare which members gets serialized/deserialized
This is implemented via a combination of template
-ed helper functions and macro glue:
class CBlockFileInfo { public: unsigned int nBlocks; //!< number of blocks stored in file unsigned int nSize; //!< number of used bytes of block file /* ... */ uint64_t nTimeLast; //!< latest time of block in file ADD_SERIALIZE_METHODS; template <typename Stream, typename Operation> inline void SerializationOp(Stream& s, Operation ser_action) { READWRITE(VARINT(nBlocks)); READWRITE(VARINT(nSize)); /* ... */ READWRITE(VARINT(nTimeLast)); }
/** * Implement three methods for serializable objects. These are actually wrappers over * "SerializationOp" template, which implements the body of each class' serialization * code. Adding "ADD_SERIALIZE_METHODS" in the body of the class causes these wrappers to be * added as members. */ #define ADD_SERIALIZE_METHODS \ template<typename Stream> \ void Serialize(Stream& s) const { \ NCONST_PTR(this)->SerializationOp(s, CSerActionSerialize()); \ } \ template<typename Stream> \ void Unserialize(Stream& s) { \ SerializationOp(s, CSerActionUnserialize()); \ }
Within the body of SerializationOp
the READWRITE
generic macro is used. It will expand differently depending if a read(unserialize) or write(serialize) is taking place
#define READWRITE(obj) (::SerReadWrite(s, (obj), ser_action))
template<typename Stream, typename T> inline void SerReadWrite(Stream& s, const T& obj, CSerActionSerialize ser_action) { ::Serialize(s, obj); }
template<typename Stream, typename T> inline void SerReadWrite(Stream& s, T& obj, CSerActionUnserialize ser_action) { ::Unserialize(s, obj); }
template<typename Stream> inline void Serialize(Stream& s, char a ) { ser_writedata8(s, a); } // TODO Get rid of bare char template<typename Stream> inline void Serialize(Stream& s, int8_t a ) { ser_writedata8(s, a); } template<typename Stream> inline void Serialize(Stream& s, uint8_t a ) { ser_writedata8(s, a); } /* ... */ template<typename Stream> inline void Serialize(Stream& s, uint64_t a) { ser_writedata64(s, a); } template<typename Stream> inline void Serialize(Stream& s, float a ) { ser_writedata32(s, ser_float_to_uint32(a)); } template<typename Stream> inline void Serialize(Stream& s, double a ) { ser_writedata64(s, ser_double_to_uint64(a)); } template<typename Stream> inline void Unserialize(Stream& s, char& a ) { a = ser_readdata8(s); } // TODO Get rid of bare char template<typename Stream> inline void Unserialize(Stream& s, int8_t& a ) { a = ser_readdata8(s); } template<typename Stream> inline void Unserialize(Stream& s, uint8_t& a ) { a = ser_readdata8(s); } /* ... */ template<typename Stream> inline void Unserialize(Stream& s, uint64_t& a) { a = ser_readdata64(s); } template<typename Stream> inline void Unserialize(Stream& s, float& a ) { a = ser_uint32_to_float(ser_readdata32(s)); } template<typename Stream> inline void Unserialize(Stream& s, double& a ) { a = ser_uint64_to_double(ser_readdata64(s)); }
template<typename Stream, typename K, typename T> void Serialize(Stream& os, const std::pair<K, T>& item) { Serialize(os, item.first); Serialize(os, item.second); }
template<typename Stream, typename K, typename T> void Unserialize(Stream& is, std::pair<K, T>& item) { Unserialize(is, item.first); Unserialize(is, item.second); }
template<typename Stream, typename T, typename A, typename V> void Serialize_impl(Stream& os, const std::vector<T, A>& v, const V&) { WriteCompactSize(os, v.size()); for (typename std::vector<T, A>::const_iterator vi = v.begin(); vi != v.end(); ++vi) ::Serialize(os, (*vi)); }
template<typename Stream, typename T, typename A, typename V> void Unserialize_impl(Stream& is, std::vector<T, A>& v, const V&) { v.clear(); unsigned int nSize = ReadCompactSize(is); unsigned int i = 0; unsigned int nMid = 0; while (nMid < nSize) { nMid += 5000000 / sizeof(T); if (nMid > nSize) nMid = nSize; v.resize(nMid); for (; i < nMid; i++) Unserialize(is, v[i]); } }
/** * Used to bypass the rule against non-const reference to temporary * where it makes sense with wrappers such as CFlatData or CTxDB */ template<typename T> inline T& REF(const T& val) { return const_cast<T&>(val); }
#define FLATDATA(obj) REF(CFlatData((char*)&(obj), (char*)&(obj) + sizeof(obj))) /* ... */ unsigned char ip[16]; /* ... */ READWRITE(FLATDATA(ip));
Interesting data structures that are relatively isolated and reusable
CDataStream
- relatively thin abstraction over a secure std::vector
of bytes
CAutoFile
- non-refcounted RAII wrapper for FILE*
CBufferedFile
- same as above
JSON is used everywhere for a RPC driven communication layer
UniValue getblockcount(const JSONRPCRequest& request) { if (request.fHelp || request.params.size() != 0) throw std::runtime_error( "getblockcount\n" "\nReturns the number of blocks in the longest blockchain.\n" "\nResult:\n" "n (numeric) The current block count\n" "\nExamples:\n" + HelpExampleCli("getblockcount", "") + HelpExampleRpc("getblockcount", "") ); LOCK(cs_main); return chainActive.Height(); }
Variant like structure that represents a JSON object value
Stores key/values as a vector of std::string
UniValue entry(UniValue::VOBJ); entry.pushKV("txid", tx.GetHash().GetHex()); entry.pushKV("hash", tx.GetWitnessHash().GetHex()); entry.pushKV("version", tx.nVersion);
According to the README
: "[it] minimizes template use (contra json_spirit)"
Drop in replacement for std::vector
that stores the first N elements in-place
An interesting mix of the standard array and a dynamic vector
size_type _size; union direct_or_indirect { char direct[sizeof(T) * N]; struct { size_type capacity; char* indirect; }; } _union;
realloc
-ed
iterator
and the reverse, const variants
std::swap
-ing the union
Only usage case is for storing the transaction script opcodes where apparently:
We use a prevector for the script to reduce the considerable memory overhead of vectors in cases where they normally contain a small number of small elements. Tests in October 2015 showed use of this reduced dbcache memory usage by 23% and made an initial sync 13% faster.
A quite simple form of encoding integers depending on their numeric range, in order to save on space/bandwidth
Range | Encoding in bytes |
---|---|
0 .. 252 |
XX |
253 .. 216 - 1 |
0xFD XX XX |
216 .. 232 – 1 |
0xFE XX XX XX XX |
232 .. 264 – 1 |
0xFF XX XX XX XX XX XX XX XX |
A couple of small convenience driven modifications:
indirectmap
limitedmap
Unique Set data structure based on the principles of the cuckoo hash map
Used to avoid double checking transactions - once for the mempool and the other time for the blocks. Replaced a boost::unique_set
as optimization in Oct 2016
std::vector
and a series of hash functions spread them around
8 way hashing is used to better distribute elements in buckets
An interesting technique is used to avoid the need of a modulus when mapping a random 32 bit number to a fixed N: https://lemire.me/blog/2016/06/27/a-fast-alternative-to-the-modulo-reduction/
inline std::array<uint32_t, 8> compute_hashes(const Element& e) const { return {{(uint32_t)((hash_function.template operator()<0>(e) * (uint64_t)size) >> 32), (uint32_t)((hash_function.template operator()<1>(e) * (uint64_t)size) >> 32), (uint32_t)((hash_function.template operator()<2>(e) * (uint64_t)size) >> 32), (uint32_t)((hash_function.template operator()<3>(e) * (uint64_t)size) >> 32), (uint32_t)((hash_function.template operator()<4>(e) * (uint64_t)size) >> 32), (uint32_t)((hash_function.template operator()<5>(e) * (uint64_t)size) >> 32), (uint32_t)((hash_function.template operator()<6>(e) * (uint64_t)size) >> 32), (uint32_t)((hash_function.template operator()<7>(e) * (uint64_t)size) >> 32)}}; }
Storing the entire blockchain is not trivial: approx. 155 GB Jan 2018
Lightweight clients use a Bloom filter data structure to optimize validity checking before getting full blocks
for (unsigned int i = 0; i < nHashFuncs; i++) { unsigned int nIndex = Hash(i, vKey); vData[nIndex >> 3] |= (1 << (7 & nIndex)); }
return MurmurHash3(nHashNum * 0xFBA4C795 + nTweak, vDataToHash) % (vData.size() * 8);
A sort of generalized sizeof
to measure dynamic memory usage for structures
Consists of a large overload set for the function DynamicUsage<T>
:
/** Dynamic memory usage for built-in types is zero. */ static inline size_t DynamicUsage(const int8_t& v) { return 0; } static inline size_t DynamicUsage(const uint8_t& v) { return 0; } /* ... */
template<typename X> static inline size_t DynamicUsage(const std::vector<X>& v) { return MallocUsage(v.capacity() * sizeof(X)); } /* ... */
For STL types they shadow the underlying type structures:
template<typename X> struct stl_tree_node { private: int color; void* parent; void* left; void* right; X x; }; template<typename X, typename Y, typename Z> static inline size_t DynamicUsage(const std::map<X, Y, Z>& m) { return MallocUsage(sizeof(stl_tree_node<std::pair<const X, Y> >)) * m.size(); }
MallocUsage
is just a small utility function that accounts for alignment