dash/src/test/streams_tests.cpp

462 lines
15 KiB
C++
Raw Normal View History

// Copyright (c) 2012-2015 The Bitcoin Core developers
// Distributed under the MIT software license, see the accompanying
// file COPYING or http://www.opensource.org/licenses/mit-license.php.
Merge #16577: util: CBufferedFile fixes and unit test efd2474d17098c754367b844ec646ebececc7c74 util: CBufferedFile fixes (Larry Ruane) Pull request description: The `CBufferedFile` object guarantees its user is able to "rewind" the data stream (that's being read from a file) up to a certain number of bytes, as specified by the user in the constructor. This guarantee is not honored due to a bug in the `SetPos` method. Such rewinding is done in `LoadExternalBlockFile()` (currently the only user of this object), which deserializes a series of `CBlock` objects. If that function encounters something unexpected in the data stream, which is coming from a `blocks/blk00???.dat` file, it "rewinds" to an earlier position in the stream to try to get in sync again. The `CBufferedFile` object does not actually rewind its file offset; it simply repositions its internal offset, `nReadPos`, to an earlier position within the object's private buffer; this is why there's a limit to how far the user may rewind. If `LoadExternalBlockFile()` needs to rewind (call `blkdat.SetPos()`), the stream may not be positioned as it should be, causing errors in deserialization. This need to rewind is probably rare, which is likely why this bug hasn't been noticed already. But if this object is used elsewhere in the future, this could be a serious problem, especially as, due to the nature of the bug, the `SetPos()` _sometimes_ works. This PR adds a unit test for `CBufferedFile` that fails due to this bug. (Until now it has had no unit tests.) The unit test provides good documentation and examples for developers trying to understand `LoadExternalBlockFile()` and for future users of this object. This PR also adds code to throw an exception from the constructor if the rewind argument is not less than the buffer size (since that doesn't make any sense). Finally, I discovered that the object is too restrictive in one respect: When the deserialization methods call this object's `read` method, a check ensures that the number of bytes being requested is less than the size of the buffer (adjusting for the rewind size), else it throws an exception. This restriction is unnecessary; the object being deserialized can be larger than the buffer because multiple reads from disk can satisfy the request. ACKs for top commit: laanwj: ACK ~after squash.~ efd2474d17098c754367b844ec646ebececc7c74 mzumsande: I had intended to follow up earlier on my last comment, ACK efd2474d17098c754367b844ec646ebececc7c74. I reviewed the code, ran tests and did a successful reindex on testnet with this branch. Tree-SHA512: 695529e0af38bae2af4e0cc2895dda56a71b9059c3de04d32e09c0165a50f6aacee499f2042156ab5eaa6f0349bab6bcca4ef9f6f9ded4e60d4483beab7e4554
2019-09-26 13:37:06 +02:00
#include <random.h>
Backport 11651 (#3358) * scripted-diff: Replace #include "" with #include <> (ryanofsky) -BEGIN VERIFY SCRIPT- for f in \ src/*.cpp \ src/*.h \ src/bench/*.cpp \ src/bench/*.h \ src/compat/*.cpp \ src/compat/*.h \ src/consensus/*.cpp \ src/consensus/*.h \ src/crypto/*.cpp \ src/crypto/*.h \ src/crypto/ctaes/*.h \ src/policy/*.cpp \ src/policy/*.h \ src/primitives/*.cpp \ src/primitives/*.h \ src/qt/*.cpp \ src/qt/*.h \ src/qt/test/*.cpp \ src/qt/test/*.h \ src/rpc/*.cpp \ src/rpc/*.h \ src/script/*.cpp \ src/script/*.h \ src/support/*.cpp \ src/support/*.h \ src/support/allocators/*.h \ src/test/*.cpp \ src/test/*.h \ src/wallet/*.cpp \ src/wallet/*.h \ src/wallet/test/*.cpp \ src/wallet/test/*.h \ src/zmq/*.cpp \ src/zmq/*.h do base=${f%/*}/ relbase=${base#src/} sed -i "s:#include \"\(.*\)\"\(.*\):if test -e \$base'\\1'; then echo \"#include <\"\$relbase\"\\1>\\2\"; else echo \"#include <\\1>\\2\"; fi:e" $f done -END VERIFY SCRIPT- Signed-off-by: Pasta <pasta@dashboost.org> * scripted-diff: Replace #include "" with #include <> (Dash Specific) -BEGIN VERIFY SCRIPT- for f in \ src/bls/*.cpp \ src/bls/*.h \ src/evo/*.cpp \ src/evo/*.h \ src/governance/*.cpp \ src/governance/*.h \ src/llmq/*.cpp \ src/llmq/*.h \ src/masternode/*.cpp \ src/masternode/*.h \ src/privatesend/*.cpp \ src/privatesend/*.h do base=${f%/*}/ relbase=${base#src/} sed -i "s:#include \"\(.*\)\"\(.*\):if test -e \$base'\\1'; then echo \"#include <\"\$relbase\"\\1>\\2\"; else echo \"#include <\\1>\\2\"; fi:e" $f done -END VERIFY SCRIPT- Signed-off-by: Pasta <pasta@dashboost.org> * build: Remove -I for everything but project root Remove -I from build system for everything but the project root, and built-in dependencies. Signed-off-by: Pasta <pasta@dashboost.org> # Conflicts: # src/Makefile.test.include * qt: refactor: Use absolute include paths in .ui files * qt: refactor: Changes to make include paths absolute This makes all include paths in the GUI absolute. Many changes are involved as every single source file in src/qt/ assumes to be able to use relative includes. Signed-off-by: Pasta <pasta@dashboost.org> # Conflicts: # src/qt/dash.cpp # src/qt/optionsmodel.cpp # src/qt/test/rpcnestedtests.cpp * test: refactor: Use absolute include paths for test data files * Recommend #include<> syntax in developer notes * refactor: Include obj/build.h instead of build.h * END BACKPORT #11651 Remove trailing whitespace causing travis failure * fix backport 11651 Signed-off-by: Pasta <pasta@dashboost.org> * More of 11651 * fix blockchain.cpp Signed-off-by: pasta <pasta@dashboost.org> * Add missing "qt/" in includes * Add missing "test/" in includes * Fix trailing whitespaces Co-authored-by: Wladimir J. van der Laan <laanwj@gmail.com> Co-authored-by: Russell Yanofsky <russ@yanofsky.org> Co-authored-by: MeshCollider <dobsonsa68@gmail.com> Co-authored-by: UdjinM6 <UdjinM6@users.noreply.github.com>
2020-03-19 23:46:56 +01:00
#include <streams.h>
#include <support/allocators/zeroafterfree.h>
#include <test/test_dash.h>
#include <boost/test/unit_test.hpp>
BOOST_FIXTURE_TEST_SUITE(streams_tests, BasicTestingSetup)
BOOST_AUTO_TEST_CASE(streams_vector_writer)
{
unsigned char a(1);
unsigned char b(2);
unsigned char bytes[] = { 3, 4, 5, 6 };
std::vector<unsigned char> vch;
// Each test runs twice. Serializing a second time at the same starting
// point should yield the same results, even if the first test grew the
// vector.
CVectorWriter(SER_NETWORK, INIT_PROTO_VERSION, vch, 0, a, b);
BOOST_CHECK((vch == std::vector<unsigned char>{{1, 2}}));
CVectorWriter(SER_NETWORK, INIT_PROTO_VERSION, vch, 0, a, b);
BOOST_CHECK((vch == std::vector<unsigned char>{{1, 2}}));
vch.clear();
CVectorWriter(SER_NETWORK, INIT_PROTO_VERSION, vch, 2, a, b);
BOOST_CHECK((vch == std::vector<unsigned char>{{0, 0, 1, 2}}));
CVectorWriter(SER_NETWORK, INIT_PROTO_VERSION, vch, 2, a, b);
BOOST_CHECK((vch == std::vector<unsigned char>{{0, 0, 1, 2}}));
vch.clear();
vch.resize(5, 0);
CVectorWriter(SER_NETWORK, INIT_PROTO_VERSION, vch, 2, a, b);
BOOST_CHECK((vch == std::vector<unsigned char>{{0, 0, 1, 2, 0}}));
CVectorWriter(SER_NETWORK, INIT_PROTO_VERSION, vch, 2, a, b);
BOOST_CHECK((vch == std::vector<unsigned char>{{0, 0, 1, 2, 0}}));
vch.clear();
vch.resize(4, 0);
CVectorWriter(SER_NETWORK, INIT_PROTO_VERSION, vch, 3, a, b);
BOOST_CHECK((vch == std::vector<unsigned char>{{0, 0, 0, 1, 2}}));
CVectorWriter(SER_NETWORK, INIT_PROTO_VERSION, vch, 3, a, b);
BOOST_CHECK((vch == std::vector<unsigned char>{{0, 0, 0, 1, 2}}));
vch.clear();
vch.resize(4, 0);
CVectorWriter(SER_NETWORK, INIT_PROTO_VERSION, vch, 4, a, b);
BOOST_CHECK((vch == std::vector<unsigned char>{{0, 0, 0, 0, 1, 2}}));
CVectorWriter(SER_NETWORK, INIT_PROTO_VERSION, vch, 4, a, b);
BOOST_CHECK((vch == std::vector<unsigned char>{{0, 0, 0, 0, 1, 2}}));
vch.clear();
CVectorWriter(SER_NETWORK, INIT_PROTO_VERSION, vch, 0, bytes);
BOOST_CHECK((vch == std::vector<unsigned char>{{3, 4, 5, 6}}));
CVectorWriter(SER_NETWORK, INIT_PROTO_VERSION, vch, 0, bytes);
BOOST_CHECK((vch == std::vector<unsigned char>{{3, 4, 5, 6}}));
vch.clear();
vch.resize(4, 8);
CVectorWriter(SER_NETWORK, INIT_PROTO_VERSION, vch, 2, a, bytes, b);
BOOST_CHECK((vch == std::vector<unsigned char>{{8, 8, 1, 3, 4, 5, 6, 2}}));
CVectorWriter(SER_NETWORK, INIT_PROTO_VERSION, vch, 2, a, bytes, b);
BOOST_CHECK((vch == std::vector<unsigned char>{{8, 8, 1, 3, 4, 5, 6, 2}}));
vch.clear();
}
Merge #12254: BIP 158: Compact Block Filters for Light Clients 254c85b68794ada713dbdae415db72adf5fcbaf3 bench: Benchmark GCS filter creation and matching. (Jim Posen) f33b717a85363e067316c133a542559d2f4aaeca blockfilter: Optimization on compilers with int128 support. (Jim Posen) 97b64d67daf0336dfb64b132f3e4d6a4c1967da4 blockfilter: Unit test against BIP 158 test vectors. (Jim Posen) a4afb9cadbaecb0676e6475ab8d32a52faecb47a blockfilter: Additional helper methods to compute hash and header. (Jim Posen) cd09c7925b5af4104834971cfe072251e3ac2bda blockfilter: Serialization methods on BlockFilter. (Jim Posen) c1855f6052aca806fdb51be01b30dfeee8b55f40 blockfilter: Construction of basic block filters. (Jim Posen) 53e7874e079f9ddfe8b176f11d46e6b59c7283d5 blockfilter: Simple test for GCSFilter construction and Match. (Jim Posen) 558c536e35a25594881693e6ff01d275c88d7af1 blockfilter: Implement GCSFilter Match methods. (Jim Posen) cf70b550054eed36f194eaa13f4a9cb31e32df38 blockfilter: Implement GCSFilter constructors. (Jim Posen) c454f0ac63c6028f54c7eb51683b3ccdb475b19b blockfilter: Declare GCSFilter class for BIP 158 impl. (Jim Posen) 9b622dc72279b027c59d6541cddff53800fc689b streams: Unit tests for BitStreamReader and BitStreamWriter. (Jim Posen) fe943f99bf0a2bbb12e30bc4803c0337e3c95b93 streams: Implement BitStreamReader/Writer classes. (Jim Posen) 87f2d9ee43a9220076b1959d1ca65245d9591be9 streams: Unit test for VectorReader class. (Jim Posen) 947133dec92cd25ec2b3358c09b8614ba6fb40d4 streams: Create VectorReader stream interface for vectors. (Jim Posen) Pull request description: This implements the compact block filter construction in [BIP 158](https://github.com/bitcoin/bips/blob/master/bip-0158.mediawiki). The code is not used anywhere in the Bitcoin Core code base yet. The next step towards [BIP 157](https://github.com/bitcoin/bips/blob/master/bip-0157.mediawiki) support would be to create an indexing module similar to `TxIndex` that constructs the basic and extended filters for each validated block. ### Filter Sizes [Here](https://gateway.ipfs.io/ipfs/QmRqaAAQZ5ZX5eqxP7J2R1MzFrc2WDdKSWJEKtQzyawqog) is a CSV of filter sizes for blocks in the main chain. As you can see below, the ratio of filter size to block size drops after the first ~150,000 blocks: ![filter_sizes](https://user-images.githubusercontent.com/881253/42900589-299772d4-8a7e-11e8-886d-0d4f3f4fbe44.png) The reason for the relatively large filter sizes is that Golomb-coded sets only achieve good compression with a sufficient number of elements. Empirically, the average element size with 100 elements is 14% larger than with 10,000 elements. The ratio of filter size to block size is computed without witness data for basic filters. Here is a summary table of filter size ratios *for blocks after height 150,000*: | Stat | Filter Type | |-------|--------------| | Weighted Size Ratio Mean | 0.0198 | | Size Ratio Mean | 0.0224 | | Size Ratio Std Deviation | 0.0202 | | Mean Element Size (bits) | 21.145 | | Approx Theoretical Min Element Size (bits) | 21.025 | Tree-SHA512: 2d045fbfc3fc45490ecb9b08d2f7e4dbbe7cd8c1c939f06bbdb8e8aacfe4c495cdb67c820e52520baebbf8a8305a0efd8e59d3fa8e367574a4b830509a39223f
2018-08-26 16:57:01 +02:00
BOOST_AUTO_TEST_CASE(streams_vector_reader)
{
std::vector<unsigned char> vch = {1, 255, 3, 4, 5, 6};
VectorReader reader(SER_NETWORK, INIT_PROTO_VERSION, vch, 0);
BOOST_CHECK_EQUAL(reader.size(), 6);
BOOST_CHECK(!reader.empty());
// Read a single byte as an unsigned char.
unsigned char a;
reader >> a;
BOOST_CHECK_EQUAL(a, 1);
BOOST_CHECK_EQUAL(reader.size(), 5);
BOOST_CHECK(!reader.empty());
// Read a single byte as a signed char.
signed char b;
reader >> b;
BOOST_CHECK_EQUAL(b, -1);
BOOST_CHECK_EQUAL(reader.size(), 4);
BOOST_CHECK(!reader.empty());
// Read a 4 bytes as an unsigned int.
unsigned int c;
reader >> c;
BOOST_CHECK_EQUAL(c, 100992003); // 3,4,5,6 in little-endian base-256
BOOST_CHECK_EQUAL(reader.size(), 0);
BOOST_CHECK(reader.empty());
// Reading after end of byte vector throws an error.
signed int d;
BOOST_CHECK_THROW(reader >> d, std::ios_base::failure);
// Read a 4 bytes as a signed int from the beginning of the buffer.
VectorReader new_reader(SER_NETWORK, INIT_PROTO_VERSION, vch, 0);
new_reader >> d;
Merge #12254: BIP 158: Compact Block Filters for Light Clients 254c85b68794ada713dbdae415db72adf5fcbaf3 bench: Benchmark GCS filter creation and matching. (Jim Posen) f33b717a85363e067316c133a542559d2f4aaeca blockfilter: Optimization on compilers with int128 support. (Jim Posen) 97b64d67daf0336dfb64b132f3e4d6a4c1967da4 blockfilter: Unit test against BIP 158 test vectors. (Jim Posen) a4afb9cadbaecb0676e6475ab8d32a52faecb47a blockfilter: Additional helper methods to compute hash and header. (Jim Posen) cd09c7925b5af4104834971cfe072251e3ac2bda blockfilter: Serialization methods on BlockFilter. (Jim Posen) c1855f6052aca806fdb51be01b30dfeee8b55f40 blockfilter: Construction of basic block filters. (Jim Posen) 53e7874e079f9ddfe8b176f11d46e6b59c7283d5 blockfilter: Simple test for GCSFilter construction and Match. (Jim Posen) 558c536e35a25594881693e6ff01d275c88d7af1 blockfilter: Implement GCSFilter Match methods. (Jim Posen) cf70b550054eed36f194eaa13f4a9cb31e32df38 blockfilter: Implement GCSFilter constructors. (Jim Posen) c454f0ac63c6028f54c7eb51683b3ccdb475b19b blockfilter: Declare GCSFilter class for BIP 158 impl. (Jim Posen) 9b622dc72279b027c59d6541cddff53800fc689b streams: Unit tests for BitStreamReader and BitStreamWriter. (Jim Posen) fe943f99bf0a2bbb12e30bc4803c0337e3c95b93 streams: Implement BitStreamReader/Writer classes. (Jim Posen) 87f2d9ee43a9220076b1959d1ca65245d9591be9 streams: Unit test for VectorReader class. (Jim Posen) 947133dec92cd25ec2b3358c09b8614ba6fb40d4 streams: Create VectorReader stream interface for vectors. (Jim Posen) Pull request description: This implements the compact block filter construction in [BIP 158](https://github.com/bitcoin/bips/blob/master/bip-0158.mediawiki). The code is not used anywhere in the Bitcoin Core code base yet. The next step towards [BIP 157](https://github.com/bitcoin/bips/blob/master/bip-0157.mediawiki) support would be to create an indexing module similar to `TxIndex` that constructs the basic and extended filters for each validated block. ### Filter Sizes [Here](https://gateway.ipfs.io/ipfs/QmRqaAAQZ5ZX5eqxP7J2R1MzFrc2WDdKSWJEKtQzyawqog) is a CSV of filter sizes for blocks in the main chain. As you can see below, the ratio of filter size to block size drops after the first ~150,000 blocks: ![filter_sizes](https://user-images.githubusercontent.com/881253/42900589-299772d4-8a7e-11e8-886d-0d4f3f4fbe44.png) The reason for the relatively large filter sizes is that Golomb-coded sets only achieve good compression with a sufficient number of elements. Empirically, the average element size with 100 elements is 14% larger than with 10,000 elements. The ratio of filter size to block size is computed without witness data for basic filters. Here is a summary table of filter size ratios *for blocks after height 150,000*: | Stat | Filter Type | |-------|--------------| | Weighted Size Ratio Mean | 0.0198 | | Size Ratio Mean | 0.0224 | | Size Ratio Std Deviation | 0.0202 | | Mean Element Size (bits) | 21.145 | | Approx Theoretical Min Element Size (bits) | 21.025 | Tree-SHA512: 2d045fbfc3fc45490ecb9b08d2f7e4dbbe7cd8c1c939f06bbdb8e8aacfe4c495cdb67c820e52520baebbf8a8305a0efd8e59d3fa8e367574a4b830509a39223f
2018-08-26 16:57:01 +02:00
BOOST_CHECK_EQUAL(d, 67370753); // 1,255,3,4 in little-endian base-256
BOOST_CHECK_EQUAL(new_reader.size(), 2);
BOOST_CHECK(!new_reader.empty());
Merge #12254: BIP 158: Compact Block Filters for Light Clients 254c85b68794ada713dbdae415db72adf5fcbaf3 bench: Benchmark GCS filter creation and matching. (Jim Posen) f33b717a85363e067316c133a542559d2f4aaeca blockfilter: Optimization on compilers with int128 support. (Jim Posen) 97b64d67daf0336dfb64b132f3e4d6a4c1967da4 blockfilter: Unit test against BIP 158 test vectors. (Jim Posen) a4afb9cadbaecb0676e6475ab8d32a52faecb47a blockfilter: Additional helper methods to compute hash and header. (Jim Posen) cd09c7925b5af4104834971cfe072251e3ac2bda blockfilter: Serialization methods on BlockFilter. (Jim Posen) c1855f6052aca806fdb51be01b30dfeee8b55f40 blockfilter: Construction of basic block filters. (Jim Posen) 53e7874e079f9ddfe8b176f11d46e6b59c7283d5 blockfilter: Simple test for GCSFilter construction and Match. (Jim Posen) 558c536e35a25594881693e6ff01d275c88d7af1 blockfilter: Implement GCSFilter Match methods. (Jim Posen) cf70b550054eed36f194eaa13f4a9cb31e32df38 blockfilter: Implement GCSFilter constructors. (Jim Posen) c454f0ac63c6028f54c7eb51683b3ccdb475b19b blockfilter: Declare GCSFilter class for BIP 158 impl. (Jim Posen) 9b622dc72279b027c59d6541cddff53800fc689b streams: Unit tests for BitStreamReader and BitStreamWriter. (Jim Posen) fe943f99bf0a2bbb12e30bc4803c0337e3c95b93 streams: Implement BitStreamReader/Writer classes. (Jim Posen) 87f2d9ee43a9220076b1959d1ca65245d9591be9 streams: Unit test for VectorReader class. (Jim Posen) 947133dec92cd25ec2b3358c09b8614ba6fb40d4 streams: Create VectorReader stream interface for vectors. (Jim Posen) Pull request description: This implements the compact block filter construction in [BIP 158](https://github.com/bitcoin/bips/blob/master/bip-0158.mediawiki). The code is not used anywhere in the Bitcoin Core code base yet. The next step towards [BIP 157](https://github.com/bitcoin/bips/blob/master/bip-0157.mediawiki) support would be to create an indexing module similar to `TxIndex` that constructs the basic and extended filters for each validated block. ### Filter Sizes [Here](https://gateway.ipfs.io/ipfs/QmRqaAAQZ5ZX5eqxP7J2R1MzFrc2WDdKSWJEKtQzyawqog) is a CSV of filter sizes for blocks in the main chain. As you can see below, the ratio of filter size to block size drops after the first ~150,000 blocks: ![filter_sizes](https://user-images.githubusercontent.com/881253/42900589-299772d4-8a7e-11e8-886d-0d4f3f4fbe44.png) The reason for the relatively large filter sizes is that Golomb-coded sets only achieve good compression with a sufficient number of elements. Empirically, the average element size with 100 elements is 14% larger than with 10,000 elements. The ratio of filter size to block size is computed without witness data for basic filters. Here is a summary table of filter size ratios *for blocks after height 150,000*: | Stat | Filter Type | |-------|--------------| | Weighted Size Ratio Mean | 0.0198 | | Size Ratio Mean | 0.0224 | | Size Ratio Std Deviation | 0.0202 | | Mean Element Size (bits) | 21.145 | | Approx Theoretical Min Element Size (bits) | 21.025 | Tree-SHA512: 2d045fbfc3fc45490ecb9b08d2f7e4dbbe7cd8c1c939f06bbdb8e8aacfe4c495cdb67c820e52520baebbf8a8305a0efd8e59d3fa8e367574a4b830509a39223f
2018-08-26 16:57:01 +02:00
// Reading after end of byte vector throws an error even if the reader is
// not totally empty.
BOOST_CHECK_THROW(new_reader >> d, std::ios_base::failure);
Merge #12254: BIP 158: Compact Block Filters for Light Clients 254c85b68794ada713dbdae415db72adf5fcbaf3 bench: Benchmark GCS filter creation and matching. (Jim Posen) f33b717a85363e067316c133a542559d2f4aaeca blockfilter: Optimization on compilers with int128 support. (Jim Posen) 97b64d67daf0336dfb64b132f3e4d6a4c1967da4 blockfilter: Unit test against BIP 158 test vectors. (Jim Posen) a4afb9cadbaecb0676e6475ab8d32a52faecb47a blockfilter: Additional helper methods to compute hash and header. (Jim Posen) cd09c7925b5af4104834971cfe072251e3ac2bda blockfilter: Serialization methods on BlockFilter. (Jim Posen) c1855f6052aca806fdb51be01b30dfeee8b55f40 blockfilter: Construction of basic block filters. (Jim Posen) 53e7874e079f9ddfe8b176f11d46e6b59c7283d5 blockfilter: Simple test for GCSFilter construction and Match. (Jim Posen) 558c536e35a25594881693e6ff01d275c88d7af1 blockfilter: Implement GCSFilter Match methods. (Jim Posen) cf70b550054eed36f194eaa13f4a9cb31e32df38 blockfilter: Implement GCSFilter constructors. (Jim Posen) c454f0ac63c6028f54c7eb51683b3ccdb475b19b blockfilter: Declare GCSFilter class for BIP 158 impl. (Jim Posen) 9b622dc72279b027c59d6541cddff53800fc689b streams: Unit tests for BitStreamReader and BitStreamWriter. (Jim Posen) fe943f99bf0a2bbb12e30bc4803c0337e3c95b93 streams: Implement BitStreamReader/Writer classes. (Jim Posen) 87f2d9ee43a9220076b1959d1ca65245d9591be9 streams: Unit test for VectorReader class. (Jim Posen) 947133dec92cd25ec2b3358c09b8614ba6fb40d4 streams: Create VectorReader stream interface for vectors. (Jim Posen) Pull request description: This implements the compact block filter construction in [BIP 158](https://github.com/bitcoin/bips/blob/master/bip-0158.mediawiki). The code is not used anywhere in the Bitcoin Core code base yet. The next step towards [BIP 157](https://github.com/bitcoin/bips/blob/master/bip-0157.mediawiki) support would be to create an indexing module similar to `TxIndex` that constructs the basic and extended filters for each validated block. ### Filter Sizes [Here](https://gateway.ipfs.io/ipfs/QmRqaAAQZ5ZX5eqxP7J2R1MzFrc2WDdKSWJEKtQzyawqog) is a CSV of filter sizes for blocks in the main chain. As you can see below, the ratio of filter size to block size drops after the first ~150,000 blocks: ![filter_sizes](https://user-images.githubusercontent.com/881253/42900589-299772d4-8a7e-11e8-886d-0d4f3f4fbe44.png) The reason for the relatively large filter sizes is that Golomb-coded sets only achieve good compression with a sufficient number of elements. Empirically, the average element size with 100 elements is 14% larger than with 10,000 elements. The ratio of filter size to block size is computed without witness data for basic filters. Here is a summary table of filter size ratios *for blocks after height 150,000*: | Stat | Filter Type | |-------|--------------| | Weighted Size Ratio Mean | 0.0198 | | Size Ratio Mean | 0.0224 | | Size Ratio Std Deviation | 0.0202 | | Mean Element Size (bits) | 21.145 | | Approx Theoretical Min Element Size (bits) | 21.025 | Tree-SHA512: 2d045fbfc3fc45490ecb9b08d2f7e4dbbe7cd8c1c939f06bbdb8e8aacfe4c495cdb67c820e52520baebbf8a8305a0efd8e59d3fa8e367574a4b830509a39223f
2018-08-26 16:57:01 +02:00
}
BOOST_AUTO_TEST_CASE(streams_vector_reader_rvalue)
{
std::vector<uint8_t> data{0x82, 0xa7, 0x31};
VectorReader reader(SER_NETWORK, INIT_PROTO_VERSION, data, /* pos= */ 0);
uint32_t varint = 0;
// Deserialize into r-value
reader >> VARINT(varint);
BOOST_CHECK_EQUAL(varint, 54321);
BOOST_CHECK(reader.empty());
}
Merge #12254: BIP 158: Compact Block Filters for Light Clients 254c85b68794ada713dbdae415db72adf5fcbaf3 bench: Benchmark GCS filter creation and matching. (Jim Posen) f33b717a85363e067316c133a542559d2f4aaeca blockfilter: Optimization on compilers with int128 support. (Jim Posen) 97b64d67daf0336dfb64b132f3e4d6a4c1967da4 blockfilter: Unit test against BIP 158 test vectors. (Jim Posen) a4afb9cadbaecb0676e6475ab8d32a52faecb47a blockfilter: Additional helper methods to compute hash and header. (Jim Posen) cd09c7925b5af4104834971cfe072251e3ac2bda blockfilter: Serialization methods on BlockFilter. (Jim Posen) c1855f6052aca806fdb51be01b30dfeee8b55f40 blockfilter: Construction of basic block filters. (Jim Posen) 53e7874e079f9ddfe8b176f11d46e6b59c7283d5 blockfilter: Simple test for GCSFilter construction and Match. (Jim Posen) 558c536e35a25594881693e6ff01d275c88d7af1 blockfilter: Implement GCSFilter Match methods. (Jim Posen) cf70b550054eed36f194eaa13f4a9cb31e32df38 blockfilter: Implement GCSFilter constructors. (Jim Posen) c454f0ac63c6028f54c7eb51683b3ccdb475b19b blockfilter: Declare GCSFilter class for BIP 158 impl. (Jim Posen) 9b622dc72279b027c59d6541cddff53800fc689b streams: Unit tests for BitStreamReader and BitStreamWriter. (Jim Posen) fe943f99bf0a2bbb12e30bc4803c0337e3c95b93 streams: Implement BitStreamReader/Writer classes. (Jim Posen) 87f2d9ee43a9220076b1959d1ca65245d9591be9 streams: Unit test for VectorReader class. (Jim Posen) 947133dec92cd25ec2b3358c09b8614ba6fb40d4 streams: Create VectorReader stream interface for vectors. (Jim Posen) Pull request description: This implements the compact block filter construction in [BIP 158](https://github.com/bitcoin/bips/blob/master/bip-0158.mediawiki). The code is not used anywhere in the Bitcoin Core code base yet. The next step towards [BIP 157](https://github.com/bitcoin/bips/blob/master/bip-0157.mediawiki) support would be to create an indexing module similar to `TxIndex` that constructs the basic and extended filters for each validated block. ### Filter Sizes [Here](https://gateway.ipfs.io/ipfs/QmRqaAAQZ5ZX5eqxP7J2R1MzFrc2WDdKSWJEKtQzyawqog) is a CSV of filter sizes for blocks in the main chain. As you can see below, the ratio of filter size to block size drops after the first ~150,000 blocks: ![filter_sizes](https://user-images.githubusercontent.com/881253/42900589-299772d4-8a7e-11e8-886d-0d4f3f4fbe44.png) The reason for the relatively large filter sizes is that Golomb-coded sets only achieve good compression with a sufficient number of elements. Empirically, the average element size with 100 elements is 14% larger than with 10,000 elements. The ratio of filter size to block size is computed without witness data for basic filters. Here is a summary table of filter size ratios *for blocks after height 150,000*: | Stat | Filter Type | |-------|--------------| | Weighted Size Ratio Mean | 0.0198 | | Size Ratio Mean | 0.0224 | | Size Ratio Std Deviation | 0.0202 | | Mean Element Size (bits) | 21.145 | | Approx Theoretical Min Element Size (bits) | 21.025 | Tree-SHA512: 2d045fbfc3fc45490ecb9b08d2f7e4dbbe7cd8c1c939f06bbdb8e8aacfe4c495cdb67c820e52520baebbf8a8305a0efd8e59d3fa8e367574a4b830509a39223f
2018-08-26 16:57:01 +02:00
BOOST_AUTO_TEST_CASE(bitstream_reader_writer)
{
CDataStream data(SER_NETWORK, INIT_PROTO_VERSION);
BitStreamWriter<CDataStream> bit_writer(data);
bit_writer.Write(0, 1);
bit_writer.Write(2, 2);
bit_writer.Write(6, 3);
bit_writer.Write(11, 4);
bit_writer.Write(1, 5);
bit_writer.Write(32, 6);
bit_writer.Write(7, 7);
bit_writer.Write(30497, 16);
bit_writer.Flush();
CDataStream data_copy(data);
uint32_t serialized_int1;
data >> serialized_int1;
BOOST_CHECK_EQUAL(serialized_int1, (uint32_t)0x7700C35A); // NOTE: Serialized as LE
uint16_t serialized_int2;
data >> serialized_int2;
BOOST_CHECK_EQUAL(serialized_int2, (uint16_t)0x1072); // NOTE: Serialized as LE
BitStreamReader<CDataStream> bit_reader(data_copy);
BOOST_CHECK_EQUAL(bit_reader.Read(1), 0);
BOOST_CHECK_EQUAL(bit_reader.Read(2), 2);
BOOST_CHECK_EQUAL(bit_reader.Read(3), 6);
BOOST_CHECK_EQUAL(bit_reader.Read(4), 11);
BOOST_CHECK_EQUAL(bit_reader.Read(5), 1);
BOOST_CHECK_EQUAL(bit_reader.Read(6), 32);
BOOST_CHECK_EQUAL(bit_reader.Read(7), 7);
BOOST_CHECK_EQUAL(bit_reader.Read(16), 30497);
BOOST_CHECK_THROW(bit_reader.Read(8), std::ios_base::failure);
}
BOOST_AUTO_TEST_CASE(streams_serializedata_xor)
{
std::vector<char> in;
std::vector<char> expected_xor;
std::vector<unsigned char> key;
CDataStream ds(in, 0, 0);
// Degenerate case
key.push_back('\x00');
key.push_back('\x00');
ds.Xor(key);
BOOST_CHECK_EQUAL(
std::string(expected_xor.begin(), expected_xor.end()),
std::string(ds.begin(), ds.end()));
in.push_back('\x0f');
in.push_back('\xf0');
expected_xor.push_back('\xf0');
expected_xor.push_back('\x0f');
// Single character key
ds.clear();
ds.insert(ds.begin(), in.begin(), in.end());
key.clear();
key.push_back('\xff');
ds.Xor(key);
BOOST_CHECK_EQUAL(
std::string(expected_xor.begin(), expected_xor.end()),
std::string(ds.begin(), ds.end()));
// Multi character key
in.clear();
expected_xor.clear();
in.push_back('\xf0');
in.push_back('\x0f');
expected_xor.push_back('\x0f');
expected_xor.push_back('\x00');
ds.clear();
ds.insert(ds.begin(), in.begin(), in.end());
key.clear();
key.push_back('\xff');
key.push_back('\x0f');
ds.Xor(key);
BOOST_CHECK_EQUAL(
std::string(expected_xor.begin(), expected_xor.end()),
std::string(ds.begin(), ds.end()));
}
Merge #16577: util: CBufferedFile fixes and unit test efd2474d17098c754367b844ec646ebececc7c74 util: CBufferedFile fixes (Larry Ruane) Pull request description: The `CBufferedFile` object guarantees its user is able to "rewind" the data stream (that's being read from a file) up to a certain number of bytes, as specified by the user in the constructor. This guarantee is not honored due to a bug in the `SetPos` method. Such rewinding is done in `LoadExternalBlockFile()` (currently the only user of this object), which deserializes a series of `CBlock` objects. If that function encounters something unexpected in the data stream, which is coming from a `blocks/blk00???.dat` file, it "rewinds" to an earlier position in the stream to try to get in sync again. The `CBufferedFile` object does not actually rewind its file offset; it simply repositions its internal offset, `nReadPos`, to an earlier position within the object's private buffer; this is why there's a limit to how far the user may rewind. If `LoadExternalBlockFile()` needs to rewind (call `blkdat.SetPos()`), the stream may not be positioned as it should be, causing errors in deserialization. This need to rewind is probably rare, which is likely why this bug hasn't been noticed already. But if this object is used elsewhere in the future, this could be a serious problem, especially as, due to the nature of the bug, the `SetPos()` _sometimes_ works. This PR adds a unit test for `CBufferedFile` that fails due to this bug. (Until now it has had no unit tests.) The unit test provides good documentation and examples for developers trying to understand `LoadExternalBlockFile()` and for future users of this object. This PR also adds code to throw an exception from the constructor if the rewind argument is not less than the buffer size (since that doesn't make any sense). Finally, I discovered that the object is too restrictive in one respect: When the deserialization methods call this object's `read` method, a check ensures that the number of bytes being requested is less than the size of the buffer (adjusting for the rewind size), else it throws an exception. This restriction is unnecessary; the object being deserialized can be larger than the buffer because multiple reads from disk can satisfy the request. ACKs for top commit: laanwj: ACK ~after squash.~ efd2474d17098c754367b844ec646ebececc7c74 mzumsande: I had intended to follow up earlier on my last comment, ACK efd2474d17098c754367b844ec646ebececc7c74. I reviewed the code, ran tests and did a successful reindex on testnet with this branch. Tree-SHA512: 695529e0af38bae2af4e0cc2895dda56a71b9059c3de04d32e09c0165a50f6aacee499f2042156ab5eaa6f0349bab6bcca4ef9f6f9ded4e60d4483beab7e4554
2019-09-26 13:37:06 +02:00
BOOST_AUTO_TEST_CASE(streams_buffered_file)
{
FILE* file = fsbridge::fopen("streams_test_tmp", "w+b");
// The value at each offset is the offset.
for (uint8_t j = 0; j < 40; ++j) {
fwrite(&j, 1, 1, file);
}
rewind(file);
// The buffer size (second arg) must be greater than the rewind
// amount (third arg).
try {
CBufferedFile bfbad(file, 25, 25, 222, 333);
BOOST_CHECK(false);
} catch (const std::exception& e) {
BOOST_CHECK(strstr(e.what(),
"Rewind limit must be less than buffer size") != nullptr);
}
// The buffer is 25 bytes, allow rewinding 10 bytes.
CBufferedFile bf(file, 25, 10, 222, 333);
BOOST_CHECK(!bf.eof());
// These two members have no functional effect.
BOOST_CHECK_EQUAL(bf.GetType(), 222);
BOOST_CHECK_EQUAL(bf.GetVersion(), 333);
uint8_t i;
bf >> i;
BOOST_CHECK_EQUAL(i, 0);
bf >> i;
BOOST_CHECK_EQUAL(i, 1);
// After reading bytes 0 and 1, we're positioned at 2.
BOOST_CHECK_EQUAL(bf.GetPos(), 2);
// Rewind to offset 0, ok (within the 10 byte window).
BOOST_CHECK(bf.SetPos(0));
bf >> i;
BOOST_CHECK_EQUAL(i, 0);
// We can go forward to where we've been, but beyond may fail.
BOOST_CHECK(bf.SetPos(2));
bf >> i;
BOOST_CHECK_EQUAL(i, 2);
// If you know the maximum number of bytes that should be
// read to deserialize the variable, you can limit the read
// extent. The current file offset is 3, so the following
// SetLimit() allows zero bytes to be read.
BOOST_CHECK(bf.SetLimit(3));
try {
bf >> i;
BOOST_CHECK(false);
} catch (const std::exception& e) {
BOOST_CHECK(strstr(e.what(),
"Read attempted past buffer limit") != nullptr);
}
// The default argument removes the limit completely.
BOOST_CHECK(bf.SetLimit());
// The read position should still be at 3 (no change).
BOOST_CHECK_EQUAL(bf.GetPos(), 3);
// Read from current offset, 3, forward until position 10.
for (uint8_t j = 3; j < 10; ++j) {
bf >> i;
BOOST_CHECK_EQUAL(i, j);
}
BOOST_CHECK_EQUAL(bf.GetPos(), 10);
// We're guaranteed (just barely) to be able to rewind to zero.
BOOST_CHECK(bf.SetPos(0));
BOOST_CHECK_EQUAL(bf.GetPos(), 0);
bf >> i;
BOOST_CHECK_EQUAL(i, 0);
// We can set the position forward again up to the farthest
// into the stream we've been, but no farther. (Attempting
// to go farther may succeed, but it's not guaranteed.)
BOOST_CHECK(bf.SetPos(10));
bf >> i;
BOOST_CHECK_EQUAL(i, 10);
BOOST_CHECK_EQUAL(bf.GetPos(), 11);
// Now it's only guaranteed that we can rewind to offset 1
// (current read position, 11, minus rewind amount, 10).
BOOST_CHECK(bf.SetPos(1));
BOOST_CHECK_EQUAL(bf.GetPos(), 1);
bf >> i;
BOOST_CHECK_EQUAL(i, 1);
// We can stream into large variables, even larger than
// the buffer size.
BOOST_CHECK(bf.SetPos(11));
{
uint8_t a[40 - 11];
bf >> a;
for (uint8_t j = 0; j < sizeof(a); ++j) {
BOOST_CHECK_EQUAL(a[j], 11 + j);
}
}
BOOST_CHECK_EQUAL(bf.GetPos(), 40);
// We've read the entire file, the next read should throw.
try {
bf >> i;
BOOST_CHECK(false);
} catch (const std::exception& e) {
BOOST_CHECK(strstr(e.what(),
"CBufferedFile::Fill: end of file") != nullptr);
}
// Attempting to read beyond the end sets the EOF indicator.
BOOST_CHECK(bf.eof());
// Still at offset 40, we can go back 10, to 30.
BOOST_CHECK_EQUAL(bf.GetPos(), 40);
BOOST_CHECK(bf.SetPos(30));
bf >> i;
BOOST_CHECK_EQUAL(i, 30);
BOOST_CHECK_EQUAL(bf.GetPos(), 31);
// We're too far to rewind to position zero.
BOOST_CHECK(!bf.SetPos(0));
// But we should now be positioned at least as far back as allowed
// by the rewind window (relative to our farthest read position, 40).
BOOST_CHECK(bf.GetPos() <= 30);
// We can explicitly close the file, or the destructor will do it.
bf.fclose();
fs::remove("streams_test_tmp");
}
BOOST_AUTO_TEST_CASE(streams_buffered_file_rand)
{
// Make this test deterministic.
SeedInsecureRand(true);
for (int rep = 0; rep < 50; ++rep) {
FILE* file = fsbridge::fopen("streams_test_tmp", "w+b");
size_t fileSize = InsecureRandRange(256);
for (uint8_t i = 0; i < fileSize; ++i) {
fwrite(&i, 1, 1, file);
}
rewind(file);
size_t bufSize = InsecureRandRange(300) + 1;
size_t rewindSize = InsecureRandRange(bufSize);
CBufferedFile bf(file, bufSize, rewindSize, 222, 333);
size_t currentPos = 0;
size_t maxPos = 0;
for (int step = 0; step < 100; ++step) {
if (currentPos >= fileSize)
break;
// We haven't read to the end of the file yet.
BOOST_CHECK(!bf.eof());
BOOST_CHECK_EQUAL(bf.GetPos(), currentPos);
// Pretend the file consists of a series of objects of varying
// sizes; the boundaries of the objects can interact arbitrarily
// with the CBufferFile's internal buffer. These first three
// cases simulate objects of various sizes (1, 2, 5 bytes).
switch (InsecureRandRange(5)) {
case 0: {
uint8_t a[1];
if (currentPos + 1 > fileSize)
continue;
bf.SetLimit(currentPos + 1);
bf >> a;
for (uint8_t i = 0; i < 1; ++i) {
BOOST_CHECK_EQUAL(a[i], currentPos);
currentPos++;
}
break;
}
case 1: {
uint8_t a[2];
if (currentPos + 2 > fileSize)
continue;
bf.SetLimit(currentPos + 2);
bf >> a;
for (uint8_t i = 0; i < 2; ++i) {
BOOST_CHECK_EQUAL(a[i], currentPos);
currentPos++;
}
break;
}
case 2: {
uint8_t a[5];
if (currentPos + 5 > fileSize)
continue;
bf.SetLimit(currentPos + 5);
bf >> a;
for (uint8_t i = 0; i < 5; ++i) {
BOOST_CHECK_EQUAL(a[i], currentPos);
currentPos++;
}
break;
}
case 3: {
// Find a byte value (that is at or ahead of the current position).
size_t find = currentPos + InsecureRandRange(8);
if (find >= fileSize)
find = fileSize - 1;
bf.FindByte(static_cast<char>(find));
// The value at each offset is the offset.
BOOST_CHECK_EQUAL(bf.GetPos(), find);
currentPos = find;
bf.SetLimit(currentPos + 1);
uint8_t i;
bf >> i;
BOOST_CHECK_EQUAL(i, currentPos);
currentPos++;
break;
}
case 4: {
size_t requestPos = InsecureRandRange(maxPos + 4);
bool okay = bf.SetPos(requestPos);
// The new position may differ from the requested position
// because we may not be able to rewind beyond the rewind
// window, and we may not be able to move forward beyond the
// farthest position we've reached so far.
currentPos = bf.GetPos();
BOOST_CHECK_EQUAL(okay, currentPos == requestPos);
// Check that we can position within the rewind window.
if (requestPos <= maxPos &&
maxPos > rewindSize &&
requestPos >= maxPos - rewindSize) {
// We requested a position within the rewind window.
BOOST_CHECK(okay);
}
break;
}
}
if (maxPos < currentPos)
maxPos = currentPos;
}
}
fs::remove("streams_test_tmp");
}
BOOST_AUTO_TEST_SUITE_END()