86b47fa741408b061ab0bda784b8678bfd7dfa88 speed up Unserialize_impl for prevector (Akio Nakamura)
Pull request description:
The unserializer for prevector uses `resize()` for reserve the area, but it's prefer to use `reserve()` because `resize()` have overhead to call its constructor many times.
However, `reserve()` does not change the value of `_size` (a private member of prevector).
This PR make the logic of read from stream to callback function, and prevector handles initilizing new values with that call-back and ajust the value of `_size`.
The changes are as follows:
1. prevector.h
Add a public member function named 'append'.
This function has 2 params, number of elemenst to append and call-back function that initilizing new appended values.
2. serialize.h
In the following two function:
- `Unserialize_impl(Stream& is, prevector<N, T>& v, const unsigned char&)`
- `Unserialize_impl(Stream& is, prevector<N, T>& v, const V&)`
Make a callback function from each original logic of reading values from stream, and call prevector's `append()`.
3. test/prevector_tests.cpp
Add a test for `append()`.
## A benchmark result is following:
[Machine]
MacBook Pro (macOS 10.13.3/i7 2.2GHz/mem 16GB/SSD)
[result]
DeserializeAndCheckBlockTest => 22% faster
DeserializeBlockTest => 29% faster
[before PR]
# Benchmark, evals, iterations, total, min, max, median
DeserializeAndCheckBlockTest, 60, 160, 94.4901, 0.0094644, 0.0104715, 0.0098339
DeserializeBlockTest, 60, 130, 65.0964, 0.00800362, 0.00895134, 0.00824187
[After PR]
# Benchmark, evals, iterations, total, min, max, median
DeserializeAndCheckBlockTest, 60, 160, 77.1597, 0.00767013, 0.00858959, 0.00805757
DeserializeBlockTest, 60, 130, 49.9443, 0.00613926, 0.00691187, 0.00635527
ACKs for top commit:
laanwj:
utACK 86b47fa741408b061ab0bda784b8678bfd7dfa88
Tree-SHA512: 62ea121ccd45a306fefc67485a1b03a853435af762607dae2426a87b15a3033d802c8556e1923727ddd1023a1837d0e5f6720c2c77b38196907e750e15fbb902
5aad635 Use memset() to optimize prevector::resize() (Evan Klitzke)
e46be25 Reduce redundant code of prevector and speed it up (Akio Nakamura)
f0e7aa7 Add new prevector benchmarks. (Evan Klitzke)
Pull request description:
This branch optimizes various `prevector` operations, especially resizing vectors. While profiling the `loadblk` thread I noticed that a lot of time was being spent in `prevector::resize()` which led to this work. I have some data here indicating that it takes up **37%** of the time in `ReadBlockFromDisk()`: https://monad.io/readblockfromdisk.svg
This branch improves things significantly. For trivial types, the new results for the prevector benchmark are:
* `PrevectorClearTrivial` which tests `prevector::clear()` becomes 24.6x faster
* `PrevectorDestructorTrivial` which tests `prevector::~prevector()` becomes 20.5x faster
* `PrevectorResizeTrivial` which tests `prevector::resize()` becomes 20.3x faster
Note that in practice it looks like the prevector is only used to contain `unsigned char` types, which is a trivial type. The benchmarks are testing a bit of an extreme case, but the changes here are motivated by the profiling data for `ReadBlockFromDisk()` I linked to above.
The pull request here consists of a series of three commits:
* The first adds new benchmarks but does not change the prevector code.
* The second is from @AkioNak , and merges some prevector optimizations he submitted in #11988
* The third optimizes `prevector::resize()` to use `memset()` when the prevector contains trivially constructible types
Tree-SHA512: 28f7cbb91a19f9f43b6a5942781d7eb2e3197389186b666f086b69df12bee37773140f765426d715bfb8ebff79cb27a5f1206d0325b54b4aa65598b50fb18368
* Add timeout params to wait_for*_chainlock methods
* Give chainlocks more time in specific case
* Add logs to llmq-chainlock.py
* Replace wait_for_chainlocked_tip_all_nodes with wait_for_chainlocked_block_all_nodes
wait_for_chainlocked_tip_all_nodes did wait for the tip of each individual
node, which would not necessarily be the same. We should only allow to
explicitly specify which block to wait for.
* Get rid of wait_for_chainlocked_tip
Same as with wait_for_chainlocked_tip_all_nodes
This speeds up assumevalid.py from 22s to 7s on my machine. On travis, this
should be an improvement of a few minutes. Without this, Travis actually
fails due to block download timeouts.
* scripted-diff: Rename `wait_for_chainlock*` test functions
-BEGIN VERIFY SCRIPT-
sed -i 's/wait_for_chainlock_tip_all_nodes(/wait_for_chainlocked_tip_all_nodes(/g' test/functional/*.py
sed -i 's/wait_for_chainlock_tip(/wait_for_chainlocked_tip(/g' test/functional/*.py
sed -i 's/wait_for_chainlock(/wait_for_chainlocked_block(/g' test/functional/*.py
sed -i 's/wait_for_chainlock /wait_for_chainlocked_block /g' test/functional/*.py
-END VERIFY SCRIPT-
* Move `wait_for_*chainlock*` functions from individual tests to DashTestFramework
* Use `wait_until` in most Dash-specific `wait_for*` functions instead of custom timers
63c16ed50770bc3d4f0ecd2ffa971fcfa0688494 Use __cpuid_count for gnu C to avoid gitian build fail. (Chun Kuan Lee)
Pull request description:
Fixes#13538
Tree-SHA512: 161ae4db022288ae8631a166eaea2d08cf2c90bcd27218a094a754276de30b92ca9cfb5a79aa899c5a9d0534c5d7261037e7e915e1b92bc7067ab1539dc2b51e
4207c1b35c configure: Initialise assembly enable_* variables (Luke Dashjr)
afe0875577 configure: Skip assembly support checks, when assembly is disabled (Luke Dashjr)
d8ab8dc12d configure: Invert --enable-asm help string since default is now enabled (Luke Dashjr)
Pull request description:
Fixes#13759
Also inverts the help (so it shows `--disable-asm` like other enabled-by-default options, and initialises the flag variables.
ACKs for commit 4207c1:
laanwj:
makes sense, utACK 4207c1b35c2e2ee1c9217cc7db3290a24c3b4b52
achow101:
utACK 4207c1b35c2e2ee1c9217cc7db3290a24c3b4b52
ken2812221:
ACK 4207c1b35c2e2ee1c9217cc7db3290a24c3b4b52
practicalswift:
tACK 4207c1b35c2e2ee1c9217cc7db3290a24c3b4b52
Tree-SHA512: a30be1008fd8f019db34073f78e90a3c4ad3767d88d7c20ebb83e99c7abc23552f7da3ac8bd20f727405799aff1ecb6044cf869653f8db70478a074d0b877e0a
66b2cf1ccfad545a8ec3f2a854e23f647322bf30 Use immintrin.h everywhere for intrinsics (Pieter Wuille)
4c935e2eee456ff66cdfb908b0edffdd1e8a6c04 Add SHA256 implementation using using Intel SHA intrinsics (Pieter Wuille)
268400d3188200c9e3dcd3482c4853354388a721 [Refactor] CPU feature detection logic for SHA256 (Pieter Wuille)
Pull request description:
Based on #13191.
This adds SHA256 implementations that use Intel's SHA Extension instructions (using intrinsics). This needs GCC 4.9 or Clang 3.4.
In addition to #13191, two extra implementations are provided:
* (a) A variable-length SHA256 implementation using SHA extensions.
* (b) A 2-way 64-byte input double-SHA256 implementation using SHA extensions.
Benchmarks for 9001-element Merkle tree root computation on an AMD Ryzen 1800X system:
* Using generic C++ code (pre-#10821): 6.1ms
* Using SSE4 (master, #10821): 4.6ms
* Using 4-way SSE4 specialized for 64-byte inputs (#13191): 2.8ms
* Using 8-way AVX2 specialized for 64-byte inputs (#13191): 2.1ms
* Using 2-way SHA-NI specialized for 64-byte inputs (this PR): 0.56ms
Benchmarks for 32-byte SHA256 on the same system:
* Using SSE4 (master, #10821): 190ns
* Using SHA-NI (this PR): 53ns
Benchmarks for 1000000-byte SHA256 on the same system:
* Using SSE4 (master, #10821): 2.5ms
* Using SHA-NI (this PR): 0.51ms
Tree-SHA512: 2b319e33b22579f815d91f9daf7994a5e1e799c4f73c13e15070dd54ba71f3f6438ccf77ae9cbd1ce76f972d9cbeb5f0edfea3d86f101bbc1055db70e42743b7
57ba401abcfe564a2c4d259e0f758401ed74616d Enable double-SHA256-for-64-byte code on 32-bit x86 (Pieter Wuille)
Pull request description:
The SSE4 and AVX2 double-SHA256-for-64-byte input code from #13191 compiles fine on 32-bit x86 systems, but the autodetection logic in sha256.cpp doesn't enable it. Fix this.
Note that these instruction sets are only available on CPUs that support 64-bit mode as well, so it is only beneficial in the (perhaps unlikely) scenario where a 64-bit CPU is running a 32-bit Bitcoin Core binary.
Tree-SHA512: 39d5963c1ba8c33932549d5fe98bd184932689a40aeba95043eca31dd6824f566197c546b60905555eccaf407408a5f0f200247bb0907450d309b0a70b245102
32d153fa360f73b4999701b97d55b12318fd2659 For AVX2 code, also check for AVX, XSAVE, and OS support (Pieter Wuille)
Pull request description:
Fixes#12903.
Tree-SHA512: 01e71efb5d3a43c49a145a5b1dc4fe7d0a491e1e78479e7df830a2aaac57c3dcfc316e28984c695206c76f93b68e4350fc037ca36756ca579b7070e39c835da2
1e1eb6367f67dcf968bb62993b98b5873b926fc0 Improve coverage of SHA256 SelfTest code (Pieter Wuille)
Pull request description:
The existing SelfTest code does not cover the specialized double-SHA256-for-64-byte-inputs transforms added in #13191. Fix this.
Tree-SHA512: 593c7ee5dc9e77fc4c89e0a7753a63529b0d3d32ddbc015ae3895b52be77bee8a80bf16b754b30a22c01625a68db83fb77fa945a543143542bebb5b0f017ec5b
f68049dd879c216d1e98b6635eec488f8e936ed4 crypto: cleanup sha256 build (Cory Fields)
Pull request description:
Requested by @sipa in #13386.
Rather than appending all possible cpu variants to all targets, create a convenience variable that encompasses all.
Tree-SHA512: 8e9ab2185515672b79bb7925afa4f3fbfe921bfcbe61456833d15457de4feba95290de17514344ce42ee81cc38b252476cd0c29432ac48c737c2225ed515a4bd
4defdfab94504018f822dc34a313ad26cedc8255 [MOVEONLY] Move unused Merkle branch code to tests (Pieter Wuille)
4437d6e1f3107a20a8c7b66be8b4b972a82e3b28 8-way AVX2 implementation for double SHA256 on 64-byte inputs (Pieter Wuille)
230294bf5fdeba7213471cd0b795fb7aa36e5717 4-way SSE4.1 implementation for double SHA256 on 64-byte inputs (Pieter Wuille)
1f0e7ca09c9d7c5787c218156fa5096a1bdf2ea8 Use SHA256D64 in Merkle root computation (Pieter Wuille)
d0c96328833127284574bfef26f96aa2e4afc91a Specialized double sha256 for 64 byte inputs (Pieter Wuille)
57f34630fb6c3e218bd19535ac607008cb894173 Refactor SHA256 code (Pieter Wuille)
0df017889b4f61860092e1d54e271092cce55f62 Benchmark Merkle root computation (Pieter Wuille)
Pull request description:
This introduces a framework for specialized double-SHA256 with 64 byte inputs. 4 different implementations are provided:
* Generic C++ (reusing the normal SHA256 code)
* Specialized C++ for 64-byte inputs, but no special instructions
* 4-way using SSE4.1 intrinsics
* 8-way using AVX2 intrinsics
On my own system (AVX2 capable), I get these benchmarks for computing the Merkle root of 9001 leaves (supported lengths / special instructions / parallellism):
* 7.2 ms with varsize/naive/1way (master, non-SSE4 hardware)
* 5.8 ms with size64/naive/1way (this PR, non-SSE4 capable systems)
* 4.8 ms with varsize/SSE4/1way (master, SSE4 hardware)
* 2.9 ms with size64/SSE4/4way (this PR, SSE4 hardware)
* 1.1 ms with size64/AVX2/8way (this PR, AVX2 hardware)
Tree-SHA512: efa32d48b32820d9ce788ead4eb583949265be8c2e5f538c94bc914e92d131a57f8c1ee26c6f998e81fb0e30675d4e2eddc3360bcf632676249036018cff343e
538cc0ca8 build: Mention use of asm in summary (Wladimir J. van der Laan)
ce5381e7f build: Rename --enable-experimental-asm to --enable-asm and enable by default (Wladimir J. van der Laan)
Pull request description:
Now that 0.15 is branched off, enable assembler SHA256 optimizations by default, but still allow disabling them, for example if something goes wrong with auto-detection on a platform.
Also add mention of the use of asm in the configure summary.
Tree-SHA512: cd20c497f65edd6b1e8b2cc3dfe82be11fcf4777543c830ccdec6c10f25eab4576b0f2953f3957736d7e04deaa4efca777aa84b12bb1cecb40c258e86c120ec8
* Cleanup p2p-instantsend.py
Bump mocktime before generating new blocks and generate a few blocks at the end of test_mempool_doublespend to clean things up
* Update test/functional/p2p-instantsend.py
Co-Authored-By: PastaPastaPasta <6443210+PastaPastaPasta@users.noreply.github.com>
* Fix `wait_for_instantlock` to make it fail if instantlock wasn't aquired, use `wait_until`
Currently it simply returns False if islock failed but that's not the way we use it (we never check results).
* Wait for txes to propagate before checking for instantlock
* Ignore recent rejects filter for locked txes
If we had a conflicting tx in the mempool before the locked tx arrived and the locked one arrived before the corresponding islock (i.e. we don't really know it's the one that should be included yet), the locked one is going to be rejected due to a mempool conflict. The old tx is going to be removed from the mempool by an incoming islock a bit later, however, we won't be able to re-request the locked tx until the tip changes because of the recentRejects filter. This patch fixes it.
* Add some explanation
* Remove check for mempool size in CInstantSendManager::CheckCanLock
This should not have been here at all and is already removed in develop.
Recent InstantSend failures on mainnet were partly related to this check.
* Skip autois tests for new IS system
* Remove LogPrints which have been commented out.
We have version control systems for a reason, if we want code to not run it should be removed. I personally see no value in keeping these around. I presume at one point they were spamming debug.log so we commented them out, but we really should have just removed them.
I believe all of this is dash specific code but any conflicts this does create are so minor they are not of concern imo.
Signed-off-by: Pasta <pasta@dashboost.org>
* remove a couple of extra comments
Signed-off-by: Pasta <pasta@dashboost.org>
* remove commented out code
Signed-off-by: Pasta <pasta@dashboost.org>
2f041f0e7 contrib/init: Update openrc-run filename (Luke Dashjr)
Pull request description:
OpenRC changed their program binary names in 2014 (3 years ago), and using the old names has loud warnings now
Tree-SHA512: 2b81802b21c32b8df6010142f9593c0b6cc814a052f83b7f5654f6885566e8dbcaf4da772145fa2cf5d94c16c2fb488c5d4879f71021407c4d7b3a3b7e7ed21e
c098c58 Wrap dumpwallet warning and note scripts aren't dumped (MeshCollider)
a38bfbc Add wallet backup text to import*, add* and dumpwallet RPCs (MeshCollider)
Pull request description:
Closes https://github.com/bitcoin/bitcoin/issues/11243
Adds "Requires a new wallet backup" text to `addwitnessaddress`, `importprivkey`, `importmulti`, `importaddress`, `importpubkey`, and `addmultisigaddress`. Also adds a warning to `dumpwallet` that backing up the seed alone is not sufficient to back up non-HD addresses
Tree-SHA512: 76d7cdca54d5b458acf479154620322391b889922525fddd6153f4164cfee393ad743757400cb8f6b1b30f24947df68ea9043b4e509f7df77a8fa05dda370933
4526d21 Add test for multiwallet batch RPC calls (Russell Yanofsky)
74182f2 Add missing batch rpc calls to python coverage logs (Russell Yanofsky)
505530c Add missing multiwallet rpc calls to python coverage logs (Russell Yanofsky)
9f67646 Make AuthServiceProxy._batch method usable (Russell Yanofsky)
e02007a Limit AuthServiceProxyWrapper.__getattr__ wrapping (Russell Yanofsky)
edafc71 Fix uninitialized URI in batch RPC requests (Russell Yanofsky)
Pull request description:
This fixes "Wallet file not specified" errors when making batch wallet RPC calls with more than one wallet loaded. This issue was reported by @NicolasDorier in https://github.com/bitcoin/bitcoin/issues/11257
Request URI is not used for anything except multiwallet request dispatching, so this change has no other effect.
Tree-SHA512: b3907af48a6323f864bb045ee2fa56b604188b835025ef82ba3d81673244c04228d796323cec208a676e7cd578a95ec7c7ba1e84d0158b93844d5dda8f6589b9
720d9e8fa [Wallet] always show help-line of wallet encryption calls (Jonas Schnelli)
Pull request description:
We do currently show/hide the wallet encryption RPC calls from the help if the current wallet.
In case of an encrypted wallet, `encryptwallet` is hidden and `walletpassphrasechange`, `walletpassphrasechange` and `walletlock` do appear in the help.
This is no longer ideal in case of multiwallet due to the fact that one may want help infos in order to target a specific wallet.
IMO its preferable to have a static help screen (show everything always). The currently show/hidden calls do handle the possible invalid encryption-state fine.
Fixes#11588
Tree-SHA512: 513fecd15248a31361f5143685e8cdeb63dfd3fa7120828917e1db54d936dc3db60d48ce46efa5c3a563a48157fe962689879856eeeed53f904686b12aec204e
d23be30 [verify-commits] Allow revoked keys to expire (Matt Corallo)
Pull request description:
This should fix verify-commits on master.
Tree-SHA512: 9bfca41fdfcdb11f6d07fcbc80a7b2de37706051e963292e0fbb4c608f146c87b65ab1e8395792197b4a7099e89fa045f278a60276672f6540b68d5e15b5a4a7
659b206 Make listsinceblock refuse unknown block hash (Russell Yanofsky)
Pull request description:
Change suggested by @theuni who noticed listsinceblock would ignore invalid block hashes causing it to return a completely unfiltered list of transactions.
Tree-SHA512: 3c8fb160265780d1334e856e853ab48e2e18372b8f1fc71ae480c3f45317048cc1fee0055d5c58031981a91b9c2bdbeb8e49a889d04ecba61729ce8109f2ce3f
fa81534 Add share/rpcuser to dist. source code archive (MarcoFalke)
Pull request description:
As the legacy rpcuser and rpcpassword are deprected since 0.12.0, we should actually include the script to generate the new auth pair in the distributed source code archive.
Ref: #6753
(Tagging for backport, since it is a trivial bugfix)
Tree-SHA512: f2737957a92396444573f41071a785be5fb318df9efeb3ade7e56b3b56d512e5f9ca36723365fe5be8aaee69c5e8d8ed1178510bf02186c848b3910ee001ecb9
3d1c311 Revert "travis: filter out pyenv" (Cory Fields)
a86e81b travis: move back to the minimal image (Cory Fields)
Pull request description:
The most recent update replaced the minimal image with a large one for the
'generic' image. Switching back to 'minimal' should reduce dependencies and
maybe speed us up some.
It should also eliminiate the need for aa2e0f09e.
Tree-SHA512: 0e5f3e97e8d97add07ea228bc5ce1e51e8e069950dbb2871a7eece297995f20b671afdf1c68211ce404cba3ba393d61dfef30ed54d46d6805fde9388f6b4455e
97932cd rpc: further constrain the libevent workaround (Cory Fields)
6b58360 rpc: work-around an upstream libevent bug (Cory Fields)
Pull request description:
A rare race condition may trigger while awaiting the body of a message.
This may fix some reported rpc hangs/crashes.
This work-around mimics what libevent does internally once a write has started, which is what usually happens, but not always due to the processing happening on a different thread: e7ff4ef2b4/http.c (L373)
Fixed upstream at: 5ff8eb2637
Tree-SHA512: b9fa97cae9da2a44101c5faf1e3be0b9cbdf722982d35541cf224be31430779c75e519c8ed18d06ab7487bfb1211069b28f22739f126d6c28ca62d3f73b79a52
6262915 Add unit test for stale tip checking (Suhas Daftuar)
83df257 Add CConnmanTest to mutate g_connman in tests (João Barbosa)
ac7b37c Connect to an extra outbound peer if our tip is stale (Suhas Daftuar)
db32a65 Track tip update time and last new block announcement from each peer (Suhas Daftuar)
2d4327d net: Allow connecting to extra outbound peers (Suhas Daftuar)
Pull request description:
This is an alternative approach to #11534. Rather than disconnect an outbound peer when our tip looks stale, instead try to connect to an additional outbound peer.
Periodically, check to see if we have more outbound peers than we target (ie if any extra peers are in use), and if so, disconnect the one that least recently announced a new block (breaking ties by choosing the newest peer that we connected to).
Tree-SHA512: 8f19e910e0bb36867f81783e020af225f356451899adfc7ade1895d6d3bd5afe51c83759610dfd10c62090c4fe404efa0283b2f63fde0bd7da898a1aaa7fb281
f3d4adf Make p2p-acceptablock not an extended test (Matt Corallo)
00dcda6 [qa] test that invalid blocks on an invalid chain get a disconnect (Matt Corallo)
015a525 Reject headers building on invalid chains by tracking invalidity (Matt Corallo)
932f118 Accept unrequested blocks with work equal to our tip (Matt Corallo)
3d9c70c Stop always storing blocks from whitelisted peers (Matt Corallo)
3b4ac43 Rewrite p2p-acceptblock in preparation for slight behavior changes (Matt Corallo)
Pull request description:
@sdaftuar pointed out that the version in #11487 was somewhat DoS-able as someone could feed you a valid chain that forked off the the last checkpoint block and force you to do lots of work just walking backwards across blocks for each new block they gave you. We came up with a few proposals but settled on the one implemented here as likely the simplest without obvious DoS issues. It uses our existing on-load mapBlockIndex walk to make sure everything that descends from an invalid block is marked as such, and then simply caches blocks which we attempted to connect but which were found to be invalid. To avoid DoS issues during IBD, this will need to depend on #11458.
Includes tests from #11487.
Tree-SHA512: 46aff8332908e122dae72ceb5fe8cd241902c2281a87f58a5fb486bf69d46458d84a096fdcb5f3e8e07fbcf7466232b10c429f4d67855425f11b38ac0bf612e1
2530bf2 net: Add missing lock in ProcessHeadersMessage(...) (practicalswift)
Pull request description:
Add missing lock in `ProcessHeadersMessage(...)`.
Reading the variable `mapBlockIndex` requires holding the mutex `cs_main`.
The new "Disconnect outbound peers relaying invalid headers" code added in commit 37886d5e2f and merged as part of #11568 two days ago did not lock `cs_main` prior to accessing `mapBlockIndex`.
Tree-SHA512: b799c234be8043d036183a00bc7867bbf3bd7ffe3baa94c88529da3b3cd0571c31ed11dadfaf29c5b8498341d6d0a3c928029a43b69f3267ef263682c91563a3