* Implement assign_to in prevector
* Implement optimized fill() methods for trivially constructible types in prevector
No need to invoke the "new" operator on every element when the elements
are trivially constructible (e.g. unsigned char)
* Benchmark prevector<...>::const_iterator vs vector.assign()
* Manually invoke ::memmove instead of relying on the stl
Some compilers do not automatically switch to memmove internally, so lets
do this manually.
* Use prevector::assign_to in benchmark
* Rename prevector benchmarks
* Use larger copy ranges in benchmarks
Co-authored-by: UdjinM6 <UdjinM6@users.noreply.github.com>
bc70ab5 Fix header guards using reserved identifiers (Dan Raviv)
Pull request description:
Identifiers beginning with an underscore followed immediately by an uppercase letter are reserved.
Tree-SHA512: 32b45e0aef6f6325bc3cbdea399532437490b753621149374df27e1c1eed6739ad1a09ae368e888cab8d01fb757f1b190c45a0854d2861de39a9296f17e29d9e
86b47fa741408b061ab0bda784b8678bfd7dfa88 speed up Unserialize_impl for prevector (Akio Nakamura)
Pull request description:
The unserializer for prevector uses `resize()` for reserve the area, but it's prefer to use `reserve()` because `resize()` have overhead to call its constructor many times.
However, `reserve()` does not change the value of `_size` (a private member of prevector).
This PR make the logic of read from stream to callback function, and prevector handles initilizing new values with that call-back and ajust the value of `_size`.
The changes are as follows:
1. prevector.h
Add a public member function named 'append'.
This function has 2 params, number of elemenst to append and call-back function that initilizing new appended values.
2. serialize.h
In the following two function:
- `Unserialize_impl(Stream& is, prevector<N, T>& v, const unsigned char&)`
- `Unserialize_impl(Stream& is, prevector<N, T>& v, const V&)`
Make a callback function from each original logic of reading values from stream, and call prevector's `append()`.
3. test/prevector_tests.cpp
Add a test for `append()`.
## A benchmark result is following:
[Machine]
MacBook Pro (macOS 10.13.3/i7 2.2GHz/mem 16GB/SSD)
[result]
DeserializeAndCheckBlockTest => 22% faster
DeserializeBlockTest => 29% faster
[before PR]
# Benchmark, evals, iterations, total, min, max, median
DeserializeAndCheckBlockTest, 60, 160, 94.4901, 0.0094644, 0.0104715, 0.0098339
DeserializeBlockTest, 60, 130, 65.0964, 0.00800362, 0.00895134, 0.00824187
[After PR]
# Benchmark, evals, iterations, total, min, max, median
DeserializeAndCheckBlockTest, 60, 160, 77.1597, 0.00767013, 0.00858959, 0.00805757
DeserializeBlockTest, 60, 130, 49.9443, 0.00613926, 0.00691187, 0.00635527
ACKs for top commit:
laanwj:
utACK 86b47fa741408b061ab0bda784b8678bfd7dfa88
Tree-SHA512: 62ea121ccd45a306fefc67485a1b03a853435af762607dae2426a87b15a3033d802c8556e1923727ddd1023a1837d0e5f6720c2c77b38196907e750e15fbb902
5aad635 Use memset() to optimize prevector::resize() (Evan Klitzke)
e46be25 Reduce redundant code of prevector and speed it up (Akio Nakamura)
f0e7aa7 Add new prevector benchmarks. (Evan Klitzke)
Pull request description:
This branch optimizes various `prevector` operations, especially resizing vectors. While profiling the `loadblk` thread I noticed that a lot of time was being spent in `prevector::resize()` which led to this work. I have some data here indicating that it takes up **37%** of the time in `ReadBlockFromDisk()`: https://monad.io/readblockfromdisk.svg
This branch improves things significantly. For trivial types, the new results for the prevector benchmark are:
* `PrevectorClearTrivial` which tests `prevector::clear()` becomes 24.6x faster
* `PrevectorDestructorTrivial` which tests `prevector::~prevector()` becomes 20.5x faster
* `PrevectorResizeTrivial` which tests `prevector::resize()` becomes 20.3x faster
Note that in practice it looks like the prevector is only used to contain `unsigned char` types, which is a trivial type. The benchmarks are testing a bit of an extreme case, but the changes here are motivated by the profiling data for `ReadBlockFromDisk()` I linked to above.
The pull request here consists of a series of three commits:
* The first adds new benchmarks but does not change the prevector code.
* The second is from @AkioNak , and merges some prevector optimizations he submitted in #11988
* The third optimizes `prevector::resize()` to use `memset()` when the prevector contains trivially constructible types
Tree-SHA512: 28f7cbb91a19f9f43b6a5942781d7eb2e3197389186b666f086b69df12bee37773140f765426d715bfb8ebff79cb27a5f1206d0325b54b4aa65598b50fb18368
45a5aaf Only call clear on prevector if it isn't trivially destructible and don't loop in clear (Jeremy Rubin)
aaa02e7 Add prevector destructor benchmark (Jeremy Rubin)
Tree-SHA512: 52bc8163b65b71310252f2d578349d0ddc364a6c23795c5e06e101f5449f04c96cbdca41c0cffb1974b984b8e33006471137d92b8dd4a81a98e922610a94132a
f00705a serialize: Deprecate `begin_ptr` / `end_ptr` (Wladimir J. van der Laan)
47314e6 prevector: add C++11-like data() method (Wladimir J. van der Laan)
swap was using an incorrect condition to determine when to apply an optimization
(not swapping the full direct[] when swapping two indirect prevectors).
Rather than correct the optimization I'm removing it for simplicity. Removing
this optimization minutely improves performance in the typical (currently only)
usage of member swap(), which is swapping with a freshly value-initialized
object.
Fixes a bug in which pop_back did not call the deleted item's destructor.
Using the most general erase() implementation to implement all the others
prevents similar bugs because the coupling between deallocation and destructor
invocation only needs to be maintained in one place.
Also reduces duplication of complex memmove logic.