2013-08-27 07:51:57 +02:00
// Copyright (c) 2009-2010 Satoshi Nakamoto
2023-08-16 19:27:31 +02:00
// Copyright (c) 2009-2020 The Bitcoin Core developers
2014-11-17 03:29:09 +01:00
// Distributed under the MIT software license, see the accompanying
2013-08-27 07:51:57 +02:00
// file COPYING or http://www.opensource.org/licenses/mit-license.php.
2020-03-19 23:46:56 +01:00
# include <txmempool.h>
# include <consensus/consensus.h>
# include <consensus/tx_verify.h>
# include <consensus/validation.h>
2021-02-08 20:36:33 +01:00
# include <hash.h>
2020-03-19 23:46:56 +01:00
# include <policy/fees.h>
2021-02-08 20:36:33 +01:00
# include <policy/policy.h>
2021-11-16 16:19:47 +01:00
# include <policy/settings.h>
2020-03-19 23:46:56 +01:00
# include <reverse_iterator.h>
2024-01-12 04:43:01 +01:00
# include <util/check.h>
2021-06-27 08:33:13 +02:00
# include <util/moneystr.h>
2021-02-08 20:36:33 +01:00
# include <util/system.h>
2021-06-27 08:33:13 +02:00
# include <util/time.h>
2021-02-08 20:36:33 +01:00
# include <validation.h>
2019-11-22 16:35:00 +01:00
# include <validationinterface.h>
2020-03-19 23:46:56 +01:00
# include <evo/specialtx.h>
2023-07-24 18:39:38 +02:00
# include <evo/assetlocktx.h>
2020-03-19 23:46:56 +01:00
# include <evo/providertx.h>
2021-04-16 05:41:16 +02:00
# include <evo/deterministicmns.h>
2021-10-02 19:32:24 +02:00
# include <llmq/instantsend.h>
2019-02-28 13:59:57 +01:00
2023-08-26 11:50:37 +02:00
# include <cmath>
2022-10-15 22:11:49 +02:00
# include <optional>
2024-09-08 18:08:41 +02:00
CTxMemPoolEntry : : CTxMemPoolEntry ( const CTransactionRef & tx , CAmount fee ,
int64_t time , unsigned int entry_height ,
bool spends_coinbase , int64_t sigops_count , LockPoints lp )
: tx { tx } ,
nFee { fee } ,
nTxSize ( tx - > GetTotalSize ( ) ) ,
nUsageSize { RecursiveDynamicUsage ( tx ) } ,
nTime { time } ,
entryHeight { entry_height } ,
spendsCoinbase { spends_coinbase } ,
sigOpCount { sigops_count } ,
lockPoints { lp } ,
nSizeWithDescendants { GetTxSize ( ) } ,
nModFeesWithDescendants { nFee } ,
nSizeWithAncestors { GetTxSize ( ) } ,
nModFeesWithAncestors { nFee } ,
nSigOpCountWithAncestors { sigOpCount } { }
2013-11-11 08:35:14 +01:00
2015-10-26 19:06:06 +01:00
void CTxMemPoolEntry : : UpdateFeeDelta ( int64_t newFeeDelta )
2014-03-17 13:19:54 +01:00
{
2015-11-19 17:18:28 +01:00
nModFeesWithDescendants + = newFeeDelta - feeDelta ;
2016-03-17 13:33:31 +01:00
nModFeesWithAncestors + = newFeeDelta - feeDelta ;
2015-10-26 19:06:06 +01:00
feeDelta = newFeeDelta ;
}
2014-03-17 13:19:54 +01:00
2015-12-04 21:01:22 +01:00
void CTxMemPoolEntry : : UpdateLockPoints ( const LockPoints & lp )
{
lockPoints = lp ;
}
2021-11-08 19:43:24 +01:00
size_t CTxMemPoolEntry : : GetTxSize ( ) const
{
return GetVirtualTransactionSize ( nTxSize , sigOpCount ) ;
}
2015-07-15 20:47:45 +02:00
// Update the given tx for any in-mempool descendants.
Merge #19478: Remove CTxMempool::mapLinks data structure member
296be8f58e02b39a58f017c52294aceed22c3ffd Get rid of unused functions CTxMemPool::GetMemPoolChildren, CTxMemPool::GetMemPoolParents (Jeremy Rubin)
46d955d196043cc297834baeebce31ff778dff80 Remove mapLinks in favor of entry inlined structs with iterator type erasure (Jeremy Rubin)
Pull request description:
Currently we have a peculiar data structure in the mempool called maplinks. Maplinks job is to track the in-pool children and parents of each transaction. This PR can be primarily understood and reviewed as a simple refactoring to remove this extra data structure, although it comes with a nice memory and performance improvement for free.
Maplinks is particularly peculiar because removing it is not as simple as just moving it's inner structure to the owning CTxMempoolEntry. Because TxLinks (the class storing the setEntries for parents and children) store txiters to each entry in the mempool corresponding to the parent or child, it means that the TxLinks type is "aware" of the boost multiindex (mapTx) it's coming from, which is in turn, aware of the entry type stored in mapTx. Thus we used maplinks to store this entry associated data we in an entirely separate data structure just to avoid a circular type reference caused by storing a txiter inside a CTxMempoolEntry.
It turns out, we can kill this circular reference by making use of iterator_to multiindex function and std::reference_wrapper. This allows us to get rid of the maplinks data structure and move the ownership of the parents/child sets to the entries themselves.
The benefit of this good all around, for any of the reasons given below the change would be acceptable, and it doesn't make the code harder to reason about or worse in any respect (as far as I can tell, there's no tradeoff).
### Simpler ownership model
No longer having to consistency check that mapLinks did have records for our CTxMempoolEntry, impossible to have a mapLinks entry outlive or incorrectly die before a CTxMempoolEntry.
### Memory Usage
We get rid of a O(Transactions) sized map in the mempool, which is a long lived data structure.
### Performance
If you have a CTxMemPoolEntry, you immediately know the address of it's children/parents, rather than having to do a O(log(Transactions)) lookup via maplinks (which we do very often). We do it in *so many* places that a true benchmark has to look at a full running node, but it is easy enough to show an improvement in this case.
The ComplexMemPool shows a good coherence check that we see the expected result of it being 12.5% faster / 1.14x faster.
```
Before:
# Benchmark, evals, iterations, total, min, max, median
ComplexMemPool, 5, 1, 1.40462, 0.277222, 0.285339, 0.279793
After:
# Benchmark, evals, iterations, total, min, max, median
ComplexMemPool, 5, 1, 1.22586, 0.243831, 0.247076, 0.244596
```
The ComplexMemPool benchmark only checks doing addUnchecked and TrimToSize for 800 transactions. While this bench does a good job of hammering the relevant types of function, it doesn't test everything.
Subbing in 5000 transactions shows a that the advantage isn't completely wiped out by other asymptotic factors (this isn't the only bottleneck in growing the mempool), but it's only a bit proportionally slower (10.8%, 1.12x), which adds evidence that this will be a good change for performance minded users.
```
# Benchmark, evals, iterations, total, min, max, median
ComplexMemPool, 5, 1, 59.1321, 11.5919, 12.235, 11.7068
# Benchmark, evals, iterations, total, min, max, median
ComplexMemPool, 5, 1, 52.1307, 10.2641, 10.5206, 10.4306
```
I don't think it's possible to come up with an example of where a maplinks based design would have better performance, but it's something for reviewers to consider.
# Discussion
## Why maplinks in the first place?
I spoke with the author of mapLinks (sdaftuar) a while back, and my recollection from our conversation was that it was implemented because he did not know how to resolve the circular dependency at the time, and there was no other reason for making it a separate map.
## Is iterator_to weird?
iterator_to is expressly for this purpose, see https://www.boost.org/doc/libs/1_51_0/libs/multi_index/doc/tutorial/indices.html#iterator_to
> iterator_to provides a way to retrieve an iterator to an element from a pointer to the element, thus making iterators and pointers interchangeable for the purposes of element pointing (not so for traversal) in many situations. This notwithstanding, it is not the aim of iterator_to to promote the usage of pointers as substitutes for real iterators: the latter are specifically designed for handling the elements of a container, and not only benefit from the iterator orientation of container interfaces, but are also capable of exposing many more programming bugs than raw pointers, both at compile and run time. iterator_to is thus meant to be used in scenarios where access via iterators is not suitable or desireable:
>
> - Interoperability with preexisting APIs based on pointers or references.
> - Publication of pointer-based interfaces (for instance, when designing a C-compatible library).
> - The exposure of pointers in place of iterators can act as a type erasure barrier effectively decoupling the user of the code from the implementation detail of which particular container is being used. Similar techniques, like the famous Pimpl idiom, are used in large projects to reduce dependencies and build times.
> - Self-referencing contexts where an element acts upon its owner container and no iterator to itself is available.
In other words, iterator_to is the perfect tool for the job by the last reason given. Under the hood it should just be a simple pointer cast and have no major runtime overhead (depending on if the function call is inlined).
Edit by laanwj: removed at sign from the description
ACKs for top commit:
jonatack:
re-ACK 296be8f per `git range-diff ab338a19 3ba1665 296be8f`, sanity check gcc 10.2 debug build is clean.
hebasto:
re-ACK 296be8f58e02b39a58f017c52294aceed22c3ffd, only rebased since my [previous](https://github.com/bitcoin/bitcoin/pull/19478#pullrequestreview-482400727) review (verified with `git range-diff`).
Tree-SHA512: f5c30a4936fcde6ae32a02823c303b3568a747c2681d11f87df88a149f984a6d3b4c81f391859afbeb68864ef7f6a3d8779f74a58e3de701b3d51f78e498682e
2020-09-07 12:06:38 +02:00
// Assumes that CTxMemPool::m_children is correct for the given tx and all
2015-07-15 20:47:45 +02:00
// descendants.
2016-03-17 13:33:31 +01:00
void CTxMemPool : : UpdateForDescendants ( txiter updateIt , cacheMap & cachedDescendants , const std : : set < uint256 > & setExclude )
2015-07-15 20:47:45 +02:00
{
Merge #19478: Remove CTxMempool::mapLinks data structure member
296be8f58e02b39a58f017c52294aceed22c3ffd Get rid of unused functions CTxMemPool::GetMemPoolChildren, CTxMemPool::GetMemPoolParents (Jeremy Rubin)
46d955d196043cc297834baeebce31ff778dff80 Remove mapLinks in favor of entry inlined structs with iterator type erasure (Jeremy Rubin)
Pull request description:
Currently we have a peculiar data structure in the mempool called maplinks. Maplinks job is to track the in-pool children and parents of each transaction. This PR can be primarily understood and reviewed as a simple refactoring to remove this extra data structure, although it comes with a nice memory and performance improvement for free.
Maplinks is particularly peculiar because removing it is not as simple as just moving it's inner structure to the owning CTxMempoolEntry. Because TxLinks (the class storing the setEntries for parents and children) store txiters to each entry in the mempool corresponding to the parent or child, it means that the TxLinks type is "aware" of the boost multiindex (mapTx) it's coming from, which is in turn, aware of the entry type stored in mapTx. Thus we used maplinks to store this entry associated data we in an entirely separate data structure just to avoid a circular type reference caused by storing a txiter inside a CTxMempoolEntry.
It turns out, we can kill this circular reference by making use of iterator_to multiindex function and std::reference_wrapper. This allows us to get rid of the maplinks data structure and move the ownership of the parents/child sets to the entries themselves.
The benefit of this good all around, for any of the reasons given below the change would be acceptable, and it doesn't make the code harder to reason about or worse in any respect (as far as I can tell, there's no tradeoff).
### Simpler ownership model
No longer having to consistency check that mapLinks did have records for our CTxMempoolEntry, impossible to have a mapLinks entry outlive or incorrectly die before a CTxMempoolEntry.
### Memory Usage
We get rid of a O(Transactions) sized map in the mempool, which is a long lived data structure.
### Performance
If you have a CTxMemPoolEntry, you immediately know the address of it's children/parents, rather than having to do a O(log(Transactions)) lookup via maplinks (which we do very often). We do it in *so many* places that a true benchmark has to look at a full running node, but it is easy enough to show an improvement in this case.
The ComplexMemPool shows a good coherence check that we see the expected result of it being 12.5% faster / 1.14x faster.
```
Before:
# Benchmark, evals, iterations, total, min, max, median
ComplexMemPool, 5, 1, 1.40462, 0.277222, 0.285339, 0.279793
After:
# Benchmark, evals, iterations, total, min, max, median
ComplexMemPool, 5, 1, 1.22586, 0.243831, 0.247076, 0.244596
```
The ComplexMemPool benchmark only checks doing addUnchecked and TrimToSize for 800 transactions. While this bench does a good job of hammering the relevant types of function, it doesn't test everything.
Subbing in 5000 transactions shows a that the advantage isn't completely wiped out by other asymptotic factors (this isn't the only bottleneck in growing the mempool), but it's only a bit proportionally slower (10.8%, 1.12x), which adds evidence that this will be a good change for performance minded users.
```
# Benchmark, evals, iterations, total, min, max, median
ComplexMemPool, 5, 1, 59.1321, 11.5919, 12.235, 11.7068
# Benchmark, evals, iterations, total, min, max, median
ComplexMemPool, 5, 1, 52.1307, 10.2641, 10.5206, 10.4306
```
I don't think it's possible to come up with an example of where a maplinks based design would have better performance, but it's something for reviewers to consider.
# Discussion
## Why maplinks in the first place?
I spoke with the author of mapLinks (sdaftuar) a while back, and my recollection from our conversation was that it was implemented because he did not know how to resolve the circular dependency at the time, and there was no other reason for making it a separate map.
## Is iterator_to weird?
iterator_to is expressly for this purpose, see https://www.boost.org/doc/libs/1_51_0/libs/multi_index/doc/tutorial/indices.html#iterator_to
> iterator_to provides a way to retrieve an iterator to an element from a pointer to the element, thus making iterators and pointers interchangeable for the purposes of element pointing (not so for traversal) in many situations. This notwithstanding, it is not the aim of iterator_to to promote the usage of pointers as substitutes for real iterators: the latter are specifically designed for handling the elements of a container, and not only benefit from the iterator orientation of container interfaces, but are also capable of exposing many more programming bugs than raw pointers, both at compile and run time. iterator_to is thus meant to be used in scenarios where access via iterators is not suitable or desireable:
>
> - Interoperability with preexisting APIs based on pointers or references.
> - Publication of pointer-based interfaces (for instance, when designing a C-compatible library).
> - The exposure of pointers in place of iterators can act as a type erasure barrier effectively decoupling the user of the code from the implementation detail of which particular container is being used. Similar techniques, like the famous Pimpl idiom, are used in large projects to reduce dependencies and build times.
> - Self-referencing contexts where an element acts upon its owner container and no iterator to itself is available.
In other words, iterator_to is the perfect tool for the job by the last reason given. Under the hood it should just be a simple pointer cast and have no major runtime overhead (depending on if the function call is inlined).
Edit by laanwj: removed at sign from the description
ACKs for top commit:
jonatack:
re-ACK 296be8f per `git range-diff ab338a19 3ba1665 296be8f`, sanity check gcc 10.2 debug build is clean.
hebasto:
re-ACK 296be8f58e02b39a58f017c52294aceed22c3ffd, only rebased since my [previous](https://github.com/bitcoin/bitcoin/pull/19478#pullrequestreview-482400727) review (verified with `git range-diff`).
Tree-SHA512: f5c30a4936fcde6ae32a02823c303b3568a747c2681d11f87df88a149f984a6d3b4c81f391859afbeb68864ef7f6a3d8779f74a58e3de701b3d51f78e498682e
2020-09-07 12:06:38 +02:00
CTxMemPoolEntry : : Children stageEntries , descendants ;
stageEntries = updateIt - > GetMemPoolChildrenConst ( ) ;
2014-03-17 13:19:54 +01:00
2015-07-15 20:47:45 +02:00
while ( ! stageEntries . empty ( ) ) {
Merge #19478: Remove CTxMempool::mapLinks data structure member
296be8f58e02b39a58f017c52294aceed22c3ffd Get rid of unused functions CTxMemPool::GetMemPoolChildren, CTxMemPool::GetMemPoolParents (Jeremy Rubin)
46d955d196043cc297834baeebce31ff778dff80 Remove mapLinks in favor of entry inlined structs with iterator type erasure (Jeremy Rubin)
Pull request description:
Currently we have a peculiar data structure in the mempool called maplinks. Maplinks job is to track the in-pool children and parents of each transaction. This PR can be primarily understood and reviewed as a simple refactoring to remove this extra data structure, although it comes with a nice memory and performance improvement for free.
Maplinks is particularly peculiar because removing it is not as simple as just moving it's inner structure to the owning CTxMempoolEntry. Because TxLinks (the class storing the setEntries for parents and children) store txiters to each entry in the mempool corresponding to the parent or child, it means that the TxLinks type is "aware" of the boost multiindex (mapTx) it's coming from, which is in turn, aware of the entry type stored in mapTx. Thus we used maplinks to store this entry associated data we in an entirely separate data structure just to avoid a circular type reference caused by storing a txiter inside a CTxMempoolEntry.
It turns out, we can kill this circular reference by making use of iterator_to multiindex function and std::reference_wrapper. This allows us to get rid of the maplinks data structure and move the ownership of the parents/child sets to the entries themselves.
The benefit of this good all around, for any of the reasons given below the change would be acceptable, and it doesn't make the code harder to reason about or worse in any respect (as far as I can tell, there's no tradeoff).
### Simpler ownership model
No longer having to consistency check that mapLinks did have records for our CTxMempoolEntry, impossible to have a mapLinks entry outlive or incorrectly die before a CTxMempoolEntry.
### Memory Usage
We get rid of a O(Transactions) sized map in the mempool, which is a long lived data structure.
### Performance
If you have a CTxMemPoolEntry, you immediately know the address of it's children/parents, rather than having to do a O(log(Transactions)) lookup via maplinks (which we do very often). We do it in *so many* places that a true benchmark has to look at a full running node, but it is easy enough to show an improvement in this case.
The ComplexMemPool shows a good coherence check that we see the expected result of it being 12.5% faster / 1.14x faster.
```
Before:
# Benchmark, evals, iterations, total, min, max, median
ComplexMemPool, 5, 1, 1.40462, 0.277222, 0.285339, 0.279793
After:
# Benchmark, evals, iterations, total, min, max, median
ComplexMemPool, 5, 1, 1.22586, 0.243831, 0.247076, 0.244596
```
The ComplexMemPool benchmark only checks doing addUnchecked and TrimToSize for 800 transactions. While this bench does a good job of hammering the relevant types of function, it doesn't test everything.
Subbing in 5000 transactions shows a that the advantage isn't completely wiped out by other asymptotic factors (this isn't the only bottleneck in growing the mempool), but it's only a bit proportionally slower (10.8%, 1.12x), which adds evidence that this will be a good change for performance minded users.
```
# Benchmark, evals, iterations, total, min, max, median
ComplexMemPool, 5, 1, 59.1321, 11.5919, 12.235, 11.7068
# Benchmark, evals, iterations, total, min, max, median
ComplexMemPool, 5, 1, 52.1307, 10.2641, 10.5206, 10.4306
```
I don't think it's possible to come up with an example of where a maplinks based design would have better performance, but it's something for reviewers to consider.
# Discussion
## Why maplinks in the first place?
I spoke with the author of mapLinks (sdaftuar) a while back, and my recollection from our conversation was that it was implemented because he did not know how to resolve the circular dependency at the time, and there was no other reason for making it a separate map.
## Is iterator_to weird?
iterator_to is expressly for this purpose, see https://www.boost.org/doc/libs/1_51_0/libs/multi_index/doc/tutorial/indices.html#iterator_to
> iterator_to provides a way to retrieve an iterator to an element from a pointer to the element, thus making iterators and pointers interchangeable for the purposes of element pointing (not so for traversal) in many situations. This notwithstanding, it is not the aim of iterator_to to promote the usage of pointers as substitutes for real iterators: the latter are specifically designed for handling the elements of a container, and not only benefit from the iterator orientation of container interfaces, but are also capable of exposing many more programming bugs than raw pointers, both at compile and run time. iterator_to is thus meant to be used in scenarios where access via iterators is not suitable or desireable:
>
> - Interoperability with preexisting APIs based on pointers or references.
> - Publication of pointer-based interfaces (for instance, when designing a C-compatible library).
> - The exposure of pointers in place of iterators can act as a type erasure barrier effectively decoupling the user of the code from the implementation detail of which particular container is being used. Similar techniques, like the famous Pimpl idiom, are used in large projects to reduce dependencies and build times.
> - Self-referencing contexts where an element acts upon its owner container and no iterator to itself is available.
In other words, iterator_to is the perfect tool for the job by the last reason given. Under the hood it should just be a simple pointer cast and have no major runtime overhead (depending on if the function call is inlined).
Edit by laanwj: removed at sign from the description
ACKs for top commit:
jonatack:
re-ACK 296be8f per `git range-diff ab338a19 3ba1665 296be8f`, sanity check gcc 10.2 debug build is clean.
hebasto:
re-ACK 296be8f58e02b39a58f017c52294aceed22c3ffd, only rebased since my [previous](https://github.com/bitcoin/bitcoin/pull/19478#pullrequestreview-482400727) review (verified with `git range-diff`).
Tree-SHA512: f5c30a4936fcde6ae32a02823c303b3568a747c2681d11f87df88a149f984a6d3b4c81f391859afbeb68864ef7f6a3d8779f74a58e3de701b3d51f78e498682e
2020-09-07 12:06:38 +02:00
const CTxMemPoolEntry & descendant = * stageEntries . begin ( ) ;
descendants . insert ( descendant ) ;
stageEntries . erase ( descendant ) ;
const CTxMemPoolEntry : : Children & children = descendant . GetMemPoolChildrenConst ( ) ;
for ( const CTxMemPoolEntry & childEntry : children ) {
cacheMap : : iterator cacheIt = cachedDescendants . find ( mapTx . iterator_to ( childEntry ) ) ;
2015-07-15 20:47:45 +02:00
if ( cacheIt ! = cachedDescendants . end ( ) ) {
// We've already calculated this one, just add the entries for this set
// but don't traverse again.
2018-06-24 16:34:56 +02:00
for ( txiter cacheEntry : cacheIt - > second ) {
Merge #19478: Remove CTxMempool::mapLinks data structure member
296be8f58e02b39a58f017c52294aceed22c3ffd Get rid of unused functions CTxMemPool::GetMemPoolChildren, CTxMemPool::GetMemPoolParents (Jeremy Rubin)
46d955d196043cc297834baeebce31ff778dff80 Remove mapLinks in favor of entry inlined structs with iterator type erasure (Jeremy Rubin)
Pull request description:
Currently we have a peculiar data structure in the mempool called maplinks. Maplinks job is to track the in-pool children and parents of each transaction. This PR can be primarily understood and reviewed as a simple refactoring to remove this extra data structure, although it comes with a nice memory and performance improvement for free.
Maplinks is particularly peculiar because removing it is not as simple as just moving it's inner structure to the owning CTxMempoolEntry. Because TxLinks (the class storing the setEntries for parents and children) store txiters to each entry in the mempool corresponding to the parent or child, it means that the TxLinks type is "aware" of the boost multiindex (mapTx) it's coming from, which is in turn, aware of the entry type stored in mapTx. Thus we used maplinks to store this entry associated data we in an entirely separate data structure just to avoid a circular type reference caused by storing a txiter inside a CTxMempoolEntry.
It turns out, we can kill this circular reference by making use of iterator_to multiindex function and std::reference_wrapper. This allows us to get rid of the maplinks data structure and move the ownership of the parents/child sets to the entries themselves.
The benefit of this good all around, for any of the reasons given below the change would be acceptable, and it doesn't make the code harder to reason about or worse in any respect (as far as I can tell, there's no tradeoff).
### Simpler ownership model
No longer having to consistency check that mapLinks did have records for our CTxMempoolEntry, impossible to have a mapLinks entry outlive or incorrectly die before a CTxMempoolEntry.
### Memory Usage
We get rid of a O(Transactions) sized map in the mempool, which is a long lived data structure.
### Performance
If you have a CTxMemPoolEntry, you immediately know the address of it's children/parents, rather than having to do a O(log(Transactions)) lookup via maplinks (which we do very often). We do it in *so many* places that a true benchmark has to look at a full running node, but it is easy enough to show an improvement in this case.
The ComplexMemPool shows a good coherence check that we see the expected result of it being 12.5% faster / 1.14x faster.
```
Before:
# Benchmark, evals, iterations, total, min, max, median
ComplexMemPool, 5, 1, 1.40462, 0.277222, 0.285339, 0.279793
After:
# Benchmark, evals, iterations, total, min, max, median
ComplexMemPool, 5, 1, 1.22586, 0.243831, 0.247076, 0.244596
```
The ComplexMemPool benchmark only checks doing addUnchecked and TrimToSize for 800 transactions. While this bench does a good job of hammering the relevant types of function, it doesn't test everything.
Subbing in 5000 transactions shows a that the advantage isn't completely wiped out by other asymptotic factors (this isn't the only bottleneck in growing the mempool), but it's only a bit proportionally slower (10.8%, 1.12x), which adds evidence that this will be a good change for performance minded users.
```
# Benchmark, evals, iterations, total, min, max, median
ComplexMemPool, 5, 1, 59.1321, 11.5919, 12.235, 11.7068
# Benchmark, evals, iterations, total, min, max, median
ComplexMemPool, 5, 1, 52.1307, 10.2641, 10.5206, 10.4306
```
I don't think it's possible to come up with an example of where a maplinks based design would have better performance, but it's something for reviewers to consider.
# Discussion
## Why maplinks in the first place?
I spoke with the author of mapLinks (sdaftuar) a while back, and my recollection from our conversation was that it was implemented because he did not know how to resolve the circular dependency at the time, and there was no other reason for making it a separate map.
## Is iterator_to weird?
iterator_to is expressly for this purpose, see https://www.boost.org/doc/libs/1_51_0/libs/multi_index/doc/tutorial/indices.html#iterator_to
> iterator_to provides a way to retrieve an iterator to an element from a pointer to the element, thus making iterators and pointers interchangeable for the purposes of element pointing (not so for traversal) in many situations. This notwithstanding, it is not the aim of iterator_to to promote the usage of pointers as substitutes for real iterators: the latter are specifically designed for handling the elements of a container, and not only benefit from the iterator orientation of container interfaces, but are also capable of exposing many more programming bugs than raw pointers, both at compile and run time. iterator_to is thus meant to be used in scenarios where access via iterators is not suitable or desireable:
>
> - Interoperability with preexisting APIs based on pointers or references.
> - Publication of pointer-based interfaces (for instance, when designing a C-compatible library).
> - The exposure of pointers in place of iterators can act as a type erasure barrier effectively decoupling the user of the code from the implementation detail of which particular container is being used. Similar techniques, like the famous Pimpl idiom, are used in large projects to reduce dependencies and build times.
> - Self-referencing contexts where an element acts upon its owner container and no iterator to itself is available.
In other words, iterator_to is the perfect tool for the job by the last reason given. Under the hood it should just be a simple pointer cast and have no major runtime overhead (depending on if the function call is inlined).
Edit by laanwj: removed at sign from the description
ACKs for top commit:
jonatack:
re-ACK 296be8f per `git range-diff ab338a19 3ba1665 296be8f`, sanity check gcc 10.2 debug build is clean.
hebasto:
re-ACK 296be8f58e02b39a58f017c52294aceed22c3ffd, only rebased since my [previous](https://github.com/bitcoin/bitcoin/pull/19478#pullrequestreview-482400727) review (verified with `git range-diff`).
Tree-SHA512: f5c30a4936fcde6ae32a02823c303b3568a747c2681d11f87df88a149f984a6d3b4c81f391859afbeb68864ef7f6a3d8779f74a58e3de701b3d51f78e498682e
2020-09-07 12:06:38 +02:00
descendants . insert ( * cacheEntry ) ;
2015-07-15 20:47:45 +02:00
}
Merge #19478: Remove CTxMempool::mapLinks data structure member
296be8f58e02b39a58f017c52294aceed22c3ffd Get rid of unused functions CTxMemPool::GetMemPoolChildren, CTxMemPool::GetMemPoolParents (Jeremy Rubin)
46d955d196043cc297834baeebce31ff778dff80 Remove mapLinks in favor of entry inlined structs with iterator type erasure (Jeremy Rubin)
Pull request description:
Currently we have a peculiar data structure in the mempool called maplinks. Maplinks job is to track the in-pool children and parents of each transaction. This PR can be primarily understood and reviewed as a simple refactoring to remove this extra data structure, although it comes with a nice memory and performance improvement for free.
Maplinks is particularly peculiar because removing it is not as simple as just moving it's inner structure to the owning CTxMempoolEntry. Because TxLinks (the class storing the setEntries for parents and children) store txiters to each entry in the mempool corresponding to the parent or child, it means that the TxLinks type is "aware" of the boost multiindex (mapTx) it's coming from, which is in turn, aware of the entry type stored in mapTx. Thus we used maplinks to store this entry associated data we in an entirely separate data structure just to avoid a circular type reference caused by storing a txiter inside a CTxMempoolEntry.
It turns out, we can kill this circular reference by making use of iterator_to multiindex function and std::reference_wrapper. This allows us to get rid of the maplinks data structure and move the ownership of the parents/child sets to the entries themselves.
The benefit of this good all around, for any of the reasons given below the change would be acceptable, and it doesn't make the code harder to reason about or worse in any respect (as far as I can tell, there's no tradeoff).
### Simpler ownership model
No longer having to consistency check that mapLinks did have records for our CTxMempoolEntry, impossible to have a mapLinks entry outlive or incorrectly die before a CTxMempoolEntry.
### Memory Usage
We get rid of a O(Transactions) sized map in the mempool, which is a long lived data structure.
### Performance
If you have a CTxMemPoolEntry, you immediately know the address of it's children/parents, rather than having to do a O(log(Transactions)) lookup via maplinks (which we do very often). We do it in *so many* places that a true benchmark has to look at a full running node, but it is easy enough to show an improvement in this case.
The ComplexMemPool shows a good coherence check that we see the expected result of it being 12.5% faster / 1.14x faster.
```
Before:
# Benchmark, evals, iterations, total, min, max, median
ComplexMemPool, 5, 1, 1.40462, 0.277222, 0.285339, 0.279793
After:
# Benchmark, evals, iterations, total, min, max, median
ComplexMemPool, 5, 1, 1.22586, 0.243831, 0.247076, 0.244596
```
The ComplexMemPool benchmark only checks doing addUnchecked and TrimToSize for 800 transactions. While this bench does a good job of hammering the relevant types of function, it doesn't test everything.
Subbing in 5000 transactions shows a that the advantage isn't completely wiped out by other asymptotic factors (this isn't the only bottleneck in growing the mempool), but it's only a bit proportionally slower (10.8%, 1.12x), which adds evidence that this will be a good change for performance minded users.
```
# Benchmark, evals, iterations, total, min, max, median
ComplexMemPool, 5, 1, 59.1321, 11.5919, 12.235, 11.7068
# Benchmark, evals, iterations, total, min, max, median
ComplexMemPool, 5, 1, 52.1307, 10.2641, 10.5206, 10.4306
```
I don't think it's possible to come up with an example of where a maplinks based design would have better performance, but it's something for reviewers to consider.
# Discussion
## Why maplinks in the first place?
I spoke with the author of mapLinks (sdaftuar) a while back, and my recollection from our conversation was that it was implemented because he did not know how to resolve the circular dependency at the time, and there was no other reason for making it a separate map.
## Is iterator_to weird?
iterator_to is expressly for this purpose, see https://www.boost.org/doc/libs/1_51_0/libs/multi_index/doc/tutorial/indices.html#iterator_to
> iterator_to provides a way to retrieve an iterator to an element from a pointer to the element, thus making iterators and pointers interchangeable for the purposes of element pointing (not so for traversal) in many situations. This notwithstanding, it is not the aim of iterator_to to promote the usage of pointers as substitutes for real iterators: the latter are specifically designed for handling the elements of a container, and not only benefit from the iterator orientation of container interfaces, but are also capable of exposing many more programming bugs than raw pointers, both at compile and run time. iterator_to is thus meant to be used in scenarios where access via iterators is not suitable or desireable:
>
> - Interoperability with preexisting APIs based on pointers or references.
> - Publication of pointer-based interfaces (for instance, when designing a C-compatible library).
> - The exposure of pointers in place of iterators can act as a type erasure barrier effectively decoupling the user of the code from the implementation detail of which particular container is being used. Similar techniques, like the famous Pimpl idiom, are used in large projects to reduce dependencies and build times.
> - Self-referencing contexts where an element acts upon its owner container and no iterator to itself is available.
In other words, iterator_to is the perfect tool for the job by the last reason given. Under the hood it should just be a simple pointer cast and have no major runtime overhead (depending on if the function call is inlined).
Edit by laanwj: removed at sign from the description
ACKs for top commit:
jonatack:
re-ACK 296be8f per `git range-diff ab338a19 3ba1665 296be8f`, sanity check gcc 10.2 debug build is clean.
hebasto:
re-ACK 296be8f58e02b39a58f017c52294aceed22c3ffd, only rebased since my [previous](https://github.com/bitcoin/bitcoin/pull/19478#pullrequestreview-482400727) review (verified with `git range-diff`).
Tree-SHA512: f5c30a4936fcde6ae32a02823c303b3568a747c2681d11f87df88a149f984a6d3b4c81f391859afbeb68864ef7f6a3d8779f74a58e3de701b3d51f78e498682e
2020-09-07 12:06:38 +02:00
} else if ( ! descendants . count ( childEntry ) ) {
2016-03-17 13:33:31 +01:00
// Schedule for later processing
stageEntries . insert ( childEntry ) ;
2015-07-15 20:47:45 +02:00
}
2014-03-17 13:19:54 +01:00
}
}
Merge #19478: Remove CTxMempool::mapLinks data structure member
296be8f58e02b39a58f017c52294aceed22c3ffd Get rid of unused functions CTxMemPool::GetMemPoolChildren, CTxMemPool::GetMemPoolParents (Jeremy Rubin)
46d955d196043cc297834baeebce31ff778dff80 Remove mapLinks in favor of entry inlined structs with iterator type erasure (Jeremy Rubin)
Pull request description:
Currently we have a peculiar data structure in the mempool called maplinks. Maplinks job is to track the in-pool children and parents of each transaction. This PR can be primarily understood and reviewed as a simple refactoring to remove this extra data structure, although it comes with a nice memory and performance improvement for free.
Maplinks is particularly peculiar because removing it is not as simple as just moving it's inner structure to the owning CTxMempoolEntry. Because TxLinks (the class storing the setEntries for parents and children) store txiters to each entry in the mempool corresponding to the parent or child, it means that the TxLinks type is "aware" of the boost multiindex (mapTx) it's coming from, which is in turn, aware of the entry type stored in mapTx. Thus we used maplinks to store this entry associated data we in an entirely separate data structure just to avoid a circular type reference caused by storing a txiter inside a CTxMempoolEntry.
It turns out, we can kill this circular reference by making use of iterator_to multiindex function and std::reference_wrapper. This allows us to get rid of the maplinks data structure and move the ownership of the parents/child sets to the entries themselves.
The benefit of this good all around, for any of the reasons given below the change would be acceptable, and it doesn't make the code harder to reason about or worse in any respect (as far as I can tell, there's no tradeoff).
### Simpler ownership model
No longer having to consistency check that mapLinks did have records for our CTxMempoolEntry, impossible to have a mapLinks entry outlive or incorrectly die before a CTxMempoolEntry.
### Memory Usage
We get rid of a O(Transactions) sized map in the mempool, which is a long lived data structure.
### Performance
If you have a CTxMemPoolEntry, you immediately know the address of it's children/parents, rather than having to do a O(log(Transactions)) lookup via maplinks (which we do very often). We do it in *so many* places that a true benchmark has to look at a full running node, but it is easy enough to show an improvement in this case.
The ComplexMemPool shows a good coherence check that we see the expected result of it being 12.5% faster / 1.14x faster.
```
Before:
# Benchmark, evals, iterations, total, min, max, median
ComplexMemPool, 5, 1, 1.40462, 0.277222, 0.285339, 0.279793
After:
# Benchmark, evals, iterations, total, min, max, median
ComplexMemPool, 5, 1, 1.22586, 0.243831, 0.247076, 0.244596
```
The ComplexMemPool benchmark only checks doing addUnchecked and TrimToSize for 800 transactions. While this bench does a good job of hammering the relevant types of function, it doesn't test everything.
Subbing in 5000 transactions shows a that the advantage isn't completely wiped out by other asymptotic factors (this isn't the only bottleneck in growing the mempool), but it's only a bit proportionally slower (10.8%, 1.12x), which adds evidence that this will be a good change for performance minded users.
```
# Benchmark, evals, iterations, total, min, max, median
ComplexMemPool, 5, 1, 59.1321, 11.5919, 12.235, 11.7068
# Benchmark, evals, iterations, total, min, max, median
ComplexMemPool, 5, 1, 52.1307, 10.2641, 10.5206, 10.4306
```
I don't think it's possible to come up with an example of where a maplinks based design would have better performance, but it's something for reviewers to consider.
# Discussion
## Why maplinks in the first place?
I spoke with the author of mapLinks (sdaftuar) a while back, and my recollection from our conversation was that it was implemented because he did not know how to resolve the circular dependency at the time, and there was no other reason for making it a separate map.
## Is iterator_to weird?
iterator_to is expressly for this purpose, see https://www.boost.org/doc/libs/1_51_0/libs/multi_index/doc/tutorial/indices.html#iterator_to
> iterator_to provides a way to retrieve an iterator to an element from a pointer to the element, thus making iterators and pointers interchangeable for the purposes of element pointing (not so for traversal) in many situations. This notwithstanding, it is not the aim of iterator_to to promote the usage of pointers as substitutes for real iterators: the latter are specifically designed for handling the elements of a container, and not only benefit from the iterator orientation of container interfaces, but are also capable of exposing many more programming bugs than raw pointers, both at compile and run time. iterator_to is thus meant to be used in scenarios where access via iterators is not suitable or desireable:
>
> - Interoperability with preexisting APIs based on pointers or references.
> - Publication of pointer-based interfaces (for instance, when designing a C-compatible library).
> - The exposure of pointers in place of iterators can act as a type erasure barrier effectively decoupling the user of the code from the implementation detail of which particular container is being used. Similar techniques, like the famous Pimpl idiom, are used in large projects to reduce dependencies and build times.
> - Self-referencing contexts where an element acts upon its owner container and no iterator to itself is available.
In other words, iterator_to is the perfect tool for the job by the last reason given. Under the hood it should just be a simple pointer cast and have no major runtime overhead (depending on if the function call is inlined).
Edit by laanwj: removed at sign from the description
ACKs for top commit:
jonatack:
re-ACK 296be8f per `git range-diff ab338a19 3ba1665 296be8f`, sanity check gcc 10.2 debug build is clean.
hebasto:
re-ACK 296be8f58e02b39a58f017c52294aceed22c3ffd, only rebased since my [previous](https://github.com/bitcoin/bitcoin/pull/19478#pullrequestreview-482400727) review (verified with `git range-diff`).
Tree-SHA512: f5c30a4936fcde6ae32a02823c303b3568a747c2681d11f87df88a149f984a6d3b4c81f391859afbeb68864ef7f6a3d8779f74a58e3de701b3d51f78e498682e
2020-09-07 12:06:38 +02:00
// descendants now contains all in-mempool descendants of updateIt.
2015-07-15 20:47:45 +02:00
// Update and add to cached descendant map
int64_t modifySize = 0 ;
CAmount modifyFee = 0 ;
int64_t modifyCount = 0 ;
Merge #19478: Remove CTxMempool::mapLinks data structure member
296be8f58e02b39a58f017c52294aceed22c3ffd Get rid of unused functions CTxMemPool::GetMemPoolChildren, CTxMemPool::GetMemPoolParents (Jeremy Rubin)
46d955d196043cc297834baeebce31ff778dff80 Remove mapLinks in favor of entry inlined structs with iterator type erasure (Jeremy Rubin)
Pull request description:
Currently we have a peculiar data structure in the mempool called maplinks. Maplinks job is to track the in-pool children and parents of each transaction. This PR can be primarily understood and reviewed as a simple refactoring to remove this extra data structure, although it comes with a nice memory and performance improvement for free.
Maplinks is particularly peculiar because removing it is not as simple as just moving it's inner structure to the owning CTxMempoolEntry. Because TxLinks (the class storing the setEntries for parents and children) store txiters to each entry in the mempool corresponding to the parent or child, it means that the TxLinks type is "aware" of the boost multiindex (mapTx) it's coming from, which is in turn, aware of the entry type stored in mapTx. Thus we used maplinks to store this entry associated data we in an entirely separate data structure just to avoid a circular type reference caused by storing a txiter inside a CTxMempoolEntry.
It turns out, we can kill this circular reference by making use of iterator_to multiindex function and std::reference_wrapper. This allows us to get rid of the maplinks data structure and move the ownership of the parents/child sets to the entries themselves.
The benefit of this good all around, for any of the reasons given below the change would be acceptable, and it doesn't make the code harder to reason about or worse in any respect (as far as I can tell, there's no tradeoff).
### Simpler ownership model
No longer having to consistency check that mapLinks did have records for our CTxMempoolEntry, impossible to have a mapLinks entry outlive or incorrectly die before a CTxMempoolEntry.
### Memory Usage
We get rid of a O(Transactions) sized map in the mempool, which is a long lived data structure.
### Performance
If you have a CTxMemPoolEntry, you immediately know the address of it's children/parents, rather than having to do a O(log(Transactions)) lookup via maplinks (which we do very often). We do it in *so many* places that a true benchmark has to look at a full running node, but it is easy enough to show an improvement in this case.
The ComplexMemPool shows a good coherence check that we see the expected result of it being 12.5% faster / 1.14x faster.
```
Before:
# Benchmark, evals, iterations, total, min, max, median
ComplexMemPool, 5, 1, 1.40462, 0.277222, 0.285339, 0.279793
After:
# Benchmark, evals, iterations, total, min, max, median
ComplexMemPool, 5, 1, 1.22586, 0.243831, 0.247076, 0.244596
```
The ComplexMemPool benchmark only checks doing addUnchecked and TrimToSize for 800 transactions. While this bench does a good job of hammering the relevant types of function, it doesn't test everything.
Subbing in 5000 transactions shows a that the advantage isn't completely wiped out by other asymptotic factors (this isn't the only bottleneck in growing the mempool), but it's only a bit proportionally slower (10.8%, 1.12x), which adds evidence that this will be a good change for performance minded users.
```
# Benchmark, evals, iterations, total, min, max, median
ComplexMemPool, 5, 1, 59.1321, 11.5919, 12.235, 11.7068
# Benchmark, evals, iterations, total, min, max, median
ComplexMemPool, 5, 1, 52.1307, 10.2641, 10.5206, 10.4306
```
I don't think it's possible to come up with an example of where a maplinks based design would have better performance, but it's something for reviewers to consider.
# Discussion
## Why maplinks in the first place?
I spoke with the author of mapLinks (sdaftuar) a while back, and my recollection from our conversation was that it was implemented because he did not know how to resolve the circular dependency at the time, and there was no other reason for making it a separate map.
## Is iterator_to weird?
iterator_to is expressly for this purpose, see https://www.boost.org/doc/libs/1_51_0/libs/multi_index/doc/tutorial/indices.html#iterator_to
> iterator_to provides a way to retrieve an iterator to an element from a pointer to the element, thus making iterators and pointers interchangeable for the purposes of element pointing (not so for traversal) in many situations. This notwithstanding, it is not the aim of iterator_to to promote the usage of pointers as substitutes for real iterators: the latter are specifically designed for handling the elements of a container, and not only benefit from the iterator orientation of container interfaces, but are also capable of exposing many more programming bugs than raw pointers, both at compile and run time. iterator_to is thus meant to be used in scenarios where access via iterators is not suitable or desireable:
>
> - Interoperability with preexisting APIs based on pointers or references.
> - Publication of pointer-based interfaces (for instance, when designing a C-compatible library).
> - The exposure of pointers in place of iterators can act as a type erasure barrier effectively decoupling the user of the code from the implementation detail of which particular container is being used. Similar techniques, like the famous Pimpl idiom, are used in large projects to reduce dependencies and build times.
> - Self-referencing contexts where an element acts upon its owner container and no iterator to itself is available.
In other words, iterator_to is the perfect tool for the job by the last reason given. Under the hood it should just be a simple pointer cast and have no major runtime overhead (depending on if the function call is inlined).
Edit by laanwj: removed at sign from the description
ACKs for top commit:
jonatack:
re-ACK 296be8f per `git range-diff ab338a19 3ba1665 296be8f`, sanity check gcc 10.2 debug build is clean.
hebasto:
re-ACK 296be8f58e02b39a58f017c52294aceed22c3ffd, only rebased since my [previous](https://github.com/bitcoin/bitcoin/pull/19478#pullrequestreview-482400727) review (verified with `git range-diff`).
Tree-SHA512: f5c30a4936fcde6ae32a02823c303b3568a747c2681d11f87df88a149f984a6d3b4c81f391859afbeb68864ef7f6a3d8779f74a58e3de701b3d51f78e498682e
2020-09-07 12:06:38 +02:00
for ( const CTxMemPoolEntry & descendant : descendants ) {
if ( ! setExclude . count ( descendant . GetTx ( ) . GetHash ( ) ) ) {
modifySize + = descendant . GetTxSize ( ) ;
modifyFee + = descendant . GetModifiedFee ( ) ;
2015-07-15 20:47:45 +02:00
modifyCount + + ;
Merge #19478: Remove CTxMempool::mapLinks data structure member
296be8f58e02b39a58f017c52294aceed22c3ffd Get rid of unused functions CTxMemPool::GetMemPoolChildren, CTxMemPool::GetMemPoolParents (Jeremy Rubin)
46d955d196043cc297834baeebce31ff778dff80 Remove mapLinks in favor of entry inlined structs with iterator type erasure (Jeremy Rubin)
Pull request description:
Currently we have a peculiar data structure in the mempool called maplinks. Maplinks job is to track the in-pool children and parents of each transaction. This PR can be primarily understood and reviewed as a simple refactoring to remove this extra data structure, although it comes with a nice memory and performance improvement for free.
Maplinks is particularly peculiar because removing it is not as simple as just moving it's inner structure to the owning CTxMempoolEntry. Because TxLinks (the class storing the setEntries for parents and children) store txiters to each entry in the mempool corresponding to the parent or child, it means that the TxLinks type is "aware" of the boost multiindex (mapTx) it's coming from, which is in turn, aware of the entry type stored in mapTx. Thus we used maplinks to store this entry associated data we in an entirely separate data structure just to avoid a circular type reference caused by storing a txiter inside a CTxMempoolEntry.
It turns out, we can kill this circular reference by making use of iterator_to multiindex function and std::reference_wrapper. This allows us to get rid of the maplinks data structure and move the ownership of the parents/child sets to the entries themselves.
The benefit of this good all around, for any of the reasons given below the change would be acceptable, and it doesn't make the code harder to reason about or worse in any respect (as far as I can tell, there's no tradeoff).
### Simpler ownership model
No longer having to consistency check that mapLinks did have records for our CTxMempoolEntry, impossible to have a mapLinks entry outlive or incorrectly die before a CTxMempoolEntry.
### Memory Usage
We get rid of a O(Transactions) sized map in the mempool, which is a long lived data structure.
### Performance
If you have a CTxMemPoolEntry, you immediately know the address of it's children/parents, rather than having to do a O(log(Transactions)) lookup via maplinks (which we do very often). We do it in *so many* places that a true benchmark has to look at a full running node, but it is easy enough to show an improvement in this case.
The ComplexMemPool shows a good coherence check that we see the expected result of it being 12.5% faster / 1.14x faster.
```
Before:
# Benchmark, evals, iterations, total, min, max, median
ComplexMemPool, 5, 1, 1.40462, 0.277222, 0.285339, 0.279793
After:
# Benchmark, evals, iterations, total, min, max, median
ComplexMemPool, 5, 1, 1.22586, 0.243831, 0.247076, 0.244596
```
The ComplexMemPool benchmark only checks doing addUnchecked and TrimToSize for 800 transactions. While this bench does a good job of hammering the relevant types of function, it doesn't test everything.
Subbing in 5000 transactions shows a that the advantage isn't completely wiped out by other asymptotic factors (this isn't the only bottleneck in growing the mempool), but it's only a bit proportionally slower (10.8%, 1.12x), which adds evidence that this will be a good change for performance minded users.
```
# Benchmark, evals, iterations, total, min, max, median
ComplexMemPool, 5, 1, 59.1321, 11.5919, 12.235, 11.7068
# Benchmark, evals, iterations, total, min, max, median
ComplexMemPool, 5, 1, 52.1307, 10.2641, 10.5206, 10.4306
```
I don't think it's possible to come up with an example of where a maplinks based design would have better performance, but it's something for reviewers to consider.
# Discussion
## Why maplinks in the first place?
I spoke with the author of mapLinks (sdaftuar) a while back, and my recollection from our conversation was that it was implemented because he did not know how to resolve the circular dependency at the time, and there was no other reason for making it a separate map.
## Is iterator_to weird?
iterator_to is expressly for this purpose, see https://www.boost.org/doc/libs/1_51_0/libs/multi_index/doc/tutorial/indices.html#iterator_to
> iterator_to provides a way to retrieve an iterator to an element from a pointer to the element, thus making iterators and pointers interchangeable for the purposes of element pointing (not so for traversal) in many situations. This notwithstanding, it is not the aim of iterator_to to promote the usage of pointers as substitutes for real iterators: the latter are specifically designed for handling the elements of a container, and not only benefit from the iterator orientation of container interfaces, but are also capable of exposing many more programming bugs than raw pointers, both at compile and run time. iterator_to is thus meant to be used in scenarios where access via iterators is not suitable or desireable:
>
> - Interoperability with preexisting APIs based on pointers or references.
> - Publication of pointer-based interfaces (for instance, when designing a C-compatible library).
> - The exposure of pointers in place of iterators can act as a type erasure barrier effectively decoupling the user of the code from the implementation detail of which particular container is being used. Similar techniques, like the famous Pimpl idiom, are used in large projects to reduce dependencies and build times.
> - Self-referencing contexts where an element acts upon its owner container and no iterator to itself is available.
In other words, iterator_to is the perfect tool for the job by the last reason given. Under the hood it should just be a simple pointer cast and have no major runtime overhead (depending on if the function call is inlined).
Edit by laanwj: removed at sign from the description
ACKs for top commit:
jonatack:
re-ACK 296be8f per `git range-diff ab338a19 3ba1665 296be8f`, sanity check gcc 10.2 debug build is clean.
hebasto:
re-ACK 296be8f58e02b39a58f017c52294aceed22c3ffd, only rebased since my [previous](https://github.com/bitcoin/bitcoin/pull/19478#pullrequestreview-482400727) review (verified with `git range-diff`).
Tree-SHA512: f5c30a4936fcde6ae32a02823c303b3568a747c2681d11f87df88a149f984a6d3b4c81f391859afbeb68864ef7f6a3d8779f74a58e3de701b3d51f78e498682e
2020-09-07 12:06:38 +02:00
cachedDescendants [ updateIt ] . insert ( mapTx . iterator_to ( descendant ) ) ;
2016-03-17 13:33:31 +01:00
// Update ancestor state for each descendant
Merge #19478: Remove CTxMempool::mapLinks data structure member
296be8f58e02b39a58f017c52294aceed22c3ffd Get rid of unused functions CTxMemPool::GetMemPoolChildren, CTxMemPool::GetMemPoolParents (Jeremy Rubin)
46d955d196043cc297834baeebce31ff778dff80 Remove mapLinks in favor of entry inlined structs with iterator type erasure (Jeremy Rubin)
Pull request description:
Currently we have a peculiar data structure in the mempool called maplinks. Maplinks job is to track the in-pool children and parents of each transaction. This PR can be primarily understood and reviewed as a simple refactoring to remove this extra data structure, although it comes with a nice memory and performance improvement for free.
Maplinks is particularly peculiar because removing it is not as simple as just moving it's inner structure to the owning CTxMempoolEntry. Because TxLinks (the class storing the setEntries for parents and children) store txiters to each entry in the mempool corresponding to the parent or child, it means that the TxLinks type is "aware" of the boost multiindex (mapTx) it's coming from, which is in turn, aware of the entry type stored in mapTx. Thus we used maplinks to store this entry associated data we in an entirely separate data structure just to avoid a circular type reference caused by storing a txiter inside a CTxMempoolEntry.
It turns out, we can kill this circular reference by making use of iterator_to multiindex function and std::reference_wrapper. This allows us to get rid of the maplinks data structure and move the ownership of the parents/child sets to the entries themselves.
The benefit of this good all around, for any of the reasons given below the change would be acceptable, and it doesn't make the code harder to reason about or worse in any respect (as far as I can tell, there's no tradeoff).
### Simpler ownership model
No longer having to consistency check that mapLinks did have records for our CTxMempoolEntry, impossible to have a mapLinks entry outlive or incorrectly die before a CTxMempoolEntry.
### Memory Usage
We get rid of a O(Transactions) sized map in the mempool, which is a long lived data structure.
### Performance
If you have a CTxMemPoolEntry, you immediately know the address of it's children/parents, rather than having to do a O(log(Transactions)) lookup via maplinks (which we do very often). We do it in *so many* places that a true benchmark has to look at a full running node, but it is easy enough to show an improvement in this case.
The ComplexMemPool shows a good coherence check that we see the expected result of it being 12.5% faster / 1.14x faster.
```
Before:
# Benchmark, evals, iterations, total, min, max, median
ComplexMemPool, 5, 1, 1.40462, 0.277222, 0.285339, 0.279793
After:
# Benchmark, evals, iterations, total, min, max, median
ComplexMemPool, 5, 1, 1.22586, 0.243831, 0.247076, 0.244596
```
The ComplexMemPool benchmark only checks doing addUnchecked and TrimToSize for 800 transactions. While this bench does a good job of hammering the relevant types of function, it doesn't test everything.
Subbing in 5000 transactions shows a that the advantage isn't completely wiped out by other asymptotic factors (this isn't the only bottleneck in growing the mempool), but it's only a bit proportionally slower (10.8%, 1.12x), which adds evidence that this will be a good change for performance minded users.
```
# Benchmark, evals, iterations, total, min, max, median
ComplexMemPool, 5, 1, 59.1321, 11.5919, 12.235, 11.7068
# Benchmark, evals, iterations, total, min, max, median
ComplexMemPool, 5, 1, 52.1307, 10.2641, 10.5206, 10.4306
```
I don't think it's possible to come up with an example of where a maplinks based design would have better performance, but it's something for reviewers to consider.
# Discussion
## Why maplinks in the first place?
I spoke with the author of mapLinks (sdaftuar) a while back, and my recollection from our conversation was that it was implemented because he did not know how to resolve the circular dependency at the time, and there was no other reason for making it a separate map.
## Is iterator_to weird?
iterator_to is expressly for this purpose, see https://www.boost.org/doc/libs/1_51_0/libs/multi_index/doc/tutorial/indices.html#iterator_to
> iterator_to provides a way to retrieve an iterator to an element from a pointer to the element, thus making iterators and pointers interchangeable for the purposes of element pointing (not so for traversal) in many situations. This notwithstanding, it is not the aim of iterator_to to promote the usage of pointers as substitutes for real iterators: the latter are specifically designed for handling the elements of a container, and not only benefit from the iterator orientation of container interfaces, but are also capable of exposing many more programming bugs than raw pointers, both at compile and run time. iterator_to is thus meant to be used in scenarios where access via iterators is not suitable or desireable:
>
> - Interoperability with preexisting APIs based on pointers or references.
> - Publication of pointer-based interfaces (for instance, when designing a C-compatible library).
> - The exposure of pointers in place of iterators can act as a type erasure barrier effectively decoupling the user of the code from the implementation detail of which particular container is being used. Similar techniques, like the famous Pimpl idiom, are used in large projects to reduce dependencies and build times.
> - Self-referencing contexts where an element acts upon its owner container and no iterator to itself is available.
In other words, iterator_to is the perfect tool for the job by the last reason given. Under the hood it should just be a simple pointer cast and have no major runtime overhead (depending on if the function call is inlined).
Edit by laanwj: removed at sign from the description
ACKs for top commit:
jonatack:
re-ACK 296be8f per `git range-diff ab338a19 3ba1665 296be8f`, sanity check gcc 10.2 debug build is clean.
hebasto:
re-ACK 296be8f58e02b39a58f017c52294aceed22c3ffd, only rebased since my [previous](https://github.com/bitcoin/bitcoin/pull/19478#pullrequestreview-482400727) review (verified with `git range-diff`).
Tree-SHA512: f5c30a4936fcde6ae32a02823c303b3568a747c2681d11f87df88a149f984a6d3b4c81f391859afbeb68864ef7f6a3d8779f74a58e3de701b3d51f78e498682e
2020-09-07 12:06:38 +02:00
mapTx . modify ( mapTx . iterator_to ( descendant ) , update_ancestor_state ( updateIt - > GetTxSize ( ) , updateIt - > GetModifiedFee ( ) , 1 , updateIt - > GetSigOpCount ( ) ) ) ;
2014-03-17 13:19:54 +01:00
}
}
2015-07-15 20:47:45 +02:00
mapTx . modify ( updateIt , update_descendant_state ( modifySize , modifyFee , modifyCount ) ) ;
}
2014-03-17 13:19:54 +01:00
2015-07-15 20:47:45 +02:00
// vHashesToUpdate is the set of transaction hashes from a disconnected block
// which has been re-added to the mempool.
2017-03-23 08:16:14 +01:00
// for each entry, look for descendants that are outside vHashesToUpdate, and
2015-07-15 20:47:45 +02:00
// add fee/size information for such descendants to the parent.
2016-03-17 13:33:31 +01:00
// for each such descendant, also update the ancestor state to include the parent.
2015-07-15 20:47:45 +02:00
void CTxMemPool : : UpdateTransactionsFromBlock ( const std : : vector < uint256 > & vHashesToUpdate )
{
2022-05-05 16:24:51 +02:00
AssertLockHeld ( cs ) ;
2015-07-15 20:47:45 +02:00
// For each entry in vHashesToUpdate, store the set of in-mempool, but not
// in-vHashesToUpdate transactions, so that we don't have to recalculate
// descendants when we come across a previously seen entry.
cacheMap mapMemPoolDescendantsToUpdate ;
// Use a set for lookups into vHashesToUpdate (these entries are already
// accounted for in the state of their ancestors)
std : : set < uint256 > setAlreadyIncluded ( vHashesToUpdate . begin ( ) , vHashesToUpdate . end ( ) ) ;
2017-06-22 20:36:16 +02:00
// Iterate in reverse, so that whenever we are looking at a transaction
2015-07-15 20:47:45 +02:00
// we are sure that all in-mempool descendants have already been processed.
// This maximizes the benefit of the descendant cache and guarantees that
Merge #19478: Remove CTxMempool::mapLinks data structure member
296be8f58e02b39a58f017c52294aceed22c3ffd Get rid of unused functions CTxMemPool::GetMemPoolChildren, CTxMemPool::GetMemPoolParents (Jeremy Rubin)
46d955d196043cc297834baeebce31ff778dff80 Remove mapLinks in favor of entry inlined structs with iterator type erasure (Jeremy Rubin)
Pull request description:
Currently we have a peculiar data structure in the mempool called maplinks. Maplinks job is to track the in-pool children and parents of each transaction. This PR can be primarily understood and reviewed as a simple refactoring to remove this extra data structure, although it comes with a nice memory and performance improvement for free.
Maplinks is particularly peculiar because removing it is not as simple as just moving it's inner structure to the owning CTxMempoolEntry. Because TxLinks (the class storing the setEntries for parents and children) store txiters to each entry in the mempool corresponding to the parent or child, it means that the TxLinks type is "aware" of the boost multiindex (mapTx) it's coming from, which is in turn, aware of the entry type stored in mapTx. Thus we used maplinks to store this entry associated data we in an entirely separate data structure just to avoid a circular type reference caused by storing a txiter inside a CTxMempoolEntry.
It turns out, we can kill this circular reference by making use of iterator_to multiindex function and std::reference_wrapper. This allows us to get rid of the maplinks data structure and move the ownership of the parents/child sets to the entries themselves.
The benefit of this good all around, for any of the reasons given below the change would be acceptable, and it doesn't make the code harder to reason about or worse in any respect (as far as I can tell, there's no tradeoff).
### Simpler ownership model
No longer having to consistency check that mapLinks did have records for our CTxMempoolEntry, impossible to have a mapLinks entry outlive or incorrectly die before a CTxMempoolEntry.
### Memory Usage
We get rid of a O(Transactions) sized map in the mempool, which is a long lived data structure.
### Performance
If you have a CTxMemPoolEntry, you immediately know the address of it's children/parents, rather than having to do a O(log(Transactions)) lookup via maplinks (which we do very often). We do it in *so many* places that a true benchmark has to look at a full running node, but it is easy enough to show an improvement in this case.
The ComplexMemPool shows a good coherence check that we see the expected result of it being 12.5% faster / 1.14x faster.
```
Before:
# Benchmark, evals, iterations, total, min, max, median
ComplexMemPool, 5, 1, 1.40462, 0.277222, 0.285339, 0.279793
After:
# Benchmark, evals, iterations, total, min, max, median
ComplexMemPool, 5, 1, 1.22586, 0.243831, 0.247076, 0.244596
```
The ComplexMemPool benchmark only checks doing addUnchecked and TrimToSize for 800 transactions. While this bench does a good job of hammering the relevant types of function, it doesn't test everything.
Subbing in 5000 transactions shows a that the advantage isn't completely wiped out by other asymptotic factors (this isn't the only bottleneck in growing the mempool), but it's only a bit proportionally slower (10.8%, 1.12x), which adds evidence that this will be a good change for performance minded users.
```
# Benchmark, evals, iterations, total, min, max, median
ComplexMemPool, 5, 1, 59.1321, 11.5919, 12.235, 11.7068
# Benchmark, evals, iterations, total, min, max, median
ComplexMemPool, 5, 1, 52.1307, 10.2641, 10.5206, 10.4306
```
I don't think it's possible to come up with an example of where a maplinks based design would have better performance, but it's something for reviewers to consider.
# Discussion
## Why maplinks in the first place?
I spoke with the author of mapLinks (sdaftuar) a while back, and my recollection from our conversation was that it was implemented because he did not know how to resolve the circular dependency at the time, and there was no other reason for making it a separate map.
## Is iterator_to weird?
iterator_to is expressly for this purpose, see https://www.boost.org/doc/libs/1_51_0/libs/multi_index/doc/tutorial/indices.html#iterator_to
> iterator_to provides a way to retrieve an iterator to an element from a pointer to the element, thus making iterators and pointers interchangeable for the purposes of element pointing (not so for traversal) in many situations. This notwithstanding, it is not the aim of iterator_to to promote the usage of pointers as substitutes for real iterators: the latter are specifically designed for handling the elements of a container, and not only benefit from the iterator orientation of container interfaces, but are also capable of exposing many more programming bugs than raw pointers, both at compile and run time. iterator_to is thus meant to be used in scenarios where access via iterators is not suitable or desireable:
>
> - Interoperability with preexisting APIs based on pointers or references.
> - Publication of pointer-based interfaces (for instance, when designing a C-compatible library).
> - The exposure of pointers in place of iterators can act as a type erasure barrier effectively decoupling the user of the code from the implementation detail of which particular container is being used. Similar techniques, like the famous Pimpl idiom, are used in large projects to reduce dependencies and build times.
> - Self-referencing contexts where an element acts upon its owner container and no iterator to itself is available.
In other words, iterator_to is the perfect tool for the job by the last reason given. Under the hood it should just be a simple pointer cast and have no major runtime overhead (depending on if the function call is inlined).
Edit by laanwj: removed at sign from the description
ACKs for top commit:
jonatack:
re-ACK 296be8f per `git range-diff ab338a19 3ba1665 296be8f`, sanity check gcc 10.2 debug build is clean.
hebasto:
re-ACK 296be8f58e02b39a58f017c52294aceed22c3ffd, only rebased since my [previous](https://github.com/bitcoin/bitcoin/pull/19478#pullrequestreview-482400727) review (verified with `git range-diff`).
Tree-SHA512: f5c30a4936fcde6ae32a02823c303b3568a747c2681d11f87df88a149f984a6d3b4c81f391859afbeb68864ef7f6a3d8779f74a58e3de701b3d51f78e498682e
2020-09-07 12:06:38 +02:00
// CTxMemPool::m_children will be updated, an assumption made in
2015-07-15 20:47:45 +02:00
// UpdateForDescendants.
2019-07-13 06:58:15 +02:00
for ( const uint256 & hash : reverse_iterate ( vHashesToUpdate ) ) {
2015-07-15 20:47:45 +02:00
// calculate children from mapNextTx
txiter it = mapTx . find ( hash ) ;
if ( it = = mapTx . end ( ) ) {
continue ;
2014-03-17 13:19:54 +01:00
}
2016-06-03 01:11:02 +02:00
auto iter = mapNextTx . lower_bound ( COutPoint ( hash , 0 ) ) ;
Merge #19478: Remove CTxMempool::mapLinks data structure member
296be8f58e02b39a58f017c52294aceed22c3ffd Get rid of unused functions CTxMemPool::GetMemPoolChildren, CTxMemPool::GetMemPoolParents (Jeremy Rubin)
46d955d196043cc297834baeebce31ff778dff80 Remove mapLinks in favor of entry inlined structs with iterator type erasure (Jeremy Rubin)
Pull request description:
Currently we have a peculiar data structure in the mempool called maplinks. Maplinks job is to track the in-pool children and parents of each transaction. This PR can be primarily understood and reviewed as a simple refactoring to remove this extra data structure, although it comes with a nice memory and performance improvement for free.
Maplinks is particularly peculiar because removing it is not as simple as just moving it's inner structure to the owning CTxMempoolEntry. Because TxLinks (the class storing the setEntries for parents and children) store txiters to each entry in the mempool corresponding to the parent or child, it means that the TxLinks type is "aware" of the boost multiindex (mapTx) it's coming from, which is in turn, aware of the entry type stored in mapTx. Thus we used maplinks to store this entry associated data we in an entirely separate data structure just to avoid a circular type reference caused by storing a txiter inside a CTxMempoolEntry.
It turns out, we can kill this circular reference by making use of iterator_to multiindex function and std::reference_wrapper. This allows us to get rid of the maplinks data structure and move the ownership of the parents/child sets to the entries themselves.
The benefit of this good all around, for any of the reasons given below the change would be acceptable, and it doesn't make the code harder to reason about or worse in any respect (as far as I can tell, there's no tradeoff).
### Simpler ownership model
No longer having to consistency check that mapLinks did have records for our CTxMempoolEntry, impossible to have a mapLinks entry outlive or incorrectly die before a CTxMempoolEntry.
### Memory Usage
We get rid of a O(Transactions) sized map in the mempool, which is a long lived data structure.
### Performance
If you have a CTxMemPoolEntry, you immediately know the address of it's children/parents, rather than having to do a O(log(Transactions)) lookup via maplinks (which we do very often). We do it in *so many* places that a true benchmark has to look at a full running node, but it is easy enough to show an improvement in this case.
The ComplexMemPool shows a good coherence check that we see the expected result of it being 12.5% faster / 1.14x faster.
```
Before:
# Benchmark, evals, iterations, total, min, max, median
ComplexMemPool, 5, 1, 1.40462, 0.277222, 0.285339, 0.279793
After:
# Benchmark, evals, iterations, total, min, max, median
ComplexMemPool, 5, 1, 1.22586, 0.243831, 0.247076, 0.244596
```
The ComplexMemPool benchmark only checks doing addUnchecked and TrimToSize for 800 transactions. While this bench does a good job of hammering the relevant types of function, it doesn't test everything.
Subbing in 5000 transactions shows a that the advantage isn't completely wiped out by other asymptotic factors (this isn't the only bottleneck in growing the mempool), but it's only a bit proportionally slower (10.8%, 1.12x), which adds evidence that this will be a good change for performance minded users.
```
# Benchmark, evals, iterations, total, min, max, median
ComplexMemPool, 5, 1, 59.1321, 11.5919, 12.235, 11.7068
# Benchmark, evals, iterations, total, min, max, median
ComplexMemPool, 5, 1, 52.1307, 10.2641, 10.5206, 10.4306
```
I don't think it's possible to come up with an example of where a maplinks based design would have better performance, but it's something for reviewers to consider.
# Discussion
## Why maplinks in the first place?
I spoke with the author of mapLinks (sdaftuar) a while back, and my recollection from our conversation was that it was implemented because he did not know how to resolve the circular dependency at the time, and there was no other reason for making it a separate map.
## Is iterator_to weird?
iterator_to is expressly for this purpose, see https://www.boost.org/doc/libs/1_51_0/libs/multi_index/doc/tutorial/indices.html#iterator_to
> iterator_to provides a way to retrieve an iterator to an element from a pointer to the element, thus making iterators and pointers interchangeable for the purposes of element pointing (not so for traversal) in many situations. This notwithstanding, it is not the aim of iterator_to to promote the usage of pointers as substitutes for real iterators: the latter are specifically designed for handling the elements of a container, and not only benefit from the iterator orientation of container interfaces, but are also capable of exposing many more programming bugs than raw pointers, both at compile and run time. iterator_to is thus meant to be used in scenarios where access via iterators is not suitable or desireable:
>
> - Interoperability with preexisting APIs based on pointers or references.
> - Publication of pointer-based interfaces (for instance, when designing a C-compatible library).
> - The exposure of pointers in place of iterators can act as a type erasure barrier effectively decoupling the user of the code from the implementation detail of which particular container is being used. Similar techniques, like the famous Pimpl idiom, are used in large projects to reduce dependencies and build times.
> - Self-referencing contexts where an element acts upon its owner container and no iterator to itself is available.
In other words, iterator_to is the perfect tool for the job by the last reason given. Under the hood it should just be a simple pointer cast and have no major runtime overhead (depending on if the function call is inlined).
Edit by laanwj: removed at sign from the description
ACKs for top commit:
jonatack:
re-ACK 296be8f per `git range-diff ab338a19 3ba1665 296be8f`, sanity check gcc 10.2 debug build is clean.
hebasto:
re-ACK 296be8f58e02b39a58f017c52294aceed22c3ffd, only rebased since my [previous](https://github.com/bitcoin/bitcoin/pull/19478#pullrequestreview-482400727) review (verified with `git range-diff`).
Tree-SHA512: f5c30a4936fcde6ae32a02823c303b3568a747c2681d11f87df88a149f984a6d3b4c81f391859afbeb68864ef7f6a3d8779f74a58e3de701b3d51f78e498682e
2020-09-07 12:06:38 +02:00
// First calculate the children, and update CTxMemPool::m_children to
// include them, and update their CTxMemPoolEntry::m_parents to include this tx.
2020-01-14 20:31:36 +01:00
// we cache the in-mempool children to avoid duplicate updates
{
2021-02-24 09:07:10 +01:00
WITH_FRESH_EPOCH ( m_epoch ) ;
2020-01-14 20:31:36 +01:00
for ( ; iter ! = mapNextTx . end ( ) & & iter - > first - > hash = = hash ; + + iter ) {
const uint256 & childHash = iter - > second - > GetHash ( ) ;
txiter childIter = mapTx . find ( childHash ) ;
assert ( childIter ! = mapTx . end ( ) ) ;
// We can skip updating entries we've encountered before or that
// are in the block (which are already accounted for).
if ( ! visited ( childIter ) & & ! setAlreadyIncluded . count ( childHash ) ) {
UpdateChild ( it , childIter , true ) ;
UpdateParent ( childIter , it , true ) ;
}
2014-03-17 13:19:54 +01:00
}
2020-01-14 20:31:36 +01:00
} // release epoch guard for UpdateForDescendants
2016-03-17 13:33:31 +01:00
UpdateForDescendants ( it , mapMemPoolDescendantsToUpdate , setAlreadyIncluded ) ;
2015-07-15 20:47:45 +02:00
}
}
2014-07-26 03:29:54 +02:00
Merge bitcoin/bitcoin#21800: mempool/validation: mempool ancestor/descendant limits for packages
accf3d5868460b4b14ab607fd66ac985b086fbb3 [test] mempool package ancestor/descendant limits (glozow)
2b6b26e57c24d2f0abd442c1c33098e3121572ce [test] parameterizable fee for make_chain and create_child_with_parents (glozow)
313c09f7b7beddfdb74c284720d209c81dfdb94f [test] helper function to increase transaction weight (glozow)
f8253d69d6f02850995a11eeb71fedc22e6f6575 extract/rename helper functions from rpc_packages.py (glozow)
3cd663a5d33aa7ef87994e452bced7f192d021a0 [policy] ancestor/descendant limits for packages (glozow)
c6e016aa139c8363e9b38bbc1ba0dca55700b8a7 [mempool] check ancestor/descendant limits for packages (glozow)
f551841d3ec080a2d7a7988c7b35088dff6c5830 [refactor] pass size/count instead of entry to CalculateAncestorsAndCheckLimits (glozow)
97dd1c729d2bbedf9527b914c0cc8267b8a7c21b MOVEONLY: add helper function for calculating ancestors and checking limits (glozow)
f95bbf58aaf72aab8a9c5827b1f162f3b8ac38f4 misc package validation doc improvements (glozow)
Pull request description:
This PR implements a function to calculate mempool ancestors for a package and enforces ancestor/descendant limits on them as a whole. It reuses a portion of `CalculateMemPoolAncestors()`; there's also a small refactor to move the reused code into a generic helper function. Instead of calculating ancestors and descendants on every single transaction in the package and their ancestors, we use a "worst case" heuristic, treating every transaction in the package as each other's ancestor and descendant. This may overestimate everyone's counts, but is still pretty accurate in the our main package use cases, in which at least one of the transactions in the package is directly related to all the others (e.g. 1 parent + 1 child, multiple parents with 1 child, or chains).
Note on Terminology: While "package" is often used to describe groups of related transactions _within_ the mempool, here, I only use package to mean the group of not-in-mempool transactions we are currently validating.
#### Motivation
It would be a potential DoS vector to allow submission of packages to mempool without a proper guard for mempool ancestors/descendants. In general, the purpose of mempool ancestor/descendant limits is to limit the computational complexity of dealing with families during removals and additions. We want to be able to validate multiple transactions on top of the mempool, but also avoid these scenarios:
- We underestimate the ancestors/descendants during package validation and end up with extremely complex families in our mempool (potentially a DoS vector).
- We expend an unreasonable amount of resources calculating everyone's ancestors and descendants during package validation.
ACKs for top commit:
JeremyRubin:
utACK accf3d5
ariard:
ACK accf3d5.
Tree-SHA512: 0d18ce4b77398fe872e0b7c2cc66d3aac2135e561b64029584339e1f4de2a6a16ebab3dd5784f376e119cbafc4d50168b28d3bd95d0b3d01158714ade2e3624d
Signed-off-by: Vijay <vijaydas.mp@gmail.com>
2021-08-09 05:53:10 +02:00
bool CTxMemPool : : CalculateAncestorsAndCheckLimits ( size_t entry_size ,
size_t entry_count ,
setEntries & setAncestors ,
CTxMemPoolEntry : : Parents & staged_ancestors ,
uint64_t limitAncestorCount ,
uint64_t limitAncestorSize ,
uint64_t limitDescendantCount ,
uint64_t limitDescendantSize ,
std : : string & errString ) const
2015-07-15 20:47:45 +02:00
{
Merge bitcoin/bitcoin#21800: mempool/validation: mempool ancestor/descendant limits for packages
accf3d5868460b4b14ab607fd66ac985b086fbb3 [test] mempool package ancestor/descendant limits (glozow)
2b6b26e57c24d2f0abd442c1c33098e3121572ce [test] parameterizable fee for make_chain and create_child_with_parents (glozow)
313c09f7b7beddfdb74c284720d209c81dfdb94f [test] helper function to increase transaction weight (glozow)
f8253d69d6f02850995a11eeb71fedc22e6f6575 extract/rename helper functions from rpc_packages.py (glozow)
3cd663a5d33aa7ef87994e452bced7f192d021a0 [policy] ancestor/descendant limits for packages (glozow)
c6e016aa139c8363e9b38bbc1ba0dca55700b8a7 [mempool] check ancestor/descendant limits for packages (glozow)
f551841d3ec080a2d7a7988c7b35088dff6c5830 [refactor] pass size/count instead of entry to CalculateAncestorsAndCheckLimits (glozow)
97dd1c729d2bbedf9527b914c0cc8267b8a7c21b MOVEONLY: add helper function for calculating ancestors and checking limits (glozow)
f95bbf58aaf72aab8a9c5827b1f162f3b8ac38f4 misc package validation doc improvements (glozow)
Pull request description:
This PR implements a function to calculate mempool ancestors for a package and enforces ancestor/descendant limits on them as a whole. It reuses a portion of `CalculateMemPoolAncestors()`; there's also a small refactor to move the reused code into a generic helper function. Instead of calculating ancestors and descendants on every single transaction in the package and their ancestors, we use a "worst case" heuristic, treating every transaction in the package as each other's ancestor and descendant. This may overestimate everyone's counts, but is still pretty accurate in the our main package use cases, in which at least one of the transactions in the package is directly related to all the others (e.g. 1 parent + 1 child, multiple parents with 1 child, or chains).
Note on Terminology: While "package" is often used to describe groups of related transactions _within_ the mempool, here, I only use package to mean the group of not-in-mempool transactions we are currently validating.
#### Motivation
It would be a potential DoS vector to allow submission of packages to mempool without a proper guard for mempool ancestors/descendants. In general, the purpose of mempool ancestor/descendant limits is to limit the computational complexity of dealing with families during removals and additions. We want to be able to validate multiple transactions on top of the mempool, but also avoid these scenarios:
- We underestimate the ancestors/descendants during package validation and end up with extremely complex families in our mempool (potentially a DoS vector).
- We expend an unreasonable amount of resources calculating everyone's ancestors and descendants during package validation.
ACKs for top commit:
JeremyRubin:
utACK accf3d5
ariard:
ACK accf3d5.
Tree-SHA512: 0d18ce4b77398fe872e0b7c2cc66d3aac2135e561b64029584339e1f4de2a6a16ebab3dd5784f376e119cbafc4d50168b28d3bd95d0b3d01158714ade2e3624d
Signed-off-by: Vijay <vijaydas.mp@gmail.com>
2021-08-09 05:53:10 +02:00
size_t totalSizeWithAncestors = entry_size ;
2014-03-17 13:19:54 +01:00
Merge #19478: Remove CTxMempool::mapLinks data structure member
296be8f58e02b39a58f017c52294aceed22c3ffd Get rid of unused functions CTxMemPool::GetMemPoolChildren, CTxMemPool::GetMemPoolParents (Jeremy Rubin)
46d955d196043cc297834baeebce31ff778dff80 Remove mapLinks in favor of entry inlined structs with iterator type erasure (Jeremy Rubin)
Pull request description:
Currently we have a peculiar data structure in the mempool called maplinks. Maplinks job is to track the in-pool children and parents of each transaction. This PR can be primarily understood and reviewed as a simple refactoring to remove this extra data structure, although it comes with a nice memory and performance improvement for free.
Maplinks is particularly peculiar because removing it is not as simple as just moving it's inner structure to the owning CTxMempoolEntry. Because TxLinks (the class storing the setEntries for parents and children) store txiters to each entry in the mempool corresponding to the parent or child, it means that the TxLinks type is "aware" of the boost multiindex (mapTx) it's coming from, which is in turn, aware of the entry type stored in mapTx. Thus we used maplinks to store this entry associated data we in an entirely separate data structure just to avoid a circular type reference caused by storing a txiter inside a CTxMempoolEntry.
It turns out, we can kill this circular reference by making use of iterator_to multiindex function and std::reference_wrapper. This allows us to get rid of the maplinks data structure and move the ownership of the parents/child sets to the entries themselves.
The benefit of this good all around, for any of the reasons given below the change would be acceptable, and it doesn't make the code harder to reason about or worse in any respect (as far as I can tell, there's no tradeoff).
### Simpler ownership model
No longer having to consistency check that mapLinks did have records for our CTxMempoolEntry, impossible to have a mapLinks entry outlive or incorrectly die before a CTxMempoolEntry.
### Memory Usage
We get rid of a O(Transactions) sized map in the mempool, which is a long lived data structure.
### Performance
If you have a CTxMemPoolEntry, you immediately know the address of it's children/parents, rather than having to do a O(log(Transactions)) lookup via maplinks (which we do very often). We do it in *so many* places that a true benchmark has to look at a full running node, but it is easy enough to show an improvement in this case.
The ComplexMemPool shows a good coherence check that we see the expected result of it being 12.5% faster / 1.14x faster.
```
Before:
# Benchmark, evals, iterations, total, min, max, median
ComplexMemPool, 5, 1, 1.40462, 0.277222, 0.285339, 0.279793
After:
# Benchmark, evals, iterations, total, min, max, median
ComplexMemPool, 5, 1, 1.22586, 0.243831, 0.247076, 0.244596
```
The ComplexMemPool benchmark only checks doing addUnchecked and TrimToSize for 800 transactions. While this bench does a good job of hammering the relevant types of function, it doesn't test everything.
Subbing in 5000 transactions shows a that the advantage isn't completely wiped out by other asymptotic factors (this isn't the only bottleneck in growing the mempool), but it's only a bit proportionally slower (10.8%, 1.12x), which adds evidence that this will be a good change for performance minded users.
```
# Benchmark, evals, iterations, total, min, max, median
ComplexMemPool, 5, 1, 59.1321, 11.5919, 12.235, 11.7068
# Benchmark, evals, iterations, total, min, max, median
ComplexMemPool, 5, 1, 52.1307, 10.2641, 10.5206, 10.4306
```
I don't think it's possible to come up with an example of where a maplinks based design would have better performance, but it's something for reviewers to consider.
# Discussion
## Why maplinks in the first place?
I spoke with the author of mapLinks (sdaftuar) a while back, and my recollection from our conversation was that it was implemented because he did not know how to resolve the circular dependency at the time, and there was no other reason for making it a separate map.
## Is iterator_to weird?
iterator_to is expressly for this purpose, see https://www.boost.org/doc/libs/1_51_0/libs/multi_index/doc/tutorial/indices.html#iterator_to
> iterator_to provides a way to retrieve an iterator to an element from a pointer to the element, thus making iterators and pointers interchangeable for the purposes of element pointing (not so for traversal) in many situations. This notwithstanding, it is not the aim of iterator_to to promote the usage of pointers as substitutes for real iterators: the latter are specifically designed for handling the elements of a container, and not only benefit from the iterator orientation of container interfaces, but are also capable of exposing many more programming bugs than raw pointers, both at compile and run time. iterator_to is thus meant to be used in scenarios where access via iterators is not suitable or desireable:
>
> - Interoperability with preexisting APIs based on pointers or references.
> - Publication of pointer-based interfaces (for instance, when designing a C-compatible library).
> - The exposure of pointers in place of iterators can act as a type erasure barrier effectively decoupling the user of the code from the implementation detail of which particular container is being used. Similar techniques, like the famous Pimpl idiom, are used in large projects to reduce dependencies and build times.
> - Self-referencing contexts where an element acts upon its owner container and no iterator to itself is available.
In other words, iterator_to is the perfect tool for the job by the last reason given. Under the hood it should just be a simple pointer cast and have no major runtime overhead (depending on if the function call is inlined).
Edit by laanwj: removed at sign from the description
ACKs for top commit:
jonatack:
re-ACK 296be8f per `git range-diff ab338a19 3ba1665 296be8f`, sanity check gcc 10.2 debug build is clean.
hebasto:
re-ACK 296be8f58e02b39a58f017c52294aceed22c3ffd, only rebased since my [previous](https://github.com/bitcoin/bitcoin/pull/19478#pullrequestreview-482400727) review (verified with `git range-diff`).
Tree-SHA512: f5c30a4936fcde6ae32a02823c303b3568a747c2681d11f87df88a149f984a6d3b4c81f391859afbeb68864ef7f6a3d8779f74a58e3de701b3d51f78e498682e
2020-09-07 12:06:38 +02:00
while ( ! staged_ancestors . empty ( ) ) {
const CTxMemPoolEntry & stage = staged_ancestors . begin ( ) - > get ( ) ;
txiter stageit = mapTx . iterator_to ( stage ) ;
2014-03-17 13:19:54 +01:00
2015-07-15 20:47:45 +02:00
setAncestors . insert ( stageit ) ;
Merge #19478: Remove CTxMempool::mapLinks data structure member
296be8f58e02b39a58f017c52294aceed22c3ffd Get rid of unused functions CTxMemPool::GetMemPoolChildren, CTxMemPool::GetMemPoolParents (Jeremy Rubin)
46d955d196043cc297834baeebce31ff778dff80 Remove mapLinks in favor of entry inlined structs with iterator type erasure (Jeremy Rubin)
Pull request description:
Currently we have a peculiar data structure in the mempool called maplinks. Maplinks job is to track the in-pool children and parents of each transaction. This PR can be primarily understood and reviewed as a simple refactoring to remove this extra data structure, although it comes with a nice memory and performance improvement for free.
Maplinks is particularly peculiar because removing it is not as simple as just moving it's inner structure to the owning CTxMempoolEntry. Because TxLinks (the class storing the setEntries for parents and children) store txiters to each entry in the mempool corresponding to the parent or child, it means that the TxLinks type is "aware" of the boost multiindex (mapTx) it's coming from, which is in turn, aware of the entry type stored in mapTx. Thus we used maplinks to store this entry associated data we in an entirely separate data structure just to avoid a circular type reference caused by storing a txiter inside a CTxMempoolEntry.
It turns out, we can kill this circular reference by making use of iterator_to multiindex function and std::reference_wrapper. This allows us to get rid of the maplinks data structure and move the ownership of the parents/child sets to the entries themselves.
The benefit of this good all around, for any of the reasons given below the change would be acceptable, and it doesn't make the code harder to reason about or worse in any respect (as far as I can tell, there's no tradeoff).
### Simpler ownership model
No longer having to consistency check that mapLinks did have records for our CTxMempoolEntry, impossible to have a mapLinks entry outlive or incorrectly die before a CTxMempoolEntry.
### Memory Usage
We get rid of a O(Transactions) sized map in the mempool, which is a long lived data structure.
### Performance
If you have a CTxMemPoolEntry, you immediately know the address of it's children/parents, rather than having to do a O(log(Transactions)) lookup via maplinks (which we do very often). We do it in *so many* places that a true benchmark has to look at a full running node, but it is easy enough to show an improvement in this case.
The ComplexMemPool shows a good coherence check that we see the expected result of it being 12.5% faster / 1.14x faster.
```
Before:
# Benchmark, evals, iterations, total, min, max, median
ComplexMemPool, 5, 1, 1.40462, 0.277222, 0.285339, 0.279793
After:
# Benchmark, evals, iterations, total, min, max, median
ComplexMemPool, 5, 1, 1.22586, 0.243831, 0.247076, 0.244596
```
The ComplexMemPool benchmark only checks doing addUnchecked and TrimToSize for 800 transactions. While this bench does a good job of hammering the relevant types of function, it doesn't test everything.
Subbing in 5000 transactions shows a that the advantage isn't completely wiped out by other asymptotic factors (this isn't the only bottleneck in growing the mempool), but it's only a bit proportionally slower (10.8%, 1.12x), which adds evidence that this will be a good change for performance minded users.
```
# Benchmark, evals, iterations, total, min, max, median
ComplexMemPool, 5, 1, 59.1321, 11.5919, 12.235, 11.7068
# Benchmark, evals, iterations, total, min, max, median
ComplexMemPool, 5, 1, 52.1307, 10.2641, 10.5206, 10.4306
```
I don't think it's possible to come up with an example of where a maplinks based design would have better performance, but it's something for reviewers to consider.
# Discussion
## Why maplinks in the first place?
I spoke with the author of mapLinks (sdaftuar) a while back, and my recollection from our conversation was that it was implemented because he did not know how to resolve the circular dependency at the time, and there was no other reason for making it a separate map.
## Is iterator_to weird?
iterator_to is expressly for this purpose, see https://www.boost.org/doc/libs/1_51_0/libs/multi_index/doc/tutorial/indices.html#iterator_to
> iterator_to provides a way to retrieve an iterator to an element from a pointer to the element, thus making iterators and pointers interchangeable for the purposes of element pointing (not so for traversal) in many situations. This notwithstanding, it is not the aim of iterator_to to promote the usage of pointers as substitutes for real iterators: the latter are specifically designed for handling the elements of a container, and not only benefit from the iterator orientation of container interfaces, but are also capable of exposing many more programming bugs than raw pointers, both at compile and run time. iterator_to is thus meant to be used in scenarios where access via iterators is not suitable or desireable:
>
> - Interoperability with preexisting APIs based on pointers or references.
> - Publication of pointer-based interfaces (for instance, when designing a C-compatible library).
> - The exposure of pointers in place of iterators can act as a type erasure barrier effectively decoupling the user of the code from the implementation detail of which particular container is being used. Similar techniques, like the famous Pimpl idiom, are used in large projects to reduce dependencies and build times.
> - Self-referencing contexts where an element acts upon its owner container and no iterator to itself is available.
In other words, iterator_to is the perfect tool for the job by the last reason given. Under the hood it should just be a simple pointer cast and have no major runtime overhead (depending on if the function call is inlined).
Edit by laanwj: removed at sign from the description
ACKs for top commit:
jonatack:
re-ACK 296be8f per `git range-diff ab338a19 3ba1665 296be8f`, sanity check gcc 10.2 debug build is clean.
hebasto:
re-ACK 296be8f58e02b39a58f017c52294aceed22c3ffd, only rebased since my [previous](https://github.com/bitcoin/bitcoin/pull/19478#pullrequestreview-482400727) review (verified with `git range-diff`).
Tree-SHA512: f5c30a4936fcde6ae32a02823c303b3568a747c2681d11f87df88a149f984a6d3b4c81f391859afbeb68864ef7f6a3d8779f74a58e3de701b3d51f78e498682e
2020-09-07 12:06:38 +02:00
staged_ancestors . erase ( stage ) ;
2015-07-15 20:47:45 +02:00
totalSizeWithAncestors + = stageit - > GetTxSize ( ) ;
2014-03-17 13:19:54 +01:00
Merge bitcoin/bitcoin#21800: mempool/validation: mempool ancestor/descendant limits for packages
accf3d5868460b4b14ab607fd66ac985b086fbb3 [test] mempool package ancestor/descendant limits (glozow)
2b6b26e57c24d2f0abd442c1c33098e3121572ce [test] parameterizable fee for make_chain and create_child_with_parents (glozow)
313c09f7b7beddfdb74c284720d209c81dfdb94f [test] helper function to increase transaction weight (glozow)
f8253d69d6f02850995a11eeb71fedc22e6f6575 extract/rename helper functions from rpc_packages.py (glozow)
3cd663a5d33aa7ef87994e452bced7f192d021a0 [policy] ancestor/descendant limits for packages (glozow)
c6e016aa139c8363e9b38bbc1ba0dca55700b8a7 [mempool] check ancestor/descendant limits for packages (glozow)
f551841d3ec080a2d7a7988c7b35088dff6c5830 [refactor] pass size/count instead of entry to CalculateAncestorsAndCheckLimits (glozow)
97dd1c729d2bbedf9527b914c0cc8267b8a7c21b MOVEONLY: add helper function for calculating ancestors and checking limits (glozow)
f95bbf58aaf72aab8a9c5827b1f162f3b8ac38f4 misc package validation doc improvements (glozow)
Pull request description:
This PR implements a function to calculate mempool ancestors for a package and enforces ancestor/descendant limits on them as a whole. It reuses a portion of `CalculateMemPoolAncestors()`; there's also a small refactor to move the reused code into a generic helper function. Instead of calculating ancestors and descendants on every single transaction in the package and their ancestors, we use a "worst case" heuristic, treating every transaction in the package as each other's ancestor and descendant. This may overestimate everyone's counts, but is still pretty accurate in the our main package use cases, in which at least one of the transactions in the package is directly related to all the others (e.g. 1 parent + 1 child, multiple parents with 1 child, or chains).
Note on Terminology: While "package" is often used to describe groups of related transactions _within_ the mempool, here, I only use package to mean the group of not-in-mempool transactions we are currently validating.
#### Motivation
It would be a potential DoS vector to allow submission of packages to mempool without a proper guard for mempool ancestors/descendants. In general, the purpose of mempool ancestor/descendant limits is to limit the computational complexity of dealing with families during removals and additions. We want to be able to validate multiple transactions on top of the mempool, but also avoid these scenarios:
- We underestimate the ancestors/descendants during package validation and end up with extremely complex families in our mempool (potentially a DoS vector).
- We expend an unreasonable amount of resources calculating everyone's ancestors and descendants during package validation.
ACKs for top commit:
JeremyRubin:
utACK accf3d5
ariard:
ACK accf3d5.
Tree-SHA512: 0d18ce4b77398fe872e0b7c2cc66d3aac2135e561b64029584339e1f4de2a6a16ebab3dd5784f376e119cbafc4d50168b28d3bd95d0b3d01158714ade2e3624d
Signed-off-by: Vijay <vijaydas.mp@gmail.com>
2021-08-09 05:53:10 +02:00
if ( stageit - > GetSizeWithDescendants ( ) + entry_size > limitDescendantSize ) {
2015-07-15 20:47:45 +02:00
errString = strprintf ( " exceeds descendant size limit for tx %s [limit: %u] " , stageit - > GetTx ( ) . GetHash ( ) . ToString ( ) , limitDescendantSize ) ;
return false ;
Merge bitcoin/bitcoin#21800: mempool/validation: mempool ancestor/descendant limits for packages
accf3d5868460b4b14ab607fd66ac985b086fbb3 [test] mempool package ancestor/descendant limits (glozow)
2b6b26e57c24d2f0abd442c1c33098e3121572ce [test] parameterizable fee for make_chain and create_child_with_parents (glozow)
313c09f7b7beddfdb74c284720d209c81dfdb94f [test] helper function to increase transaction weight (glozow)
f8253d69d6f02850995a11eeb71fedc22e6f6575 extract/rename helper functions from rpc_packages.py (glozow)
3cd663a5d33aa7ef87994e452bced7f192d021a0 [policy] ancestor/descendant limits for packages (glozow)
c6e016aa139c8363e9b38bbc1ba0dca55700b8a7 [mempool] check ancestor/descendant limits for packages (glozow)
f551841d3ec080a2d7a7988c7b35088dff6c5830 [refactor] pass size/count instead of entry to CalculateAncestorsAndCheckLimits (glozow)
97dd1c729d2bbedf9527b914c0cc8267b8a7c21b MOVEONLY: add helper function for calculating ancestors and checking limits (glozow)
f95bbf58aaf72aab8a9c5827b1f162f3b8ac38f4 misc package validation doc improvements (glozow)
Pull request description:
This PR implements a function to calculate mempool ancestors for a package and enforces ancestor/descendant limits on them as a whole. It reuses a portion of `CalculateMemPoolAncestors()`; there's also a small refactor to move the reused code into a generic helper function. Instead of calculating ancestors and descendants on every single transaction in the package and their ancestors, we use a "worst case" heuristic, treating every transaction in the package as each other's ancestor and descendant. This may overestimate everyone's counts, but is still pretty accurate in the our main package use cases, in which at least one of the transactions in the package is directly related to all the others (e.g. 1 parent + 1 child, multiple parents with 1 child, or chains).
Note on Terminology: While "package" is often used to describe groups of related transactions _within_ the mempool, here, I only use package to mean the group of not-in-mempool transactions we are currently validating.
#### Motivation
It would be a potential DoS vector to allow submission of packages to mempool without a proper guard for mempool ancestors/descendants. In general, the purpose of mempool ancestor/descendant limits is to limit the computational complexity of dealing with families during removals and additions. We want to be able to validate multiple transactions on top of the mempool, but also avoid these scenarios:
- We underestimate the ancestors/descendants during package validation and end up with extremely complex families in our mempool (potentially a DoS vector).
- We expend an unreasonable amount of resources calculating everyone's ancestors and descendants during package validation.
ACKs for top commit:
JeremyRubin:
utACK accf3d5
ariard:
ACK accf3d5.
Tree-SHA512: 0d18ce4b77398fe872e0b7c2cc66d3aac2135e561b64029584339e1f4de2a6a16ebab3dd5784f376e119cbafc4d50168b28d3bd95d0b3d01158714ade2e3624d
Signed-off-by: Vijay <vijaydas.mp@gmail.com>
2021-08-09 05:53:10 +02:00
} else if ( stageit - > GetCountWithDescendants ( ) + entry_count > limitDescendantCount ) {
2015-07-15 20:47:45 +02:00
errString = strprintf ( " too many descendants for tx %s [limit: %u] " , stageit - > GetTx ( ) . GetHash ( ) . ToString ( ) , limitDescendantCount ) ;
return false ;
} else if ( totalSizeWithAncestors > limitAncestorSize ) {
errString = strprintf ( " exceeds ancestor size limit [limit: %u] " , limitAncestorSize ) ;
return false ;
}
2014-03-17 13:19:54 +01:00
Merge #19478: Remove CTxMempool::mapLinks data structure member
296be8f58e02b39a58f017c52294aceed22c3ffd Get rid of unused functions CTxMemPool::GetMemPoolChildren, CTxMemPool::GetMemPoolParents (Jeremy Rubin)
46d955d196043cc297834baeebce31ff778dff80 Remove mapLinks in favor of entry inlined structs with iterator type erasure (Jeremy Rubin)
Pull request description:
Currently we have a peculiar data structure in the mempool called maplinks. Maplinks job is to track the in-pool children and parents of each transaction. This PR can be primarily understood and reviewed as a simple refactoring to remove this extra data structure, although it comes with a nice memory and performance improvement for free.
Maplinks is particularly peculiar because removing it is not as simple as just moving it's inner structure to the owning CTxMempoolEntry. Because TxLinks (the class storing the setEntries for parents and children) store txiters to each entry in the mempool corresponding to the parent or child, it means that the TxLinks type is "aware" of the boost multiindex (mapTx) it's coming from, which is in turn, aware of the entry type stored in mapTx. Thus we used maplinks to store this entry associated data we in an entirely separate data structure just to avoid a circular type reference caused by storing a txiter inside a CTxMempoolEntry.
It turns out, we can kill this circular reference by making use of iterator_to multiindex function and std::reference_wrapper. This allows us to get rid of the maplinks data structure and move the ownership of the parents/child sets to the entries themselves.
The benefit of this good all around, for any of the reasons given below the change would be acceptable, and it doesn't make the code harder to reason about or worse in any respect (as far as I can tell, there's no tradeoff).
### Simpler ownership model
No longer having to consistency check that mapLinks did have records for our CTxMempoolEntry, impossible to have a mapLinks entry outlive or incorrectly die before a CTxMempoolEntry.
### Memory Usage
We get rid of a O(Transactions) sized map in the mempool, which is a long lived data structure.
### Performance
If you have a CTxMemPoolEntry, you immediately know the address of it's children/parents, rather than having to do a O(log(Transactions)) lookup via maplinks (which we do very often). We do it in *so many* places that a true benchmark has to look at a full running node, but it is easy enough to show an improvement in this case.
The ComplexMemPool shows a good coherence check that we see the expected result of it being 12.5% faster / 1.14x faster.
```
Before:
# Benchmark, evals, iterations, total, min, max, median
ComplexMemPool, 5, 1, 1.40462, 0.277222, 0.285339, 0.279793
After:
# Benchmark, evals, iterations, total, min, max, median
ComplexMemPool, 5, 1, 1.22586, 0.243831, 0.247076, 0.244596
```
The ComplexMemPool benchmark only checks doing addUnchecked and TrimToSize for 800 transactions. While this bench does a good job of hammering the relevant types of function, it doesn't test everything.
Subbing in 5000 transactions shows a that the advantage isn't completely wiped out by other asymptotic factors (this isn't the only bottleneck in growing the mempool), but it's only a bit proportionally slower (10.8%, 1.12x), which adds evidence that this will be a good change for performance minded users.
```
# Benchmark, evals, iterations, total, min, max, median
ComplexMemPool, 5, 1, 59.1321, 11.5919, 12.235, 11.7068
# Benchmark, evals, iterations, total, min, max, median
ComplexMemPool, 5, 1, 52.1307, 10.2641, 10.5206, 10.4306
```
I don't think it's possible to come up with an example of where a maplinks based design would have better performance, but it's something for reviewers to consider.
# Discussion
## Why maplinks in the first place?
I spoke with the author of mapLinks (sdaftuar) a while back, and my recollection from our conversation was that it was implemented because he did not know how to resolve the circular dependency at the time, and there was no other reason for making it a separate map.
## Is iterator_to weird?
iterator_to is expressly for this purpose, see https://www.boost.org/doc/libs/1_51_0/libs/multi_index/doc/tutorial/indices.html#iterator_to
> iterator_to provides a way to retrieve an iterator to an element from a pointer to the element, thus making iterators and pointers interchangeable for the purposes of element pointing (not so for traversal) in many situations. This notwithstanding, it is not the aim of iterator_to to promote the usage of pointers as substitutes for real iterators: the latter are specifically designed for handling the elements of a container, and not only benefit from the iterator orientation of container interfaces, but are also capable of exposing many more programming bugs than raw pointers, both at compile and run time. iterator_to is thus meant to be used in scenarios where access via iterators is not suitable or desireable:
>
> - Interoperability with preexisting APIs based on pointers or references.
> - Publication of pointer-based interfaces (for instance, when designing a C-compatible library).
> - The exposure of pointers in place of iterators can act as a type erasure barrier effectively decoupling the user of the code from the implementation detail of which particular container is being used. Similar techniques, like the famous Pimpl idiom, are used in large projects to reduce dependencies and build times.
> - Self-referencing contexts where an element acts upon its owner container and no iterator to itself is available.
In other words, iterator_to is the perfect tool for the job by the last reason given. Under the hood it should just be a simple pointer cast and have no major runtime overhead (depending on if the function call is inlined).
Edit by laanwj: removed at sign from the description
ACKs for top commit:
jonatack:
re-ACK 296be8f per `git range-diff ab338a19 3ba1665 296be8f`, sanity check gcc 10.2 debug build is clean.
hebasto:
re-ACK 296be8f58e02b39a58f017c52294aceed22c3ffd, only rebased since my [previous](https://github.com/bitcoin/bitcoin/pull/19478#pullrequestreview-482400727) review (verified with `git range-diff`).
Tree-SHA512: f5c30a4936fcde6ae32a02823c303b3568a747c2681d11f87df88a149f984a6d3b4c81f391859afbeb68864ef7f6a3d8779f74a58e3de701b3d51f78e498682e
2020-09-07 12:06:38 +02:00
const CTxMemPoolEntry : : Parents & parents = stageit - > GetMemPoolParentsConst ( ) ;
for ( const CTxMemPoolEntry & parent : parents ) {
txiter parent_it = mapTx . iterator_to ( parent ) ;
2015-07-15 20:47:45 +02:00
// If this is a new ancestor, add it.
Merge #19478: Remove CTxMempool::mapLinks data structure member
296be8f58e02b39a58f017c52294aceed22c3ffd Get rid of unused functions CTxMemPool::GetMemPoolChildren, CTxMemPool::GetMemPoolParents (Jeremy Rubin)
46d955d196043cc297834baeebce31ff778dff80 Remove mapLinks in favor of entry inlined structs with iterator type erasure (Jeremy Rubin)
Pull request description:
Currently we have a peculiar data structure in the mempool called maplinks. Maplinks job is to track the in-pool children and parents of each transaction. This PR can be primarily understood and reviewed as a simple refactoring to remove this extra data structure, although it comes with a nice memory and performance improvement for free.
Maplinks is particularly peculiar because removing it is not as simple as just moving it's inner structure to the owning CTxMempoolEntry. Because TxLinks (the class storing the setEntries for parents and children) store txiters to each entry in the mempool corresponding to the parent or child, it means that the TxLinks type is "aware" of the boost multiindex (mapTx) it's coming from, which is in turn, aware of the entry type stored in mapTx. Thus we used maplinks to store this entry associated data we in an entirely separate data structure just to avoid a circular type reference caused by storing a txiter inside a CTxMempoolEntry.
It turns out, we can kill this circular reference by making use of iterator_to multiindex function and std::reference_wrapper. This allows us to get rid of the maplinks data structure and move the ownership of the parents/child sets to the entries themselves.
The benefit of this good all around, for any of the reasons given below the change would be acceptable, and it doesn't make the code harder to reason about or worse in any respect (as far as I can tell, there's no tradeoff).
### Simpler ownership model
No longer having to consistency check that mapLinks did have records for our CTxMempoolEntry, impossible to have a mapLinks entry outlive or incorrectly die before a CTxMempoolEntry.
### Memory Usage
We get rid of a O(Transactions) sized map in the mempool, which is a long lived data structure.
### Performance
If you have a CTxMemPoolEntry, you immediately know the address of it's children/parents, rather than having to do a O(log(Transactions)) lookup via maplinks (which we do very often). We do it in *so many* places that a true benchmark has to look at a full running node, but it is easy enough to show an improvement in this case.
The ComplexMemPool shows a good coherence check that we see the expected result of it being 12.5% faster / 1.14x faster.
```
Before:
# Benchmark, evals, iterations, total, min, max, median
ComplexMemPool, 5, 1, 1.40462, 0.277222, 0.285339, 0.279793
After:
# Benchmark, evals, iterations, total, min, max, median
ComplexMemPool, 5, 1, 1.22586, 0.243831, 0.247076, 0.244596
```
The ComplexMemPool benchmark only checks doing addUnchecked and TrimToSize for 800 transactions. While this bench does a good job of hammering the relevant types of function, it doesn't test everything.
Subbing in 5000 transactions shows a that the advantage isn't completely wiped out by other asymptotic factors (this isn't the only bottleneck in growing the mempool), but it's only a bit proportionally slower (10.8%, 1.12x), which adds evidence that this will be a good change for performance minded users.
```
# Benchmark, evals, iterations, total, min, max, median
ComplexMemPool, 5, 1, 59.1321, 11.5919, 12.235, 11.7068
# Benchmark, evals, iterations, total, min, max, median
ComplexMemPool, 5, 1, 52.1307, 10.2641, 10.5206, 10.4306
```
I don't think it's possible to come up with an example of where a maplinks based design would have better performance, but it's something for reviewers to consider.
# Discussion
## Why maplinks in the first place?
I spoke with the author of mapLinks (sdaftuar) a while back, and my recollection from our conversation was that it was implemented because he did not know how to resolve the circular dependency at the time, and there was no other reason for making it a separate map.
## Is iterator_to weird?
iterator_to is expressly for this purpose, see https://www.boost.org/doc/libs/1_51_0/libs/multi_index/doc/tutorial/indices.html#iterator_to
> iterator_to provides a way to retrieve an iterator to an element from a pointer to the element, thus making iterators and pointers interchangeable for the purposes of element pointing (not so for traversal) in many situations. This notwithstanding, it is not the aim of iterator_to to promote the usage of pointers as substitutes for real iterators: the latter are specifically designed for handling the elements of a container, and not only benefit from the iterator orientation of container interfaces, but are also capable of exposing many more programming bugs than raw pointers, both at compile and run time. iterator_to is thus meant to be used in scenarios where access via iterators is not suitable or desireable:
>
> - Interoperability with preexisting APIs based on pointers or references.
> - Publication of pointer-based interfaces (for instance, when designing a C-compatible library).
> - The exposure of pointers in place of iterators can act as a type erasure barrier effectively decoupling the user of the code from the implementation detail of which particular container is being used. Similar techniques, like the famous Pimpl idiom, are used in large projects to reduce dependencies and build times.
> - Self-referencing contexts where an element acts upon its owner container and no iterator to itself is available.
In other words, iterator_to is the perfect tool for the job by the last reason given. Under the hood it should just be a simple pointer cast and have no major runtime overhead (depending on if the function call is inlined).
Edit by laanwj: removed at sign from the description
ACKs for top commit:
jonatack:
re-ACK 296be8f per `git range-diff ab338a19 3ba1665 296be8f`, sanity check gcc 10.2 debug build is clean.
hebasto:
re-ACK 296be8f58e02b39a58f017c52294aceed22c3ffd, only rebased since my [previous](https://github.com/bitcoin/bitcoin/pull/19478#pullrequestreview-482400727) review (verified with `git range-diff`).
Tree-SHA512: f5c30a4936fcde6ae32a02823c303b3568a747c2681d11f87df88a149f984a6d3b4c81f391859afbeb68864ef7f6a3d8779f74a58e3de701b3d51f78e498682e
2020-09-07 12:06:38 +02:00
if ( setAncestors . count ( parent_it ) = = 0 ) {
staged_ancestors . insert ( parent ) ;
2015-07-15 20:47:45 +02:00
}
Merge bitcoin/bitcoin#21800: mempool/validation: mempool ancestor/descendant limits for packages
accf3d5868460b4b14ab607fd66ac985b086fbb3 [test] mempool package ancestor/descendant limits (glozow)
2b6b26e57c24d2f0abd442c1c33098e3121572ce [test] parameterizable fee for make_chain and create_child_with_parents (glozow)
313c09f7b7beddfdb74c284720d209c81dfdb94f [test] helper function to increase transaction weight (glozow)
f8253d69d6f02850995a11eeb71fedc22e6f6575 extract/rename helper functions from rpc_packages.py (glozow)
3cd663a5d33aa7ef87994e452bced7f192d021a0 [policy] ancestor/descendant limits for packages (glozow)
c6e016aa139c8363e9b38bbc1ba0dca55700b8a7 [mempool] check ancestor/descendant limits for packages (glozow)
f551841d3ec080a2d7a7988c7b35088dff6c5830 [refactor] pass size/count instead of entry to CalculateAncestorsAndCheckLimits (glozow)
97dd1c729d2bbedf9527b914c0cc8267b8a7c21b MOVEONLY: add helper function for calculating ancestors and checking limits (glozow)
f95bbf58aaf72aab8a9c5827b1f162f3b8ac38f4 misc package validation doc improvements (glozow)
Pull request description:
This PR implements a function to calculate mempool ancestors for a package and enforces ancestor/descendant limits on them as a whole. It reuses a portion of `CalculateMemPoolAncestors()`; there's also a small refactor to move the reused code into a generic helper function. Instead of calculating ancestors and descendants on every single transaction in the package and their ancestors, we use a "worst case" heuristic, treating every transaction in the package as each other's ancestor and descendant. This may overestimate everyone's counts, but is still pretty accurate in the our main package use cases, in which at least one of the transactions in the package is directly related to all the others (e.g. 1 parent + 1 child, multiple parents with 1 child, or chains).
Note on Terminology: While "package" is often used to describe groups of related transactions _within_ the mempool, here, I only use package to mean the group of not-in-mempool transactions we are currently validating.
#### Motivation
It would be a potential DoS vector to allow submission of packages to mempool without a proper guard for mempool ancestors/descendants. In general, the purpose of mempool ancestor/descendant limits is to limit the computational complexity of dealing with families during removals and additions. We want to be able to validate multiple transactions on top of the mempool, but also avoid these scenarios:
- We underestimate the ancestors/descendants during package validation and end up with extremely complex families in our mempool (potentially a DoS vector).
- We expend an unreasonable amount of resources calculating everyone's ancestors and descendants during package validation.
ACKs for top commit:
JeremyRubin:
utACK accf3d5
ariard:
ACK accf3d5.
Tree-SHA512: 0d18ce4b77398fe872e0b7c2cc66d3aac2135e561b64029584339e1f4de2a6a16ebab3dd5784f376e119cbafc4d50168b28d3bd95d0b3d01158714ade2e3624d
Signed-off-by: Vijay <vijaydas.mp@gmail.com>
2021-08-09 05:53:10 +02:00
if ( staged_ancestors . size ( ) + setAncestors . size ( ) + entry_count > limitAncestorCount ) {
2015-07-15 20:47:45 +02:00
errString = strprintf ( " too many unconfirmed ancestors [limit: %u] " , limitAncestorCount ) ;
return false ;
}
}
2014-03-17 13:19:54 +01:00
}
2015-07-15 20:47:45 +02:00
return true ;
}
2014-03-17 13:19:54 +01:00
Merge bitcoin/bitcoin#21800: mempool/validation: mempool ancestor/descendant limits for packages
accf3d5868460b4b14ab607fd66ac985b086fbb3 [test] mempool package ancestor/descendant limits (glozow)
2b6b26e57c24d2f0abd442c1c33098e3121572ce [test] parameterizable fee for make_chain and create_child_with_parents (glozow)
313c09f7b7beddfdb74c284720d209c81dfdb94f [test] helper function to increase transaction weight (glozow)
f8253d69d6f02850995a11eeb71fedc22e6f6575 extract/rename helper functions from rpc_packages.py (glozow)
3cd663a5d33aa7ef87994e452bced7f192d021a0 [policy] ancestor/descendant limits for packages (glozow)
c6e016aa139c8363e9b38bbc1ba0dca55700b8a7 [mempool] check ancestor/descendant limits for packages (glozow)
f551841d3ec080a2d7a7988c7b35088dff6c5830 [refactor] pass size/count instead of entry to CalculateAncestorsAndCheckLimits (glozow)
97dd1c729d2bbedf9527b914c0cc8267b8a7c21b MOVEONLY: add helper function for calculating ancestors and checking limits (glozow)
f95bbf58aaf72aab8a9c5827b1f162f3b8ac38f4 misc package validation doc improvements (glozow)
Pull request description:
This PR implements a function to calculate mempool ancestors for a package and enforces ancestor/descendant limits on them as a whole. It reuses a portion of `CalculateMemPoolAncestors()`; there's also a small refactor to move the reused code into a generic helper function. Instead of calculating ancestors and descendants on every single transaction in the package and their ancestors, we use a "worst case" heuristic, treating every transaction in the package as each other's ancestor and descendant. This may overestimate everyone's counts, but is still pretty accurate in the our main package use cases, in which at least one of the transactions in the package is directly related to all the others (e.g. 1 parent + 1 child, multiple parents with 1 child, or chains).
Note on Terminology: While "package" is often used to describe groups of related transactions _within_ the mempool, here, I only use package to mean the group of not-in-mempool transactions we are currently validating.
#### Motivation
It would be a potential DoS vector to allow submission of packages to mempool without a proper guard for mempool ancestors/descendants. In general, the purpose of mempool ancestor/descendant limits is to limit the computational complexity of dealing with families during removals and additions. We want to be able to validate multiple transactions on top of the mempool, but also avoid these scenarios:
- We underestimate the ancestors/descendants during package validation and end up with extremely complex families in our mempool (potentially a DoS vector).
- We expend an unreasonable amount of resources calculating everyone's ancestors and descendants during package validation.
ACKs for top commit:
JeremyRubin:
utACK accf3d5
ariard:
ACK accf3d5.
Tree-SHA512: 0d18ce4b77398fe872e0b7c2cc66d3aac2135e561b64029584339e1f4de2a6a16ebab3dd5784f376e119cbafc4d50168b28d3bd95d0b3d01158714ade2e3624d
Signed-off-by: Vijay <vijaydas.mp@gmail.com>
2021-08-09 05:53:10 +02:00
bool CTxMemPool : : CheckPackageLimits ( const Package & package ,
uint64_t limitAncestorCount ,
uint64_t limitAncestorSize ,
uint64_t limitDescendantCount ,
uint64_t limitDescendantSize ,
std : : string & errString ) const
{
CTxMemPoolEntry : : Parents staged_ancestors ;
size_t total_size = 0 ;
for ( const auto & tx : package ) {
total_size + = GetVirtualTransactionSize ( * tx ) ;
for ( const auto & input : tx - > vin ) {
std : : optional < txiter > piter = GetIter ( input . prevout . hash ) ;
if ( piter ) {
staged_ancestors . insert ( * * piter ) ;
if ( staged_ancestors . size ( ) + package . size ( ) > limitAncestorCount ) {
errString = strprintf ( " too many unconfirmed parents [limit: %u] " , limitAncestorCount ) ;
return false ;
}
}
}
}
// When multiple transactions are passed in, the ancestors and descendants of all transactions
// considered together must be within limits even if they are not interdependent. This may be
// stricter than the limits for each individual transaction.
setEntries setAncestors ;
const auto ret = CalculateAncestorsAndCheckLimits ( total_size , package . size ( ) ,
setAncestors , staged_ancestors ,
limitAncestorCount , limitAncestorSize ,
limitDescendantCount , limitDescendantSize , errString ) ;
// It's possible to overestimate the ancestor/descendant totals.
if ( ! ret ) errString . insert ( 0 , " possibly " ) ;
return ret ;
}
bool CTxMemPool : : CalculateMemPoolAncestors ( const CTxMemPoolEntry & entry ,
setEntries & setAncestors ,
uint64_t limitAncestorCount ,
uint64_t limitAncestorSize ,
uint64_t limitDescendantCount ,
uint64_t limitDescendantSize ,
std : : string & errString ,
bool fSearchForParents /* = true */ ) const
{
CTxMemPoolEntry : : Parents staged_ancestors ;
const CTransaction & tx = entry . GetTx ( ) ;
if ( fSearchForParents ) {
// Get parents of this transaction that are in the mempool
// GetMemPoolParents() is only valid for entries in the mempool, so we
// iterate mapTx to find parents.
for ( unsigned int i = 0 ; i < tx . vin . size ( ) ; i + + ) {
std : : optional < txiter > piter = GetIter ( tx . vin [ i ] . prevout . hash ) ;
if ( piter ) {
staged_ancestors . insert ( * * piter ) ;
if ( staged_ancestors . size ( ) + 1 > limitAncestorCount ) {
errString = strprintf ( " too many unconfirmed parents [limit: %u] " , limitAncestorCount ) ;
return false ;
}
}
}
} else {
// If we're not searching for parents, we require this to already be an
// entry in the mempool and use the entry's cached parents.
txiter it = mapTx . iterator_to ( entry ) ;
staged_ancestors = it - > GetMemPoolParentsConst ( ) ;
}
return CalculateAncestorsAndCheckLimits ( entry . GetTxSize ( ) , /* entry_count */ 1 ,
setAncestors , staged_ancestors ,
limitAncestorCount , limitAncestorSize ,
limitDescendantCount , limitDescendantSize , errString ) ;
}
2015-07-15 20:47:45 +02:00
void CTxMemPool : : UpdateAncestorsOf ( bool add , txiter it , setEntries & setAncestors )
{
Merge #19478: Remove CTxMempool::mapLinks data structure member
296be8f58e02b39a58f017c52294aceed22c3ffd Get rid of unused functions CTxMemPool::GetMemPoolChildren, CTxMemPool::GetMemPoolParents (Jeremy Rubin)
46d955d196043cc297834baeebce31ff778dff80 Remove mapLinks in favor of entry inlined structs with iterator type erasure (Jeremy Rubin)
Pull request description:
Currently we have a peculiar data structure in the mempool called maplinks. Maplinks job is to track the in-pool children and parents of each transaction. This PR can be primarily understood and reviewed as a simple refactoring to remove this extra data structure, although it comes with a nice memory and performance improvement for free.
Maplinks is particularly peculiar because removing it is not as simple as just moving it's inner structure to the owning CTxMempoolEntry. Because TxLinks (the class storing the setEntries for parents and children) store txiters to each entry in the mempool corresponding to the parent or child, it means that the TxLinks type is "aware" of the boost multiindex (mapTx) it's coming from, which is in turn, aware of the entry type stored in mapTx. Thus we used maplinks to store this entry associated data we in an entirely separate data structure just to avoid a circular type reference caused by storing a txiter inside a CTxMempoolEntry.
It turns out, we can kill this circular reference by making use of iterator_to multiindex function and std::reference_wrapper. This allows us to get rid of the maplinks data structure and move the ownership of the parents/child sets to the entries themselves.
The benefit of this good all around, for any of the reasons given below the change would be acceptable, and it doesn't make the code harder to reason about or worse in any respect (as far as I can tell, there's no tradeoff).
### Simpler ownership model
No longer having to consistency check that mapLinks did have records for our CTxMempoolEntry, impossible to have a mapLinks entry outlive or incorrectly die before a CTxMempoolEntry.
### Memory Usage
We get rid of a O(Transactions) sized map in the mempool, which is a long lived data structure.
### Performance
If you have a CTxMemPoolEntry, you immediately know the address of it's children/parents, rather than having to do a O(log(Transactions)) lookup via maplinks (which we do very often). We do it in *so many* places that a true benchmark has to look at a full running node, but it is easy enough to show an improvement in this case.
The ComplexMemPool shows a good coherence check that we see the expected result of it being 12.5% faster / 1.14x faster.
```
Before:
# Benchmark, evals, iterations, total, min, max, median
ComplexMemPool, 5, 1, 1.40462, 0.277222, 0.285339, 0.279793
After:
# Benchmark, evals, iterations, total, min, max, median
ComplexMemPool, 5, 1, 1.22586, 0.243831, 0.247076, 0.244596
```
The ComplexMemPool benchmark only checks doing addUnchecked and TrimToSize for 800 transactions. While this bench does a good job of hammering the relevant types of function, it doesn't test everything.
Subbing in 5000 transactions shows a that the advantage isn't completely wiped out by other asymptotic factors (this isn't the only bottleneck in growing the mempool), but it's only a bit proportionally slower (10.8%, 1.12x), which adds evidence that this will be a good change for performance minded users.
```
# Benchmark, evals, iterations, total, min, max, median
ComplexMemPool, 5, 1, 59.1321, 11.5919, 12.235, 11.7068
# Benchmark, evals, iterations, total, min, max, median
ComplexMemPool, 5, 1, 52.1307, 10.2641, 10.5206, 10.4306
```
I don't think it's possible to come up with an example of where a maplinks based design would have better performance, but it's something for reviewers to consider.
# Discussion
## Why maplinks in the first place?
I spoke with the author of mapLinks (sdaftuar) a while back, and my recollection from our conversation was that it was implemented because he did not know how to resolve the circular dependency at the time, and there was no other reason for making it a separate map.
## Is iterator_to weird?
iterator_to is expressly for this purpose, see https://www.boost.org/doc/libs/1_51_0/libs/multi_index/doc/tutorial/indices.html#iterator_to
> iterator_to provides a way to retrieve an iterator to an element from a pointer to the element, thus making iterators and pointers interchangeable for the purposes of element pointing (not so for traversal) in many situations. This notwithstanding, it is not the aim of iterator_to to promote the usage of pointers as substitutes for real iterators: the latter are specifically designed for handling the elements of a container, and not only benefit from the iterator orientation of container interfaces, but are also capable of exposing many more programming bugs than raw pointers, both at compile and run time. iterator_to is thus meant to be used in scenarios where access via iterators is not suitable or desireable:
>
> - Interoperability with preexisting APIs based on pointers or references.
> - Publication of pointer-based interfaces (for instance, when designing a C-compatible library).
> - The exposure of pointers in place of iterators can act as a type erasure barrier effectively decoupling the user of the code from the implementation detail of which particular container is being used. Similar techniques, like the famous Pimpl idiom, are used in large projects to reduce dependencies and build times.
> - Self-referencing contexts where an element acts upon its owner container and no iterator to itself is available.
In other words, iterator_to is the perfect tool for the job by the last reason given. Under the hood it should just be a simple pointer cast and have no major runtime overhead (depending on if the function call is inlined).
Edit by laanwj: removed at sign from the description
ACKs for top commit:
jonatack:
re-ACK 296be8f per `git range-diff ab338a19 3ba1665 296be8f`, sanity check gcc 10.2 debug build is clean.
hebasto:
re-ACK 296be8f58e02b39a58f017c52294aceed22c3ffd, only rebased since my [previous](https://github.com/bitcoin/bitcoin/pull/19478#pullrequestreview-482400727) review (verified with `git range-diff`).
Tree-SHA512: f5c30a4936fcde6ae32a02823c303b3568a747c2681d11f87df88a149f984a6d3b4c81f391859afbeb68864ef7f6a3d8779f74a58e3de701b3d51f78e498682e
2020-09-07 12:06:38 +02:00
CTxMemPoolEntry : : Parents parents = it - > GetMemPoolParents ( ) ;
2015-07-15 20:47:45 +02:00
// add or remove this tx as a child of each parent
Merge #19478: Remove CTxMempool::mapLinks data structure member
296be8f58e02b39a58f017c52294aceed22c3ffd Get rid of unused functions CTxMemPool::GetMemPoolChildren, CTxMemPool::GetMemPoolParents (Jeremy Rubin)
46d955d196043cc297834baeebce31ff778dff80 Remove mapLinks in favor of entry inlined structs with iterator type erasure (Jeremy Rubin)
Pull request description:
Currently we have a peculiar data structure in the mempool called maplinks. Maplinks job is to track the in-pool children and parents of each transaction. This PR can be primarily understood and reviewed as a simple refactoring to remove this extra data structure, although it comes with a nice memory and performance improvement for free.
Maplinks is particularly peculiar because removing it is not as simple as just moving it's inner structure to the owning CTxMempoolEntry. Because TxLinks (the class storing the setEntries for parents and children) store txiters to each entry in the mempool corresponding to the parent or child, it means that the TxLinks type is "aware" of the boost multiindex (mapTx) it's coming from, which is in turn, aware of the entry type stored in mapTx. Thus we used maplinks to store this entry associated data we in an entirely separate data structure just to avoid a circular type reference caused by storing a txiter inside a CTxMempoolEntry.
It turns out, we can kill this circular reference by making use of iterator_to multiindex function and std::reference_wrapper. This allows us to get rid of the maplinks data structure and move the ownership of the parents/child sets to the entries themselves.
The benefit of this good all around, for any of the reasons given below the change would be acceptable, and it doesn't make the code harder to reason about or worse in any respect (as far as I can tell, there's no tradeoff).
### Simpler ownership model
No longer having to consistency check that mapLinks did have records for our CTxMempoolEntry, impossible to have a mapLinks entry outlive or incorrectly die before a CTxMempoolEntry.
### Memory Usage
We get rid of a O(Transactions) sized map in the mempool, which is a long lived data structure.
### Performance
If you have a CTxMemPoolEntry, you immediately know the address of it's children/parents, rather than having to do a O(log(Transactions)) lookup via maplinks (which we do very often). We do it in *so many* places that a true benchmark has to look at a full running node, but it is easy enough to show an improvement in this case.
The ComplexMemPool shows a good coherence check that we see the expected result of it being 12.5% faster / 1.14x faster.
```
Before:
# Benchmark, evals, iterations, total, min, max, median
ComplexMemPool, 5, 1, 1.40462, 0.277222, 0.285339, 0.279793
After:
# Benchmark, evals, iterations, total, min, max, median
ComplexMemPool, 5, 1, 1.22586, 0.243831, 0.247076, 0.244596
```
The ComplexMemPool benchmark only checks doing addUnchecked and TrimToSize for 800 transactions. While this bench does a good job of hammering the relevant types of function, it doesn't test everything.
Subbing in 5000 transactions shows a that the advantage isn't completely wiped out by other asymptotic factors (this isn't the only bottleneck in growing the mempool), but it's only a bit proportionally slower (10.8%, 1.12x), which adds evidence that this will be a good change for performance minded users.
```
# Benchmark, evals, iterations, total, min, max, median
ComplexMemPool, 5, 1, 59.1321, 11.5919, 12.235, 11.7068
# Benchmark, evals, iterations, total, min, max, median
ComplexMemPool, 5, 1, 52.1307, 10.2641, 10.5206, 10.4306
```
I don't think it's possible to come up with an example of where a maplinks based design would have better performance, but it's something for reviewers to consider.
# Discussion
## Why maplinks in the first place?
I spoke with the author of mapLinks (sdaftuar) a while back, and my recollection from our conversation was that it was implemented because he did not know how to resolve the circular dependency at the time, and there was no other reason for making it a separate map.
## Is iterator_to weird?
iterator_to is expressly for this purpose, see https://www.boost.org/doc/libs/1_51_0/libs/multi_index/doc/tutorial/indices.html#iterator_to
> iterator_to provides a way to retrieve an iterator to an element from a pointer to the element, thus making iterators and pointers interchangeable for the purposes of element pointing (not so for traversal) in many situations. This notwithstanding, it is not the aim of iterator_to to promote the usage of pointers as substitutes for real iterators: the latter are specifically designed for handling the elements of a container, and not only benefit from the iterator orientation of container interfaces, but are also capable of exposing many more programming bugs than raw pointers, both at compile and run time. iterator_to is thus meant to be used in scenarios where access via iterators is not suitable or desireable:
>
> - Interoperability with preexisting APIs based on pointers or references.
> - Publication of pointer-based interfaces (for instance, when designing a C-compatible library).
> - The exposure of pointers in place of iterators can act as a type erasure barrier effectively decoupling the user of the code from the implementation detail of which particular container is being used. Similar techniques, like the famous Pimpl idiom, are used in large projects to reduce dependencies and build times.
> - Self-referencing contexts where an element acts upon its owner container and no iterator to itself is available.
In other words, iterator_to is the perfect tool for the job by the last reason given. Under the hood it should just be a simple pointer cast and have no major runtime overhead (depending on if the function call is inlined).
Edit by laanwj: removed at sign from the description
ACKs for top commit:
jonatack:
re-ACK 296be8f per `git range-diff ab338a19 3ba1665 296be8f`, sanity check gcc 10.2 debug build is clean.
hebasto:
re-ACK 296be8f58e02b39a58f017c52294aceed22c3ffd, only rebased since my [previous](https://github.com/bitcoin/bitcoin/pull/19478#pullrequestreview-482400727) review (verified with `git range-diff`).
Tree-SHA512: f5c30a4936fcde6ae32a02823c303b3568a747c2681d11f87df88a149f984a6d3b4c81f391859afbeb68864ef7f6a3d8779f74a58e3de701b3d51f78e498682e
2020-09-07 12:06:38 +02:00
for ( const CTxMemPoolEntry & parent : parents ) {
UpdateChild ( mapTx . iterator_to ( parent ) , it , add ) ;
2015-07-15 20:47:45 +02:00
}
const int64_t updateCount = ( add ? 1 : - 1 ) ;
const int64_t updateSize = updateCount * it - > GetTxSize ( ) ;
2015-11-19 17:18:28 +01:00
const CAmount updateFee = updateCount * it - > GetModifiedFee ( ) ;
2019-07-05 09:06:28 +02:00
for ( txiter ancestorIt : setAncestors ) {
2015-07-15 20:47:45 +02:00
mapTx . modify ( ancestorIt , update_descendant_state ( updateSize , updateFee , updateCount ) ) ;
}
}
2014-03-17 13:19:54 +01:00
2016-03-17 13:33:31 +01:00
void CTxMemPool : : UpdateEntryForAncestors ( txiter it , const setEntries & setAncestors )
{
int64_t updateCount = setAncestors . size ( ) ;
int64_t updateSize = 0 ;
CAmount updateFee = 0 ;
int updateSigOps = 0 ;
2019-07-05 09:06:28 +02:00
for ( txiter ancestorIt : setAncestors ) {
2016-03-17 13:33:31 +01:00
updateSize + = ancestorIt - > GetTxSize ( ) ;
updateFee + = ancestorIt - > GetModifiedFee ( ) ;
updateSigOps + = ancestorIt - > GetSigOpCount ( ) ;
}
mapTx . modify ( it , update_ancestor_state ( updateSize , updateFee , updateCount , updateSigOps ) ) ;
}
2015-07-15 20:47:45 +02:00
void CTxMemPool : : UpdateChildrenForRemoval ( txiter it )
{
Merge #19478: Remove CTxMempool::mapLinks data structure member
296be8f58e02b39a58f017c52294aceed22c3ffd Get rid of unused functions CTxMemPool::GetMemPoolChildren, CTxMemPool::GetMemPoolParents (Jeremy Rubin)
46d955d196043cc297834baeebce31ff778dff80 Remove mapLinks in favor of entry inlined structs with iterator type erasure (Jeremy Rubin)
Pull request description:
Currently we have a peculiar data structure in the mempool called maplinks. Maplinks job is to track the in-pool children and parents of each transaction. This PR can be primarily understood and reviewed as a simple refactoring to remove this extra data structure, although it comes with a nice memory and performance improvement for free.
Maplinks is particularly peculiar because removing it is not as simple as just moving it's inner structure to the owning CTxMempoolEntry. Because TxLinks (the class storing the setEntries for parents and children) store txiters to each entry in the mempool corresponding to the parent or child, it means that the TxLinks type is "aware" of the boost multiindex (mapTx) it's coming from, which is in turn, aware of the entry type stored in mapTx. Thus we used maplinks to store this entry associated data we in an entirely separate data structure just to avoid a circular type reference caused by storing a txiter inside a CTxMempoolEntry.
It turns out, we can kill this circular reference by making use of iterator_to multiindex function and std::reference_wrapper. This allows us to get rid of the maplinks data structure and move the ownership of the parents/child sets to the entries themselves.
The benefit of this good all around, for any of the reasons given below the change would be acceptable, and it doesn't make the code harder to reason about or worse in any respect (as far as I can tell, there's no tradeoff).
### Simpler ownership model
No longer having to consistency check that mapLinks did have records for our CTxMempoolEntry, impossible to have a mapLinks entry outlive or incorrectly die before a CTxMempoolEntry.
### Memory Usage
We get rid of a O(Transactions) sized map in the mempool, which is a long lived data structure.
### Performance
If you have a CTxMemPoolEntry, you immediately know the address of it's children/parents, rather than having to do a O(log(Transactions)) lookup via maplinks (which we do very often). We do it in *so many* places that a true benchmark has to look at a full running node, but it is easy enough to show an improvement in this case.
The ComplexMemPool shows a good coherence check that we see the expected result of it being 12.5% faster / 1.14x faster.
```
Before:
# Benchmark, evals, iterations, total, min, max, median
ComplexMemPool, 5, 1, 1.40462, 0.277222, 0.285339, 0.279793
After:
# Benchmark, evals, iterations, total, min, max, median
ComplexMemPool, 5, 1, 1.22586, 0.243831, 0.247076, 0.244596
```
The ComplexMemPool benchmark only checks doing addUnchecked and TrimToSize for 800 transactions. While this bench does a good job of hammering the relevant types of function, it doesn't test everything.
Subbing in 5000 transactions shows a that the advantage isn't completely wiped out by other asymptotic factors (this isn't the only bottleneck in growing the mempool), but it's only a bit proportionally slower (10.8%, 1.12x), which adds evidence that this will be a good change for performance minded users.
```
# Benchmark, evals, iterations, total, min, max, median
ComplexMemPool, 5, 1, 59.1321, 11.5919, 12.235, 11.7068
# Benchmark, evals, iterations, total, min, max, median
ComplexMemPool, 5, 1, 52.1307, 10.2641, 10.5206, 10.4306
```
I don't think it's possible to come up with an example of where a maplinks based design would have better performance, but it's something for reviewers to consider.
# Discussion
## Why maplinks in the first place?
I spoke with the author of mapLinks (sdaftuar) a while back, and my recollection from our conversation was that it was implemented because he did not know how to resolve the circular dependency at the time, and there was no other reason for making it a separate map.
## Is iterator_to weird?
iterator_to is expressly for this purpose, see https://www.boost.org/doc/libs/1_51_0/libs/multi_index/doc/tutorial/indices.html#iterator_to
> iterator_to provides a way to retrieve an iterator to an element from a pointer to the element, thus making iterators and pointers interchangeable for the purposes of element pointing (not so for traversal) in many situations. This notwithstanding, it is not the aim of iterator_to to promote the usage of pointers as substitutes for real iterators: the latter are specifically designed for handling the elements of a container, and not only benefit from the iterator orientation of container interfaces, but are also capable of exposing many more programming bugs than raw pointers, both at compile and run time. iterator_to is thus meant to be used in scenarios where access via iterators is not suitable or desireable:
>
> - Interoperability with preexisting APIs based on pointers or references.
> - Publication of pointer-based interfaces (for instance, when designing a C-compatible library).
> - The exposure of pointers in place of iterators can act as a type erasure barrier effectively decoupling the user of the code from the implementation detail of which particular container is being used. Similar techniques, like the famous Pimpl idiom, are used in large projects to reduce dependencies and build times.
> - Self-referencing contexts where an element acts upon its owner container and no iterator to itself is available.
In other words, iterator_to is the perfect tool for the job by the last reason given. Under the hood it should just be a simple pointer cast and have no major runtime overhead (depending on if the function call is inlined).
Edit by laanwj: removed at sign from the description
ACKs for top commit:
jonatack:
re-ACK 296be8f per `git range-diff ab338a19 3ba1665 296be8f`, sanity check gcc 10.2 debug build is clean.
hebasto:
re-ACK 296be8f58e02b39a58f017c52294aceed22c3ffd, only rebased since my [previous](https://github.com/bitcoin/bitcoin/pull/19478#pullrequestreview-482400727) review (verified with `git range-diff`).
Tree-SHA512: f5c30a4936fcde6ae32a02823c303b3568a747c2681d11f87df88a149f984a6d3b4c81f391859afbeb68864ef7f6a3d8779f74a58e3de701b3d51f78e498682e
2020-09-07 12:06:38 +02:00
const CTxMemPoolEntry : : Children & children = it - > GetMemPoolChildrenConst ( ) ;
for ( const CTxMemPoolEntry & updateIt : children ) {
UpdateParent ( mapTx . iterator_to ( updateIt ) , it , false ) ;
2014-03-17 13:19:54 +01:00
}
2015-07-15 20:47:45 +02:00
}
2014-03-17 13:19:54 +01:00
2016-03-17 13:33:31 +01:00
void CTxMemPool : : UpdateForRemoveFromMempool ( const setEntries & entriesToRemove , bool updateDescendants )
2015-07-15 20:47:45 +02:00
{
// For each entry, walk back all ancestors and decrement size associated with this
// transaction
const uint64_t nNoLimit = std : : numeric_limits < uint64_t > : : max ( ) ;
2016-03-17 13:33:31 +01:00
if ( updateDescendants ) {
// updateDescendants should be true whenever we're not recursively
// removing a tx and all its descendants, eg when a transaction is
// confirmed in a block.
Merge #19478: Remove CTxMempool::mapLinks data structure member
296be8f58e02b39a58f017c52294aceed22c3ffd Get rid of unused functions CTxMemPool::GetMemPoolChildren, CTxMemPool::GetMemPoolParents (Jeremy Rubin)
46d955d196043cc297834baeebce31ff778dff80 Remove mapLinks in favor of entry inlined structs with iterator type erasure (Jeremy Rubin)
Pull request description:
Currently we have a peculiar data structure in the mempool called maplinks. Maplinks job is to track the in-pool children and parents of each transaction. This PR can be primarily understood and reviewed as a simple refactoring to remove this extra data structure, although it comes with a nice memory and performance improvement for free.
Maplinks is particularly peculiar because removing it is not as simple as just moving it's inner structure to the owning CTxMempoolEntry. Because TxLinks (the class storing the setEntries for parents and children) store txiters to each entry in the mempool corresponding to the parent or child, it means that the TxLinks type is "aware" of the boost multiindex (mapTx) it's coming from, which is in turn, aware of the entry type stored in mapTx. Thus we used maplinks to store this entry associated data we in an entirely separate data structure just to avoid a circular type reference caused by storing a txiter inside a CTxMempoolEntry.
It turns out, we can kill this circular reference by making use of iterator_to multiindex function and std::reference_wrapper. This allows us to get rid of the maplinks data structure and move the ownership of the parents/child sets to the entries themselves.
The benefit of this good all around, for any of the reasons given below the change would be acceptable, and it doesn't make the code harder to reason about or worse in any respect (as far as I can tell, there's no tradeoff).
### Simpler ownership model
No longer having to consistency check that mapLinks did have records for our CTxMempoolEntry, impossible to have a mapLinks entry outlive or incorrectly die before a CTxMempoolEntry.
### Memory Usage
We get rid of a O(Transactions) sized map in the mempool, which is a long lived data structure.
### Performance
If you have a CTxMemPoolEntry, you immediately know the address of it's children/parents, rather than having to do a O(log(Transactions)) lookup via maplinks (which we do very often). We do it in *so many* places that a true benchmark has to look at a full running node, but it is easy enough to show an improvement in this case.
The ComplexMemPool shows a good coherence check that we see the expected result of it being 12.5% faster / 1.14x faster.
```
Before:
# Benchmark, evals, iterations, total, min, max, median
ComplexMemPool, 5, 1, 1.40462, 0.277222, 0.285339, 0.279793
After:
# Benchmark, evals, iterations, total, min, max, median
ComplexMemPool, 5, 1, 1.22586, 0.243831, 0.247076, 0.244596
```
The ComplexMemPool benchmark only checks doing addUnchecked and TrimToSize for 800 transactions. While this bench does a good job of hammering the relevant types of function, it doesn't test everything.
Subbing in 5000 transactions shows a that the advantage isn't completely wiped out by other asymptotic factors (this isn't the only bottleneck in growing the mempool), but it's only a bit proportionally slower (10.8%, 1.12x), which adds evidence that this will be a good change for performance minded users.
```
# Benchmark, evals, iterations, total, min, max, median
ComplexMemPool, 5, 1, 59.1321, 11.5919, 12.235, 11.7068
# Benchmark, evals, iterations, total, min, max, median
ComplexMemPool, 5, 1, 52.1307, 10.2641, 10.5206, 10.4306
```
I don't think it's possible to come up with an example of where a maplinks based design would have better performance, but it's something for reviewers to consider.
# Discussion
## Why maplinks in the first place?
I spoke with the author of mapLinks (sdaftuar) a while back, and my recollection from our conversation was that it was implemented because he did not know how to resolve the circular dependency at the time, and there was no other reason for making it a separate map.
## Is iterator_to weird?
iterator_to is expressly for this purpose, see https://www.boost.org/doc/libs/1_51_0/libs/multi_index/doc/tutorial/indices.html#iterator_to
> iterator_to provides a way to retrieve an iterator to an element from a pointer to the element, thus making iterators and pointers interchangeable for the purposes of element pointing (not so for traversal) in many situations. This notwithstanding, it is not the aim of iterator_to to promote the usage of pointers as substitutes for real iterators: the latter are specifically designed for handling the elements of a container, and not only benefit from the iterator orientation of container interfaces, but are also capable of exposing many more programming bugs than raw pointers, both at compile and run time. iterator_to is thus meant to be used in scenarios where access via iterators is not suitable or desireable:
>
> - Interoperability with preexisting APIs based on pointers or references.
> - Publication of pointer-based interfaces (for instance, when designing a C-compatible library).
> - The exposure of pointers in place of iterators can act as a type erasure barrier effectively decoupling the user of the code from the implementation detail of which particular container is being used. Similar techniques, like the famous Pimpl idiom, are used in large projects to reduce dependencies and build times.
> - Self-referencing contexts where an element acts upon its owner container and no iterator to itself is available.
In other words, iterator_to is the perfect tool for the job by the last reason given. Under the hood it should just be a simple pointer cast and have no major runtime overhead (depending on if the function call is inlined).
Edit by laanwj: removed at sign from the description
ACKs for top commit:
jonatack:
re-ACK 296be8f per `git range-diff ab338a19 3ba1665 296be8f`, sanity check gcc 10.2 debug build is clean.
hebasto:
re-ACK 296be8f58e02b39a58f017c52294aceed22c3ffd, only rebased since my [previous](https://github.com/bitcoin/bitcoin/pull/19478#pullrequestreview-482400727) review (verified with `git range-diff`).
Tree-SHA512: f5c30a4936fcde6ae32a02823c303b3568a747c2681d11f87df88a149f984a6d3b4c81f391859afbeb68864ef7f6a3d8779f74a58e3de701b3d51f78e498682e
2020-09-07 12:06:38 +02:00
// Here we only update statistics and not data in CTxMemPool::Parents
// and CTxMemPoolEntry::Children (which we need to preserve until we're
// finished with all operations that need to traverse the mempool).
2019-07-05 09:06:28 +02:00
for ( txiter removeIt : entriesToRemove ) {
2016-03-17 13:33:31 +01:00
setEntries setDescendants ;
CalculateDescendants ( removeIt , setDescendants ) ;
setDescendants . erase ( removeIt ) ; // don't update state for self
int64_t modifySize = - ( ( int64_t ) removeIt - > GetTxSize ( ) ) ;
CAmount modifyFee = - removeIt - > GetModifiedFee ( ) ;
int modifySigOps = - removeIt - > GetSigOpCount ( ) ;
2019-07-05 09:06:28 +02:00
for ( txiter dit : setDescendants ) {
2016-03-17 13:33:31 +01:00
mapTx . modify ( dit , update_ancestor_state ( modifySize , modifyFee , - 1 , modifySigOps ) ) ;
}
}
}
2019-07-05 09:06:28 +02:00
for ( txiter removeIt : entriesToRemove ) {
2015-07-15 20:47:45 +02:00
setEntries setAncestors ;
const CTxMemPoolEntry & entry = * removeIt ;
std : : string dummy ;
2015-09-23 19:37:32 +02:00
// Since this is a tx that is already in the mempool, we can call CMPA
// with fSearchForParents = false. If the mempool is in a consistent
// state, then using true or false should both be correct, though false
// should be a bit faster.
// However, if we happen to be in the middle of processing a reorg, then
// the mempool can be in an inconsistent state. In this case, the set
Merge #19478: Remove CTxMempool::mapLinks data structure member
296be8f58e02b39a58f017c52294aceed22c3ffd Get rid of unused functions CTxMemPool::GetMemPoolChildren, CTxMemPool::GetMemPoolParents (Jeremy Rubin)
46d955d196043cc297834baeebce31ff778dff80 Remove mapLinks in favor of entry inlined structs with iterator type erasure (Jeremy Rubin)
Pull request description:
Currently we have a peculiar data structure in the mempool called maplinks. Maplinks job is to track the in-pool children and parents of each transaction. This PR can be primarily understood and reviewed as a simple refactoring to remove this extra data structure, although it comes with a nice memory and performance improvement for free.
Maplinks is particularly peculiar because removing it is not as simple as just moving it's inner structure to the owning CTxMempoolEntry. Because TxLinks (the class storing the setEntries for parents and children) store txiters to each entry in the mempool corresponding to the parent or child, it means that the TxLinks type is "aware" of the boost multiindex (mapTx) it's coming from, which is in turn, aware of the entry type stored in mapTx. Thus we used maplinks to store this entry associated data we in an entirely separate data structure just to avoid a circular type reference caused by storing a txiter inside a CTxMempoolEntry.
It turns out, we can kill this circular reference by making use of iterator_to multiindex function and std::reference_wrapper. This allows us to get rid of the maplinks data structure and move the ownership of the parents/child sets to the entries themselves.
The benefit of this good all around, for any of the reasons given below the change would be acceptable, and it doesn't make the code harder to reason about or worse in any respect (as far as I can tell, there's no tradeoff).
### Simpler ownership model
No longer having to consistency check that mapLinks did have records for our CTxMempoolEntry, impossible to have a mapLinks entry outlive or incorrectly die before a CTxMempoolEntry.
### Memory Usage
We get rid of a O(Transactions) sized map in the mempool, which is a long lived data structure.
### Performance
If you have a CTxMemPoolEntry, you immediately know the address of it's children/parents, rather than having to do a O(log(Transactions)) lookup via maplinks (which we do very often). We do it in *so many* places that a true benchmark has to look at a full running node, but it is easy enough to show an improvement in this case.
The ComplexMemPool shows a good coherence check that we see the expected result of it being 12.5% faster / 1.14x faster.
```
Before:
# Benchmark, evals, iterations, total, min, max, median
ComplexMemPool, 5, 1, 1.40462, 0.277222, 0.285339, 0.279793
After:
# Benchmark, evals, iterations, total, min, max, median
ComplexMemPool, 5, 1, 1.22586, 0.243831, 0.247076, 0.244596
```
The ComplexMemPool benchmark only checks doing addUnchecked and TrimToSize for 800 transactions. While this bench does a good job of hammering the relevant types of function, it doesn't test everything.
Subbing in 5000 transactions shows a that the advantage isn't completely wiped out by other asymptotic factors (this isn't the only bottleneck in growing the mempool), but it's only a bit proportionally slower (10.8%, 1.12x), which adds evidence that this will be a good change for performance minded users.
```
# Benchmark, evals, iterations, total, min, max, median
ComplexMemPool, 5, 1, 59.1321, 11.5919, 12.235, 11.7068
# Benchmark, evals, iterations, total, min, max, median
ComplexMemPool, 5, 1, 52.1307, 10.2641, 10.5206, 10.4306
```
I don't think it's possible to come up with an example of where a maplinks based design would have better performance, but it's something for reviewers to consider.
# Discussion
## Why maplinks in the first place?
I spoke with the author of mapLinks (sdaftuar) a while back, and my recollection from our conversation was that it was implemented because he did not know how to resolve the circular dependency at the time, and there was no other reason for making it a separate map.
## Is iterator_to weird?
iterator_to is expressly for this purpose, see https://www.boost.org/doc/libs/1_51_0/libs/multi_index/doc/tutorial/indices.html#iterator_to
> iterator_to provides a way to retrieve an iterator to an element from a pointer to the element, thus making iterators and pointers interchangeable for the purposes of element pointing (not so for traversal) in many situations. This notwithstanding, it is not the aim of iterator_to to promote the usage of pointers as substitutes for real iterators: the latter are specifically designed for handling the elements of a container, and not only benefit from the iterator orientation of container interfaces, but are also capable of exposing many more programming bugs than raw pointers, both at compile and run time. iterator_to is thus meant to be used in scenarios where access via iterators is not suitable or desireable:
>
> - Interoperability with preexisting APIs based on pointers or references.
> - Publication of pointer-based interfaces (for instance, when designing a C-compatible library).
> - The exposure of pointers in place of iterators can act as a type erasure barrier effectively decoupling the user of the code from the implementation detail of which particular container is being used. Similar techniques, like the famous Pimpl idiom, are used in large projects to reduce dependencies and build times.
> - Self-referencing contexts where an element acts upon its owner container and no iterator to itself is available.
In other words, iterator_to is the perfect tool for the job by the last reason given. Under the hood it should just be a simple pointer cast and have no major runtime overhead (depending on if the function call is inlined).
Edit by laanwj: removed at sign from the description
ACKs for top commit:
jonatack:
re-ACK 296be8f per `git range-diff ab338a19 3ba1665 296be8f`, sanity check gcc 10.2 debug build is clean.
hebasto:
re-ACK 296be8f58e02b39a58f017c52294aceed22c3ffd, only rebased since my [previous](https://github.com/bitcoin/bitcoin/pull/19478#pullrequestreview-482400727) review (verified with `git range-diff`).
Tree-SHA512: f5c30a4936fcde6ae32a02823c303b3568a747c2681d11f87df88a149f984a6d3b4c81f391859afbeb68864ef7f6a3d8779f74a58e3de701b3d51f78e498682e
2020-09-07 12:06:38 +02:00
// of ancestors reachable via GetMemPoolParents()/GetMemPoolChildren()
// will be the same as the set of ancestors whose packages include this
// transaction, because when we add a new transaction to the mempool in
// addUnchecked(), we assume it has no children, and in the case of a
// reorg where that assumption is false, the in-mempool children aren't
// linked to the in-block tx's until UpdateTransactionsFromBlock() is
// called.
2015-09-23 19:37:32 +02:00
// So if we're being called during a reorg, ie before
Merge #19478: Remove CTxMempool::mapLinks data structure member
296be8f58e02b39a58f017c52294aceed22c3ffd Get rid of unused functions CTxMemPool::GetMemPoolChildren, CTxMemPool::GetMemPoolParents (Jeremy Rubin)
46d955d196043cc297834baeebce31ff778dff80 Remove mapLinks in favor of entry inlined structs with iterator type erasure (Jeremy Rubin)
Pull request description:
Currently we have a peculiar data structure in the mempool called maplinks. Maplinks job is to track the in-pool children and parents of each transaction. This PR can be primarily understood and reviewed as a simple refactoring to remove this extra data structure, although it comes with a nice memory and performance improvement for free.
Maplinks is particularly peculiar because removing it is not as simple as just moving it's inner structure to the owning CTxMempoolEntry. Because TxLinks (the class storing the setEntries for parents and children) store txiters to each entry in the mempool corresponding to the parent or child, it means that the TxLinks type is "aware" of the boost multiindex (mapTx) it's coming from, which is in turn, aware of the entry type stored in mapTx. Thus we used maplinks to store this entry associated data we in an entirely separate data structure just to avoid a circular type reference caused by storing a txiter inside a CTxMempoolEntry.
It turns out, we can kill this circular reference by making use of iterator_to multiindex function and std::reference_wrapper. This allows us to get rid of the maplinks data structure and move the ownership of the parents/child sets to the entries themselves.
The benefit of this good all around, for any of the reasons given below the change would be acceptable, and it doesn't make the code harder to reason about or worse in any respect (as far as I can tell, there's no tradeoff).
### Simpler ownership model
No longer having to consistency check that mapLinks did have records for our CTxMempoolEntry, impossible to have a mapLinks entry outlive or incorrectly die before a CTxMempoolEntry.
### Memory Usage
We get rid of a O(Transactions) sized map in the mempool, which is a long lived data structure.
### Performance
If you have a CTxMemPoolEntry, you immediately know the address of it's children/parents, rather than having to do a O(log(Transactions)) lookup via maplinks (which we do very often). We do it in *so many* places that a true benchmark has to look at a full running node, but it is easy enough to show an improvement in this case.
The ComplexMemPool shows a good coherence check that we see the expected result of it being 12.5% faster / 1.14x faster.
```
Before:
# Benchmark, evals, iterations, total, min, max, median
ComplexMemPool, 5, 1, 1.40462, 0.277222, 0.285339, 0.279793
After:
# Benchmark, evals, iterations, total, min, max, median
ComplexMemPool, 5, 1, 1.22586, 0.243831, 0.247076, 0.244596
```
The ComplexMemPool benchmark only checks doing addUnchecked and TrimToSize for 800 transactions. While this bench does a good job of hammering the relevant types of function, it doesn't test everything.
Subbing in 5000 transactions shows a that the advantage isn't completely wiped out by other asymptotic factors (this isn't the only bottleneck in growing the mempool), but it's only a bit proportionally slower (10.8%, 1.12x), which adds evidence that this will be a good change for performance minded users.
```
# Benchmark, evals, iterations, total, min, max, median
ComplexMemPool, 5, 1, 59.1321, 11.5919, 12.235, 11.7068
# Benchmark, evals, iterations, total, min, max, median
ComplexMemPool, 5, 1, 52.1307, 10.2641, 10.5206, 10.4306
```
I don't think it's possible to come up with an example of where a maplinks based design would have better performance, but it's something for reviewers to consider.
# Discussion
## Why maplinks in the first place?
I spoke with the author of mapLinks (sdaftuar) a while back, and my recollection from our conversation was that it was implemented because he did not know how to resolve the circular dependency at the time, and there was no other reason for making it a separate map.
## Is iterator_to weird?
iterator_to is expressly for this purpose, see https://www.boost.org/doc/libs/1_51_0/libs/multi_index/doc/tutorial/indices.html#iterator_to
> iterator_to provides a way to retrieve an iterator to an element from a pointer to the element, thus making iterators and pointers interchangeable for the purposes of element pointing (not so for traversal) in many situations. This notwithstanding, it is not the aim of iterator_to to promote the usage of pointers as substitutes for real iterators: the latter are specifically designed for handling the elements of a container, and not only benefit from the iterator orientation of container interfaces, but are also capable of exposing many more programming bugs than raw pointers, both at compile and run time. iterator_to is thus meant to be used in scenarios where access via iterators is not suitable or desireable:
>
> - Interoperability with preexisting APIs based on pointers or references.
> - Publication of pointer-based interfaces (for instance, when designing a C-compatible library).
> - The exposure of pointers in place of iterators can act as a type erasure barrier effectively decoupling the user of the code from the implementation detail of which particular container is being used. Similar techniques, like the famous Pimpl idiom, are used in large projects to reduce dependencies and build times.
> - Self-referencing contexts where an element acts upon its owner container and no iterator to itself is available.
In other words, iterator_to is the perfect tool for the job by the last reason given. Under the hood it should just be a simple pointer cast and have no major runtime overhead (depending on if the function call is inlined).
Edit by laanwj: removed at sign from the description
ACKs for top commit:
jonatack:
re-ACK 296be8f per `git range-diff ab338a19 3ba1665 296be8f`, sanity check gcc 10.2 debug build is clean.
hebasto:
re-ACK 296be8f58e02b39a58f017c52294aceed22c3ffd, only rebased since my [previous](https://github.com/bitcoin/bitcoin/pull/19478#pullrequestreview-482400727) review (verified with `git range-diff`).
Tree-SHA512: f5c30a4936fcde6ae32a02823c303b3568a747c2681d11f87df88a149f984a6d3b4c81f391859afbeb68864ef7f6a3d8779f74a58e3de701b3d51f78e498682e
2020-09-07 12:06:38 +02:00
// UpdateTransactionsFromBlock() has been called, then
// GetMemPoolParents()/GetMemPoolChildren() will differ from the set of
// mempool parents we'd calculate by searching, and it's important that
// we use the cached notion of ancestor transactions as the set of
// things to update for removal.
2015-09-23 19:37:32 +02:00
CalculateMemPoolAncestors ( entry , setAncestors , nNoLimit , nNoLimit , nNoLimit , nNoLimit , dummy , false ) ;
2015-07-15 20:47:45 +02:00
// Note that UpdateAncestorsOf severs the child links that point to
2016-03-17 13:33:31 +01:00
// removeIt in the entries for the parents of removeIt.
2015-07-15 20:47:45 +02:00
UpdateAncestorsOf ( false , removeIt , setAncestors ) ;
}
// After updating all the ancestor sizes, we can now sever the link between each
Merge #19478: Remove CTxMempool::mapLinks data structure member
296be8f58e02b39a58f017c52294aceed22c3ffd Get rid of unused functions CTxMemPool::GetMemPoolChildren, CTxMemPool::GetMemPoolParents (Jeremy Rubin)
46d955d196043cc297834baeebce31ff778dff80 Remove mapLinks in favor of entry inlined structs with iterator type erasure (Jeremy Rubin)
Pull request description:
Currently we have a peculiar data structure in the mempool called maplinks. Maplinks job is to track the in-pool children and parents of each transaction. This PR can be primarily understood and reviewed as a simple refactoring to remove this extra data structure, although it comes with a nice memory and performance improvement for free.
Maplinks is particularly peculiar because removing it is not as simple as just moving it's inner structure to the owning CTxMempoolEntry. Because TxLinks (the class storing the setEntries for parents and children) store txiters to each entry in the mempool corresponding to the parent or child, it means that the TxLinks type is "aware" of the boost multiindex (mapTx) it's coming from, which is in turn, aware of the entry type stored in mapTx. Thus we used maplinks to store this entry associated data we in an entirely separate data structure just to avoid a circular type reference caused by storing a txiter inside a CTxMempoolEntry.
It turns out, we can kill this circular reference by making use of iterator_to multiindex function and std::reference_wrapper. This allows us to get rid of the maplinks data structure and move the ownership of the parents/child sets to the entries themselves.
The benefit of this good all around, for any of the reasons given below the change would be acceptable, and it doesn't make the code harder to reason about or worse in any respect (as far as I can tell, there's no tradeoff).
### Simpler ownership model
No longer having to consistency check that mapLinks did have records for our CTxMempoolEntry, impossible to have a mapLinks entry outlive or incorrectly die before a CTxMempoolEntry.
### Memory Usage
We get rid of a O(Transactions) sized map in the mempool, which is a long lived data structure.
### Performance
If you have a CTxMemPoolEntry, you immediately know the address of it's children/parents, rather than having to do a O(log(Transactions)) lookup via maplinks (which we do very often). We do it in *so many* places that a true benchmark has to look at a full running node, but it is easy enough to show an improvement in this case.
The ComplexMemPool shows a good coherence check that we see the expected result of it being 12.5% faster / 1.14x faster.
```
Before:
# Benchmark, evals, iterations, total, min, max, median
ComplexMemPool, 5, 1, 1.40462, 0.277222, 0.285339, 0.279793
After:
# Benchmark, evals, iterations, total, min, max, median
ComplexMemPool, 5, 1, 1.22586, 0.243831, 0.247076, 0.244596
```
The ComplexMemPool benchmark only checks doing addUnchecked and TrimToSize for 800 transactions. While this bench does a good job of hammering the relevant types of function, it doesn't test everything.
Subbing in 5000 transactions shows a that the advantage isn't completely wiped out by other asymptotic factors (this isn't the only bottleneck in growing the mempool), but it's only a bit proportionally slower (10.8%, 1.12x), which adds evidence that this will be a good change for performance minded users.
```
# Benchmark, evals, iterations, total, min, max, median
ComplexMemPool, 5, 1, 59.1321, 11.5919, 12.235, 11.7068
# Benchmark, evals, iterations, total, min, max, median
ComplexMemPool, 5, 1, 52.1307, 10.2641, 10.5206, 10.4306
```
I don't think it's possible to come up with an example of where a maplinks based design would have better performance, but it's something for reviewers to consider.
# Discussion
## Why maplinks in the first place?
I spoke with the author of mapLinks (sdaftuar) a while back, and my recollection from our conversation was that it was implemented because he did not know how to resolve the circular dependency at the time, and there was no other reason for making it a separate map.
## Is iterator_to weird?
iterator_to is expressly for this purpose, see https://www.boost.org/doc/libs/1_51_0/libs/multi_index/doc/tutorial/indices.html#iterator_to
> iterator_to provides a way to retrieve an iterator to an element from a pointer to the element, thus making iterators and pointers interchangeable for the purposes of element pointing (not so for traversal) in many situations. This notwithstanding, it is not the aim of iterator_to to promote the usage of pointers as substitutes for real iterators: the latter are specifically designed for handling the elements of a container, and not only benefit from the iterator orientation of container interfaces, but are also capable of exposing many more programming bugs than raw pointers, both at compile and run time. iterator_to is thus meant to be used in scenarios where access via iterators is not suitable or desireable:
>
> - Interoperability with preexisting APIs based on pointers or references.
> - Publication of pointer-based interfaces (for instance, when designing a C-compatible library).
> - The exposure of pointers in place of iterators can act as a type erasure barrier effectively decoupling the user of the code from the implementation detail of which particular container is being used. Similar techniques, like the famous Pimpl idiom, are used in large projects to reduce dependencies and build times.
> - Self-referencing contexts where an element acts upon its owner container and no iterator to itself is available.
In other words, iterator_to is the perfect tool for the job by the last reason given. Under the hood it should just be a simple pointer cast and have no major runtime overhead (depending on if the function call is inlined).
Edit by laanwj: removed at sign from the description
ACKs for top commit:
jonatack:
re-ACK 296be8f per `git range-diff ab338a19 3ba1665 296be8f`, sanity check gcc 10.2 debug build is clean.
hebasto:
re-ACK 296be8f58e02b39a58f017c52294aceed22c3ffd, only rebased since my [previous](https://github.com/bitcoin/bitcoin/pull/19478#pullrequestreview-482400727) review (verified with `git range-diff`).
Tree-SHA512: f5c30a4936fcde6ae32a02823c303b3568a747c2681d11f87df88a149f984a6d3b4c81f391859afbeb68864ef7f6a3d8779f74a58e3de701b3d51f78e498682e
2020-09-07 12:06:38 +02:00
// transaction being removed and any mempool children (ie, update CTxMemPoolEntry::m_parents
2015-07-15 20:47:45 +02:00
// for each direct child of a transaction being removed).
2019-07-05 09:06:28 +02:00
for ( txiter removeIt : entriesToRemove ) {
2015-07-15 20:47:45 +02:00
UpdateChildrenForRemoval ( removeIt ) ;
2014-03-17 13:19:54 +01:00
}
2015-07-15 20:47:45 +02:00
}
2014-03-17 13:19:54 +01:00
2016-03-17 13:33:31 +01:00
void CTxMemPoolEntry : : UpdateDescendantState ( int64_t modifySize , CAmount modifyFee , int64_t modifyCount )
2015-07-15 20:47:45 +02:00
{
2016-03-17 13:33:31 +01:00
nSizeWithDescendants + = modifySize ;
assert ( int64_t ( nSizeWithDescendants ) > 0 ) ;
nModFeesWithDescendants + = modifyFee ;
nCountWithDescendants + = modifyCount ;
assert ( int64_t ( nCountWithDescendants ) > 0 ) ;
2015-07-15 20:47:45 +02:00
}
2014-07-29 06:09:57 +02:00
2017-11-10 21:33:28 +01:00
void CTxMemPoolEntry : : UpdateAncestorState ( int64_t modifySize , CAmount modifyFee , int64_t modifyCount , int64_t modifySigOps )
2015-07-15 20:47:45 +02:00
{
2016-03-17 13:33:31 +01:00
nSizeWithAncestors + = modifySize ;
assert ( int64_t ( nSizeWithAncestors ) > 0 ) ;
nModFeesWithAncestors + = modifyFee ;
nCountWithAncestors + = modifyCount ;
assert ( int64_t ( nCountWithAncestors ) > 0 ) ;
nSigOpCountWithAncestors + = modifySigOps ;
assert ( int ( nSigOpCountWithAncestors ) > = 0 ) ;
2015-07-15 20:47:45 +02:00
}
2014-03-17 13:19:54 +01:00
2023-02-20 20:31:40 +01:00
CTxMemPool : : CTxMemPool ( CBlockPolicyEstimator * estimator , int check_ratio )
: m_check_ratio ( check_ratio ) , minerPolicyEstimator ( estimator )
2013-08-27 07:51:57 +02:00
{
2015-10-26 14:55:17 +01:00
_clear ( ) ; //lock free clear
2013-08-27 07:51:57 +02:00
}
2024-05-28 17:24:36 +02:00
void CTxMemPool : : ConnectManagers ( gsl : : not_null < CDeterministicMNManager * > dmnman )
2024-04-04 12:27:33 +02:00
{
// Do not allow double-initialization
assert ( m_dmnman = = nullptr ) ;
2024-05-28 17:24:36 +02:00
m_dmnman = dmnman ;
2024-04-04 12:27:33 +02:00
}
Merge #12717: [REST] Handle UTXO retrieval when ignoring the mempool
9cb9af8 [REST] Handle UTXO retrieval when ignoring the mempool (Roman Zeyde)
1fdc7c4 Make CTxMemPool::isSpent() const (Roman Zeyde)
Pull request description:
Current REST API always returns empty UTXO when invoked without `/checkmempool/` URL part.
After the fix:
```
$ curl -s http://localhost:8332/rest/getutxos/0e3e2357e806b6cdb1f70b54c3a3a17b6714ee1f0e68bebb44a74b1efd512098-0.json | jq
{
"chainHeight": 514109,
"chaintipHash": "0000000000000000001fe76d1445e8a6432fd2de04261dc9c5915311dc7ad6de",
"bitmap": "1",
"utxos": [
{
"height": 1,
"value": 50,
"scriptPubKey": {
"asm": "0496b538e853519c726a2c91e61ec11600ae1390813a627c66fb8be7947be63c52da7589379515d4e0a604f8141781e62294721166bf621e73a82cbf2342c858ee OP_CHECKSIG",
"hex": "410496b538e853519c726a2c91e61ec11600ae1390813a627c66fb8be7947be63c52da7589379515d4e0a604f8141781e62294721166bf621e73a82cbf2342c858eeac",
"reqSigs": 1,
"type": "pubkey",
"addresses": [
"12c6DSiU4Rq3P4ZxziKxzrL5LmMBrzjrJX"
]
}
}
]
}
```
Before the fix:
```
$ curl -s http://localhost:8332/rest/getutxos/0e3e2357e806b6cdb1f70b54c3a3a17b6714ee1f0e68bebb44a74b1efd512098-0.json | jq
{
"chainHeight": 514109,
"chaintipHash": "0000000000000000001fe76d1445e8a6432fd2de04261dc9c5915311dc7ad6de",
"bitmap": "0",
"utxos": []
}
```
Tree-SHA512: 994a350cb34a3c8f5a7afbc169c6b177c5be6cf223b2071c62d63644819d416d3e10d1c58b244d9d351bae7233d2974aa5e9ebadd1b5d6218f5245558675be0d
2018-03-27 21:31:27 +02:00
bool CTxMemPool : : isSpent ( const COutPoint & outpoint ) const
2013-08-27 07:51:57 +02:00
{
LOCK ( cs ) ;
2017-06-02 00:47:58 +02:00
return mapNextTx . count ( outpoint ) ;
2013-08-27 07:51:57 +02:00
}
unsigned int CTxMemPool : : GetTransactionsUpdated ( ) const
{
return nTransactionsUpdated ;
}
void CTxMemPool : : AddTransactionsUpdated ( unsigned int n )
{
nTransactionsUpdated + = n ;
}
2021-07-19 00:54:57 +02:00
void CTxMemPool : : addUnchecked ( const CTxMemPoolEntry & entry , setEntries & setAncestors , bool validFeeEstimate )
2013-08-27 07:51:57 +02:00
{
// Add to memory pool without checking anything.
2017-02-16 14:01:00 +01:00
// Used by AcceptToMemoryPool(), which DOES do
2013-08-27 07:51:57 +02:00
// all the appropriate checks.
2015-07-15 20:47:45 +02:00
indexed_transaction_set : : iterator newit = mapTx . insert ( entry ) . first ;
2015-11-19 17:18:28 +01:00
// Update transaction for any feeDelta created by PrioritiseTransaction
2021-08-10 07:21:20 +02:00
CAmount delta { 0 } ;
ApplyDelta ( entry . GetTx ( ) . GetHash ( ) , delta ) ;
if ( delta ) {
2019-03-14 15:44:42 +01:00
mapTx . modify ( newit , update_fee_delta ( delta ) ) ;
2015-11-19 17:18:28 +01:00
}
2015-07-15 20:47:45 +02:00
// Update cachedInnerUsage to include contained transaction's usage.
// (When we update the entry for in-mempool parents, memory usage will be
// further updated.)
cachedInnerUsage + = entry . DynamicMemoryUsage ( ) ;
const CTransaction & tx = newit - > GetTx ( ) ;
std : : set < uint256 > setParentTransactions ;
for ( unsigned int i = 0 ; i < tx . vin . size ( ) ; i + + ) {
2016-06-03 01:11:02 +02:00
mapNextTx . insert ( std : : make_pair ( & tx . vin [ i ] . prevout , & tx ) ) ;
2015-07-15 20:47:45 +02:00
setParentTransactions . insert ( tx . vin [ i ] . prevout . hash ) ;
}
// Don't bother worrying about child transactions of this one.
// Normal case of a new transaction arriving is that there can't be any
// children, because such children would be orphans.
// An exception to that is if a transaction enters that used to be in a block.
// In that case, our disconnect block logic will call UpdateTransactionsFromBlock
// to clean up the mess we're leaving here.
// Update ancestors with information about this tx
2021-08-10 07:21:20 +02:00
for ( const auto & pit : GetIterSet ( setParentTransactions ) ) {
2015-07-15 20:47:45 +02:00
UpdateParent ( newit , pit , true ) ;
2013-08-27 07:51:57 +02:00
}
2015-07-15 20:47:45 +02:00
UpdateAncestorsOf ( true , newit , setAncestors ) ;
2016-03-17 13:33:31 +01:00
UpdateEntryForAncestors ( newit , setAncestors ) ;
2015-07-15 20:47:45 +02:00
2014-08-26 22:28:32 +02:00
nTransactionsUpdated + + ;
totalTxSize + = entry . GetTxSize ( ) ;
2021-02-08 20:36:33 +01:00
m_total_fee + = entry . GetFee ( ) ;
if ( minerPolicyEstimator ) {
minerPolicyEstimator - > processTransaction ( entry , validFeeEstimate ) ;
}
2014-08-26 22:28:32 +02:00
2021-07-19 02:07:01 +02:00
vTxHashes . emplace_back ( entry . GetTx ( ) . GetHash ( ) , newit ) ;
Backport compact blocks functionality from bitcoin (#1966)
* Merge #8068: Compact Blocks
48efec8 Fix some minor compact block issues that came up in review (Matt Corallo)
ccd06b9 Elaborate bucket size math (Pieter Wuille)
0d4cb48 Use vTxHashes to optimize InitData significantly (Matt Corallo)
8119026 Provide a flat list of txid/terators to txn in CTxMemPool (Matt Corallo)
678ee97 Add BIP 152 to implemented BIPs list (Matt Corallo)
56ba516 Add reconstruction debug logging (Matt Corallo)
2f34a2e Get our "best three" peers to announce blocks using cmpctblocks (Matt Corallo)
927f8ee Add ability to fetch CNode by NodeId (Matt Corallo)
d25cd3e Add receiver-side protocol implementation for CMPCTBLOCK stuff (Matt Corallo)
9c837d5 Add sender-side protocol implementation for CMPCTBLOCK stuff (Matt Corallo)
00c4078 Add protocol messages for short-ids blocks (Matt Corallo)
e3b2222 Add some blockencodings tests (Matt Corallo)
f4f8f14 Add TestMemPoolEntryHelper::FromTx version for CTransaction (Matt Corallo)
85ad31e Add partial-block block encodings API (Matt Corallo)
5249dac Add COMPACTSIZE wrapper similar to VARINT for serialization (Matt Corallo)
cbda71c Move context-required checks from CheckBlockHeader to Contextual... (Matt Corallo)
7c29ec9 If AcceptBlockHeader returns true, pindex will be set. (Matt Corallo)
96806c3 Stop trimming when mapTx is empty (Pieter Wuille)
* Merge #8408: Prevent fingerprinting, disk-DoS with compact blocks
1d06e49 Ignore CMPCTBLOCK messages for pruned blocks (Suhas Daftuar)
1de2a46 Ignore GETBLOCKTXN requests for unknown blocks (Suhas Daftuar)
* Merge #8418: Add tests for compact blocks
45c7ddd Add p2p test for BIP 152 (compact blocks) (Suhas Daftuar)
9a22a6c Add support for compactblocks to mininode (Suhas Daftuar)
a8689fd Tests: refactor compact size serialization in mininode (Suhas Daftuar)
9c8593d Implement SipHash in Python (Pieter Wuille)
56c87e9 Allow changing BIP9 parameters on regtest (Suhas Daftuar)
* Merge #8505: Trivial: Fix typos in various files
1aacfc2 various typos (leijurv)
* Merge #8449: [Trivial] Do not shadow local variable, cleanup
a159f25 Remove redundand (and shadowing) declaration (Pavel Janík)
cce3024 Do not shadow local variable, cleanup (Pavel Janík)
* Merge #8739: [qa] Fix broken sendcmpct test in p2p-compactblocks.py
157254a Fix broken sendcmpct test in p2p-compactblocks.py (Suhas Daftuar)
* Merge #8854: [qa] Fix race condition in p2p-compactblocks test
b5fd666 [qa] Fix race condition in p2p-compactblocks test (Suhas Daftuar)
* Merge #8393: Support for compact blocks together with segwit
27acfc1 [qa] Update p2p-compactblocks.py for compactblocks v2 (Suhas Daftuar)
422fac6 [qa] Add support for compactblocks v2 to mininode (Suhas Daftuar)
f5b9b8f [qa] Fix bug in mininode witness deserialization (Suhas Daftuar)
6aa28ab Use cmpctblock type 2 for segwit-enabled transfer (Pieter Wuille)
be7555f Fix overly-prescriptive p2p-segwit test for new fetch logic (Matt Corallo)
06128da Make GetFetchFlags always request witness objects from witness peers (Matt Corallo)
* Merge #8882: [qa] Fix race conditions in p2p-compactblocks.py and sendheaders.py
b55d941 [qa] Fix race condition in sendheaders.py (Suhas Daftuar)
6976db2 [qa] Another attempt to fix race condition in p2p-compactblocks.py (Suhas Daftuar)
* Merge #8904: [qa] Fix compact block shortids for a test case
4cdece4 [qa] Fix compact block shortids for a test case (Dagur Valberg Johannsson)
* Merge #8637: Compact Block Tweaks (rebase of #8235)
3ac6de0 Align constant names for maximum compact block / blocktxn depth (Pieter Wuille)
b2e93a3 Add cmpctblock to debug help list (instagibbs)
fe998e9 More agressively filter compact block requests (Matt Corallo)
02a337d Dont remove a "preferred" cmpctblock peer if they provide a block (Matt Corallo)
* Merge #8975: Chainparams: Trivial: In AppInit2(), s/Params()/chainparams/
6f2f639 Chainparams: Trivial: In AppInit2(), s/Params()/chainparams/ (Jorge Timón)
* Merge #8968: Don't hold cs_main when calling ProcessNewBlock from a cmpctblock
72ca7d9 Don't hold cs_main when calling ProcessNewBlock from a cmpctblock (Matt Corallo)
* Merge #8995: Add missing cs_main lock to ::GETBLOCKTXN processing
dfe7906 Add missing cs_main lock to ::GETBLOCKTXN processing (Matt Corallo)
* Merge #8515: A few mempool removal optimizations
0334430 Add some missing includes (Pieter Wuille)
4100499 Return shared_ptr<CTransaction> from mempool removes (Pieter Wuille)
51f2783 Make removed and conflicted arguments optional to remove (Pieter Wuille)
f48211b Bypass removeRecursive in removeForReorg (Pieter Wuille)
* Merge #9026: Fix handling of invalid compact blocks
d4833ff Bump the protocol version to distinguish new banning behavior. (Suhas Daftuar)
88c3549 Fix compact block handling to not ban if block is invalid (Suhas Daftuar)
c93beac [qa] Test that invalid compactblocks don't result in ban (Suhas Daftuar)
* Merge #9039: Various serialization simplifcations and optimizations
d59a518 Use fixed preallocation instead of costly GetSerializeSize (Pieter Wuille)
25a211a Add optimized CSizeComputer serializers (Pieter Wuille)
a2929a2 Make CSerAction's ForRead() constexpr (Pieter Wuille)
a603925 Avoid -Wshadow errors (Pieter Wuille)
5284721 Get rid of nType and nVersion (Pieter Wuille)
657e05a Make GetSerializeSize a wrapper on top of CSizeComputer (Pieter Wuille)
fad9b66 Make nType and nVersion private and sometimes const (Pieter Wuille)
c2c5d42 Make streams' read and write return void (Pieter Wuille)
50e8a9c Remove unused ReadVersion and WriteVersion (Pieter Wuille)
* Merge #9058: Fixes for p2p-compactblocks.py test timeouts on travis (#8842)
dac53b5 Modify getblocktxn handler not to drop requests for old blocks (Russell Yanofsky)
55bfddc [qa] Fix stale data bug in test_compactblocks_not_at_tip (Russell Yanofsky)
47e9659 [qa] Fix bug in compactblocks v2 merge (Russell Yanofsky)
* Merge #9160: [trivial] Fix hungarian variable name
ec34648 [trivial] Fix hungarian variable name (Russell Yanofsky)
* Merge #9159: [qa] Wait for specific block announcement in p2p-compactblocks
dfa44d1 [qa] Wait for specific block announcement in p2p-compactblocks (Russell Yanofsky)
* Merge #9125: Make CBlock a vector of shared_ptr of CTransactions
b4e4ba4 Introduce convenience type CTransactionRef (Pieter Wuille)
1662b43 Make CBlock::vtx a vector of shared_ptr<CTransaction> (Pieter Wuille)
da60506 Add deserializing constructors to CTransaction and CMutableTransaction (Pieter Wuille)
0e85204 Add serialization for unique_ptr and shared_ptr (Pieter Wuille)
* Merge #8872: Remove block-request logic from INV message processing
037159c Remove block-request logic from INV message processing (Matt Corallo)
3451203 [qa] Respond to getheaders and do not assume a getdata on inv (Matt Corallo)
d768f15 [qa] Make comptool push blocks instead of relying on inv-fetch (mrbandrews)
* Merge #9199: Always drop the least preferred HB peer when adding a new one.
ca8549d Always drop the least preferred HB peer when adding a new one. (Gregory Maxwell)
* Merge #9233: Fix some typos
15fa95d Fix some typos (fsb4000)
* Merge #9260: Mrs Peacock in The Library with The Candlestick (killed main.{h,cpp})
76faa3c Rename the remaining main.{h,cpp} to validation.{h,cpp} (Matt Corallo)
e736772 Move network-msg-processing code out of main to its own file (Matt Corallo)
87c35f5 Remove orphan state wipe from UnloadBlockIndex. (Matt Corallo)
* Merge #9014: Fix block-connection performance regression
dd0df81 Document ConnectBlock connectTrace postconditions (Matt Corallo)
2d6e561 Switch pblock in ProcessNewBlock to a shared_ptr (Matt Corallo)
2736c44 Make the optional pblock in ActivateBestChain a shared_ptr (Matt Corallo)
ae4db44 Create a shared_ptr for the block we're connecting in ActivateBCS (Matt Corallo)
fd9d890 Keep blocks as shared_ptrs, instead of copying txn in ConnectTip (Matt Corallo)
6fdd43b Add struct to track block-connect-time-generated info for callbacks (Matt Corallo)
* Merge #9240: Remove txConflicted
a874ab5 remove internal tracking of mempool conflicts for reporting to wallet (Alex Morcos)
bf663f8 remove external usage of mempool conflict tracking (Alex Morcos)
* Merge #9344: Do not run functions with necessary side-effects in assert()
da9cdd2 Do not run functions with necessary side-effects in assert() (Gregory Maxwell)
* Merge #9273: Remove unused CDiskBlockPos* argument from ProcessNewBlock
a13fa4c Remove unused CDiskBlockPos* argument from ProcessNewBlock (Matt Corallo)
* Merge #9352: Attempt reconstruction from all compact block announcements
813ede9 [qa] Update compactblocks test for multi-peer reconstruction (Suhas Daftuar)
7017298 Allow compactblock reconstruction when block is in flight (Suhas Daftuar)
* Merge #9252: Release cs_main before calling ProcessNewBlock, or processing headers (cmpctblock handling)
bd02bdd Release cs_main before processing cmpctblock as header (Suhas Daftuar)
680b0c0 Release cs_main before calling ProcessNewBlock (cmpctblock handling) (Suhas Daftuar)
* Merge #9283: A few more CTransactionRef optimizations
91335ba Remove unused MakeTransactionRef overloads (Pieter Wuille)
6713f0f Make FillBlock consume txn_available to avoid shared_ptr copies (Pieter Wuille)
62607d7 Convert COrphanTx to keep a CTransactionRef (Pieter Wuille)
c44e4c4 Make AcceptToMemoryPool take CTransactionRef (Pieter Wuille)
* Merge #9375: Relay compact block messages prior to full block connection
02ee4eb Make most_recent_compact_block a pointer to a const (Matt Corallo)
73666ad Add comment to describe callers to ActivateBestChain (Matt Corallo)
962f7f0 Call ActivateBestChain without cs_main/with most_recent_block (Matt Corallo)
0df777d Use a temp pindex to avoid a const_cast in ProcessNewBlockHeaders (Matt Corallo)
c1ae4fc Avoid holding cs_most_recent_block while calling ReadBlockFromDisk (Matt Corallo)
9eb67f5 Ensure we meet the BIP 152 old-relay-types response requirements (Matt Corallo)
5749a85 Cache most-recently-connected compact block (Matt Corallo)
9eaec08 Cache most-recently-announced block's shared_ptr (Matt Corallo)
c802092 Relay compact block messages prior to full block connection (Matt Corallo)
6987219 Add a CValidationInterface::NewPoWValidBlock callback (Matt Corallo)
180586f Call AcceptBlock with the block's shared_ptr instead of CBlock& (Matt Corallo)
8baaba6 [qa] Avoid race in preciousblock test. (Matt Corallo)
9a0b2f4 [qa] Make compact blocks test construction using fetch methods (Matt Corallo)
8017547 Make CBlockIndex*es in net_processing const (Matt Corallo)
* Merge #9486: Make peer=%d log prints consistent
e6111b2 Make peer id logging consistent ("peer=%d" instead of "peer %d") (Matt Corallo)
* Merge #9400: Set peers as HB peers upon full block validation
d4781ac Set peers as HB peers upon full block validation (Gregory Sanders)
* Merge #9499: Use recent-rejects, orphans, and recently-replaced txn for compact-block-reconstruction
c594580 Add braces around AddToCompactExtraTransactions (Matt Corallo)
1ccfe9b Clarify comment about mempool/extra conflicts (Matt Corallo)
fac4c78 Make PartiallyDownloadedBlock::InitData's second param const (Matt Corallo)
b55b416 Add extra_count lower bound to compact reconstruction debug print (Matt Corallo)
863edb4 Consider all (<100k memusage) txn for compact-block-extra-txn cache (Matt Corallo)
7f8c8ca Consider all orphan txn for compact-block-extra-txn cache (Matt Corallo)
93380c5 Use replaced transactions in compact block reconstruction (Matt Corallo)
1531652 Keep shared_ptrs to recently-replaced txn for compact blocks (Matt Corallo)
edded80 Make ATMP optionally return the CTransactionRefs it replaced (Matt Corallo)
c735540 Move ORPHAN constants from validation.h to net_processing.h (Matt Corallo)
* Merge #9587: Do not shadow local variable named `tx`.
44f2baa Do not shadow local variable named `tx`. (Pavel Janík)
* Merge #9510: [trivial] Fix typos in comments
cc16d99 [trivial] Fix typos in comments (practicalswift)
* Merge #9604: [Trivial] add comment about setting peer as HB peer.
dd5b011 [Trivial] add comment about setting peer as HB peer. (John Newbery)
* Fix using of AcceptToMemoryPool in PrivateSend code
* add `override`
* fSupportsDesiredCmpctVersion
* bring back tx ressurection in DisconnectTip
* Fix delayed headers
* Remove unused CConnman::FindNode overload
* Fix typos and comments
* Fix minor code differences
* Don't use rejection cache for corrupted transactions
Partly based on https://github.com/bitcoin/bitcoin/pull/8525
* Backport missed cs_main locking changes
Missed from https://github.com/bitcoin/bitcoin/commit/58a215ce8c13b900cf982c39f8ee4879290d1a95
* Backport missed comments and mapBlockSource.emplace call
Missed from two commits:
https://github.com/bitcoin/bitcoin/commit/88c35491ab19f9afdf9b3fa9356a072f70ef2f55
https://github.com/bitcoin/bitcoin/commit/7c98ce584ec23bcddcba8cdb33efa6547212f6ef
* Add CheckPeerHeaders() helper and check in (nCount == 0) too
2018-04-11 13:06:01 +02:00
newit - > vTxHashesIdx = vTxHashes . size ( ) - 1 ;
2019-02-06 17:57:27 +01:00
// Invalid ProTxes should never get this far because transactions should be
// fully checked by AcceptToMemoryPool() at this point, so we just assume that
// everything is fine here.
2024-04-04 12:27:33 +02:00
if ( m_dmnman ) {
2024-04-04 12:02:07 +02:00
addUncheckedProTx ( newit , tx ) ;
2018-02-14 14:42:06 +01:00
}
2013-08-27 07:51:57 +02:00
}
2024-06-26 19:51:48 +02:00
void CTxMemPool : : addAddressIndex ( const CTxMemPoolEntry & entry , const CCoinsViewCache & view )
2016-04-04 22:37:43 +02:00
{
LOCK ( cs ) ;
const CTransaction & tx = entry . GetTx ( ) ;
std : : vector < CMempoolAddressDeltaKey > inserted ;
uint256 txhash = tx . GetHash ( ) ;
for ( unsigned int j = 0 ; j < tx . vin . size ( ) ; j + + ) {
const CTxIn input = tx . vin [ j ] ;
2017-09-26 17:58:49 +02:00
const Coin & coin = view . AccessCoin ( input . prevout ) ;
const CTxOut & prevout = coin . out ;
2023-09-14 20:59:48 +02:00
2023-09-25 10:16:15 +02:00
AddressType address_type { AddressType : : UNKNOWN } ;
2023-09-14 20:59:48 +02:00
uint160 address_bytes ;
2023-09-24 22:54:01 +02:00
if ( ! AddressBytesFromScript ( prevout . scriptPubKey , address_type , address_bytes ) ) {
2023-09-14 20:59:48 +02:00
continue ;
2016-04-04 22:37:43 +02:00
}
2023-09-14 20:59:48 +02:00
CMempoolAddressDeltaKey key ( address_type , address_bytes , txhash , j , /* tx_spent */ true ) ;
CMempoolAddressDelta delta ( entry . GetTime ( ) , prevout . nValue * - 1 , input . prevout . hash , input . prevout . n ) ;
mapAddress . insert ( std : : make_pair ( key , delta ) ) ;
inserted . push_back ( key ) ;
2016-04-04 22:37:43 +02:00
}
for ( unsigned int k = 0 ; k < tx . vout . size ( ) ; k + + ) {
const CTxOut & out = tx . vout [ k ] ;
2023-09-14 20:59:48 +02:00
2023-09-25 10:16:15 +02:00
AddressType address_type { AddressType : : UNKNOWN } ;
2023-09-14 20:59:48 +02:00
uint160 address_bytes ;
2023-09-24 22:54:01 +02:00
if ( ! AddressBytesFromScript ( out . scriptPubKey , address_type , address_bytes ) ) {
2023-09-14 20:59:48 +02:00
continue ;
2016-04-04 22:37:43 +02:00
}
2023-09-14 20:59:48 +02:00
CMempoolAddressDeltaKey key ( address_type , address_bytes , txhash , k , /* tx_spent */ false ) ;
mapAddress . insert ( std : : make_pair ( key , CMempoolAddressDelta ( entry . GetTime ( ) , out . nValue ) ) ) ;
inserted . push_back ( key ) ;
2016-04-04 22:37:43 +02:00
}
2017-01-30 13:13:07 +01:00
mapAddressInserted . insert ( std : : make_pair ( txhash , inserted ) ) ;
2016-04-04 22:37:43 +02:00
}
2024-06-26 21:24:21 +02:00
bool CTxMemPool : : getAddressIndex ( const std : : vector < CMempoolAddressDeltaKey > & addresses ,
2024-06-26 21:09:14 +02:00
std : : vector < CMempoolAddressDeltaEntry > & results ) const
2016-04-04 22:37:43 +02:00
{
LOCK ( cs ) ;
2023-09-25 19:23:15 +02:00
for ( const auto & address : addresses ) {
2024-06-26 21:24:21 +02:00
addressDeltaMap : : const_iterator ait = mapAddress . lower_bound ( address ) ;
while ( ait ! = mapAddress . end ( ) & & ( * ait ) . first . m_address_bytes = = address . m_address_bytes
& & ( * ait ) . first . m_address_type = = address . m_address_type ) {
2016-04-04 22:37:43 +02:00
results . push_back ( * ait ) ;
ait + + ;
}
}
return true ;
}
bool CTxMemPool : : removeAddressIndex ( const uint256 txhash )
{
LOCK ( cs ) ;
addressDeltaMapInserted : : iterator it = mapAddressInserted . find ( txhash ) ;
if ( it ! = mapAddressInserted . end ( ) ) {
std : : vector < CMempoolAddressDeltaKey > keys = ( * it ) . second ;
for ( std : : vector < CMempoolAddressDeltaKey > : : iterator mit = keys . begin ( ) ; mit ! = keys . end ( ) ; mit + + ) {
mapAddress . erase ( * mit ) ;
}
mapAddressInserted . erase ( it ) ;
}
return true ;
}
2024-06-26 19:51:48 +02:00
void CTxMemPool : : addSpentIndex ( const CTxMemPoolEntry & entry , const CCoinsViewCache & view )
2016-05-16 20:23:01 +02:00
{
LOCK ( cs ) ;
const CTransaction & tx = entry . GetTx ( ) ;
std : : vector < CSpentIndexKey > inserted ;
uint256 txhash = tx . GetHash ( ) ;
for ( unsigned int j = 0 ; j < tx . vin . size ( ) ; j + + ) {
const CTxIn input = tx . vin [ j ] ;
2017-09-26 17:58:49 +02:00
const Coin & coin = view . AccessCoin ( input . prevout ) ;
const CTxOut & prevout = coin . out ;
2023-09-14 20:59:48 +02:00
2023-09-25 10:16:15 +02:00
AddressType address_type { AddressType : : UNKNOWN } ;
2023-09-14 20:59:48 +02:00
uint160 address_bytes ;
2016-05-16 20:23:01 +02:00
2023-09-24 22:54:01 +02:00
if ( ! AddressBytesFromScript ( prevout . scriptPubKey , address_type , address_bytes ) ) {
continue ;
2016-05-16 20:23:01 +02:00
}
CSpentIndexKey key = CSpentIndexKey ( input . prevout . hash , input . prevout . n ) ;
2023-09-14 20:59:48 +02:00
CSpentIndexValue value = CSpentIndexValue ( txhash , j , - 1 , prevout . nValue , address_type , address_bytes ) ;
2016-05-16 20:23:01 +02:00
2017-01-30 13:13:07 +01:00
mapSpent . insert ( std : : make_pair ( key , value ) ) ;
2016-05-16 20:23:01 +02:00
inserted . push_back ( key ) ;
}
mapSpentInserted . insert ( make_pair ( txhash , inserted ) ) ;
}
2024-06-26 19:51:48 +02:00
bool CTxMemPool : : getSpentIndex ( const CSpentIndexKey & key , CSpentIndexValue & value ) const
2016-05-16 20:23:01 +02:00
{
LOCK ( cs ) ;
2024-06-26 19:51:48 +02:00
mapSpentIndex : : const_iterator it = mapSpent . find ( key ) ;
2016-05-16 20:23:01 +02:00
if ( it ! = mapSpent . end ( ) ) {
value = it - > second ;
return true ;
}
return false ;
}
bool CTxMemPool : : removeSpentIndex ( const uint256 txhash )
{
LOCK ( cs ) ;
mapSpentIndexInserted : : iterator it = mapSpentInserted . find ( txhash ) ;
if ( it ! = mapSpentInserted . end ( ) ) {
std : : vector < CSpentIndexKey > keys = ( * it ) . second ;
for ( std : : vector < CSpentIndexKey > : : iterator mit = keys . begin ( ) ; mit ! = keys . end ( ) ; mit + + ) {
mapSpent . erase ( * mit ) ;
}
mapSpentInserted . erase ( it ) ;
}
return true ;
}
2024-04-04 12:02:07 +02:00
void CTxMemPool : : addUncheckedProTx ( indexed_transaction_set : : iterator & newit , const CTransaction & tx )
{
2024-04-04 12:27:33 +02:00
assert ( m_dmnman ) ;
2024-04-04 12:02:07 +02:00
if ( tx . nType = = TRANSACTION_PROVIDER_REGISTER ) {
auto proTx = * Assert ( GetTxPayload < CProRegTx > ( tx ) ) ;
if ( ! proTx . collateralOutpoint . hash . IsNull ( ) ) {
mapProTxRefs . emplace ( tx . GetHash ( ) , proTx . collateralOutpoint . hash ) ;
}
mapProTxAddresses . emplace ( proTx . addr , tx . GetHash ( ) ) ;
mapProTxPubKeyIDs . emplace ( proTx . keyIDOwner , tx . GetHash ( ) ) ;
mapProTxBlsPubKeyHashes . emplace ( proTx . pubKeyOperator . GetHash ( ) , tx . GetHash ( ) ) ;
if ( ! proTx . collateralOutpoint . hash . IsNull ( ) ) {
mapProTxCollaterals . emplace ( proTx . collateralOutpoint , tx . GetHash ( ) ) ;
} else {
mapProTxCollaterals . emplace ( COutPoint ( tx . GetHash ( ) , proTx . collateralOutpoint . n ) , tx . GetHash ( ) ) ;
}
} else if ( tx . nType = = TRANSACTION_PROVIDER_UPDATE_SERVICE ) {
auto proTx = * Assert ( GetTxPayload < CProUpServTx > ( tx ) ) ;
mapProTxRefs . emplace ( proTx . proTxHash , tx . GetHash ( ) ) ;
mapProTxAddresses . emplace ( proTx . addr , tx . GetHash ( ) ) ;
} else if ( tx . nType = = TRANSACTION_PROVIDER_UPDATE_REGISTRAR ) {
auto proTx = * Assert ( GetTxPayload < CProUpRegTx > ( tx ) ) ;
mapProTxRefs . emplace ( proTx . proTxHash , tx . GetHash ( ) ) ;
mapProTxBlsPubKeyHashes . emplace ( proTx . pubKeyOperator . GetHash ( ) , tx . GetHash ( ) ) ;
2024-04-04 12:27:33 +02:00
auto dmn = Assert ( m_dmnman - > GetListAtChainTip ( ) . GetMN ( proTx . proTxHash ) ) ;
2024-04-04 12:02:07 +02:00
newit - > validForProTxKey = : : SerializeHash ( dmn - > pdmnState - > pubKeyOperator ) ;
if ( dmn - > pdmnState - > pubKeyOperator ! = proTx . pubKeyOperator ) {
newit - > isKeyChangeProTx = true ;
}
} else if ( tx . nType = = TRANSACTION_PROVIDER_UPDATE_REVOKE ) {
auto proTx = * Assert ( GetTxPayload < CProUpRevTx > ( tx ) ) ;
mapProTxRefs . emplace ( proTx . proTxHash , tx . GetHash ( ) ) ;
2024-04-04 12:27:33 +02:00
auto dmn = Assert ( m_dmnman - > GetListAtChainTip ( ) . GetMN ( proTx . proTxHash ) ) ;
2024-04-04 12:02:07 +02:00
newit - > validForProTxKey = : : SerializeHash ( dmn - > pdmnState - > pubKeyOperator ) ;
if ( dmn - > pdmnState - > pubKeyOperator . Get ( ) ! = CBLSPublicKey ( ) ) {
newit - > isKeyChangeProTx = true ;
}
} else if ( tx . nType = = TRANSACTION_ASSET_UNLOCK ) {
auto assetUnlockTx = * Assert ( GetTxPayload < CAssetUnlockPayload > ( tx ) ) ;
mapAssetUnlockExpiry . insert ( { tx . GetHash ( ) , assetUnlockTx . getHeightToExpiry ( ) } ) ;
} else if ( tx . nType = = TRANSACTION_MNHF_SIGNAL ) {
PrioritiseTransaction ( tx . GetHash ( ) , 0.1 * COIN ) ;
}
}
2017-01-24 10:07:50 +01:00
void CTxMemPool : : removeUnchecked ( txiter it , MemPoolRemovalReason reason )
2015-07-15 20:47:45 +02:00
{
2024-08-26 17:35:12 +02:00
// We increment mempool sequence value no matter removal reason
// even if not directly reported below.
uint64_t mempool_sequence = GetAndIncrementSequence ( ) ;
2020-03-19 17:09:15 +01:00
if ( reason ! = MemPoolRemovalReason : : BLOCK ) {
// Notify clients that a transaction has been removed from the mempool
// for any reason except being included in a block. Clients interested
// in transactions included in blocks can subscribe to the BlockConnected
// notification.
2024-08-26 17:35:12 +02:00
GetMainSignals ( ) . TransactionRemovedFromMempool ( it - > GetSharedTx ( ) , reason , mempool_sequence ) ;
2019-11-22 16:35:00 +01:00
}
2015-07-15 20:47:45 +02:00
const uint256 hash = it - > GetTx ( ) . GetHash ( ) ;
2019-07-05 09:06:28 +02:00
for ( const CTxIn & txin : it - > GetTx ( ) . vin )
2015-07-15 20:47:45 +02:00
mapNextTx . erase ( txin . prevout ) ;
2022-05-06 06:04:50 +02:00
RemoveUnbroadcastTx ( hash , true /* add logging because unchecked */ ) ;
Backport compact blocks functionality from bitcoin (#1966)
* Merge #8068: Compact Blocks
48efec8 Fix some minor compact block issues that came up in review (Matt Corallo)
ccd06b9 Elaborate bucket size math (Pieter Wuille)
0d4cb48 Use vTxHashes to optimize InitData significantly (Matt Corallo)
8119026 Provide a flat list of txid/terators to txn in CTxMemPool (Matt Corallo)
678ee97 Add BIP 152 to implemented BIPs list (Matt Corallo)
56ba516 Add reconstruction debug logging (Matt Corallo)
2f34a2e Get our "best three" peers to announce blocks using cmpctblocks (Matt Corallo)
927f8ee Add ability to fetch CNode by NodeId (Matt Corallo)
d25cd3e Add receiver-side protocol implementation for CMPCTBLOCK stuff (Matt Corallo)
9c837d5 Add sender-side protocol implementation for CMPCTBLOCK stuff (Matt Corallo)
00c4078 Add protocol messages for short-ids blocks (Matt Corallo)
e3b2222 Add some blockencodings tests (Matt Corallo)
f4f8f14 Add TestMemPoolEntryHelper::FromTx version for CTransaction (Matt Corallo)
85ad31e Add partial-block block encodings API (Matt Corallo)
5249dac Add COMPACTSIZE wrapper similar to VARINT for serialization (Matt Corallo)
cbda71c Move context-required checks from CheckBlockHeader to Contextual... (Matt Corallo)
7c29ec9 If AcceptBlockHeader returns true, pindex will be set. (Matt Corallo)
96806c3 Stop trimming when mapTx is empty (Pieter Wuille)
* Merge #8408: Prevent fingerprinting, disk-DoS with compact blocks
1d06e49 Ignore CMPCTBLOCK messages for pruned blocks (Suhas Daftuar)
1de2a46 Ignore GETBLOCKTXN requests for unknown blocks (Suhas Daftuar)
* Merge #8418: Add tests for compact blocks
45c7ddd Add p2p test for BIP 152 (compact blocks) (Suhas Daftuar)
9a22a6c Add support for compactblocks to mininode (Suhas Daftuar)
a8689fd Tests: refactor compact size serialization in mininode (Suhas Daftuar)
9c8593d Implement SipHash in Python (Pieter Wuille)
56c87e9 Allow changing BIP9 parameters on regtest (Suhas Daftuar)
* Merge #8505: Trivial: Fix typos in various files
1aacfc2 various typos (leijurv)
* Merge #8449: [Trivial] Do not shadow local variable, cleanup
a159f25 Remove redundand (and shadowing) declaration (Pavel Janík)
cce3024 Do not shadow local variable, cleanup (Pavel Janík)
* Merge #8739: [qa] Fix broken sendcmpct test in p2p-compactblocks.py
157254a Fix broken sendcmpct test in p2p-compactblocks.py (Suhas Daftuar)
* Merge #8854: [qa] Fix race condition in p2p-compactblocks test
b5fd666 [qa] Fix race condition in p2p-compactblocks test (Suhas Daftuar)
* Merge #8393: Support for compact blocks together with segwit
27acfc1 [qa] Update p2p-compactblocks.py for compactblocks v2 (Suhas Daftuar)
422fac6 [qa] Add support for compactblocks v2 to mininode (Suhas Daftuar)
f5b9b8f [qa] Fix bug in mininode witness deserialization (Suhas Daftuar)
6aa28ab Use cmpctblock type 2 for segwit-enabled transfer (Pieter Wuille)
be7555f Fix overly-prescriptive p2p-segwit test for new fetch logic (Matt Corallo)
06128da Make GetFetchFlags always request witness objects from witness peers (Matt Corallo)
* Merge #8882: [qa] Fix race conditions in p2p-compactblocks.py and sendheaders.py
b55d941 [qa] Fix race condition in sendheaders.py (Suhas Daftuar)
6976db2 [qa] Another attempt to fix race condition in p2p-compactblocks.py (Suhas Daftuar)
* Merge #8904: [qa] Fix compact block shortids for a test case
4cdece4 [qa] Fix compact block shortids for a test case (Dagur Valberg Johannsson)
* Merge #8637: Compact Block Tweaks (rebase of #8235)
3ac6de0 Align constant names for maximum compact block / blocktxn depth (Pieter Wuille)
b2e93a3 Add cmpctblock to debug help list (instagibbs)
fe998e9 More agressively filter compact block requests (Matt Corallo)
02a337d Dont remove a "preferred" cmpctblock peer if they provide a block (Matt Corallo)
* Merge #8975: Chainparams: Trivial: In AppInit2(), s/Params()/chainparams/
6f2f639 Chainparams: Trivial: In AppInit2(), s/Params()/chainparams/ (Jorge Timón)
* Merge #8968: Don't hold cs_main when calling ProcessNewBlock from a cmpctblock
72ca7d9 Don't hold cs_main when calling ProcessNewBlock from a cmpctblock (Matt Corallo)
* Merge #8995: Add missing cs_main lock to ::GETBLOCKTXN processing
dfe7906 Add missing cs_main lock to ::GETBLOCKTXN processing (Matt Corallo)
* Merge #8515: A few mempool removal optimizations
0334430 Add some missing includes (Pieter Wuille)
4100499 Return shared_ptr<CTransaction> from mempool removes (Pieter Wuille)
51f2783 Make removed and conflicted arguments optional to remove (Pieter Wuille)
f48211b Bypass removeRecursive in removeForReorg (Pieter Wuille)
* Merge #9026: Fix handling of invalid compact blocks
d4833ff Bump the protocol version to distinguish new banning behavior. (Suhas Daftuar)
88c3549 Fix compact block handling to not ban if block is invalid (Suhas Daftuar)
c93beac [qa] Test that invalid compactblocks don't result in ban (Suhas Daftuar)
* Merge #9039: Various serialization simplifcations and optimizations
d59a518 Use fixed preallocation instead of costly GetSerializeSize (Pieter Wuille)
25a211a Add optimized CSizeComputer serializers (Pieter Wuille)
a2929a2 Make CSerAction's ForRead() constexpr (Pieter Wuille)
a603925 Avoid -Wshadow errors (Pieter Wuille)
5284721 Get rid of nType and nVersion (Pieter Wuille)
657e05a Make GetSerializeSize a wrapper on top of CSizeComputer (Pieter Wuille)
fad9b66 Make nType and nVersion private and sometimes const (Pieter Wuille)
c2c5d42 Make streams' read and write return void (Pieter Wuille)
50e8a9c Remove unused ReadVersion and WriteVersion (Pieter Wuille)
* Merge #9058: Fixes for p2p-compactblocks.py test timeouts on travis (#8842)
dac53b5 Modify getblocktxn handler not to drop requests for old blocks (Russell Yanofsky)
55bfddc [qa] Fix stale data bug in test_compactblocks_not_at_tip (Russell Yanofsky)
47e9659 [qa] Fix bug in compactblocks v2 merge (Russell Yanofsky)
* Merge #9160: [trivial] Fix hungarian variable name
ec34648 [trivial] Fix hungarian variable name (Russell Yanofsky)
* Merge #9159: [qa] Wait for specific block announcement in p2p-compactblocks
dfa44d1 [qa] Wait for specific block announcement in p2p-compactblocks (Russell Yanofsky)
* Merge #9125: Make CBlock a vector of shared_ptr of CTransactions
b4e4ba4 Introduce convenience type CTransactionRef (Pieter Wuille)
1662b43 Make CBlock::vtx a vector of shared_ptr<CTransaction> (Pieter Wuille)
da60506 Add deserializing constructors to CTransaction and CMutableTransaction (Pieter Wuille)
0e85204 Add serialization for unique_ptr and shared_ptr (Pieter Wuille)
* Merge #8872: Remove block-request logic from INV message processing
037159c Remove block-request logic from INV message processing (Matt Corallo)
3451203 [qa] Respond to getheaders and do not assume a getdata on inv (Matt Corallo)
d768f15 [qa] Make comptool push blocks instead of relying on inv-fetch (mrbandrews)
* Merge #9199: Always drop the least preferred HB peer when adding a new one.
ca8549d Always drop the least preferred HB peer when adding a new one. (Gregory Maxwell)
* Merge #9233: Fix some typos
15fa95d Fix some typos (fsb4000)
* Merge #9260: Mrs Peacock in The Library with The Candlestick (killed main.{h,cpp})
76faa3c Rename the remaining main.{h,cpp} to validation.{h,cpp} (Matt Corallo)
e736772 Move network-msg-processing code out of main to its own file (Matt Corallo)
87c35f5 Remove orphan state wipe from UnloadBlockIndex. (Matt Corallo)
* Merge #9014: Fix block-connection performance regression
dd0df81 Document ConnectBlock connectTrace postconditions (Matt Corallo)
2d6e561 Switch pblock in ProcessNewBlock to a shared_ptr (Matt Corallo)
2736c44 Make the optional pblock in ActivateBestChain a shared_ptr (Matt Corallo)
ae4db44 Create a shared_ptr for the block we're connecting in ActivateBCS (Matt Corallo)
fd9d890 Keep blocks as shared_ptrs, instead of copying txn in ConnectTip (Matt Corallo)
6fdd43b Add struct to track block-connect-time-generated info for callbacks (Matt Corallo)
* Merge #9240: Remove txConflicted
a874ab5 remove internal tracking of mempool conflicts for reporting to wallet (Alex Morcos)
bf663f8 remove external usage of mempool conflict tracking (Alex Morcos)
* Merge #9344: Do not run functions with necessary side-effects in assert()
da9cdd2 Do not run functions with necessary side-effects in assert() (Gregory Maxwell)
* Merge #9273: Remove unused CDiskBlockPos* argument from ProcessNewBlock
a13fa4c Remove unused CDiskBlockPos* argument from ProcessNewBlock (Matt Corallo)
* Merge #9352: Attempt reconstruction from all compact block announcements
813ede9 [qa] Update compactblocks test for multi-peer reconstruction (Suhas Daftuar)
7017298 Allow compactblock reconstruction when block is in flight (Suhas Daftuar)
* Merge #9252: Release cs_main before calling ProcessNewBlock, or processing headers (cmpctblock handling)
bd02bdd Release cs_main before processing cmpctblock as header (Suhas Daftuar)
680b0c0 Release cs_main before calling ProcessNewBlock (cmpctblock handling) (Suhas Daftuar)
* Merge #9283: A few more CTransactionRef optimizations
91335ba Remove unused MakeTransactionRef overloads (Pieter Wuille)
6713f0f Make FillBlock consume txn_available to avoid shared_ptr copies (Pieter Wuille)
62607d7 Convert COrphanTx to keep a CTransactionRef (Pieter Wuille)
c44e4c4 Make AcceptToMemoryPool take CTransactionRef (Pieter Wuille)
* Merge #9375: Relay compact block messages prior to full block connection
02ee4eb Make most_recent_compact_block a pointer to a const (Matt Corallo)
73666ad Add comment to describe callers to ActivateBestChain (Matt Corallo)
962f7f0 Call ActivateBestChain without cs_main/with most_recent_block (Matt Corallo)
0df777d Use a temp pindex to avoid a const_cast in ProcessNewBlockHeaders (Matt Corallo)
c1ae4fc Avoid holding cs_most_recent_block while calling ReadBlockFromDisk (Matt Corallo)
9eb67f5 Ensure we meet the BIP 152 old-relay-types response requirements (Matt Corallo)
5749a85 Cache most-recently-connected compact block (Matt Corallo)
9eaec08 Cache most-recently-announced block's shared_ptr (Matt Corallo)
c802092 Relay compact block messages prior to full block connection (Matt Corallo)
6987219 Add a CValidationInterface::NewPoWValidBlock callback (Matt Corallo)
180586f Call AcceptBlock with the block's shared_ptr instead of CBlock& (Matt Corallo)
8baaba6 [qa] Avoid race in preciousblock test. (Matt Corallo)
9a0b2f4 [qa] Make compact blocks test construction using fetch methods (Matt Corallo)
8017547 Make CBlockIndex*es in net_processing const (Matt Corallo)
* Merge #9486: Make peer=%d log prints consistent
e6111b2 Make peer id logging consistent ("peer=%d" instead of "peer %d") (Matt Corallo)
* Merge #9400: Set peers as HB peers upon full block validation
d4781ac Set peers as HB peers upon full block validation (Gregory Sanders)
* Merge #9499: Use recent-rejects, orphans, and recently-replaced txn for compact-block-reconstruction
c594580 Add braces around AddToCompactExtraTransactions (Matt Corallo)
1ccfe9b Clarify comment about mempool/extra conflicts (Matt Corallo)
fac4c78 Make PartiallyDownloadedBlock::InitData's second param const (Matt Corallo)
b55b416 Add extra_count lower bound to compact reconstruction debug print (Matt Corallo)
863edb4 Consider all (<100k memusage) txn for compact-block-extra-txn cache (Matt Corallo)
7f8c8ca Consider all orphan txn for compact-block-extra-txn cache (Matt Corallo)
93380c5 Use replaced transactions in compact block reconstruction (Matt Corallo)
1531652 Keep shared_ptrs to recently-replaced txn for compact blocks (Matt Corallo)
edded80 Make ATMP optionally return the CTransactionRefs it replaced (Matt Corallo)
c735540 Move ORPHAN constants from validation.h to net_processing.h (Matt Corallo)
* Merge #9587: Do not shadow local variable named `tx`.
44f2baa Do not shadow local variable named `tx`. (Pavel Janík)
* Merge #9510: [trivial] Fix typos in comments
cc16d99 [trivial] Fix typos in comments (practicalswift)
* Merge #9604: [Trivial] add comment about setting peer as HB peer.
dd5b011 [Trivial] add comment about setting peer as HB peer. (John Newbery)
* Fix using of AcceptToMemoryPool in PrivateSend code
* add `override`
* fSupportsDesiredCmpctVersion
* bring back tx ressurection in DisconnectTip
* Fix delayed headers
* Remove unused CConnman::FindNode overload
* Fix typos and comments
* Fix minor code differences
* Don't use rejection cache for corrupted transactions
Partly based on https://github.com/bitcoin/bitcoin/pull/8525
* Backport missed cs_main locking changes
Missed from https://github.com/bitcoin/bitcoin/commit/58a215ce8c13b900cf982c39f8ee4879290d1a95
* Backport missed comments and mapBlockSource.emplace call
Missed from two commits:
https://github.com/bitcoin/bitcoin/commit/88c35491ab19f9afdf9b3fa9356a072f70ef2f55
https://github.com/bitcoin/bitcoin/commit/7c98ce584ec23bcddcba8cdb33efa6547212f6ef
* Add CheckPeerHeaders() helper and check in (nCount == 0) too
2018-04-11 13:06:01 +02:00
if ( vTxHashes . size ( ) > 1 ) {
vTxHashes [ it - > vTxHashesIdx ] = std : : move ( vTxHashes . back ( ) ) ;
vTxHashes [ it - > vTxHashesIdx ] . second - > vTxHashesIdx = it - > vTxHashesIdx ;
vTxHashes . pop_back ( ) ;
if ( vTxHashes . size ( ) * 2 < vTxHashes . capacity ( ) )
vTxHashes . shrink_to_fit ( ) ;
} else
vTxHashes . clear ( ) ;
2024-04-04 12:27:33 +02:00
if ( m_dmnman ) {
2024-04-04 12:02:07 +02:00
removeUncheckedProTx ( it - > GetTx ( ) ) ;
}
totalTxSize - = it - > GetTxSize ( ) ;
m_total_fee - = it - > GetFee ( ) ;
cachedInnerUsage - = it - > DynamicMemoryUsage ( ) ;
cachedInnerUsage - = memusage : : DynamicUsage ( it - > GetMemPoolParentsConst ( ) ) + memusage : : DynamicUsage ( it - > GetMemPoolChildrenConst ( ) ) ;
mapTx . erase ( it ) ;
nTransactionsUpdated + + ;
if ( minerPolicyEstimator ) { minerPolicyEstimator - > removeTx ( hash , false ) ; }
removeAddressIndex ( hash ) ;
removeSpentIndex ( hash ) ;
}
void CTxMemPool : : removeUncheckedProTx ( const CTransaction & tx )
{
2018-12-10 06:03:57 +01:00
auto eraseProTxRef = [ & ] ( const uint256 & proTxHash , const uint256 & txHash ) {
auto its = mapProTxRefs . equal_range ( proTxHash ) ;
for ( auto it = its . first ; it ! = its . second ; ) {
if ( it - > second = = txHash ) {
it = mapProTxRefs . erase ( it ) ;
} else {
+ + it ;
}
}
} ;
2024-04-04 12:02:07 +02:00
if ( tx . nType = = TRANSACTION_PROVIDER_REGISTER ) {
auto proTx = * Assert ( GetTxPayload < CProRegTx > ( tx ) ) ;
2018-12-10 06:03:57 +01:00
if ( ! proTx . collateralOutpoint . IsNull ( ) ) {
2024-04-04 12:02:07 +02:00
eraseProTxRef ( tx . GetHash ( ) , proTx . collateralOutpoint . hash ) ;
2018-12-10 06:03:57 +01:00
}
2018-03-12 12:14:11 +01:00
mapProTxAddresses . erase ( proTx . addr ) ;
2018-02-14 14:42:06 +01:00
mapProTxPubKeyIDs . erase ( proTx . keyIDOwner ) ;
2018-10-21 21:45:16 +02:00
mapProTxBlsPubKeyHashes . erase ( proTx . pubKeyOperator . GetHash ( ) ) ;
2018-10-25 16:29:50 +02:00
mapProTxCollaterals . erase ( proTx . collateralOutpoint ) ;
2024-04-04 12:02:07 +02:00
mapProTxCollaterals . erase ( COutPoint ( tx . GetHash ( ) , proTx . collateralOutpoint . n ) ) ;
} else if ( tx . nType = = TRANSACTION_PROVIDER_UPDATE_SERVICE ) {
auto proTx = * Assert ( GetTxPayload < CProUpServTx > ( tx ) ) ;
eraseProTxRef ( proTx . proTxHash , tx . GetHash ( ) ) ;
2018-03-12 12:14:11 +01:00
mapProTxAddresses . erase ( proTx . addr ) ;
2024-04-04 12:02:07 +02:00
} else if ( tx . nType = = TRANSACTION_PROVIDER_UPDATE_REGISTRAR ) {
auto proTx = * Assert ( GetTxPayload < CProUpRegTx > ( tx ) ) ;
eraseProTxRef ( proTx . proTxHash , tx . GetHash ( ) ) ;
2018-10-21 21:45:16 +02:00
mapProTxBlsPubKeyHashes . erase ( proTx . pubKeyOperator . GetHash ( ) ) ;
2024-04-04 12:02:07 +02:00
} else if ( tx . nType = = TRANSACTION_PROVIDER_UPDATE_REVOKE ) {
auto proTx = * Assert ( GetTxPayload < CProUpRevTx > ( tx ) ) ;
eraseProTxRef ( proTx . proTxHash , tx . GetHash ( ) ) ;
} else if ( tx . nType = = TRANSACTION_ASSET_UNLOCK ) {
mapAssetUnlockExpiry . erase ( tx . GetHash ( ) ) ;
2018-02-14 14:42:06 +01:00
}
2015-07-15 20:47:45 +02:00
}
// Calculates descendants of entry that are not already in setDescendants, and adds to
Merge #19478: Remove CTxMempool::mapLinks data structure member
296be8f58e02b39a58f017c52294aceed22c3ffd Get rid of unused functions CTxMemPool::GetMemPoolChildren, CTxMemPool::GetMemPoolParents (Jeremy Rubin)
46d955d196043cc297834baeebce31ff778dff80 Remove mapLinks in favor of entry inlined structs with iterator type erasure (Jeremy Rubin)
Pull request description:
Currently we have a peculiar data structure in the mempool called maplinks. Maplinks job is to track the in-pool children and parents of each transaction. This PR can be primarily understood and reviewed as a simple refactoring to remove this extra data structure, although it comes with a nice memory and performance improvement for free.
Maplinks is particularly peculiar because removing it is not as simple as just moving it's inner structure to the owning CTxMempoolEntry. Because TxLinks (the class storing the setEntries for parents and children) store txiters to each entry in the mempool corresponding to the parent or child, it means that the TxLinks type is "aware" of the boost multiindex (mapTx) it's coming from, which is in turn, aware of the entry type stored in mapTx. Thus we used maplinks to store this entry associated data we in an entirely separate data structure just to avoid a circular type reference caused by storing a txiter inside a CTxMempoolEntry.
It turns out, we can kill this circular reference by making use of iterator_to multiindex function and std::reference_wrapper. This allows us to get rid of the maplinks data structure and move the ownership of the parents/child sets to the entries themselves.
The benefit of this good all around, for any of the reasons given below the change would be acceptable, and it doesn't make the code harder to reason about or worse in any respect (as far as I can tell, there's no tradeoff).
### Simpler ownership model
No longer having to consistency check that mapLinks did have records for our CTxMempoolEntry, impossible to have a mapLinks entry outlive or incorrectly die before a CTxMempoolEntry.
### Memory Usage
We get rid of a O(Transactions) sized map in the mempool, which is a long lived data structure.
### Performance
If you have a CTxMemPoolEntry, you immediately know the address of it's children/parents, rather than having to do a O(log(Transactions)) lookup via maplinks (which we do very often). We do it in *so many* places that a true benchmark has to look at a full running node, but it is easy enough to show an improvement in this case.
The ComplexMemPool shows a good coherence check that we see the expected result of it being 12.5% faster / 1.14x faster.
```
Before:
# Benchmark, evals, iterations, total, min, max, median
ComplexMemPool, 5, 1, 1.40462, 0.277222, 0.285339, 0.279793
After:
# Benchmark, evals, iterations, total, min, max, median
ComplexMemPool, 5, 1, 1.22586, 0.243831, 0.247076, 0.244596
```
The ComplexMemPool benchmark only checks doing addUnchecked and TrimToSize for 800 transactions. While this bench does a good job of hammering the relevant types of function, it doesn't test everything.
Subbing in 5000 transactions shows a that the advantage isn't completely wiped out by other asymptotic factors (this isn't the only bottleneck in growing the mempool), but it's only a bit proportionally slower (10.8%, 1.12x), which adds evidence that this will be a good change for performance minded users.
```
# Benchmark, evals, iterations, total, min, max, median
ComplexMemPool, 5, 1, 59.1321, 11.5919, 12.235, 11.7068
# Benchmark, evals, iterations, total, min, max, median
ComplexMemPool, 5, 1, 52.1307, 10.2641, 10.5206, 10.4306
```
I don't think it's possible to come up with an example of where a maplinks based design would have better performance, but it's something for reviewers to consider.
# Discussion
## Why maplinks in the first place?
I spoke with the author of mapLinks (sdaftuar) a while back, and my recollection from our conversation was that it was implemented because he did not know how to resolve the circular dependency at the time, and there was no other reason for making it a separate map.
## Is iterator_to weird?
iterator_to is expressly for this purpose, see https://www.boost.org/doc/libs/1_51_0/libs/multi_index/doc/tutorial/indices.html#iterator_to
> iterator_to provides a way to retrieve an iterator to an element from a pointer to the element, thus making iterators and pointers interchangeable for the purposes of element pointing (not so for traversal) in many situations. This notwithstanding, it is not the aim of iterator_to to promote the usage of pointers as substitutes for real iterators: the latter are specifically designed for handling the elements of a container, and not only benefit from the iterator orientation of container interfaces, but are also capable of exposing many more programming bugs than raw pointers, both at compile and run time. iterator_to is thus meant to be used in scenarios where access via iterators is not suitable or desireable:
>
> - Interoperability with preexisting APIs based on pointers or references.
> - Publication of pointer-based interfaces (for instance, when designing a C-compatible library).
> - The exposure of pointers in place of iterators can act as a type erasure barrier effectively decoupling the user of the code from the implementation detail of which particular container is being used. Similar techniques, like the famous Pimpl idiom, are used in large projects to reduce dependencies and build times.
> - Self-referencing contexts where an element acts upon its owner container and no iterator to itself is available.
In other words, iterator_to is the perfect tool for the job by the last reason given. Under the hood it should just be a simple pointer cast and have no major runtime overhead (depending on if the function call is inlined).
Edit by laanwj: removed at sign from the description
ACKs for top commit:
jonatack:
re-ACK 296be8f per `git range-diff ab338a19 3ba1665 296be8f`, sanity check gcc 10.2 debug build is clean.
hebasto:
re-ACK 296be8f58e02b39a58f017c52294aceed22c3ffd, only rebased since my [previous](https://github.com/bitcoin/bitcoin/pull/19478#pullrequestreview-482400727) review (verified with `git range-diff`).
Tree-SHA512: f5c30a4936fcde6ae32a02823c303b3568a747c2681d11f87df88a149f984a6d3b4c81f391859afbeb68864ef7f6a3d8779f74a58e3de701b3d51f78e498682e
2020-09-07 12:06:38 +02:00
// setDescendants. Assumes entryit is already a tx in the mempool and CTxMemPoolEntry::m_children
2015-07-15 20:47:45 +02:00
// is correct for tx and all descendants.
// Also assumes that if an entry is in setDescendants already, then all
// in-mempool descendants of it are already in setDescendants as well, so that we
// can save time by not iterating over those entries.
2021-04-08 21:35:16 +02:00
void CTxMemPool : : CalculateDescendants ( txiter entryit , setEntries & setDescendants ) const
2015-07-15 20:47:45 +02:00
{
setEntries stage ;
if ( setDescendants . count ( entryit ) = = 0 ) {
stage . insert ( entryit ) ;
}
// Traverse down the children of entry, only adding children that are not
// accounted for in setDescendants already (because those children have either
// already been walked, or will be walked in this iteration).
while ( ! stage . empty ( ) ) {
txiter it = * stage . begin ( ) ;
setDescendants . insert ( it ) ;
stage . erase ( it ) ;
Merge #19478: Remove CTxMempool::mapLinks data structure member
296be8f58e02b39a58f017c52294aceed22c3ffd Get rid of unused functions CTxMemPool::GetMemPoolChildren, CTxMemPool::GetMemPoolParents (Jeremy Rubin)
46d955d196043cc297834baeebce31ff778dff80 Remove mapLinks in favor of entry inlined structs with iterator type erasure (Jeremy Rubin)
Pull request description:
Currently we have a peculiar data structure in the mempool called maplinks. Maplinks job is to track the in-pool children and parents of each transaction. This PR can be primarily understood and reviewed as a simple refactoring to remove this extra data structure, although it comes with a nice memory and performance improvement for free.
Maplinks is particularly peculiar because removing it is not as simple as just moving it's inner structure to the owning CTxMempoolEntry. Because TxLinks (the class storing the setEntries for parents and children) store txiters to each entry in the mempool corresponding to the parent or child, it means that the TxLinks type is "aware" of the boost multiindex (mapTx) it's coming from, which is in turn, aware of the entry type stored in mapTx. Thus we used maplinks to store this entry associated data we in an entirely separate data structure just to avoid a circular type reference caused by storing a txiter inside a CTxMempoolEntry.
It turns out, we can kill this circular reference by making use of iterator_to multiindex function and std::reference_wrapper. This allows us to get rid of the maplinks data structure and move the ownership of the parents/child sets to the entries themselves.
The benefit of this good all around, for any of the reasons given below the change would be acceptable, and it doesn't make the code harder to reason about or worse in any respect (as far as I can tell, there's no tradeoff).
### Simpler ownership model
No longer having to consistency check that mapLinks did have records for our CTxMempoolEntry, impossible to have a mapLinks entry outlive or incorrectly die before a CTxMempoolEntry.
### Memory Usage
We get rid of a O(Transactions) sized map in the mempool, which is a long lived data structure.
### Performance
If you have a CTxMemPoolEntry, you immediately know the address of it's children/parents, rather than having to do a O(log(Transactions)) lookup via maplinks (which we do very often). We do it in *so many* places that a true benchmark has to look at a full running node, but it is easy enough to show an improvement in this case.
The ComplexMemPool shows a good coherence check that we see the expected result of it being 12.5% faster / 1.14x faster.
```
Before:
# Benchmark, evals, iterations, total, min, max, median
ComplexMemPool, 5, 1, 1.40462, 0.277222, 0.285339, 0.279793
After:
# Benchmark, evals, iterations, total, min, max, median
ComplexMemPool, 5, 1, 1.22586, 0.243831, 0.247076, 0.244596
```
The ComplexMemPool benchmark only checks doing addUnchecked and TrimToSize for 800 transactions. While this bench does a good job of hammering the relevant types of function, it doesn't test everything.
Subbing in 5000 transactions shows a that the advantage isn't completely wiped out by other asymptotic factors (this isn't the only bottleneck in growing the mempool), but it's only a bit proportionally slower (10.8%, 1.12x), which adds evidence that this will be a good change for performance minded users.
```
# Benchmark, evals, iterations, total, min, max, median
ComplexMemPool, 5, 1, 59.1321, 11.5919, 12.235, 11.7068
# Benchmark, evals, iterations, total, min, max, median
ComplexMemPool, 5, 1, 52.1307, 10.2641, 10.5206, 10.4306
```
I don't think it's possible to come up with an example of where a maplinks based design would have better performance, but it's something for reviewers to consider.
# Discussion
## Why maplinks in the first place?
I spoke with the author of mapLinks (sdaftuar) a while back, and my recollection from our conversation was that it was implemented because he did not know how to resolve the circular dependency at the time, and there was no other reason for making it a separate map.
## Is iterator_to weird?
iterator_to is expressly for this purpose, see https://www.boost.org/doc/libs/1_51_0/libs/multi_index/doc/tutorial/indices.html#iterator_to
> iterator_to provides a way to retrieve an iterator to an element from a pointer to the element, thus making iterators and pointers interchangeable for the purposes of element pointing (not so for traversal) in many situations. This notwithstanding, it is not the aim of iterator_to to promote the usage of pointers as substitutes for real iterators: the latter are specifically designed for handling the elements of a container, and not only benefit from the iterator orientation of container interfaces, but are also capable of exposing many more programming bugs than raw pointers, both at compile and run time. iterator_to is thus meant to be used in scenarios where access via iterators is not suitable or desireable:
>
> - Interoperability with preexisting APIs based on pointers or references.
> - Publication of pointer-based interfaces (for instance, when designing a C-compatible library).
> - The exposure of pointers in place of iterators can act as a type erasure barrier effectively decoupling the user of the code from the implementation detail of which particular container is being used. Similar techniques, like the famous Pimpl idiom, are used in large projects to reduce dependencies and build times.
> - Self-referencing contexts where an element acts upon its owner container and no iterator to itself is available.
In other words, iterator_to is the perfect tool for the job by the last reason given. Under the hood it should just be a simple pointer cast and have no major runtime overhead (depending on if the function call is inlined).
Edit by laanwj: removed at sign from the description
ACKs for top commit:
jonatack:
re-ACK 296be8f per `git range-diff ab338a19 3ba1665 296be8f`, sanity check gcc 10.2 debug build is clean.
hebasto:
re-ACK 296be8f58e02b39a58f017c52294aceed22c3ffd, only rebased since my [previous](https://github.com/bitcoin/bitcoin/pull/19478#pullrequestreview-482400727) review (verified with `git range-diff`).
Tree-SHA512: f5c30a4936fcde6ae32a02823c303b3568a747c2681d11f87df88a149f984a6d3b4c81f391859afbeb68864ef7f6a3d8779f74a58e3de701b3d51f78e498682e
2020-09-07 12:06:38 +02:00
const CTxMemPoolEntry : : Children & children = it - > GetMemPoolChildrenConst ( ) ;
for ( const CTxMemPoolEntry & child : children ) {
txiter childiter = mapTx . iterator_to ( child ) ;
2015-07-15 20:47:45 +02:00
if ( ! setDescendants . count ( childiter ) ) {
stage . insert ( childiter ) ;
}
}
}
}
2013-08-27 07:51:57 +02:00
2017-01-24 10:07:50 +01:00
void CTxMemPool : : removeRecursive ( const CTransaction & origTx , MemPoolRemovalReason reason )
2013-08-27 07:51:57 +02:00
{
// Remove transaction from memory pool
2022-05-05 16:24:51 +02:00
AssertLockHeld ( cs ) ;
2015-07-15 20:47:45 +02:00
setEntries txToRemove ;
txiter origit = mapTx . find ( origTx . GetHash ( ) ) ;
if ( origit ! = mapTx . end ( ) ) {
txToRemove . insert ( origit ) ;
2016-03-17 13:33:31 +01:00
} else {
// When recursively removing but origTx isn't in the mempool
2015-03-25 18:13:09 +01:00
// be sure to remove any children that are in the pool. This can
// happen during chain re-orgs if origTx isn't re-accepted into
// the mempool for any reason.
for ( unsigned int i = 0 ; i < origTx . vout . size ( ) ; i + + ) {
2016-06-03 01:11:02 +02:00
auto it = mapNextTx . find ( COutPoint ( origTx . GetHash ( ) , i ) ) ;
2015-03-25 18:13:09 +01:00
if ( it = = mapNextTx . end ( ) )
continue ;
2016-06-03 01:11:02 +02:00
txiter nextit = mapTx . find ( it - > second - > GetHash ( ) ) ;
2015-07-15 20:47:45 +02:00
assert ( nextit ! = mapTx . end ( ) ) ;
txToRemove . insert ( nextit ) ;
2015-03-25 18:13:09 +01:00
}
}
2015-07-15 20:47:45 +02:00
setEntries setAllRemoves ;
2019-07-05 09:06:28 +02:00
for ( txiter it : txToRemove ) {
2016-03-17 13:33:31 +01:00
CalculateDescendants ( it , setAllRemoves ) ;
2015-07-15 20:47:45 +02:00
}
2017-01-24 10:07:50 +01:00
RemoveStaged ( setAllRemoves , false , reason ) ;
2013-08-27 07:51:57 +02:00
}
2023-03-23 08:47:33 +01:00
void CTxMemPool : : removeForReorg ( CChainState & active_chainstate , int flags ) EXCLUSIVE_LOCKS_REQUIRED ( cs_main )
2014-11-12 05:57:54 +01:00
{
2015-09-06 06:40:21 +02:00
// Remove transactions spending a coinbase which are now immature and no-longer-final transactions
2022-05-05 16:24:51 +02:00
AssertLockHeld ( cs ) ;
2016-10-25 14:00:27 +02:00
setEntries txToRemove ;
2015-06-24 10:32:20 +02:00
for ( indexed_transaction_set : : const_iterator it = mapTx . begin ( ) ; it ! = mapTx . end ( ) ; it + + ) {
const CTransaction & tx = it - > GetTx ( ) ;
2015-12-04 21:01:22 +01:00
LockPoints lp = it - > GetLockPoints ( ) ;
2023-03-23 08:47:33 +01:00
bool validLP = TestLockPointValidity ( active_chainstate . m_chain , & lp ) ;
2024-01-18 20:17:25 +01:00
CCoinsViewMemPool view_mempool ( & active_chainstate . CoinsTip ( ) , * this ) ;
2024-01-29 16:01:14 +01:00
if ( ! CheckFinalTx ( active_chainstate . m_chain . Tip ( ) , tx , flags )
2024-01-18 20:17:25 +01:00
| | ! CheckSequenceLocks ( active_chainstate . m_chain . Tip ( ) , view_mempool , tx , flags , & lp , validLP ) ) {
2015-12-04 21:01:22 +01:00
// Note if CheckSequenceLocks fails the LockPoints may still be invalid
// So it's critical that we remove the tx and not depend on the LockPoints.
2016-10-25 14:00:27 +02:00
txToRemove . insert ( it ) ;
2015-10-29 19:06:13 +01:00
} else if ( it - > GetSpendsCoinbase ( ) ) {
2019-07-05 09:06:28 +02:00
for ( const CTxIn & txin : tx . vin ) {
2015-08-27 03:58:17 +02:00
indexed_transaction_set : : const_iterator it2 = mapTx . find ( txin . prevout . hash ) ;
if ( it2 ! = mapTx . end ( ) )
continue ;
2023-03-23 08:47:33 +01:00
const Coin & coin = active_chainstate . CoinsTip ( ) . AccessCoin ( txin . prevout ) ;
2023-02-20 20:31:40 +01:00
if ( m_check_ratio ! = 0 ) assert ( ! coin . IsSpent ( ) ) ;
2023-03-23 08:47:33 +01:00
unsigned int nMemPoolHeight = active_chainstate . m_chain . Tip ( ) - > nHeight + 1 ;
2017-06-02 00:47:58 +02:00
if ( coin . IsSpent ( ) | | ( coin . IsCoinBase ( ) & & ( ( signed long ) nMemPoolHeight ) - coin . nHeight < COINBASE_MATURITY ) ) {
2016-10-25 14:00:27 +02:00
txToRemove . insert ( it ) ;
2015-08-27 03:58:17 +02:00
break ;
}
2014-11-12 05:57:54 +01:00
}
}
2015-12-04 21:01:22 +01:00
if ( ! validLP ) {
mapTx . modify ( it , update_lock_points ( lp ) ) ;
}
2014-11-12 05:57:54 +01:00
}
2016-10-25 14:00:27 +02:00
setEntries setAllRemoves ;
for ( txiter it : txToRemove ) {
CalculateDescendants ( it , setAllRemoves ) ;
2014-11-12 05:57:54 +01:00
}
2017-01-24 10:07:50 +01:00
RemoveStaged ( setAllRemoves , false , MemPoolRemovalReason : : REORG ) ;
2014-11-12 05:57:54 +01:00
}
2016-12-10 01:22:33 +01:00
void CTxMemPool : : removeConflicts ( const CTransaction & tx )
2013-08-27 07:51:57 +02:00
{
// Remove transactions which depend on inputs of tx, recursively
2018-07-22 15:41:34 +02:00
AssertLockHeld ( cs ) ;
2019-07-05 09:06:28 +02:00
for ( const CTxIn & txin : tx . vin ) {
2016-06-03 01:11:02 +02:00
auto it = mapNextTx . find ( txin . prevout ) ;
2013-08-27 07:51:57 +02:00
if ( it ! = mapNextTx . end ( ) ) {
2016-06-03 01:11:02 +02:00
const CTransaction & txConflict = * it - > second ;
2013-08-27 07:51:57 +02:00
if ( txConflict ! = tx )
2014-02-15 22:38:28 +01:00
{
2021-05-14 19:55:03 +02:00
if ( txConflict . nType = = TRANSACTION_PROVIDER_REGISTER ) {
// Remove all other protxes which refer to this protx
// NOTE: Can't use equal_range here as every call to removeRecursive might invalidate iterators
while ( true ) {
auto itPro = mapProTxRefs . find ( txConflict . GetHash ( ) ) ;
if ( itPro = = mapProTxRefs . end ( ) ) {
break ;
}
auto txit = mapTx . find ( itPro - > second ) ;
if ( txit ! = mapTx . end ( ) ) {
ClearPrioritisation ( txit - > GetTx ( ) . GetHash ( ) ) ;
removeRecursive ( txit - > GetTx ( ) , MemPoolRemovalReason : : CONFLICT ) ;
} else {
mapProTxRefs . erase ( itPro ) ;
}
}
}
2015-08-03 21:49:01 +02:00
ClearPrioritisation ( txConflict . GetHash ( ) ) ;
2017-01-24 10:07:50 +01:00
removeRecursive ( txConflict , MemPoolRemovalReason : : CONFLICT ) ;
2014-02-15 22:38:28 +01:00
}
2013-08-27 07:51:57 +02:00
}
}
}
2018-03-19 08:44:00 +01:00
void CTxMemPool : : removeProTxPubKeyConflicts ( const CTransaction & tx , const CKeyID & keyId )
{
if ( mapProTxPubKeyIDs . count ( keyId ) ) {
uint256 conflictHash = mapProTxPubKeyIDs [ keyId ] ;
if ( conflictHash ! = tx . GetHash ( ) & & mapTx . count ( conflictHash ) ) {
removeRecursive ( mapTx . find ( conflictHash ) - > GetTx ( ) , MemPoolRemovalReason : : CONFLICT ) ;
}
}
}
2023-06-04 22:45:56 +02:00
void CTxMemPool : : removeProTxPubKeyConflicts ( const CTransaction & tx , const CBLSLazyPublicKey & pubKey )
2018-10-21 21:45:16 +02:00
{
if ( mapProTxBlsPubKeyHashes . count ( pubKey . GetHash ( ) ) ) {
uint256 conflictHash = mapProTxBlsPubKeyHashes [ pubKey . GetHash ( ) ] ;
if ( conflictHash ! = tx . GetHash ( ) & & mapTx . count ( conflictHash ) ) {
removeRecursive ( mapTx . find ( conflictHash ) - > GetTx ( ) , MemPoolRemovalReason : : CONFLICT ) ;
}
}
}
2018-10-25 16:29:50 +02:00
void CTxMemPool : : removeProTxCollateralConflicts ( const CTransaction & tx , const COutPoint & collateralOutpoint )
{
if ( mapProTxCollaterals . count ( collateralOutpoint ) ) {
uint256 conflictHash = mapProTxCollaterals [ collateralOutpoint ] ;
if ( conflictHash ! = tx . GetHash ( ) & & mapTx . count ( conflictHash ) ) {
removeRecursive ( mapTx . find ( conflictHash ) - > GetTx ( ) , MemPoolRemovalReason : : CONFLICT ) ;
}
}
}
2018-12-10 06:03:57 +01:00
void CTxMemPool : : removeProTxSpentCollateralConflicts ( const CTransaction & tx )
{
2024-04-04 12:27:33 +02:00
assert ( m_dmnman ) ;
2024-04-04 12:02:07 +02:00
2018-12-10 06:03:57 +01:00
// Remove TXs that refer to a MN for which the collateral was spent
auto removeSpentCollateralConflict = [ & ] ( const uint256 & proTxHash ) {
// Can't use equal_range here as every call to removeRecursive might invalidate iterators
2021-06-23 10:12:11 +02:00
AssertLockHeld ( cs ) ;
2018-12-10 06:03:57 +01:00
while ( true ) {
auto it = mapProTxRefs . find ( proTxHash ) ;
if ( it = = mapProTxRefs . end ( ) ) {
break ;
}
auto conflictIt = mapTx . find ( it - > second ) ;
if ( conflictIt ! = mapTx . end ( ) ) {
removeRecursive ( conflictIt - > GetTx ( ) , MemPoolRemovalReason : : CONFLICT ) ;
} else {
// Should not happen as we track referencing TXs in addUnchecked/removeUnchecked.
// But lets be on the safe side and not run into an endless loop...
2020-01-28 11:04:47 +01:00
LogPrint ( BCLog : : MEMPOOL , " %s: ERROR: found invalid TX ref in mapProTxRefs, proTxHash=%s, txHash=%s \n " , __func__ , proTxHash . ToString ( ) , it - > second . ToString ( ) ) ;
2018-12-10 06:03:57 +01:00
mapProTxRefs . erase ( it ) ;
}
}
} ;
2024-04-04 12:27:33 +02:00
auto mnList = m_dmnman - > GetListAtChainTip ( ) ;
2018-12-10 06:03:57 +01:00
for ( const auto & in : tx . vin ) {
auto collateralIt = mapProTxCollaterals . find ( in . prevout ) ;
if ( collateralIt ! = mapProTxCollaterals . end ( ) ) {
// These are not yet mined ProRegTxs
removeSpentCollateralConflict ( collateralIt - > second ) ;
}
auto dmn = mnList . GetMNByCollateral ( in . prevout ) ;
if ( dmn ) {
2021-07-17 21:15:21 +02:00
// These are updates referring to a mined ProRegTx
2018-12-10 06:03:57 +01:00
removeSpentCollateralConflict ( dmn - > proTxHash ) ;
}
}
}
2018-12-10 09:14:19 +01:00
void CTxMemPool : : removeProTxKeyChangedConflicts ( const CTransaction & tx , const uint256 & proTxHash , const uint256 & newKeyHash )
{
std : : set < uint256 > conflictingTxs ;
for ( auto its = mapProTxRefs . equal_range ( proTxHash ) ; its . first ! = its . second ; + + its . first ) {
auto txit = mapTx . find ( its . first - > second ) ;
if ( txit = = mapTx . end ( ) ) {
continue ;
}
if ( txit - > validForProTxKey ! = newKeyHash ) {
conflictingTxs . emplace ( txit - > GetTx ( ) . GetHash ( ) ) ;
}
}
for ( const auto & txHash : conflictingTxs ) {
auto & tx = mapTx . find ( txHash ) - > GetTx ( ) ;
removeRecursive ( tx , MemPoolRemovalReason : : CONFLICT ) ;
}
}
2018-02-14 14:42:06 +01:00
void CTxMemPool : : removeProTxConflicts ( const CTransaction & tx )
{
2018-12-10 06:03:57 +01:00
removeProTxSpentCollateralConflicts ( tx ) ;
2018-03-12 12:14:11 +01:00
if ( tx . nType = = TRANSACTION_PROVIDER_REGISTER ) {
2024-01-12 04:43:01 +01:00
const auto opt_proTx = GetTxPayload < CProRegTx > ( tx ) ;
if ( ! opt_proTx ) {
2023-07-17 23:03:40 +02:00
LogPrint ( BCLog : : MEMPOOL , " %s: ERROR: Invalid transaction payload, tx: %s \n " , __func__ , tx . GetHash ( ) . ToString ( ) ) ;
2018-11-13 13:47:28 +01:00
return ;
2018-03-12 12:14:11 +01:00
}
2024-01-12 04:43:01 +01:00
auto & proTx = * opt_proTx ;
2018-02-14 14:42:06 +01:00
2018-03-12 12:14:11 +01:00
if ( mapProTxAddresses . count ( proTx . addr ) ) {
uint256 conflictHash = mapProTxAddresses [ proTx . addr ] ;
if ( conflictHash ! = tx . GetHash ( ) & & mapTx . count ( conflictHash ) ) {
removeRecursive ( mapTx . find ( conflictHash ) - > GetTx ( ) , MemPoolRemovalReason : : CONFLICT ) ;
}
2018-02-14 14:42:06 +01:00
}
2018-03-19 08:44:00 +01:00
removeProTxPubKeyConflicts ( tx , proTx . keyIDOwner ) ;
2018-10-21 21:45:16 +02:00
removeProTxPubKeyConflicts ( tx , proTx . pubKeyOperator ) ;
2018-10-25 16:29:50 +02:00
if ( ! proTx . collateralOutpoint . hash . IsNull ( ) ) {
removeProTxCollateralConflicts ( tx , proTx . collateralOutpoint ) ;
2021-05-14 19:55:03 +02:00
} else {
removeProTxCollateralConflicts ( tx , COutPoint ( tx . GetHash ( ) , proTx . collateralOutpoint . n ) ) ;
2018-10-25 16:29:50 +02:00
}
2018-03-12 12:14:11 +01:00
} else if ( tx . nType = = TRANSACTION_PROVIDER_UPDATE_SERVICE ) {
2024-01-12 04:43:01 +01:00
const auto opt_proTx = GetTxPayload < CProUpServTx > ( tx ) ;
if ( ! opt_proTx ) {
2023-07-17 23:03:40 +02:00
LogPrint ( BCLog : : MEMPOOL , " %s: ERROR: Invalid transaction payload, tx: %s \n " , __func__ , tx . GetHash ( ) . ToString ( ) ) ;
2018-11-13 13:47:28 +01:00
return ;
2018-03-12 12:14:11 +01:00
}
2024-01-12 04:43:01 +01:00
if ( mapProTxAddresses . count ( opt_proTx - > addr ) ) {
uint256 conflictHash = mapProTxAddresses [ opt_proTx - > addr ] ;
2018-03-12 12:14:11 +01:00
if ( conflictHash ! = tx . GetHash ( ) & & mapTx . count ( conflictHash ) ) {
removeRecursive ( mapTx . find ( conflictHash ) - > GetTx ( ) , MemPoolRemovalReason : : CONFLICT ) ;
}
2018-02-14 14:42:06 +01:00
}
2018-03-19 08:44:00 +01:00
} else if ( tx . nType = = TRANSACTION_PROVIDER_UPDATE_REGISTRAR ) {
2024-01-12 04:43:01 +01:00
const auto opt_proTx = GetTxPayload < CProUpRegTx > ( tx ) ;
if ( ! opt_proTx ) {
2023-07-17 23:03:40 +02:00
LogPrint ( BCLog : : MEMPOOL , " %s: ERROR: Invalid transaction payload, tx: %s \n " , __func__ , tx . GetHash ( ) . ToString ( ) ) ;
2018-11-13 13:47:28 +01:00
return ;
2018-03-19 08:44:00 +01:00
}
2024-01-12 04:43:01 +01:00
removeProTxPubKeyConflicts ( tx , opt_proTx - > pubKeyOperator ) ;
removeProTxKeyChangedConflicts ( tx , opt_proTx - > proTxHash , : : SerializeHash ( opt_proTx - > pubKeyOperator ) ) ;
2018-12-10 09:14:19 +01:00
} else if ( tx . nType = = TRANSACTION_PROVIDER_UPDATE_REVOKE ) {
2024-01-12 04:43:01 +01:00
const auto opt_proTx = GetTxPayload < CProUpRevTx > ( tx ) ;
if ( ! opt_proTx ) {
2023-07-17 23:03:40 +02:00
LogPrint ( BCLog : : MEMPOOL , " %s: ERROR: Invalid transaction payload, tx: %s \n " , __func__ , tx . GetHash ( ) . ToString ( ) ) ;
2018-12-10 09:14:19 +01:00
return ;
}
2024-01-12 04:43:01 +01:00
removeProTxKeyChangedConflicts ( tx , opt_proTx - > proTxHash , : : SerializeHash ( CBLSPublicKey ( ) ) ) ;
2018-02-14 14:42:06 +01:00
}
}
2014-11-17 03:29:09 +01:00
/**
* Called when a block is connected . Removes from mempool and updates the miner fee estimator .
*/
2017-01-05 23:14:23 +01:00
void CTxMemPool : : removeForBlock ( const std : : vector < CTransactionRef > & vtx , unsigned int nBlockHeight )
2014-03-17 13:19:54 +01:00
{
2022-05-05 16:24:51 +02:00
AssertLockHeld ( cs ) ;
2017-01-05 23:14:23 +01:00
std : : vector < const CTxMemPoolEntry * > entries ;
2016-11-21 10:51:32 +01:00
for ( const auto & tx : vtx )
2014-03-17 13:19:54 +01:00
{
2016-11-21 10:51:32 +01:00
uint256 hash = tx - > GetHash ( ) ;
2015-06-24 10:32:20 +02:00
indexed_transaction_set : : iterator i = mapTx . find ( hash ) ;
if ( i ! = mapTx . end ( ) )
2017-01-05 23:14:23 +01:00
entries . push_back ( & * i ) ;
2014-03-17 13:19:54 +01:00
}
2017-01-05 23:14:23 +01:00
// Before the txs in the new block have been removed from the mempool, update policy estimates
2017-04-20 21:16:19 +02:00
if ( minerPolicyEstimator ) { minerPolicyEstimator - > processBlock ( nBlockHeight , entries ) ; }
2016-11-21 10:51:32 +01:00
for ( const auto & tx : vtx )
2014-03-17 13:19:54 +01:00
{
2016-11-21 10:51:32 +01:00
txiter it = mapTx . find ( tx - > GetHash ( ) ) ;
2016-03-17 13:33:31 +01:00
if ( it ! = mapTx . end ( ) ) {
setEntries stage ;
stage . insert ( it ) ;
2017-01-24 10:07:50 +01:00
RemoveStaged ( stage , true , MemPoolRemovalReason : : BLOCK ) ;
2016-03-17 13:33:31 +01:00
}
2016-12-10 01:22:33 +01:00
removeConflicts ( * tx ) ;
2024-04-04 12:27:33 +02:00
if ( m_dmnman ) {
2024-04-04 12:02:07 +02:00
removeProTxConflicts ( * tx ) ;
}
2016-11-21 10:51:32 +01:00
ClearPrioritisation ( tx - > GetHash ( ) ) ;
2014-03-17 13:19:54 +01:00
}
2015-10-02 23:19:55 +02:00
lastRollingFeeUpdate = GetTime ( ) ;
blockSinceLastRollingFeeBump = true ;
2014-03-17 13:19:54 +01:00
}
2023-07-24 18:39:38 +02:00
/**
* Called when a lenght of chain is increased . Removes from mempool expired asset - unlock transactions
*/
2020-05-11 05:33:42 +02:00
void CTxMemPool : : removeExpiredAssetUnlock ( int nBlockHeight )
2023-07-24 18:39:38 +02:00
{
AssertLockHeld ( cs ) ;
// items to removed should be firstly collected to independed list,
// because removing items by `removeRecursive` changes the mapAssetUnlockExpiry
std : : vector < CTransactionRef > entries ;
for ( const auto & item : mapAssetUnlockExpiry ) {
if ( item . second < nBlockHeight ) {
entries . push_back ( get ( item . first ) ) ;
}
}
for ( const auto & tx : entries ) {
removeRecursive ( * tx , MemPoolRemovalReason : : EXPIRY ) ;
}
}
2015-10-26 14:55:17 +01:00
void CTxMemPool : : _clear ( )
2013-08-27 07:51:57 +02:00
{
2022-04-06 14:02:04 +02:00
vTxHashes . clear ( ) ;
2013-08-27 07:51:57 +02:00
mapTx . clear ( ) ;
mapNextTx . clear ( ) ;
2018-03-12 12:14:11 +01:00
mapProTxAddresses . clear ( ) ;
2018-02-14 14:42:06 +01:00
mapProTxPubKeyIDs . clear ( ) ;
2014-08-07 05:58:19 +02:00
totalTxSize = 0 ;
2021-02-08 20:36:33 +01:00
m_total_fee = 0 ;
2015-07-09 19:56:31 +02:00
cachedInnerUsage = 0 ;
2015-10-02 23:19:55 +02:00
lastRollingFeeUpdate = GetTime ( ) ;
blockSinceLastRollingFeeBump = false ;
rollingMinimumFeeRate = 0 ;
2013-08-27 07:51:57 +02:00
+ + nTransactionsUpdated ;
}
2015-10-26 14:55:17 +01:00
void CTxMemPool : : clear ( )
{
LOCK ( cs ) ;
_clear ( ) ;
}
2017-10-11 10:42:55 +02:00
static void CheckInputsAndUpdateCoins ( const CTransaction & tx , CCoinsViewCache & mempoolDuplicate , const int64_t spendheight )
{
2019-10-30 15:27:22 +01:00
TxValidationState dummy_state ; // Not used. CheckTxInputs() should always pass
2017-10-11 10:42:55 +02:00
CAmount txfee = 0 ;
2019-10-30 15:27:22 +01:00
bool fCheckResult = tx . IsCoinBase ( ) | | Consensus : : CheckTxInputs ( tx , dummy_state , mempoolDuplicate , spendheight , txfee ) ;
2017-10-11 10:42:55 +02:00
assert ( fCheckResult ) ;
2019-05-29 12:21:33 +02:00
UpdateCoins ( tx , mempoolDuplicate , std : : numeric_limits < int > : : max ( ) ) ;
2017-10-11 10:42:55 +02:00
}
2023-03-13 16:37:00 +01:00
void CTxMemPool : : check ( CChainState & active_chainstate ) const
2013-08-27 07:51:57 +02:00
{
2023-02-20 20:31:40 +01:00
if ( m_check_ratio = = 0 ) return ;
2015-10-07 23:34:55 +02:00
2023-02-20 20:31:40 +01:00
if ( GetRand ( m_check_ratio ) > = 1 ) return ;
2013-08-27 07:51:57 +02:00
2021-01-20 20:43:27 +01:00
AssertLockHeld ( : : cs_main ) ;
2023-02-20 20:31:40 +01:00
LOCK ( cs ) ;
2019-05-22 23:51:39 +02:00
LogPrint ( BCLog : : MEMPOOL , " Checking mempool with %u transactions and %u inputs \n " , ( unsigned int ) mapTx . size ( ) , ( unsigned int ) mapNextTx . size ( ) ) ;
2013-08-27 07:51:57 +02:00
2014-08-07 05:58:19 +02:00
uint64_t checkTotal = 0 ;
2021-02-08 20:36:33 +01:00
CAmount check_total_fee { 0 } ;
2015-07-09 19:56:31 +02:00
uint64_t innerUsage = 0 ;
2014-08-07 05:58:19 +02:00
2023-03-13 16:37:00 +01:00
CCoinsViewCache & active_coins_tip = active_chainstate . CoinsTip ( ) ;
CCoinsViewCache mempoolDuplicate ( const_cast < CCoinsViewCache * > ( & active_coins_tip ) ) ;
const int64_t spendheight = active_chainstate . m_chain . Height ( ) + 1 ;
2014-11-12 06:06:15 +01:00
2017-01-30 13:13:07 +01:00
std : : list < const CTxMemPoolEntry * > waitingOnDependants ;
2015-06-24 10:32:20 +02:00
for ( indexed_transaction_set : : const_iterator it = mapTx . begin ( ) ; it ! = mapTx . end ( ) ; it + + ) {
2013-08-27 07:51:57 +02:00
unsigned int i = 0 ;
2015-06-24 10:32:20 +02:00
checkTotal + = it - > GetTxSize ( ) ;
2021-02-08 20:36:33 +01:00
check_total_fee + = it - > GetFee ( ) ;
2015-06-24 10:32:20 +02:00
innerUsage + = it - > DynamicMemoryUsage ( ) ;
const CTransaction & tx = it - > GetTx ( ) ;
Merge #19478: Remove CTxMempool::mapLinks data structure member
296be8f58e02b39a58f017c52294aceed22c3ffd Get rid of unused functions CTxMemPool::GetMemPoolChildren, CTxMemPool::GetMemPoolParents (Jeremy Rubin)
46d955d196043cc297834baeebce31ff778dff80 Remove mapLinks in favor of entry inlined structs with iterator type erasure (Jeremy Rubin)
Pull request description:
Currently we have a peculiar data structure in the mempool called maplinks. Maplinks job is to track the in-pool children and parents of each transaction. This PR can be primarily understood and reviewed as a simple refactoring to remove this extra data structure, although it comes with a nice memory and performance improvement for free.
Maplinks is particularly peculiar because removing it is not as simple as just moving it's inner structure to the owning CTxMempoolEntry. Because TxLinks (the class storing the setEntries for parents and children) store txiters to each entry in the mempool corresponding to the parent or child, it means that the TxLinks type is "aware" of the boost multiindex (mapTx) it's coming from, which is in turn, aware of the entry type stored in mapTx. Thus we used maplinks to store this entry associated data we in an entirely separate data structure just to avoid a circular type reference caused by storing a txiter inside a CTxMempoolEntry.
It turns out, we can kill this circular reference by making use of iterator_to multiindex function and std::reference_wrapper. This allows us to get rid of the maplinks data structure and move the ownership of the parents/child sets to the entries themselves.
The benefit of this good all around, for any of the reasons given below the change would be acceptable, and it doesn't make the code harder to reason about or worse in any respect (as far as I can tell, there's no tradeoff).
### Simpler ownership model
No longer having to consistency check that mapLinks did have records for our CTxMempoolEntry, impossible to have a mapLinks entry outlive or incorrectly die before a CTxMempoolEntry.
### Memory Usage
We get rid of a O(Transactions) sized map in the mempool, which is a long lived data structure.
### Performance
If you have a CTxMemPoolEntry, you immediately know the address of it's children/parents, rather than having to do a O(log(Transactions)) lookup via maplinks (which we do very often). We do it in *so many* places that a true benchmark has to look at a full running node, but it is easy enough to show an improvement in this case.
The ComplexMemPool shows a good coherence check that we see the expected result of it being 12.5% faster / 1.14x faster.
```
Before:
# Benchmark, evals, iterations, total, min, max, median
ComplexMemPool, 5, 1, 1.40462, 0.277222, 0.285339, 0.279793
After:
# Benchmark, evals, iterations, total, min, max, median
ComplexMemPool, 5, 1, 1.22586, 0.243831, 0.247076, 0.244596
```
The ComplexMemPool benchmark only checks doing addUnchecked and TrimToSize for 800 transactions. While this bench does a good job of hammering the relevant types of function, it doesn't test everything.
Subbing in 5000 transactions shows a that the advantage isn't completely wiped out by other asymptotic factors (this isn't the only bottleneck in growing the mempool), but it's only a bit proportionally slower (10.8%, 1.12x), which adds evidence that this will be a good change for performance minded users.
```
# Benchmark, evals, iterations, total, min, max, median
ComplexMemPool, 5, 1, 59.1321, 11.5919, 12.235, 11.7068
# Benchmark, evals, iterations, total, min, max, median
ComplexMemPool, 5, 1, 52.1307, 10.2641, 10.5206, 10.4306
```
I don't think it's possible to come up with an example of where a maplinks based design would have better performance, but it's something for reviewers to consider.
# Discussion
## Why maplinks in the first place?
I spoke with the author of mapLinks (sdaftuar) a while back, and my recollection from our conversation was that it was implemented because he did not know how to resolve the circular dependency at the time, and there was no other reason for making it a separate map.
## Is iterator_to weird?
iterator_to is expressly for this purpose, see https://www.boost.org/doc/libs/1_51_0/libs/multi_index/doc/tutorial/indices.html#iterator_to
> iterator_to provides a way to retrieve an iterator to an element from a pointer to the element, thus making iterators and pointers interchangeable for the purposes of element pointing (not so for traversal) in many situations. This notwithstanding, it is not the aim of iterator_to to promote the usage of pointers as substitutes for real iterators: the latter are specifically designed for handling the elements of a container, and not only benefit from the iterator orientation of container interfaces, but are also capable of exposing many more programming bugs than raw pointers, both at compile and run time. iterator_to is thus meant to be used in scenarios where access via iterators is not suitable or desireable:
>
> - Interoperability with preexisting APIs based on pointers or references.
> - Publication of pointer-based interfaces (for instance, when designing a C-compatible library).
> - The exposure of pointers in place of iterators can act as a type erasure barrier effectively decoupling the user of the code from the implementation detail of which particular container is being used. Similar techniques, like the famous Pimpl idiom, are used in large projects to reduce dependencies and build times.
> - Self-referencing contexts where an element acts upon its owner container and no iterator to itself is available.
In other words, iterator_to is the perfect tool for the job by the last reason given. Under the hood it should just be a simple pointer cast and have no major runtime overhead (depending on if the function call is inlined).
Edit by laanwj: removed at sign from the description
ACKs for top commit:
jonatack:
re-ACK 296be8f per `git range-diff ab338a19 3ba1665 296be8f`, sanity check gcc 10.2 debug build is clean.
hebasto:
re-ACK 296be8f58e02b39a58f017c52294aceed22c3ffd, only rebased since my [previous](https://github.com/bitcoin/bitcoin/pull/19478#pullrequestreview-482400727) review (verified with `git range-diff`).
Tree-SHA512: f5c30a4936fcde6ae32a02823c303b3568a747c2681d11f87df88a149f984a6d3b4c81f391859afbeb68864ef7f6a3d8779f74a58e3de701b3d51f78e498682e
2020-09-07 12:06:38 +02:00
innerUsage + = memusage : : DynamicUsage ( it - > GetMemPoolParentsConst ( ) ) + memusage : : DynamicUsage ( it - > GetMemPoolChildrenConst ( ) ) ;
2014-11-12 06:06:15 +01:00
bool fDependsWait = false ;
Merge #19478: Remove CTxMempool::mapLinks data structure member
296be8f58e02b39a58f017c52294aceed22c3ffd Get rid of unused functions CTxMemPool::GetMemPoolChildren, CTxMemPool::GetMemPoolParents (Jeremy Rubin)
46d955d196043cc297834baeebce31ff778dff80 Remove mapLinks in favor of entry inlined structs with iterator type erasure (Jeremy Rubin)
Pull request description:
Currently we have a peculiar data structure in the mempool called maplinks. Maplinks job is to track the in-pool children and parents of each transaction. This PR can be primarily understood and reviewed as a simple refactoring to remove this extra data structure, although it comes with a nice memory and performance improvement for free.
Maplinks is particularly peculiar because removing it is not as simple as just moving it's inner structure to the owning CTxMempoolEntry. Because TxLinks (the class storing the setEntries for parents and children) store txiters to each entry in the mempool corresponding to the parent or child, it means that the TxLinks type is "aware" of the boost multiindex (mapTx) it's coming from, which is in turn, aware of the entry type stored in mapTx. Thus we used maplinks to store this entry associated data we in an entirely separate data structure just to avoid a circular type reference caused by storing a txiter inside a CTxMempoolEntry.
It turns out, we can kill this circular reference by making use of iterator_to multiindex function and std::reference_wrapper. This allows us to get rid of the maplinks data structure and move the ownership of the parents/child sets to the entries themselves.
The benefit of this good all around, for any of the reasons given below the change would be acceptable, and it doesn't make the code harder to reason about or worse in any respect (as far as I can tell, there's no tradeoff).
### Simpler ownership model
No longer having to consistency check that mapLinks did have records for our CTxMempoolEntry, impossible to have a mapLinks entry outlive or incorrectly die before a CTxMempoolEntry.
### Memory Usage
We get rid of a O(Transactions) sized map in the mempool, which is a long lived data structure.
### Performance
If you have a CTxMemPoolEntry, you immediately know the address of it's children/parents, rather than having to do a O(log(Transactions)) lookup via maplinks (which we do very often). We do it in *so many* places that a true benchmark has to look at a full running node, but it is easy enough to show an improvement in this case.
The ComplexMemPool shows a good coherence check that we see the expected result of it being 12.5% faster / 1.14x faster.
```
Before:
# Benchmark, evals, iterations, total, min, max, median
ComplexMemPool, 5, 1, 1.40462, 0.277222, 0.285339, 0.279793
After:
# Benchmark, evals, iterations, total, min, max, median
ComplexMemPool, 5, 1, 1.22586, 0.243831, 0.247076, 0.244596
```
The ComplexMemPool benchmark only checks doing addUnchecked and TrimToSize for 800 transactions. While this bench does a good job of hammering the relevant types of function, it doesn't test everything.
Subbing in 5000 transactions shows a that the advantage isn't completely wiped out by other asymptotic factors (this isn't the only bottleneck in growing the mempool), but it's only a bit proportionally slower (10.8%, 1.12x), which adds evidence that this will be a good change for performance minded users.
```
# Benchmark, evals, iterations, total, min, max, median
ComplexMemPool, 5, 1, 59.1321, 11.5919, 12.235, 11.7068
# Benchmark, evals, iterations, total, min, max, median
ComplexMemPool, 5, 1, 52.1307, 10.2641, 10.5206, 10.4306
```
I don't think it's possible to come up with an example of where a maplinks based design would have better performance, but it's something for reviewers to consider.
# Discussion
## Why maplinks in the first place?
I spoke with the author of mapLinks (sdaftuar) a while back, and my recollection from our conversation was that it was implemented because he did not know how to resolve the circular dependency at the time, and there was no other reason for making it a separate map.
## Is iterator_to weird?
iterator_to is expressly for this purpose, see https://www.boost.org/doc/libs/1_51_0/libs/multi_index/doc/tutorial/indices.html#iterator_to
> iterator_to provides a way to retrieve an iterator to an element from a pointer to the element, thus making iterators and pointers interchangeable for the purposes of element pointing (not so for traversal) in many situations. This notwithstanding, it is not the aim of iterator_to to promote the usage of pointers as substitutes for real iterators: the latter are specifically designed for handling the elements of a container, and not only benefit from the iterator orientation of container interfaces, but are also capable of exposing many more programming bugs than raw pointers, both at compile and run time. iterator_to is thus meant to be used in scenarios where access via iterators is not suitable or desireable:
>
> - Interoperability with preexisting APIs based on pointers or references.
> - Publication of pointer-based interfaces (for instance, when designing a C-compatible library).
> - The exposure of pointers in place of iterators can act as a type erasure barrier effectively decoupling the user of the code from the implementation detail of which particular container is being used. Similar techniques, like the famous Pimpl idiom, are used in large projects to reduce dependencies and build times.
> - Self-referencing contexts where an element acts upon its owner container and no iterator to itself is available.
In other words, iterator_to is the perfect tool for the job by the last reason given. Under the hood it should just be a simple pointer cast and have no major runtime overhead (depending on if the function call is inlined).
Edit by laanwj: removed at sign from the description
ACKs for top commit:
jonatack:
re-ACK 296be8f per `git range-diff ab338a19 3ba1665 296be8f`, sanity check gcc 10.2 debug build is clean.
hebasto:
re-ACK 296be8f58e02b39a58f017c52294aceed22c3ffd, only rebased since my [previous](https://github.com/bitcoin/bitcoin/pull/19478#pullrequestreview-482400727) review (verified with `git range-diff`).
Tree-SHA512: f5c30a4936fcde6ae32a02823c303b3568a747c2681d11f87df88a149f984a6d3b4c81f391859afbeb68864ef7f6a3d8779f74a58e3de701b3d51f78e498682e
2020-09-07 12:06:38 +02:00
CTxMemPoolEntry : : Parents setParentCheck ;
2019-07-05 09:06:28 +02:00
for ( const CTxIn & txin : tx . vin ) {
2013-08-27 07:51:57 +02:00
// Check that every mempool transaction's inputs refer to available coins, or other mempool tx's.
2015-06-24 10:32:20 +02:00
indexed_transaction_set : : const_iterator it2 = mapTx . find ( txin . prevout . hash ) ;
2013-08-27 07:51:57 +02:00
if ( it2 ! = mapTx . end ( ) ) {
2015-06-24 10:32:20 +02:00
const CTransaction & tx2 = it2 - > GetTx ( ) ;
2013-11-11 08:35:14 +01:00
assert ( tx2 . vout . size ( ) > txin . prevout . n & & ! tx2 . vout [ txin . prevout . n ] . IsNull ( ) ) ;
2014-11-12 06:06:15 +01:00
fDependsWait = true ;
Merge #19478: Remove CTxMempool::mapLinks data structure member
296be8f58e02b39a58f017c52294aceed22c3ffd Get rid of unused functions CTxMemPool::GetMemPoolChildren, CTxMemPool::GetMemPoolParents (Jeremy Rubin)
46d955d196043cc297834baeebce31ff778dff80 Remove mapLinks in favor of entry inlined structs with iterator type erasure (Jeremy Rubin)
Pull request description:
Currently we have a peculiar data structure in the mempool called maplinks. Maplinks job is to track the in-pool children and parents of each transaction. This PR can be primarily understood and reviewed as a simple refactoring to remove this extra data structure, although it comes with a nice memory and performance improvement for free.
Maplinks is particularly peculiar because removing it is not as simple as just moving it's inner structure to the owning CTxMempoolEntry. Because TxLinks (the class storing the setEntries for parents and children) store txiters to each entry in the mempool corresponding to the parent or child, it means that the TxLinks type is "aware" of the boost multiindex (mapTx) it's coming from, which is in turn, aware of the entry type stored in mapTx. Thus we used maplinks to store this entry associated data we in an entirely separate data structure just to avoid a circular type reference caused by storing a txiter inside a CTxMempoolEntry.
It turns out, we can kill this circular reference by making use of iterator_to multiindex function and std::reference_wrapper. This allows us to get rid of the maplinks data structure and move the ownership of the parents/child sets to the entries themselves.
The benefit of this good all around, for any of the reasons given below the change would be acceptable, and it doesn't make the code harder to reason about or worse in any respect (as far as I can tell, there's no tradeoff).
### Simpler ownership model
No longer having to consistency check that mapLinks did have records for our CTxMempoolEntry, impossible to have a mapLinks entry outlive or incorrectly die before a CTxMempoolEntry.
### Memory Usage
We get rid of a O(Transactions) sized map in the mempool, which is a long lived data structure.
### Performance
If you have a CTxMemPoolEntry, you immediately know the address of it's children/parents, rather than having to do a O(log(Transactions)) lookup via maplinks (which we do very often). We do it in *so many* places that a true benchmark has to look at a full running node, but it is easy enough to show an improvement in this case.
The ComplexMemPool shows a good coherence check that we see the expected result of it being 12.5% faster / 1.14x faster.
```
Before:
# Benchmark, evals, iterations, total, min, max, median
ComplexMemPool, 5, 1, 1.40462, 0.277222, 0.285339, 0.279793
After:
# Benchmark, evals, iterations, total, min, max, median
ComplexMemPool, 5, 1, 1.22586, 0.243831, 0.247076, 0.244596
```
The ComplexMemPool benchmark only checks doing addUnchecked and TrimToSize for 800 transactions. While this bench does a good job of hammering the relevant types of function, it doesn't test everything.
Subbing in 5000 transactions shows a that the advantage isn't completely wiped out by other asymptotic factors (this isn't the only bottleneck in growing the mempool), but it's only a bit proportionally slower (10.8%, 1.12x), which adds evidence that this will be a good change for performance minded users.
```
# Benchmark, evals, iterations, total, min, max, median
ComplexMemPool, 5, 1, 59.1321, 11.5919, 12.235, 11.7068
# Benchmark, evals, iterations, total, min, max, median
ComplexMemPool, 5, 1, 52.1307, 10.2641, 10.5206, 10.4306
```
I don't think it's possible to come up with an example of where a maplinks based design would have better performance, but it's something for reviewers to consider.
# Discussion
## Why maplinks in the first place?
I spoke with the author of mapLinks (sdaftuar) a while back, and my recollection from our conversation was that it was implemented because he did not know how to resolve the circular dependency at the time, and there was no other reason for making it a separate map.
## Is iterator_to weird?
iterator_to is expressly for this purpose, see https://www.boost.org/doc/libs/1_51_0/libs/multi_index/doc/tutorial/indices.html#iterator_to
> iterator_to provides a way to retrieve an iterator to an element from a pointer to the element, thus making iterators and pointers interchangeable for the purposes of element pointing (not so for traversal) in many situations. This notwithstanding, it is not the aim of iterator_to to promote the usage of pointers as substitutes for real iterators: the latter are specifically designed for handling the elements of a container, and not only benefit from the iterator orientation of container interfaces, but are also capable of exposing many more programming bugs than raw pointers, both at compile and run time. iterator_to is thus meant to be used in scenarios where access via iterators is not suitable or desireable:
>
> - Interoperability with preexisting APIs based on pointers or references.
> - Publication of pointer-based interfaces (for instance, when designing a C-compatible library).
> - The exposure of pointers in place of iterators can act as a type erasure barrier effectively decoupling the user of the code from the implementation detail of which particular container is being used. Similar techniques, like the famous Pimpl idiom, are used in large projects to reduce dependencies and build times.
> - Self-referencing contexts where an element acts upon its owner container and no iterator to itself is available.
In other words, iterator_to is the perfect tool for the job by the last reason given. Under the hood it should just be a simple pointer cast and have no major runtime overhead (depending on if the function call is inlined).
Edit by laanwj: removed at sign from the description
ACKs for top commit:
jonatack:
re-ACK 296be8f per `git range-diff ab338a19 3ba1665 296be8f`, sanity check gcc 10.2 debug build is clean.
hebasto:
re-ACK 296be8f58e02b39a58f017c52294aceed22c3ffd, only rebased since my [previous](https://github.com/bitcoin/bitcoin/pull/19478#pullrequestreview-482400727) review (verified with `git range-diff`).
Tree-SHA512: f5c30a4936fcde6ae32a02823c303b3568a747c2681d11f87df88a149f984a6d3b4c81f391859afbeb68864ef7f6a3d8779f74a58e3de701b3d51f78e498682e
2020-09-07 12:06:38 +02:00
setParentCheck . insert ( * it2 ) ;
2013-08-27 07:51:57 +02:00
} else {
2023-03-13 16:37:00 +01:00
assert ( active_coins_tip . HaveCoin ( txin . prevout ) ) ;
2013-08-27 07:51:57 +02:00
}
// Check whether its inputs are marked in mapNextTx.
2016-06-03 01:11:02 +02:00
auto it3 = mapNextTx . find ( txin . prevout ) ;
2013-08-27 07:51:57 +02:00
assert ( it3 ! = mapNextTx . end ( ) ) ;
2016-06-03 01:11:02 +02:00
assert ( it3 - > first = = & txin . prevout ) ;
assert ( it3 - > second = = & tx ) ;
2013-08-27 07:51:57 +02:00
i + + ;
}
Merge #19478: Remove CTxMempool::mapLinks data structure member
296be8f58e02b39a58f017c52294aceed22c3ffd Get rid of unused functions CTxMemPool::GetMemPoolChildren, CTxMemPool::GetMemPoolParents (Jeremy Rubin)
46d955d196043cc297834baeebce31ff778dff80 Remove mapLinks in favor of entry inlined structs with iterator type erasure (Jeremy Rubin)
Pull request description:
Currently we have a peculiar data structure in the mempool called maplinks. Maplinks job is to track the in-pool children and parents of each transaction. This PR can be primarily understood and reviewed as a simple refactoring to remove this extra data structure, although it comes with a nice memory and performance improvement for free.
Maplinks is particularly peculiar because removing it is not as simple as just moving it's inner structure to the owning CTxMempoolEntry. Because TxLinks (the class storing the setEntries for parents and children) store txiters to each entry in the mempool corresponding to the parent or child, it means that the TxLinks type is "aware" of the boost multiindex (mapTx) it's coming from, which is in turn, aware of the entry type stored in mapTx. Thus we used maplinks to store this entry associated data we in an entirely separate data structure just to avoid a circular type reference caused by storing a txiter inside a CTxMempoolEntry.
It turns out, we can kill this circular reference by making use of iterator_to multiindex function and std::reference_wrapper. This allows us to get rid of the maplinks data structure and move the ownership of the parents/child sets to the entries themselves.
The benefit of this good all around, for any of the reasons given below the change would be acceptable, and it doesn't make the code harder to reason about or worse in any respect (as far as I can tell, there's no tradeoff).
### Simpler ownership model
No longer having to consistency check that mapLinks did have records for our CTxMempoolEntry, impossible to have a mapLinks entry outlive or incorrectly die before a CTxMempoolEntry.
### Memory Usage
We get rid of a O(Transactions) sized map in the mempool, which is a long lived data structure.
### Performance
If you have a CTxMemPoolEntry, you immediately know the address of it's children/parents, rather than having to do a O(log(Transactions)) lookup via maplinks (which we do very often). We do it in *so many* places that a true benchmark has to look at a full running node, but it is easy enough to show an improvement in this case.
The ComplexMemPool shows a good coherence check that we see the expected result of it being 12.5% faster / 1.14x faster.
```
Before:
# Benchmark, evals, iterations, total, min, max, median
ComplexMemPool, 5, 1, 1.40462, 0.277222, 0.285339, 0.279793
After:
# Benchmark, evals, iterations, total, min, max, median
ComplexMemPool, 5, 1, 1.22586, 0.243831, 0.247076, 0.244596
```
The ComplexMemPool benchmark only checks doing addUnchecked and TrimToSize for 800 transactions. While this bench does a good job of hammering the relevant types of function, it doesn't test everything.
Subbing in 5000 transactions shows a that the advantage isn't completely wiped out by other asymptotic factors (this isn't the only bottleneck in growing the mempool), but it's only a bit proportionally slower (10.8%, 1.12x), which adds evidence that this will be a good change for performance minded users.
```
# Benchmark, evals, iterations, total, min, max, median
ComplexMemPool, 5, 1, 59.1321, 11.5919, 12.235, 11.7068
# Benchmark, evals, iterations, total, min, max, median
ComplexMemPool, 5, 1, 52.1307, 10.2641, 10.5206, 10.4306
```
I don't think it's possible to come up with an example of where a maplinks based design would have better performance, but it's something for reviewers to consider.
# Discussion
## Why maplinks in the first place?
I spoke with the author of mapLinks (sdaftuar) a while back, and my recollection from our conversation was that it was implemented because he did not know how to resolve the circular dependency at the time, and there was no other reason for making it a separate map.
## Is iterator_to weird?
iterator_to is expressly for this purpose, see https://www.boost.org/doc/libs/1_51_0/libs/multi_index/doc/tutorial/indices.html#iterator_to
> iterator_to provides a way to retrieve an iterator to an element from a pointer to the element, thus making iterators and pointers interchangeable for the purposes of element pointing (not so for traversal) in many situations. This notwithstanding, it is not the aim of iterator_to to promote the usage of pointers as substitutes for real iterators: the latter are specifically designed for handling the elements of a container, and not only benefit from the iterator orientation of container interfaces, but are also capable of exposing many more programming bugs than raw pointers, both at compile and run time. iterator_to is thus meant to be used in scenarios where access via iterators is not suitable or desireable:
>
> - Interoperability with preexisting APIs based on pointers or references.
> - Publication of pointer-based interfaces (for instance, when designing a C-compatible library).
> - The exposure of pointers in place of iterators can act as a type erasure barrier effectively decoupling the user of the code from the implementation detail of which particular container is being used. Similar techniques, like the famous Pimpl idiom, are used in large projects to reduce dependencies and build times.
> - Self-referencing contexts where an element acts upon its owner container and no iterator to itself is available.
In other words, iterator_to is the perfect tool for the job by the last reason given. Under the hood it should just be a simple pointer cast and have no major runtime overhead (depending on if the function call is inlined).
Edit by laanwj: removed at sign from the description
ACKs for top commit:
jonatack:
re-ACK 296be8f per `git range-diff ab338a19 3ba1665 296be8f`, sanity check gcc 10.2 debug build is clean.
hebasto:
re-ACK 296be8f58e02b39a58f017c52294aceed22c3ffd, only rebased since my [previous](https://github.com/bitcoin/bitcoin/pull/19478#pullrequestreview-482400727) review (verified with `git range-diff`).
Tree-SHA512: f5c30a4936fcde6ae32a02823c303b3568a747c2681d11f87df88a149f984a6d3b4c81f391859afbeb68864ef7f6a3d8779f74a58e3de701b3d51f78e498682e
2020-09-07 12:06:38 +02:00
auto comp = [ ] ( const CTxMemPoolEntry & a , const CTxMemPoolEntry & b ) - > bool {
return a . GetTx ( ) . GetHash ( ) = = b . GetTx ( ) . GetHash ( ) ;
} ;
assert ( setParentCheck . size ( ) = = it - > GetMemPoolParentsConst ( ) . size ( ) ) ;
assert ( std : : equal ( setParentCheck . begin ( ) , setParentCheck . end ( ) , it - > GetMemPoolParentsConst ( ) . begin ( ) , comp ) ) ;
2016-03-17 13:33:31 +01:00
// Verify ancestor state is correct.
setEntries setAncestors ;
uint64_t nNoLimit = std : : numeric_limits < uint64_t > : : max ( ) ;
std : : string dummy ;
CalculateMemPoolAncestors ( * it , setAncestors , nNoLimit , nNoLimit , nNoLimit , nNoLimit , dummy ) ;
uint64_t nCountCheck = setAncestors . size ( ) + 1 ;
uint64_t nSizeCheck = it - > GetTxSize ( ) ;
CAmount nFeesCheck = it - > GetModifiedFee ( ) ;
unsigned int nSigOpCheck = it - > GetSigOpCount ( ) ;
2019-07-05 09:06:28 +02:00
for ( txiter ancestorIt : setAncestors ) {
2016-03-17 13:33:31 +01:00
nSizeCheck + = ancestorIt - > GetTxSize ( ) ;
nFeesCheck + = ancestorIt - > GetModifiedFee ( ) ;
nSigOpCheck + = ancestorIt - > GetSigOpCount ( ) ;
}
assert ( it - > GetCountWithAncestors ( ) = = nCountCheck ) ;
assert ( it - > GetSizeWithAncestors ( ) = = nSizeCheck ) ;
assert ( it - > GetSigOpCountWithAncestors ( ) = = nSigOpCheck ) ;
assert ( it - > GetModFeesWithAncestors ( ) = = nFeesCheck ) ;
2015-07-15 20:47:45 +02:00
// Check children against mapNextTx
Merge #19478: Remove CTxMempool::mapLinks data structure member
296be8f58e02b39a58f017c52294aceed22c3ffd Get rid of unused functions CTxMemPool::GetMemPoolChildren, CTxMemPool::GetMemPoolParents (Jeremy Rubin)
46d955d196043cc297834baeebce31ff778dff80 Remove mapLinks in favor of entry inlined structs with iterator type erasure (Jeremy Rubin)
Pull request description:
Currently we have a peculiar data structure in the mempool called maplinks. Maplinks job is to track the in-pool children and parents of each transaction. This PR can be primarily understood and reviewed as a simple refactoring to remove this extra data structure, although it comes with a nice memory and performance improvement for free.
Maplinks is particularly peculiar because removing it is not as simple as just moving it's inner structure to the owning CTxMempoolEntry. Because TxLinks (the class storing the setEntries for parents and children) store txiters to each entry in the mempool corresponding to the parent or child, it means that the TxLinks type is "aware" of the boost multiindex (mapTx) it's coming from, which is in turn, aware of the entry type stored in mapTx. Thus we used maplinks to store this entry associated data we in an entirely separate data structure just to avoid a circular type reference caused by storing a txiter inside a CTxMempoolEntry.
It turns out, we can kill this circular reference by making use of iterator_to multiindex function and std::reference_wrapper. This allows us to get rid of the maplinks data structure and move the ownership of the parents/child sets to the entries themselves.
The benefit of this good all around, for any of the reasons given below the change would be acceptable, and it doesn't make the code harder to reason about or worse in any respect (as far as I can tell, there's no tradeoff).
### Simpler ownership model
No longer having to consistency check that mapLinks did have records for our CTxMempoolEntry, impossible to have a mapLinks entry outlive or incorrectly die before a CTxMempoolEntry.
### Memory Usage
We get rid of a O(Transactions) sized map in the mempool, which is a long lived data structure.
### Performance
If you have a CTxMemPoolEntry, you immediately know the address of it's children/parents, rather than having to do a O(log(Transactions)) lookup via maplinks (which we do very often). We do it in *so many* places that a true benchmark has to look at a full running node, but it is easy enough to show an improvement in this case.
The ComplexMemPool shows a good coherence check that we see the expected result of it being 12.5% faster / 1.14x faster.
```
Before:
# Benchmark, evals, iterations, total, min, max, median
ComplexMemPool, 5, 1, 1.40462, 0.277222, 0.285339, 0.279793
After:
# Benchmark, evals, iterations, total, min, max, median
ComplexMemPool, 5, 1, 1.22586, 0.243831, 0.247076, 0.244596
```
The ComplexMemPool benchmark only checks doing addUnchecked and TrimToSize for 800 transactions. While this bench does a good job of hammering the relevant types of function, it doesn't test everything.
Subbing in 5000 transactions shows a that the advantage isn't completely wiped out by other asymptotic factors (this isn't the only bottleneck in growing the mempool), but it's only a bit proportionally slower (10.8%, 1.12x), which adds evidence that this will be a good change for performance minded users.
```
# Benchmark, evals, iterations, total, min, max, median
ComplexMemPool, 5, 1, 59.1321, 11.5919, 12.235, 11.7068
# Benchmark, evals, iterations, total, min, max, median
ComplexMemPool, 5, 1, 52.1307, 10.2641, 10.5206, 10.4306
```
I don't think it's possible to come up with an example of where a maplinks based design would have better performance, but it's something for reviewers to consider.
# Discussion
## Why maplinks in the first place?
I spoke with the author of mapLinks (sdaftuar) a while back, and my recollection from our conversation was that it was implemented because he did not know how to resolve the circular dependency at the time, and there was no other reason for making it a separate map.
## Is iterator_to weird?
iterator_to is expressly for this purpose, see https://www.boost.org/doc/libs/1_51_0/libs/multi_index/doc/tutorial/indices.html#iterator_to
> iterator_to provides a way to retrieve an iterator to an element from a pointer to the element, thus making iterators and pointers interchangeable for the purposes of element pointing (not so for traversal) in many situations. This notwithstanding, it is not the aim of iterator_to to promote the usage of pointers as substitutes for real iterators: the latter are specifically designed for handling the elements of a container, and not only benefit from the iterator orientation of container interfaces, but are also capable of exposing many more programming bugs than raw pointers, both at compile and run time. iterator_to is thus meant to be used in scenarios where access via iterators is not suitable or desireable:
>
> - Interoperability with preexisting APIs based on pointers or references.
> - Publication of pointer-based interfaces (for instance, when designing a C-compatible library).
> - The exposure of pointers in place of iterators can act as a type erasure barrier effectively decoupling the user of the code from the implementation detail of which particular container is being used. Similar techniques, like the famous Pimpl idiom, are used in large projects to reduce dependencies and build times.
> - Self-referencing contexts where an element acts upon its owner container and no iterator to itself is available.
In other words, iterator_to is the perfect tool for the job by the last reason given. Under the hood it should just be a simple pointer cast and have no major runtime overhead (depending on if the function call is inlined).
Edit by laanwj: removed at sign from the description
ACKs for top commit:
jonatack:
re-ACK 296be8f per `git range-diff ab338a19 3ba1665 296be8f`, sanity check gcc 10.2 debug build is clean.
hebasto:
re-ACK 296be8f58e02b39a58f017c52294aceed22c3ffd, only rebased since my [previous](https://github.com/bitcoin/bitcoin/pull/19478#pullrequestreview-482400727) review (verified with `git range-diff`).
Tree-SHA512: f5c30a4936fcde6ae32a02823c303b3568a747c2681d11f87df88a149f984a6d3b4c81f391859afbeb68864ef7f6a3d8779f74a58e3de701b3d51f78e498682e
2020-09-07 12:06:38 +02:00
CTxMemPoolEntry : : Children setChildrenCheck ;
2016-06-03 01:11:02 +02:00
auto iter = mapNextTx . lower_bound ( COutPoint ( it - > GetTx ( ) . GetHash ( ) , 0 ) ) ;
2018-06-11 15:05:49 +02:00
uint64_t child_sizes = 0 ;
2016-06-03 01:11:02 +02:00
for ( ; iter ! = mapNextTx . end ( ) & & iter - > first - > hash = = it - > GetTx ( ) . GetHash ( ) ; + + iter ) {
txiter childit = mapTx . find ( iter - > second - > GetHash ( ) ) ;
2015-07-15 20:47:45 +02:00
assert ( childit ! = mapTx . end ( ) ) ; // mapNextTx points to in-mempool transactions
Merge #19478: Remove CTxMempool::mapLinks data structure member
296be8f58e02b39a58f017c52294aceed22c3ffd Get rid of unused functions CTxMemPool::GetMemPoolChildren, CTxMemPool::GetMemPoolParents (Jeremy Rubin)
46d955d196043cc297834baeebce31ff778dff80 Remove mapLinks in favor of entry inlined structs with iterator type erasure (Jeremy Rubin)
Pull request description:
Currently we have a peculiar data structure in the mempool called maplinks. Maplinks job is to track the in-pool children and parents of each transaction. This PR can be primarily understood and reviewed as a simple refactoring to remove this extra data structure, although it comes with a nice memory and performance improvement for free.
Maplinks is particularly peculiar because removing it is not as simple as just moving it's inner structure to the owning CTxMempoolEntry. Because TxLinks (the class storing the setEntries for parents and children) store txiters to each entry in the mempool corresponding to the parent or child, it means that the TxLinks type is "aware" of the boost multiindex (mapTx) it's coming from, which is in turn, aware of the entry type stored in mapTx. Thus we used maplinks to store this entry associated data we in an entirely separate data structure just to avoid a circular type reference caused by storing a txiter inside a CTxMempoolEntry.
It turns out, we can kill this circular reference by making use of iterator_to multiindex function and std::reference_wrapper. This allows us to get rid of the maplinks data structure and move the ownership of the parents/child sets to the entries themselves.
The benefit of this good all around, for any of the reasons given below the change would be acceptable, and it doesn't make the code harder to reason about or worse in any respect (as far as I can tell, there's no tradeoff).
### Simpler ownership model
No longer having to consistency check that mapLinks did have records for our CTxMempoolEntry, impossible to have a mapLinks entry outlive or incorrectly die before a CTxMempoolEntry.
### Memory Usage
We get rid of a O(Transactions) sized map in the mempool, which is a long lived data structure.
### Performance
If you have a CTxMemPoolEntry, you immediately know the address of it's children/parents, rather than having to do a O(log(Transactions)) lookup via maplinks (which we do very often). We do it in *so many* places that a true benchmark has to look at a full running node, but it is easy enough to show an improvement in this case.
The ComplexMemPool shows a good coherence check that we see the expected result of it being 12.5% faster / 1.14x faster.
```
Before:
# Benchmark, evals, iterations, total, min, max, median
ComplexMemPool, 5, 1, 1.40462, 0.277222, 0.285339, 0.279793
After:
# Benchmark, evals, iterations, total, min, max, median
ComplexMemPool, 5, 1, 1.22586, 0.243831, 0.247076, 0.244596
```
The ComplexMemPool benchmark only checks doing addUnchecked and TrimToSize for 800 transactions. While this bench does a good job of hammering the relevant types of function, it doesn't test everything.
Subbing in 5000 transactions shows a that the advantage isn't completely wiped out by other asymptotic factors (this isn't the only bottleneck in growing the mempool), but it's only a bit proportionally slower (10.8%, 1.12x), which adds evidence that this will be a good change for performance minded users.
```
# Benchmark, evals, iterations, total, min, max, median
ComplexMemPool, 5, 1, 59.1321, 11.5919, 12.235, 11.7068
# Benchmark, evals, iterations, total, min, max, median
ComplexMemPool, 5, 1, 52.1307, 10.2641, 10.5206, 10.4306
```
I don't think it's possible to come up with an example of where a maplinks based design would have better performance, but it's something for reviewers to consider.
# Discussion
## Why maplinks in the first place?
I spoke with the author of mapLinks (sdaftuar) a while back, and my recollection from our conversation was that it was implemented because he did not know how to resolve the circular dependency at the time, and there was no other reason for making it a separate map.
## Is iterator_to weird?
iterator_to is expressly for this purpose, see https://www.boost.org/doc/libs/1_51_0/libs/multi_index/doc/tutorial/indices.html#iterator_to
> iterator_to provides a way to retrieve an iterator to an element from a pointer to the element, thus making iterators and pointers interchangeable for the purposes of element pointing (not so for traversal) in many situations. This notwithstanding, it is not the aim of iterator_to to promote the usage of pointers as substitutes for real iterators: the latter are specifically designed for handling the elements of a container, and not only benefit from the iterator orientation of container interfaces, but are also capable of exposing many more programming bugs than raw pointers, both at compile and run time. iterator_to is thus meant to be used in scenarios where access via iterators is not suitable or desireable:
>
> - Interoperability with preexisting APIs based on pointers or references.
> - Publication of pointer-based interfaces (for instance, when designing a C-compatible library).
> - The exposure of pointers in place of iterators can act as a type erasure barrier effectively decoupling the user of the code from the implementation detail of which particular container is being used. Similar techniques, like the famous Pimpl idiom, are used in large projects to reduce dependencies and build times.
> - Self-referencing contexts where an element acts upon its owner container and no iterator to itself is available.
In other words, iterator_to is the perfect tool for the job by the last reason given. Under the hood it should just be a simple pointer cast and have no major runtime overhead (depending on if the function call is inlined).
Edit by laanwj: removed at sign from the description
ACKs for top commit:
jonatack:
re-ACK 296be8f per `git range-diff ab338a19 3ba1665 296be8f`, sanity check gcc 10.2 debug build is clean.
hebasto:
re-ACK 296be8f58e02b39a58f017c52294aceed22c3ffd, only rebased since my [previous](https://github.com/bitcoin/bitcoin/pull/19478#pullrequestreview-482400727) review (verified with `git range-diff`).
Tree-SHA512: f5c30a4936fcde6ae32a02823c303b3568a747c2681d11f87df88a149f984a6d3b4c81f391859afbeb68864ef7f6a3d8779f74a58e3de701b3d51f78e498682e
2020-09-07 12:06:38 +02:00
if ( setChildrenCheck . insert ( * childit ) . second ) {
2018-06-11 15:05:49 +02:00
child_sizes + = childit - > GetTxSize ( ) ;
2015-07-15 20:47:45 +02:00
}
}
Merge #19478: Remove CTxMempool::mapLinks data structure member
296be8f58e02b39a58f017c52294aceed22c3ffd Get rid of unused functions CTxMemPool::GetMemPoolChildren, CTxMemPool::GetMemPoolParents (Jeremy Rubin)
46d955d196043cc297834baeebce31ff778dff80 Remove mapLinks in favor of entry inlined structs with iterator type erasure (Jeremy Rubin)
Pull request description:
Currently we have a peculiar data structure in the mempool called maplinks. Maplinks job is to track the in-pool children and parents of each transaction. This PR can be primarily understood and reviewed as a simple refactoring to remove this extra data structure, although it comes with a nice memory and performance improvement for free.
Maplinks is particularly peculiar because removing it is not as simple as just moving it's inner structure to the owning CTxMempoolEntry. Because TxLinks (the class storing the setEntries for parents and children) store txiters to each entry in the mempool corresponding to the parent or child, it means that the TxLinks type is "aware" of the boost multiindex (mapTx) it's coming from, which is in turn, aware of the entry type stored in mapTx. Thus we used maplinks to store this entry associated data we in an entirely separate data structure just to avoid a circular type reference caused by storing a txiter inside a CTxMempoolEntry.
It turns out, we can kill this circular reference by making use of iterator_to multiindex function and std::reference_wrapper. This allows us to get rid of the maplinks data structure and move the ownership of the parents/child sets to the entries themselves.
The benefit of this good all around, for any of the reasons given below the change would be acceptable, and it doesn't make the code harder to reason about or worse in any respect (as far as I can tell, there's no tradeoff).
### Simpler ownership model
No longer having to consistency check that mapLinks did have records for our CTxMempoolEntry, impossible to have a mapLinks entry outlive or incorrectly die before a CTxMempoolEntry.
### Memory Usage
We get rid of a O(Transactions) sized map in the mempool, which is a long lived data structure.
### Performance
If you have a CTxMemPoolEntry, you immediately know the address of it's children/parents, rather than having to do a O(log(Transactions)) lookup via maplinks (which we do very often). We do it in *so many* places that a true benchmark has to look at a full running node, but it is easy enough to show an improvement in this case.
The ComplexMemPool shows a good coherence check that we see the expected result of it being 12.5% faster / 1.14x faster.
```
Before:
# Benchmark, evals, iterations, total, min, max, median
ComplexMemPool, 5, 1, 1.40462, 0.277222, 0.285339, 0.279793
After:
# Benchmark, evals, iterations, total, min, max, median
ComplexMemPool, 5, 1, 1.22586, 0.243831, 0.247076, 0.244596
```
The ComplexMemPool benchmark only checks doing addUnchecked and TrimToSize for 800 transactions. While this bench does a good job of hammering the relevant types of function, it doesn't test everything.
Subbing in 5000 transactions shows a that the advantage isn't completely wiped out by other asymptotic factors (this isn't the only bottleneck in growing the mempool), but it's only a bit proportionally slower (10.8%, 1.12x), which adds evidence that this will be a good change for performance minded users.
```
# Benchmark, evals, iterations, total, min, max, median
ComplexMemPool, 5, 1, 59.1321, 11.5919, 12.235, 11.7068
# Benchmark, evals, iterations, total, min, max, median
ComplexMemPool, 5, 1, 52.1307, 10.2641, 10.5206, 10.4306
```
I don't think it's possible to come up with an example of where a maplinks based design would have better performance, but it's something for reviewers to consider.
# Discussion
## Why maplinks in the first place?
I spoke with the author of mapLinks (sdaftuar) a while back, and my recollection from our conversation was that it was implemented because he did not know how to resolve the circular dependency at the time, and there was no other reason for making it a separate map.
## Is iterator_to weird?
iterator_to is expressly for this purpose, see https://www.boost.org/doc/libs/1_51_0/libs/multi_index/doc/tutorial/indices.html#iterator_to
> iterator_to provides a way to retrieve an iterator to an element from a pointer to the element, thus making iterators and pointers interchangeable for the purposes of element pointing (not so for traversal) in many situations. This notwithstanding, it is not the aim of iterator_to to promote the usage of pointers as substitutes for real iterators: the latter are specifically designed for handling the elements of a container, and not only benefit from the iterator orientation of container interfaces, but are also capable of exposing many more programming bugs than raw pointers, both at compile and run time. iterator_to is thus meant to be used in scenarios where access via iterators is not suitable or desireable:
>
> - Interoperability with preexisting APIs based on pointers or references.
> - Publication of pointer-based interfaces (for instance, when designing a C-compatible library).
> - The exposure of pointers in place of iterators can act as a type erasure barrier effectively decoupling the user of the code from the implementation detail of which particular container is being used. Similar techniques, like the famous Pimpl idiom, are used in large projects to reduce dependencies and build times.
> - Self-referencing contexts where an element acts upon its owner container and no iterator to itself is available.
In other words, iterator_to is the perfect tool for the job by the last reason given. Under the hood it should just be a simple pointer cast and have no major runtime overhead (depending on if the function call is inlined).
Edit by laanwj: removed at sign from the description
ACKs for top commit:
jonatack:
re-ACK 296be8f per `git range-diff ab338a19 3ba1665 296be8f`, sanity check gcc 10.2 debug build is clean.
hebasto:
re-ACK 296be8f58e02b39a58f017c52294aceed22c3ffd, only rebased since my [previous](https://github.com/bitcoin/bitcoin/pull/19478#pullrequestreview-482400727) review (verified with `git range-diff`).
Tree-SHA512: f5c30a4936fcde6ae32a02823c303b3568a747c2681d11f87df88a149f984a6d3b4c81f391859afbeb68864ef7f6a3d8779f74a58e3de701b3d51f78e498682e
2020-09-07 12:06:38 +02:00
assert ( setChildrenCheck . size ( ) = = it - > GetMemPoolChildrenConst ( ) . size ( ) ) ;
assert ( std : : equal ( setChildrenCheck . begin ( ) , setChildrenCheck . end ( ) , it - > GetMemPoolChildrenConst ( ) . begin ( ) , comp ) ) ;
2015-11-19 17:18:28 +01:00
// Also check to make sure size is greater than sum with immediate children.
2015-07-15 20:47:45 +02:00
// just a sanity check, not definitive that this calc is correct...
2018-06-11 15:05:49 +02:00
assert ( it - > GetSizeWithDescendants ( ) > = child_sizes + it - > GetTxSize ( ) ) ;
2015-07-15 20:47:45 +02:00
2014-11-12 06:06:15 +01:00
if ( fDependsWait )
2015-06-24 10:32:20 +02:00
waitingOnDependants . push_back ( & ( * it ) ) ;
2014-11-12 06:06:15 +01:00
else {
2017-10-11 10:42:55 +02:00
CheckInputsAndUpdateCoins ( tx , mempoolDuplicate , spendheight ) ;
2014-11-12 06:06:15 +01:00
}
}
unsigned int stepsSinceLastRemove = 0 ;
while ( ! waitingOnDependants . empty ( ) ) {
const CTxMemPoolEntry * entry = waitingOnDependants . front ( ) ;
waitingOnDependants . pop_front ( ) ;
if ( ! mempoolDuplicate . HaveInputs ( entry - > GetTx ( ) ) ) {
waitingOnDependants . push_back ( entry ) ;
stepsSinceLastRemove + + ;
assert ( stepsSinceLastRemove < waitingOnDependants . size ( ) ) ;
} else {
2017-10-11 10:42:55 +02:00
CheckInputsAndUpdateCoins ( entry - > GetTx ( ) , mempoolDuplicate , spendheight ) ;
2014-11-12 06:06:15 +01:00
stepsSinceLastRemove = 0 ;
}
2013-08-27 07:51:57 +02:00
}
2016-06-03 01:11:02 +02:00
for ( auto it = mapNextTx . cbegin ( ) ; it ! = mapNextTx . cend ( ) ; it + + ) {
uint256 hash = it - > second - > GetHash ( ) ;
2015-06-24 10:32:20 +02:00
indexed_transaction_set : : const_iterator it2 = mapTx . find ( hash ) ;
const CTransaction & tx = it2 - > GetTx ( ) ;
2013-08-27 07:51:57 +02:00
assert ( it2 ! = mapTx . end ( ) ) ;
2016-06-03 01:11:02 +02:00
assert ( & tx = = it - > second ) ;
2013-08-27 07:51:57 +02:00
}
2014-08-07 05:58:19 +02:00
assert ( totalTxSize = = checkTotal ) ;
2021-02-08 20:36:33 +01:00
assert ( m_total_fee = = check_total_fee ) ;
2015-07-09 19:56:31 +02:00
assert ( innerUsage = = cachedInnerUsage ) ;
2013-08-27 07:51:57 +02:00
}
2016-05-05 13:14:29 +02:00
bool CTxMemPool : : CompareDepthAndScore ( const uint256 & hasha , const uint256 & hashb )
{
2023-05-11 15:02:14 +02:00
/* Return `true` if hasha should be considered sooner than hashb. Namely when:
* a is not in the mempool , but b is
* both are in the mempool and a has fewer ancestors than b
* both are in the mempool and a has a higher score than b
*/
2016-05-05 13:14:29 +02:00
LOCK ( cs ) ;
indexed_transaction_set : : const_iterator j = mapTx . find ( hashb ) ;
2023-05-11 15:02:14 +02:00
if ( j = = mapTx . end ( ) ) return false ;
indexed_transaction_set : : const_iterator i = mapTx . find ( hasha ) ;
if ( i = = mapTx . end ( ) ) return true ;
2016-05-05 13:14:29 +02:00
uint64_t counta = i - > GetCountWithAncestors ( ) ;
uint64_t countb = j - > GetCountWithAncestors ( ) ;
if ( counta = = countb ) {
return CompareTxMemPoolEntryByScore ( ) ( * i , * j ) ;
}
return counta < countb ;
}
namespace {
class DepthAndScoreComparator
{
public :
2016-06-08 14:01:05 +02:00
bool operator ( ) ( const CTxMemPool : : indexed_transaction_set : : const_iterator & a , const CTxMemPool : : indexed_transaction_set : : const_iterator & b )
{
uint64_t counta = a - > GetCountWithAncestors ( ) ;
uint64_t countb = b - > GetCountWithAncestors ( ) ;
if ( counta = = countb ) {
return CompareTxMemPoolEntryByScore ( ) ( * a , * b ) ;
}
return counta < countb ;
}
2016-05-05 13:14:29 +02:00
} ;
2017-06-26 13:37:42 +02:00
} // namespace
2016-05-05 13:14:29 +02:00
2016-06-08 14:01:05 +02:00
std : : vector < CTxMemPool : : indexed_transaction_set : : const_iterator > CTxMemPool : : GetSortedDepthAndScore ( ) const
2013-08-27 07:51:57 +02:00
{
2016-06-08 14:01:05 +02:00
std : : vector < indexed_transaction_set : : const_iterator > iters ;
AssertLockHeld ( cs ) ;
iters . reserve ( mapTx . size ( ) ) ;
2013-08-27 07:51:57 +02:00
2016-06-08 14:01:05 +02:00
for ( indexed_transaction_set : : iterator mi = mapTx . begin ( ) ; mi ! = mapTx . end ( ) ; + + mi ) {
iters . push_back ( mi ) ;
}
std : : sort ( iters . begin ( ) , iters . end ( ) , DepthAndScoreComparator ( ) ) ;
return iters ;
}
2019-02-23 17:04:20 +01:00
void CTxMemPool : : queryHashes ( std : : vector < uint256 > & vtxid ) const
2016-06-08 14:01:05 +02:00
{
2013-08-27 07:51:57 +02:00
LOCK ( cs ) ;
2016-06-08 14:01:05 +02:00
auto iters = GetSortedDepthAndScore ( ) ;
vtxid . clear ( ) ;
2013-08-27 07:51:57 +02:00
vtxid . reserve ( mapTx . size ( ) ) ;
2016-05-05 13:14:29 +02:00
2016-06-08 14:01:05 +02:00
for ( auto it : iters ) {
vtxid . push_back ( it - > GetTx ( ) . GetHash ( ) ) ;
}
2013-08-27 07:51:57 +02:00
}
2016-11-02 11:12:48 +01:00
static TxMempoolInfo GetInfo ( CTxMemPool : : indexed_transaction_set : : const_iterator it ) {
2019-10-04 19:27:52 +02:00
return TxMempoolInfo { it - > GetSharedTx ( ) , it - > GetTime ( ) , it - > GetFee ( ) , it - > GetTxSize ( ) , it - > GetModifiedFee ( ) - it - > GetFee ( ) } ;
2016-11-02 11:12:48 +01:00
}
2016-06-08 14:01:05 +02:00
std : : vector < TxMempoolInfo > CTxMemPool : : infoAll ( ) const
2013-08-27 07:51:57 +02:00
{
LOCK ( cs ) ;
2016-06-08 14:01:05 +02:00
auto iters = GetSortedDepthAndScore ( ) ;
std : : vector < TxMempoolInfo > ret ;
ret . reserve ( mapTx . size ( ) ) ;
for ( auto it : iters ) {
2016-11-02 11:12:48 +01:00
ret . push_back ( GetInfo ( it ) ) ;
2016-06-08 14:01:05 +02:00
}
return ret ;
2013-08-27 07:51:57 +02:00
}
2013-11-05 02:47:07 +01:00
2016-11-21 10:51:32 +01:00
CTransactionRef CTxMemPool : : get ( const uint256 & hash ) const
2016-05-31 15:47:15 +02:00
{
2016-06-08 14:01:05 +02:00
LOCK ( cs ) ;
indexed_transaction_set : : const_iterator i = mapTx . find ( hash ) ;
if ( i = = mapTx . end ( ) )
return nullptr ;
return i - > GetSharedTx ( ) ;
2016-05-31 15:47:15 +02:00
}
2016-06-08 14:01:05 +02:00
TxMempoolInfo CTxMemPool : : info ( const uint256 & hash ) const
2016-03-21 18:02:47 +01:00
{
LOCK ( cs ) ;
indexed_transaction_set : : const_iterator i = mapTx . find ( hash ) ;
if ( i = = mapTx . end ( ) )
2016-06-08 14:01:05 +02:00
return TxMempoolInfo ( ) ;
2016-11-02 11:12:48 +01:00
return GetInfo ( i ) ;
2016-03-21 18:02:47 +01:00
}
2018-02-14 14:42:06 +01:00
bool CTxMemPool : : existsProviderTxConflict ( const CTransaction & tx ) const {
2024-04-04 12:27:33 +02:00
assert ( m_dmnman ) ;
2024-04-04 12:02:07 +02:00
2018-02-14 14:42:06 +01:00
LOCK ( cs ) ;
2018-12-10 09:14:19 +01:00
auto hasKeyChangeInMempool = [ & ] ( const uint256 & proTxHash ) {
2021-06-23 10:12:11 +02:00
AssertLockHeld ( cs ) ;
2018-12-10 09:14:19 +01:00
for ( auto its = mapProTxRefs . equal_range ( proTxHash ) ; its . first ! = its . second ; + + its . first ) {
auto txit = mapTx . find ( its . first - > second ) ;
if ( txit = = mapTx . end ( ) ) {
continue ;
}
if ( txit - > isKeyChangeProTx ) {
return true ;
}
}
return false ;
} ;
2018-03-19 08:57:43 +01:00
if ( tx . nType = = TRANSACTION_PROVIDER_REGISTER ) {
2024-01-12 04:43:01 +01:00
const auto opt_proTx = GetTxPayload < CProRegTx > ( tx ) ;
if ( ! opt_proTx ) {
2023-07-17 23:03:40 +02:00
LogPrint ( BCLog : : MEMPOOL , " %s: ERROR: Invalid transaction payload, tx: %s \n " , __func__ , tx . GetHash ( ) . ToString ( ) ) ;
2018-11-13 13:47:28 +01:00
return true ; // i.e. can't decode payload == conflict
}
2024-01-12 04:43:01 +01:00
auto & proTx = * opt_proTx ;
2018-10-25 16:29:50 +02:00
if ( mapProTxAddresses . count ( proTx . addr ) | | mapProTxPubKeyIDs . count ( proTx . keyIDOwner ) | | mapProTxBlsPubKeyHashes . count ( proTx . pubKeyOperator . GetHash ( ) ) )
return true ;
2018-12-13 07:49:50 +01:00
if ( ! proTx . collateralOutpoint . hash . IsNull ( ) ) {
if ( mapProTxCollaterals . count ( proTx . collateralOutpoint ) ) {
// there is another ProRegTx that refers to the same collateral
return true ;
}
if ( mapNextTx . count ( proTx . collateralOutpoint ) ) {
// there is another tx that spends the collateral
return true ;
}
}
2018-10-25 16:29:50 +02:00
return false ;
2018-03-19 08:57:43 +01:00
} else if ( tx . nType = = TRANSACTION_PROVIDER_UPDATE_SERVICE ) {
2024-01-12 04:43:01 +01:00
const auto opt_proTx = GetTxPayload < CProUpServTx > ( tx ) ;
if ( ! opt_proTx ) {
2023-07-17 23:03:40 +02:00
LogPrint ( BCLog : : MEMPOOL , " %s: ERROR: Invalid transaction payload, tx: %s \n " , __func__ , tx . GetHash ( ) . ToString ( ) ) ;
2018-11-13 13:47:28 +01:00
return true ; // i.e. can't decode payload == conflict
}
2024-01-12 04:43:01 +01:00
auto it = mapProTxAddresses . find ( opt_proTx - > addr ) ;
return it ! = mapProTxAddresses . end ( ) & & it - > second ! = opt_proTx - > proTxHash ;
2018-03-19 08:44:00 +01:00
} else if ( tx . nType = = TRANSACTION_PROVIDER_UPDATE_REGISTRAR ) {
2024-01-12 04:43:01 +01:00
const auto opt_proTx = GetTxPayload < CProUpRegTx > ( tx ) ;
if ( ! opt_proTx ) {
2023-07-17 23:03:40 +02:00
LogPrint ( BCLog : : MEMPOOL , " %s: ERROR: Invalid transaction payload, tx: %s \n " , __func__ , tx . GetHash ( ) . ToString ( ) ) ;
2018-11-13 13:47:28 +01:00
return true ; // i.e. can't decode payload == conflict
}
2024-01-12 04:43:01 +01:00
auto & proTx = * opt_proTx ;
2018-12-10 09:14:19 +01:00
2019-02-06 17:57:27 +01:00
// this method should only be called with validated ProTxs
2024-04-04 12:27:33 +02:00
auto dmn = m_dmnman - > GetListAtChainTip ( ) . GetMN ( proTx . proTxHash ) ;
2019-02-06 17:57:27 +01:00
if ( ! dmn ) {
2020-01-28 11:04:47 +01:00
LogPrint ( BCLog : : MEMPOOL , " %s: ERROR: Masternode is not in the list, proTxHash: %s \n " , __func__ , proTx . proTxHash . ToString ( ) ) ;
2019-02-06 17:57:27 +01:00
return true ; // i.e. failed to find validated ProTx == conflict
}
// only allow one operator key change in the mempool
2023-06-04 22:45:56 +02:00
if ( dmn - > pdmnState - > pubKeyOperator ! = proTx . pubKeyOperator ) {
2018-12-10 09:14:19 +01:00
if ( hasKeyChangeInMempool ( proTx . proTxHash ) ) {
return true ;
}
}
2018-10-21 21:45:16 +02:00
auto it = mapProTxBlsPubKeyHashes . find ( proTx . pubKeyOperator . GetHash ( ) ) ;
return it ! = mapProTxBlsPubKeyHashes . end ( ) & & it - > second ! = proTx . proTxHash ;
2018-12-10 09:14:19 +01:00
} else if ( tx . nType = = TRANSACTION_PROVIDER_UPDATE_REVOKE ) {
2024-01-12 04:43:01 +01:00
const auto opt_proTx = GetTxPayload < CProUpRevTx > ( tx ) ;
if ( ! opt_proTx ) {
2023-07-17 23:03:40 +02:00
LogPrint ( BCLog : : MEMPOOL , " %s: ERROR: Invalid transaction payload, tx: %s \n " , __func__ , tx . GetHash ( ) . ToString ( ) ) ;
2018-12-10 09:14:19 +01:00
return true ; // i.e. can't decode payload == conflict
}
2024-01-12 04:43:01 +01:00
auto & proTx = * opt_proTx ;
2019-02-06 17:57:27 +01:00
// this method should only be called with validated ProTxs
2024-04-04 12:27:33 +02:00
auto dmn = m_dmnman - > GetListAtChainTip ( ) . GetMN ( proTx . proTxHash ) ;
2019-02-06 17:57:27 +01:00
if ( ! dmn ) {
2020-01-28 11:04:47 +01:00
LogPrint ( BCLog : : MEMPOOL , " %s: ERROR: Masternode is not in the list, proTxHash: %s \n " , __func__ , proTx . proTxHash . ToString ( ) ) ;
2019-02-06 17:57:27 +01:00
return true ; // i.e. failed to find validated ProTx == conflict
}
// only allow one operator key change in the mempool
2019-06-13 11:01:26 +02:00
if ( dmn - > pdmnState - > pubKeyOperator . Get ( ) ! = CBLSPublicKey ( ) ) {
2018-12-10 09:14:19 +01:00
if ( hasKeyChangeInMempool ( proTx . proTxHash ) ) {
return true ;
}
}
2018-03-19 08:57:43 +01:00
}
return false ;
2018-02-14 14:42:06 +01:00
}
2019-03-14 15:44:42 +01:00
void CTxMemPool : : PrioritiseTransaction ( const uint256 & hash , const CAmount & nFeeDelta )
2012-07-11 20:52:41 +02:00
{
{
LOCK ( cs ) ;
2019-03-14 15:44:42 +01:00
CAmount & delta = mapDeltas [ hash ] ;
delta + = nFeeDelta ;
2015-10-26 19:06:06 +01:00
txiter it = mapTx . find ( hash ) ;
if ( it ! = mapTx . end ( ) ) {
2019-03-14 15:44:42 +01:00
mapTx . modify ( it , update_fee_delta ( delta ) ) ;
2015-11-19 17:18:28 +01:00
// Now update all ancestors' modified fees with descendants
setEntries setAncestors ;
uint64_t nNoLimit = std : : numeric_limits < uint64_t > : : max ( ) ;
std : : string dummy ;
CalculateMemPoolAncestors ( * it , setAncestors , nNoLimit , nNoLimit , nNoLimit , nNoLimit , dummy , false ) ;
2019-07-05 09:06:28 +02:00
for ( txiter ancestorIt : setAncestors ) {
2015-11-19 17:18:28 +01:00
mapTx . modify ( ancestorIt , update_descendant_state ( 0 , nFeeDelta , 0 ) ) ;
}
2017-04-05 08:36:34 +02:00
// Now update all descendants' modified fees with ancestors
setEntries setDescendants ;
CalculateDescendants ( it , setDescendants ) ;
setDescendants . erase ( it ) ;
2019-07-05 09:06:28 +02:00
for ( txiter descendantIt : setDescendants ) {
2017-04-05 08:36:34 +02:00
mapTx . modify ( descendantIt , update_ancestor_state ( 0 , nFeeDelta , 0 , 0 ) ) ;
}
2017-05-17 22:17:28 +02:00
+ + nTransactionsUpdated ;
2015-10-26 19:06:06 +01:00
}
2012-07-11 20:52:41 +02:00
}
2021-04-17 21:23:54 +02:00
LogPrint ( BCLog : : MEMPOOL , " PrioritiseTransaction: %s feerate += %s \n " , hash . ToString ( ) , FormatMoney ( nFeeDelta ) ) ;
2012-07-11 20:52:41 +02:00
}
2020-09-04 12:34:38 +02:00
void CTxMemPool : : ApplyDelta ( const uint256 & hash , CAmount & nFeeDelta ) const
2012-07-11 20:52:41 +02:00
{
2020-09-04 12:34:38 +02:00
AssertLockHeld ( cs ) ;
2019-03-14 15:44:42 +01:00
std : : map < uint256 , CAmount > : : const_iterator pos = mapDeltas . find ( hash ) ;
2012-07-11 20:52:41 +02:00
if ( pos = = mapDeltas . end ( ) )
return ;
2019-03-14 15:44:42 +01:00
const CAmount & delta = pos - > second ;
nFeeDelta + = delta ;
2012-07-11 20:52:41 +02:00
}
2020-09-04 12:34:38 +02:00
void CTxMemPool : : ClearPrioritisation ( const uint256 & hash )
2012-07-11 20:52:41 +02:00
{
2020-09-04 12:34:38 +02:00
AssertLockHeld ( cs ) ;
2012-07-11 20:52:41 +02:00
mapDeltas . erase ( hash ) ;
}
2021-08-10 07:21:20 +02:00
const CTransaction * CTxMemPool : : GetConflictTx ( const COutPoint & prevout ) const
{
const auto it = mapNextTx . find ( prevout ) ;
return it = = mapNextTx . end ( ) ? nullptr : it - > second ;
}
2022-10-15 22:11:49 +02:00
std : : optional < CTxMemPool : : txiter > CTxMemPool : : GetIter ( const uint256 & txid ) const
2021-08-10 07:21:20 +02:00
{
auto it = mapTx . find ( txid ) ;
if ( it ! = mapTx . end ( ) ) return it ;
2021-03-22 06:42:01 +01:00
return std : : nullopt ;
2021-08-10 07:21:20 +02:00
}
CTxMemPool : : setEntries CTxMemPool : : GetIterSet ( const std : : set < uint256 > & hashes ) const
{
CTxMemPool : : setEntries ret ;
for ( const auto & h : hashes ) {
const auto mi = GetIter ( h ) ;
if ( mi ) ret . insert ( * mi ) ;
}
return ret ;
}
2014-08-26 22:28:32 +02:00
bool CTxMemPool : : HasNoInputsOf ( const CTransaction & tx ) const
{
for ( unsigned int i = 0 ; i < tx . vin . size ( ) ; i + + )
if ( exists ( tx . vin [ i ] . prevout . hash ) )
return false ;
return true ;
}
2014-03-17 13:19:54 +01:00
2016-05-05 12:52:28 +02:00
CCoinsViewMemPool : : CCoinsViewMemPool ( CCoinsView * baseIn , const CTxMemPool & mempoolIn ) : CCoinsViewBacked ( baseIn ) , mempool ( mempoolIn ) { }
2013-11-05 02:47:07 +01:00
2017-06-02 00:47:58 +02:00
bool CCoinsViewMemPool : : GetCoin ( const COutPoint & outpoint , Coin & coin ) const {
2024-01-29 16:01:14 +01:00
// Check to see if the inputs are made available by another tx in the package.
// These Coins would not be available in the underlying CoinsView.
if ( auto it = m_temp_added . find ( outpoint ) ; it ! = m_temp_added . end ( ) ) {
coin = it - > second ;
return true ;
}
2014-07-23 15:28:45 +02:00
// If an entry in the mempool exists, always return that one, as it's guaranteed to never
// conflict with the underlying cache, and it cannot have pruned entries (as it contains full)
// transactions. First checking the underlying cache risks returning a pruned entry instead.
2016-11-21 10:51:32 +01:00
CTransactionRef ptx = mempool . get ( outpoint . hash ) ;
2016-06-08 14:01:05 +02:00
if ( ptx ) {
if ( outpoint . n < ptx - > vout . size ( ) ) {
coin = Coin ( ptx - > vout [ outpoint . n ] , MEMPOOL_HEIGHT , false ) ;
2017-06-02 00:47:58 +02:00
return true ;
} else {
return false ;
}
2013-11-05 02:47:07 +01:00
}
2017-06-27 08:49:44 +02:00
return base - > GetCoin ( outpoint , coin ) ;
2013-11-05 02:47:07 +01:00
}
2015-07-09 19:56:31 +02:00
2024-01-29 16:01:14 +01:00
void CCoinsViewMemPool : : PackageAddTransaction ( const CTransactionRef & tx )
{
for ( unsigned int n = 0 ; n < tx - > vout . size ( ) ; + + n ) {
m_temp_added . emplace ( COutPoint ( tx - > GetHash ( ) , n ) , Coin ( tx - > vout [ n ] , MEMPOOL_HEIGHT , false ) ) ;
}
}
2015-07-09 19:56:31 +02:00
size_t CTxMemPool : : DynamicMemoryUsage ( ) const {
LOCK ( cs ) ;
2018-01-15 09:55:03 +01:00
// Estimate the overhead of mapTx to be 12 pointers + an allocation, as no exact formula for boost::multi_index_contained is implemented.
Merge #19478: Remove CTxMempool::mapLinks data structure member
296be8f58e02b39a58f017c52294aceed22c3ffd Get rid of unused functions CTxMemPool::GetMemPoolChildren, CTxMemPool::GetMemPoolParents (Jeremy Rubin)
46d955d196043cc297834baeebce31ff778dff80 Remove mapLinks in favor of entry inlined structs with iterator type erasure (Jeremy Rubin)
Pull request description:
Currently we have a peculiar data structure in the mempool called maplinks. Maplinks job is to track the in-pool children and parents of each transaction. This PR can be primarily understood and reviewed as a simple refactoring to remove this extra data structure, although it comes with a nice memory and performance improvement for free.
Maplinks is particularly peculiar because removing it is not as simple as just moving it's inner structure to the owning CTxMempoolEntry. Because TxLinks (the class storing the setEntries for parents and children) store txiters to each entry in the mempool corresponding to the parent or child, it means that the TxLinks type is "aware" of the boost multiindex (mapTx) it's coming from, which is in turn, aware of the entry type stored in mapTx. Thus we used maplinks to store this entry associated data we in an entirely separate data structure just to avoid a circular type reference caused by storing a txiter inside a CTxMempoolEntry.
It turns out, we can kill this circular reference by making use of iterator_to multiindex function and std::reference_wrapper. This allows us to get rid of the maplinks data structure and move the ownership of the parents/child sets to the entries themselves.
The benefit of this good all around, for any of the reasons given below the change would be acceptable, and it doesn't make the code harder to reason about or worse in any respect (as far as I can tell, there's no tradeoff).
### Simpler ownership model
No longer having to consistency check that mapLinks did have records for our CTxMempoolEntry, impossible to have a mapLinks entry outlive or incorrectly die before a CTxMempoolEntry.
### Memory Usage
We get rid of a O(Transactions) sized map in the mempool, which is a long lived data structure.
### Performance
If you have a CTxMemPoolEntry, you immediately know the address of it's children/parents, rather than having to do a O(log(Transactions)) lookup via maplinks (which we do very often). We do it in *so many* places that a true benchmark has to look at a full running node, but it is easy enough to show an improvement in this case.
The ComplexMemPool shows a good coherence check that we see the expected result of it being 12.5% faster / 1.14x faster.
```
Before:
# Benchmark, evals, iterations, total, min, max, median
ComplexMemPool, 5, 1, 1.40462, 0.277222, 0.285339, 0.279793
After:
# Benchmark, evals, iterations, total, min, max, median
ComplexMemPool, 5, 1, 1.22586, 0.243831, 0.247076, 0.244596
```
The ComplexMemPool benchmark only checks doing addUnchecked and TrimToSize for 800 transactions. While this bench does a good job of hammering the relevant types of function, it doesn't test everything.
Subbing in 5000 transactions shows a that the advantage isn't completely wiped out by other asymptotic factors (this isn't the only bottleneck in growing the mempool), but it's only a bit proportionally slower (10.8%, 1.12x), which adds evidence that this will be a good change for performance minded users.
```
# Benchmark, evals, iterations, total, min, max, median
ComplexMemPool, 5, 1, 59.1321, 11.5919, 12.235, 11.7068
# Benchmark, evals, iterations, total, min, max, median
ComplexMemPool, 5, 1, 52.1307, 10.2641, 10.5206, 10.4306
```
I don't think it's possible to come up with an example of where a maplinks based design would have better performance, but it's something for reviewers to consider.
# Discussion
## Why maplinks in the first place?
I spoke with the author of mapLinks (sdaftuar) a while back, and my recollection from our conversation was that it was implemented because he did not know how to resolve the circular dependency at the time, and there was no other reason for making it a separate map.
## Is iterator_to weird?
iterator_to is expressly for this purpose, see https://www.boost.org/doc/libs/1_51_0/libs/multi_index/doc/tutorial/indices.html#iterator_to
> iterator_to provides a way to retrieve an iterator to an element from a pointer to the element, thus making iterators and pointers interchangeable for the purposes of element pointing (not so for traversal) in many situations. This notwithstanding, it is not the aim of iterator_to to promote the usage of pointers as substitutes for real iterators: the latter are specifically designed for handling the elements of a container, and not only benefit from the iterator orientation of container interfaces, but are also capable of exposing many more programming bugs than raw pointers, both at compile and run time. iterator_to is thus meant to be used in scenarios where access via iterators is not suitable or desireable:
>
> - Interoperability with preexisting APIs based on pointers or references.
> - Publication of pointer-based interfaces (for instance, when designing a C-compatible library).
> - The exposure of pointers in place of iterators can act as a type erasure barrier effectively decoupling the user of the code from the implementation detail of which particular container is being used. Similar techniques, like the famous Pimpl idiom, are used in large projects to reduce dependencies and build times.
> - Self-referencing contexts where an element acts upon its owner container and no iterator to itself is available.
In other words, iterator_to is the perfect tool for the job by the last reason given. Under the hood it should just be a simple pointer cast and have no major runtime overhead (depending on if the function call is inlined).
Edit by laanwj: removed at sign from the description
ACKs for top commit:
jonatack:
re-ACK 296be8f per `git range-diff ab338a19 3ba1665 296be8f`, sanity check gcc 10.2 debug build is clean.
hebasto:
re-ACK 296be8f58e02b39a58f017c52294aceed22c3ffd, only rebased since my [previous](https://github.com/bitcoin/bitcoin/pull/19478#pullrequestreview-482400727) review (verified with `git range-diff`).
Tree-SHA512: f5c30a4936fcde6ae32a02823c303b3568a747c2681d11f87df88a149f984a6d3b4c81f391859afbeb68864ef7f6a3d8779f74a58e3de701b3d51f78e498682e
2020-09-07 12:06:38 +02:00
return memusage : : MallocUsage ( sizeof ( CTxMemPoolEntry ) + 12 * sizeof ( void * ) ) * mapTx . size ( ) + memusage : : DynamicUsage ( mapNextTx ) + memusage : : DynamicUsage ( mapDeltas ) + memusage : : DynamicUsage ( vTxHashes ) + cachedInnerUsage ;
2015-07-15 20:47:45 +02:00
}
2022-05-06 06:04:50 +02:00
void CTxMemPool : : RemoveUnbroadcastTx ( const uint256 & txid , const bool unchecked ) {
LOCK ( cs ) ;
if ( m_unbroadcast_txids . erase ( txid ) )
{
LogPrint ( BCLog : : MEMPOOL , " Removed %i from set of unbroadcast txns%s \n " , txid . GetHex ( ) , ( unchecked ? " before confirmation that txn was sent out " : " " ) ) ;
}
}
2017-01-24 10:07:50 +01:00
void CTxMemPool : : RemoveStaged ( setEntries & stage , bool updateDescendants , MemPoolRemovalReason reason ) {
2015-07-15 20:47:45 +02:00
AssertLockHeld ( cs ) ;
2016-03-17 13:33:31 +01:00
UpdateForRemoveFromMempool ( stage , updateDescendants ) ;
2018-09-04 15:36:09 +02:00
for ( txiter it : stage ) {
2017-01-24 10:07:50 +01:00
removeUnchecked ( it , reason ) ;
2015-07-15 20:47:45 +02:00
}
}
2019-10-02 16:55:03 +02:00
int CTxMemPool : : Expire ( std : : chrono : : seconds time )
{
2022-05-05 16:24:51 +02:00
AssertLockHeld ( cs ) ;
2016-03-05 06:49:47 +01:00
indexed_transaction_set : : index < entry_time > : : type : : iterator it = mapTx . get < entry_time > ( ) . begin ( ) ;
2015-10-02 23:43:30 +02:00
setEntries toremove ;
2016-03-05 06:49:47 +01:00
while ( it ! = mapTx . get < entry_time > ( ) . end ( ) & & it - > GetTime ( ) < time ) {
2018-03-29 17:07:40 +02:00
// locked txes do not expire until mined and have sufficient confirmations
2019-07-09 16:50:08 +02:00
if ( llmq : : quorumInstantSendManager - > IsLocked ( it - > GetTx ( ) . GetHash ( ) ) ) {
2018-03-29 17:07:40 +02:00
it + + ;
continue ;
}
2015-10-02 23:43:30 +02:00
toremove . insert ( mapTx . project < 0 > ( it ) ) ;
it + + ;
}
setEntries stage ;
2019-07-05 09:06:28 +02:00
for ( txiter removeit : toremove ) {
2015-10-02 23:43:30 +02:00
CalculateDescendants ( removeit , stage ) ;
}
2017-01-24 10:07:50 +01:00
RemoveStaged ( stage , false , MemPoolRemovalReason : : EXPIRY ) ;
2015-10-02 23:43:30 +02:00
return stage . size ( ) ;
}
2021-07-19 00:54:57 +02:00
void CTxMemPool : : addUnchecked ( const CTxMemPoolEntry & entry , bool validFeeEstimate )
2015-07-15 20:47:45 +02:00
{
setEntries setAncestors ;
uint64_t nNoLimit = std : : numeric_limits < uint64_t > : : max ( ) ;
std : : string dummy ;
CalculateMemPoolAncestors ( entry , setAncestors , nNoLimit , nNoLimit , nNoLimit , nNoLimit , dummy ) ;
2021-07-19 00:54:57 +02:00
return addUnchecked ( entry , setAncestors , validFeeEstimate ) ;
2015-07-15 20:47:45 +02:00
}
void CTxMemPool : : UpdateChild ( txiter entry , txiter child , bool add )
{
2020-09-04 12:34:38 +02:00
AssertLockHeld ( cs ) ;
Merge #19478: Remove CTxMempool::mapLinks data structure member
296be8f58e02b39a58f017c52294aceed22c3ffd Get rid of unused functions CTxMemPool::GetMemPoolChildren, CTxMemPool::GetMemPoolParents (Jeremy Rubin)
46d955d196043cc297834baeebce31ff778dff80 Remove mapLinks in favor of entry inlined structs with iterator type erasure (Jeremy Rubin)
Pull request description:
Currently we have a peculiar data structure in the mempool called maplinks. Maplinks job is to track the in-pool children and parents of each transaction. This PR can be primarily understood and reviewed as a simple refactoring to remove this extra data structure, although it comes with a nice memory and performance improvement for free.
Maplinks is particularly peculiar because removing it is not as simple as just moving it's inner structure to the owning CTxMempoolEntry. Because TxLinks (the class storing the setEntries for parents and children) store txiters to each entry in the mempool corresponding to the parent or child, it means that the TxLinks type is "aware" of the boost multiindex (mapTx) it's coming from, which is in turn, aware of the entry type stored in mapTx. Thus we used maplinks to store this entry associated data we in an entirely separate data structure just to avoid a circular type reference caused by storing a txiter inside a CTxMempoolEntry.
It turns out, we can kill this circular reference by making use of iterator_to multiindex function and std::reference_wrapper. This allows us to get rid of the maplinks data structure and move the ownership of the parents/child sets to the entries themselves.
The benefit of this good all around, for any of the reasons given below the change would be acceptable, and it doesn't make the code harder to reason about or worse in any respect (as far as I can tell, there's no tradeoff).
### Simpler ownership model
No longer having to consistency check that mapLinks did have records for our CTxMempoolEntry, impossible to have a mapLinks entry outlive or incorrectly die before a CTxMempoolEntry.
### Memory Usage
We get rid of a O(Transactions) sized map in the mempool, which is a long lived data structure.
### Performance
If you have a CTxMemPoolEntry, you immediately know the address of it's children/parents, rather than having to do a O(log(Transactions)) lookup via maplinks (which we do very often). We do it in *so many* places that a true benchmark has to look at a full running node, but it is easy enough to show an improvement in this case.
The ComplexMemPool shows a good coherence check that we see the expected result of it being 12.5% faster / 1.14x faster.
```
Before:
# Benchmark, evals, iterations, total, min, max, median
ComplexMemPool, 5, 1, 1.40462, 0.277222, 0.285339, 0.279793
After:
# Benchmark, evals, iterations, total, min, max, median
ComplexMemPool, 5, 1, 1.22586, 0.243831, 0.247076, 0.244596
```
The ComplexMemPool benchmark only checks doing addUnchecked and TrimToSize for 800 transactions. While this bench does a good job of hammering the relevant types of function, it doesn't test everything.
Subbing in 5000 transactions shows a that the advantage isn't completely wiped out by other asymptotic factors (this isn't the only bottleneck in growing the mempool), but it's only a bit proportionally slower (10.8%, 1.12x), which adds evidence that this will be a good change for performance minded users.
```
# Benchmark, evals, iterations, total, min, max, median
ComplexMemPool, 5, 1, 59.1321, 11.5919, 12.235, 11.7068
# Benchmark, evals, iterations, total, min, max, median
ComplexMemPool, 5, 1, 52.1307, 10.2641, 10.5206, 10.4306
```
I don't think it's possible to come up with an example of where a maplinks based design would have better performance, but it's something for reviewers to consider.
# Discussion
## Why maplinks in the first place?
I spoke with the author of mapLinks (sdaftuar) a while back, and my recollection from our conversation was that it was implemented because he did not know how to resolve the circular dependency at the time, and there was no other reason for making it a separate map.
## Is iterator_to weird?
iterator_to is expressly for this purpose, see https://www.boost.org/doc/libs/1_51_0/libs/multi_index/doc/tutorial/indices.html#iterator_to
> iterator_to provides a way to retrieve an iterator to an element from a pointer to the element, thus making iterators and pointers interchangeable for the purposes of element pointing (not so for traversal) in many situations. This notwithstanding, it is not the aim of iterator_to to promote the usage of pointers as substitutes for real iterators: the latter are specifically designed for handling the elements of a container, and not only benefit from the iterator orientation of container interfaces, but are also capable of exposing many more programming bugs than raw pointers, both at compile and run time. iterator_to is thus meant to be used in scenarios where access via iterators is not suitable or desireable:
>
> - Interoperability with preexisting APIs based on pointers or references.
> - Publication of pointer-based interfaces (for instance, when designing a C-compatible library).
> - The exposure of pointers in place of iterators can act as a type erasure barrier effectively decoupling the user of the code from the implementation detail of which particular container is being used. Similar techniques, like the famous Pimpl idiom, are used in large projects to reduce dependencies and build times.
> - Self-referencing contexts where an element acts upon its owner container and no iterator to itself is available.
In other words, iterator_to is the perfect tool for the job by the last reason given. Under the hood it should just be a simple pointer cast and have no major runtime overhead (depending on if the function call is inlined).
Edit by laanwj: removed at sign from the description
ACKs for top commit:
jonatack:
re-ACK 296be8f per `git range-diff ab338a19 3ba1665 296be8f`, sanity check gcc 10.2 debug build is clean.
hebasto:
re-ACK 296be8f58e02b39a58f017c52294aceed22c3ffd, only rebased since my [previous](https://github.com/bitcoin/bitcoin/pull/19478#pullrequestreview-482400727) review (verified with `git range-diff`).
Tree-SHA512: f5c30a4936fcde6ae32a02823c303b3568a747c2681d11f87df88a149f984a6d3b4c81f391859afbeb68864ef7f6a3d8779f74a58e3de701b3d51f78e498682e
2020-09-07 12:06:38 +02:00
CTxMemPoolEntry : : Children s ;
if ( add & & entry - > GetMemPoolChildren ( ) . insert ( * child ) . second ) {
2015-07-15 20:47:45 +02:00
cachedInnerUsage + = memusage : : IncrementalDynamicUsage ( s ) ;
Merge #19478: Remove CTxMempool::mapLinks data structure member
296be8f58e02b39a58f017c52294aceed22c3ffd Get rid of unused functions CTxMemPool::GetMemPoolChildren, CTxMemPool::GetMemPoolParents (Jeremy Rubin)
46d955d196043cc297834baeebce31ff778dff80 Remove mapLinks in favor of entry inlined structs with iterator type erasure (Jeremy Rubin)
Pull request description:
Currently we have a peculiar data structure in the mempool called maplinks. Maplinks job is to track the in-pool children and parents of each transaction. This PR can be primarily understood and reviewed as a simple refactoring to remove this extra data structure, although it comes with a nice memory and performance improvement for free.
Maplinks is particularly peculiar because removing it is not as simple as just moving it's inner structure to the owning CTxMempoolEntry. Because TxLinks (the class storing the setEntries for parents and children) store txiters to each entry in the mempool corresponding to the parent or child, it means that the TxLinks type is "aware" of the boost multiindex (mapTx) it's coming from, which is in turn, aware of the entry type stored in mapTx. Thus we used maplinks to store this entry associated data we in an entirely separate data structure just to avoid a circular type reference caused by storing a txiter inside a CTxMempoolEntry.
It turns out, we can kill this circular reference by making use of iterator_to multiindex function and std::reference_wrapper. This allows us to get rid of the maplinks data structure and move the ownership of the parents/child sets to the entries themselves.
The benefit of this good all around, for any of the reasons given below the change would be acceptable, and it doesn't make the code harder to reason about or worse in any respect (as far as I can tell, there's no tradeoff).
### Simpler ownership model
No longer having to consistency check that mapLinks did have records for our CTxMempoolEntry, impossible to have a mapLinks entry outlive or incorrectly die before a CTxMempoolEntry.
### Memory Usage
We get rid of a O(Transactions) sized map in the mempool, which is a long lived data structure.
### Performance
If you have a CTxMemPoolEntry, you immediately know the address of it's children/parents, rather than having to do a O(log(Transactions)) lookup via maplinks (which we do very often). We do it in *so many* places that a true benchmark has to look at a full running node, but it is easy enough to show an improvement in this case.
The ComplexMemPool shows a good coherence check that we see the expected result of it being 12.5% faster / 1.14x faster.
```
Before:
# Benchmark, evals, iterations, total, min, max, median
ComplexMemPool, 5, 1, 1.40462, 0.277222, 0.285339, 0.279793
After:
# Benchmark, evals, iterations, total, min, max, median
ComplexMemPool, 5, 1, 1.22586, 0.243831, 0.247076, 0.244596
```
The ComplexMemPool benchmark only checks doing addUnchecked and TrimToSize for 800 transactions. While this bench does a good job of hammering the relevant types of function, it doesn't test everything.
Subbing in 5000 transactions shows a that the advantage isn't completely wiped out by other asymptotic factors (this isn't the only bottleneck in growing the mempool), but it's only a bit proportionally slower (10.8%, 1.12x), which adds evidence that this will be a good change for performance minded users.
```
# Benchmark, evals, iterations, total, min, max, median
ComplexMemPool, 5, 1, 59.1321, 11.5919, 12.235, 11.7068
# Benchmark, evals, iterations, total, min, max, median
ComplexMemPool, 5, 1, 52.1307, 10.2641, 10.5206, 10.4306
```
I don't think it's possible to come up with an example of where a maplinks based design would have better performance, but it's something for reviewers to consider.
# Discussion
## Why maplinks in the first place?
I spoke with the author of mapLinks (sdaftuar) a while back, and my recollection from our conversation was that it was implemented because he did not know how to resolve the circular dependency at the time, and there was no other reason for making it a separate map.
## Is iterator_to weird?
iterator_to is expressly for this purpose, see https://www.boost.org/doc/libs/1_51_0/libs/multi_index/doc/tutorial/indices.html#iterator_to
> iterator_to provides a way to retrieve an iterator to an element from a pointer to the element, thus making iterators and pointers interchangeable for the purposes of element pointing (not so for traversal) in many situations. This notwithstanding, it is not the aim of iterator_to to promote the usage of pointers as substitutes for real iterators: the latter are specifically designed for handling the elements of a container, and not only benefit from the iterator orientation of container interfaces, but are also capable of exposing many more programming bugs than raw pointers, both at compile and run time. iterator_to is thus meant to be used in scenarios where access via iterators is not suitable or desireable:
>
> - Interoperability with preexisting APIs based on pointers or references.
> - Publication of pointer-based interfaces (for instance, when designing a C-compatible library).
> - The exposure of pointers in place of iterators can act as a type erasure barrier effectively decoupling the user of the code from the implementation detail of which particular container is being used. Similar techniques, like the famous Pimpl idiom, are used in large projects to reduce dependencies and build times.
> - Self-referencing contexts where an element acts upon its owner container and no iterator to itself is available.
In other words, iterator_to is the perfect tool for the job by the last reason given. Under the hood it should just be a simple pointer cast and have no major runtime overhead (depending on if the function call is inlined).
Edit by laanwj: removed at sign from the description
ACKs for top commit:
jonatack:
re-ACK 296be8f per `git range-diff ab338a19 3ba1665 296be8f`, sanity check gcc 10.2 debug build is clean.
hebasto:
re-ACK 296be8f58e02b39a58f017c52294aceed22c3ffd, only rebased since my [previous](https://github.com/bitcoin/bitcoin/pull/19478#pullrequestreview-482400727) review (verified with `git range-diff`).
Tree-SHA512: f5c30a4936fcde6ae32a02823c303b3568a747c2681d11f87df88a149f984a6d3b4c81f391859afbeb68864ef7f6a3d8779f74a58e3de701b3d51f78e498682e
2020-09-07 12:06:38 +02:00
} else if ( ! add & & entry - > GetMemPoolChildren ( ) . erase ( * child ) ) {
2015-07-15 20:47:45 +02:00
cachedInnerUsage - = memusage : : IncrementalDynamicUsage ( s ) ;
}
}
void CTxMemPool : : UpdateParent ( txiter entry , txiter parent , bool add )
{
2020-09-04 12:34:38 +02:00
AssertLockHeld ( cs ) ;
Merge #19478: Remove CTxMempool::mapLinks data structure member
296be8f58e02b39a58f017c52294aceed22c3ffd Get rid of unused functions CTxMemPool::GetMemPoolChildren, CTxMemPool::GetMemPoolParents (Jeremy Rubin)
46d955d196043cc297834baeebce31ff778dff80 Remove mapLinks in favor of entry inlined structs with iterator type erasure (Jeremy Rubin)
Pull request description:
Currently we have a peculiar data structure in the mempool called maplinks. Maplinks job is to track the in-pool children and parents of each transaction. This PR can be primarily understood and reviewed as a simple refactoring to remove this extra data structure, although it comes with a nice memory and performance improvement for free.
Maplinks is particularly peculiar because removing it is not as simple as just moving it's inner structure to the owning CTxMempoolEntry. Because TxLinks (the class storing the setEntries for parents and children) store txiters to each entry in the mempool corresponding to the parent or child, it means that the TxLinks type is "aware" of the boost multiindex (mapTx) it's coming from, which is in turn, aware of the entry type stored in mapTx. Thus we used maplinks to store this entry associated data we in an entirely separate data structure just to avoid a circular type reference caused by storing a txiter inside a CTxMempoolEntry.
It turns out, we can kill this circular reference by making use of iterator_to multiindex function and std::reference_wrapper. This allows us to get rid of the maplinks data structure and move the ownership of the parents/child sets to the entries themselves.
The benefit of this good all around, for any of the reasons given below the change would be acceptable, and it doesn't make the code harder to reason about or worse in any respect (as far as I can tell, there's no tradeoff).
### Simpler ownership model
No longer having to consistency check that mapLinks did have records for our CTxMempoolEntry, impossible to have a mapLinks entry outlive or incorrectly die before a CTxMempoolEntry.
### Memory Usage
We get rid of a O(Transactions) sized map in the mempool, which is a long lived data structure.
### Performance
If you have a CTxMemPoolEntry, you immediately know the address of it's children/parents, rather than having to do a O(log(Transactions)) lookup via maplinks (which we do very often). We do it in *so many* places that a true benchmark has to look at a full running node, but it is easy enough to show an improvement in this case.
The ComplexMemPool shows a good coherence check that we see the expected result of it being 12.5% faster / 1.14x faster.
```
Before:
# Benchmark, evals, iterations, total, min, max, median
ComplexMemPool, 5, 1, 1.40462, 0.277222, 0.285339, 0.279793
After:
# Benchmark, evals, iterations, total, min, max, median
ComplexMemPool, 5, 1, 1.22586, 0.243831, 0.247076, 0.244596
```
The ComplexMemPool benchmark only checks doing addUnchecked and TrimToSize for 800 transactions. While this bench does a good job of hammering the relevant types of function, it doesn't test everything.
Subbing in 5000 transactions shows a that the advantage isn't completely wiped out by other asymptotic factors (this isn't the only bottleneck in growing the mempool), but it's only a bit proportionally slower (10.8%, 1.12x), which adds evidence that this will be a good change for performance minded users.
```
# Benchmark, evals, iterations, total, min, max, median
ComplexMemPool, 5, 1, 59.1321, 11.5919, 12.235, 11.7068
# Benchmark, evals, iterations, total, min, max, median
ComplexMemPool, 5, 1, 52.1307, 10.2641, 10.5206, 10.4306
```
I don't think it's possible to come up with an example of where a maplinks based design would have better performance, but it's something for reviewers to consider.
# Discussion
## Why maplinks in the first place?
I spoke with the author of mapLinks (sdaftuar) a while back, and my recollection from our conversation was that it was implemented because he did not know how to resolve the circular dependency at the time, and there was no other reason for making it a separate map.
## Is iterator_to weird?
iterator_to is expressly for this purpose, see https://www.boost.org/doc/libs/1_51_0/libs/multi_index/doc/tutorial/indices.html#iterator_to
> iterator_to provides a way to retrieve an iterator to an element from a pointer to the element, thus making iterators and pointers interchangeable for the purposes of element pointing (not so for traversal) in many situations. This notwithstanding, it is not the aim of iterator_to to promote the usage of pointers as substitutes for real iterators: the latter are specifically designed for handling the elements of a container, and not only benefit from the iterator orientation of container interfaces, but are also capable of exposing many more programming bugs than raw pointers, both at compile and run time. iterator_to is thus meant to be used in scenarios where access via iterators is not suitable or desireable:
>
> - Interoperability with preexisting APIs based on pointers or references.
> - Publication of pointer-based interfaces (for instance, when designing a C-compatible library).
> - The exposure of pointers in place of iterators can act as a type erasure barrier effectively decoupling the user of the code from the implementation detail of which particular container is being used. Similar techniques, like the famous Pimpl idiom, are used in large projects to reduce dependencies and build times.
> - Self-referencing contexts where an element acts upon its owner container and no iterator to itself is available.
In other words, iterator_to is the perfect tool for the job by the last reason given. Under the hood it should just be a simple pointer cast and have no major runtime overhead (depending on if the function call is inlined).
Edit by laanwj: removed at sign from the description
ACKs for top commit:
jonatack:
re-ACK 296be8f per `git range-diff ab338a19 3ba1665 296be8f`, sanity check gcc 10.2 debug build is clean.
hebasto:
re-ACK 296be8f58e02b39a58f017c52294aceed22c3ffd, only rebased since my [previous](https://github.com/bitcoin/bitcoin/pull/19478#pullrequestreview-482400727) review (verified with `git range-diff`).
Tree-SHA512: f5c30a4936fcde6ae32a02823c303b3568a747c2681d11f87df88a149f984a6d3b4c81f391859afbeb68864ef7f6a3d8779f74a58e3de701b3d51f78e498682e
2020-09-07 12:06:38 +02:00
CTxMemPoolEntry : : Parents s ;
if ( add & & entry - > GetMemPoolParents ( ) . insert ( * parent ) . second ) {
2015-07-15 20:47:45 +02:00
cachedInnerUsage + = memusage : : IncrementalDynamicUsage ( s ) ;
Merge #19478: Remove CTxMempool::mapLinks data structure member
296be8f58e02b39a58f017c52294aceed22c3ffd Get rid of unused functions CTxMemPool::GetMemPoolChildren, CTxMemPool::GetMemPoolParents (Jeremy Rubin)
46d955d196043cc297834baeebce31ff778dff80 Remove mapLinks in favor of entry inlined structs with iterator type erasure (Jeremy Rubin)
Pull request description:
Currently we have a peculiar data structure in the mempool called maplinks. Maplinks job is to track the in-pool children and parents of each transaction. This PR can be primarily understood and reviewed as a simple refactoring to remove this extra data structure, although it comes with a nice memory and performance improvement for free.
Maplinks is particularly peculiar because removing it is not as simple as just moving it's inner structure to the owning CTxMempoolEntry. Because TxLinks (the class storing the setEntries for parents and children) store txiters to each entry in the mempool corresponding to the parent or child, it means that the TxLinks type is "aware" of the boost multiindex (mapTx) it's coming from, which is in turn, aware of the entry type stored in mapTx. Thus we used maplinks to store this entry associated data we in an entirely separate data structure just to avoid a circular type reference caused by storing a txiter inside a CTxMempoolEntry.
It turns out, we can kill this circular reference by making use of iterator_to multiindex function and std::reference_wrapper. This allows us to get rid of the maplinks data structure and move the ownership of the parents/child sets to the entries themselves.
The benefit of this good all around, for any of the reasons given below the change would be acceptable, and it doesn't make the code harder to reason about or worse in any respect (as far as I can tell, there's no tradeoff).
### Simpler ownership model
No longer having to consistency check that mapLinks did have records for our CTxMempoolEntry, impossible to have a mapLinks entry outlive or incorrectly die before a CTxMempoolEntry.
### Memory Usage
We get rid of a O(Transactions) sized map in the mempool, which is a long lived data structure.
### Performance
If you have a CTxMemPoolEntry, you immediately know the address of it's children/parents, rather than having to do a O(log(Transactions)) lookup via maplinks (which we do very often). We do it in *so many* places that a true benchmark has to look at a full running node, but it is easy enough to show an improvement in this case.
The ComplexMemPool shows a good coherence check that we see the expected result of it being 12.5% faster / 1.14x faster.
```
Before:
# Benchmark, evals, iterations, total, min, max, median
ComplexMemPool, 5, 1, 1.40462, 0.277222, 0.285339, 0.279793
After:
# Benchmark, evals, iterations, total, min, max, median
ComplexMemPool, 5, 1, 1.22586, 0.243831, 0.247076, 0.244596
```
The ComplexMemPool benchmark only checks doing addUnchecked and TrimToSize for 800 transactions. While this bench does a good job of hammering the relevant types of function, it doesn't test everything.
Subbing in 5000 transactions shows a that the advantage isn't completely wiped out by other asymptotic factors (this isn't the only bottleneck in growing the mempool), but it's only a bit proportionally slower (10.8%, 1.12x), which adds evidence that this will be a good change for performance minded users.
```
# Benchmark, evals, iterations, total, min, max, median
ComplexMemPool, 5, 1, 59.1321, 11.5919, 12.235, 11.7068
# Benchmark, evals, iterations, total, min, max, median
ComplexMemPool, 5, 1, 52.1307, 10.2641, 10.5206, 10.4306
```
I don't think it's possible to come up with an example of where a maplinks based design would have better performance, but it's something for reviewers to consider.
# Discussion
## Why maplinks in the first place?
I spoke with the author of mapLinks (sdaftuar) a while back, and my recollection from our conversation was that it was implemented because he did not know how to resolve the circular dependency at the time, and there was no other reason for making it a separate map.
## Is iterator_to weird?
iterator_to is expressly for this purpose, see https://www.boost.org/doc/libs/1_51_0/libs/multi_index/doc/tutorial/indices.html#iterator_to
> iterator_to provides a way to retrieve an iterator to an element from a pointer to the element, thus making iterators and pointers interchangeable for the purposes of element pointing (not so for traversal) in many situations. This notwithstanding, it is not the aim of iterator_to to promote the usage of pointers as substitutes for real iterators: the latter are specifically designed for handling the elements of a container, and not only benefit from the iterator orientation of container interfaces, but are also capable of exposing many more programming bugs than raw pointers, both at compile and run time. iterator_to is thus meant to be used in scenarios where access via iterators is not suitable or desireable:
>
> - Interoperability with preexisting APIs based on pointers or references.
> - Publication of pointer-based interfaces (for instance, when designing a C-compatible library).
> - The exposure of pointers in place of iterators can act as a type erasure barrier effectively decoupling the user of the code from the implementation detail of which particular container is being used. Similar techniques, like the famous Pimpl idiom, are used in large projects to reduce dependencies and build times.
> - Self-referencing contexts where an element acts upon its owner container and no iterator to itself is available.
In other words, iterator_to is the perfect tool for the job by the last reason given. Under the hood it should just be a simple pointer cast and have no major runtime overhead (depending on if the function call is inlined).
Edit by laanwj: removed at sign from the description
ACKs for top commit:
jonatack:
re-ACK 296be8f per `git range-diff ab338a19 3ba1665 296be8f`, sanity check gcc 10.2 debug build is clean.
hebasto:
re-ACK 296be8f58e02b39a58f017c52294aceed22c3ffd, only rebased since my [previous](https://github.com/bitcoin/bitcoin/pull/19478#pullrequestreview-482400727) review (verified with `git range-diff`).
Tree-SHA512: f5c30a4936fcde6ae32a02823c303b3568a747c2681d11f87df88a149f984a6d3b4c81f391859afbeb68864ef7f6a3d8779f74a58e3de701b3d51f78e498682e
2020-09-07 12:06:38 +02:00
} else if ( ! add & & entry - > GetMemPoolParents ( ) . erase ( * parent ) ) {
2015-07-15 20:47:45 +02:00
cachedInnerUsage - = memusage : : IncrementalDynamicUsage ( s ) ;
}
}
2015-10-02 23:19:55 +02:00
CFeeRate CTxMemPool : : GetMinFee ( size_t sizelimit ) const {
LOCK ( cs ) ;
if ( ! blockSinceLastRollingFeeBump | | rollingMinimumFeeRate = = 0 )
2017-09-30 18:07:44 +02:00
return CFeeRate ( llround ( rollingMinimumFeeRate ) ) ;
2015-10-02 23:19:55 +02:00
int64_t time = GetTime ( ) ;
if ( time > lastRollingFeeUpdate + 10 ) {
double halflife = ROLLING_FEE_HALFLIFE ;
if ( DynamicMemoryUsage ( ) < sizelimit / 4 )
halflife / = 4 ;
else if ( DynamicMemoryUsage ( ) < sizelimit / 2 )
halflife / = 2 ;
rollingMinimumFeeRate = rollingMinimumFeeRate / pow ( 2.0 , ( time - lastRollingFeeUpdate ) / halflife ) ;
lastRollingFeeUpdate = time ;
2017-01-16 19:32:51 +01:00
if ( rollingMinimumFeeRate < ( double ) incrementalRelayFee . GetFeePerK ( ) / 2 ) {
2015-10-02 23:19:55 +02:00
rollingMinimumFeeRate = 0 ;
2015-10-14 21:44:18 +02:00
return CFeeRate ( 0 ) ;
}
2015-10-02 23:19:55 +02:00
}
2017-09-30 18:07:44 +02:00
return std : : max ( CFeeRate ( llround ( rollingMinimumFeeRate ) ) , incrementalRelayFee ) ;
2015-10-02 23:19:55 +02:00
}
void CTxMemPool : : trackPackageRemoved ( const CFeeRate & rate ) {
AssertLockHeld ( cs ) ;
if ( rate . GetFeePerK ( ) > rollingMinimumFeeRate ) {
rollingMinimumFeeRate = rate . GetFeePerK ( ) ;
blockSinceLastRollingFeeBump = false ;
}
}
2017-06-02 00:47:58 +02:00
void CTxMemPool : : TrimToSize ( size_t sizelimit , std : : vector < COutPoint > * pvNoSpendsRemaining ) {
2022-05-05 16:24:51 +02:00
AssertLockHeld ( cs ) ;
2015-10-02 23:19:55 +02:00
unsigned nTxnRemoved = 0 ;
CFeeRate maxFeeRateRemoved ( 0 ) ;
2016-06-20 15:21:21 +02:00
while ( ! mapTx . empty ( ) & & DynamicMemoryUsage ( ) > sizelimit ) {
2016-03-05 06:49:47 +01:00
indexed_transaction_set : : index < descendant_score > : : type : : iterator it = mapTx . get < descendant_score > ( ) . begin ( ) ;
2015-10-02 23:19:55 +02:00
2015-10-19 11:40:25 +02:00
// We set the new mempool min fee to the feerate of the removed set, plus the
// "minimum reasonable fee rate" (ie some value under which we consider txn
// to have 0 fee). This way, we don't allow txn to enter mempool with feerate
// equal to txn which were removed with no block in between.
2015-11-19 17:18:28 +01:00
CFeeRate removed ( it - > GetModFeesWithDescendants ( ) , it - > GetSizeWithDescendants ( ) ) ;
2017-01-16 19:32:51 +01:00
removed + = incrementalRelayFee ;
2015-10-02 23:19:55 +02:00
trackPackageRemoved ( removed ) ;
maxFeeRateRemoved = std : : max ( maxFeeRateRemoved , removed ) ;
setEntries stage ;
CalculateDescendants ( mapTx . project < 0 > ( it ) , stage ) ;
nTxnRemoved + = stage . size ( ) ;
2015-10-22 02:44:00 +02:00
std : : vector < CTransaction > txn ;
if ( pvNoSpendsRemaining ) {
txn . reserve ( stage . size ( ) ) ;
2019-07-05 09:06:28 +02:00
for ( txiter iter : stage )
2016-09-27 13:25:42 +02:00
txn . push_back ( iter - > GetTx ( ) ) ;
2015-10-22 02:44:00 +02:00
}
2017-01-24 10:07:50 +01:00
RemoveStaged ( stage , false , MemPoolRemovalReason : : SIZELIMIT ) ;
2015-10-22 02:44:00 +02:00
if ( pvNoSpendsRemaining ) {
2019-07-05 09:06:28 +02:00
for ( const CTransaction & tx : txn ) {
for ( const CTxIn & txin : tx . vin ) {
2017-06-02 00:47:58 +02:00
if ( exists ( txin . prevout . hash ) ) continue ;
2017-06-21 03:18:09 +02:00
pvNoSpendsRemaining - > push_back ( txin . prevout ) ;
2015-10-22 02:44:00 +02:00
}
}
}
2015-10-02 23:19:55 +02:00
}
2019-05-22 23:51:39 +02:00
if ( maxFeeRateRemoved > CFeeRate ( 0 ) ) {
LogPrint ( BCLog : : MEMPOOL , " Removed %u txn, rolling minimum fee bumped to %s \n " , nTxnRemoved , maxFeeRateRemoved . ToString ( ) ) ;
}
2015-10-02 23:19:55 +02:00
}
2017-06-02 00:47:58 +02:00
2018-06-11 15:35:42 +02:00
uint64_t CTxMemPool : : CalculateDescendantMaximum ( txiter entry ) const {
// find parent with highest descendant count
std : : vector < txiter > candidates ;
setEntries counted ;
candidates . push_back ( entry ) ;
uint64_t maximum = 0 ;
while ( candidates . size ( ) ) {
txiter candidate = candidates . back ( ) ;
candidates . pop_back ( ) ;
if ( ! counted . insert ( candidate ) . second ) continue ;
Merge #19478: Remove CTxMempool::mapLinks data structure member
296be8f58e02b39a58f017c52294aceed22c3ffd Get rid of unused functions CTxMemPool::GetMemPoolChildren, CTxMemPool::GetMemPoolParents (Jeremy Rubin)
46d955d196043cc297834baeebce31ff778dff80 Remove mapLinks in favor of entry inlined structs with iterator type erasure (Jeremy Rubin)
Pull request description:
Currently we have a peculiar data structure in the mempool called maplinks. Maplinks job is to track the in-pool children and parents of each transaction. This PR can be primarily understood and reviewed as a simple refactoring to remove this extra data structure, although it comes with a nice memory and performance improvement for free.
Maplinks is particularly peculiar because removing it is not as simple as just moving it's inner structure to the owning CTxMempoolEntry. Because TxLinks (the class storing the setEntries for parents and children) store txiters to each entry in the mempool corresponding to the parent or child, it means that the TxLinks type is "aware" of the boost multiindex (mapTx) it's coming from, which is in turn, aware of the entry type stored in mapTx. Thus we used maplinks to store this entry associated data we in an entirely separate data structure just to avoid a circular type reference caused by storing a txiter inside a CTxMempoolEntry.
It turns out, we can kill this circular reference by making use of iterator_to multiindex function and std::reference_wrapper. This allows us to get rid of the maplinks data structure and move the ownership of the parents/child sets to the entries themselves.
The benefit of this good all around, for any of the reasons given below the change would be acceptable, and it doesn't make the code harder to reason about or worse in any respect (as far as I can tell, there's no tradeoff).
### Simpler ownership model
No longer having to consistency check that mapLinks did have records for our CTxMempoolEntry, impossible to have a mapLinks entry outlive or incorrectly die before a CTxMempoolEntry.
### Memory Usage
We get rid of a O(Transactions) sized map in the mempool, which is a long lived data structure.
### Performance
If you have a CTxMemPoolEntry, you immediately know the address of it's children/parents, rather than having to do a O(log(Transactions)) lookup via maplinks (which we do very often). We do it in *so many* places that a true benchmark has to look at a full running node, but it is easy enough to show an improvement in this case.
The ComplexMemPool shows a good coherence check that we see the expected result of it being 12.5% faster / 1.14x faster.
```
Before:
# Benchmark, evals, iterations, total, min, max, median
ComplexMemPool, 5, 1, 1.40462, 0.277222, 0.285339, 0.279793
After:
# Benchmark, evals, iterations, total, min, max, median
ComplexMemPool, 5, 1, 1.22586, 0.243831, 0.247076, 0.244596
```
The ComplexMemPool benchmark only checks doing addUnchecked and TrimToSize for 800 transactions. While this bench does a good job of hammering the relevant types of function, it doesn't test everything.
Subbing in 5000 transactions shows a that the advantage isn't completely wiped out by other asymptotic factors (this isn't the only bottleneck in growing the mempool), but it's only a bit proportionally slower (10.8%, 1.12x), which adds evidence that this will be a good change for performance minded users.
```
# Benchmark, evals, iterations, total, min, max, median
ComplexMemPool, 5, 1, 59.1321, 11.5919, 12.235, 11.7068
# Benchmark, evals, iterations, total, min, max, median
ComplexMemPool, 5, 1, 52.1307, 10.2641, 10.5206, 10.4306
```
I don't think it's possible to come up with an example of where a maplinks based design would have better performance, but it's something for reviewers to consider.
# Discussion
## Why maplinks in the first place?
I spoke with the author of mapLinks (sdaftuar) a while back, and my recollection from our conversation was that it was implemented because he did not know how to resolve the circular dependency at the time, and there was no other reason for making it a separate map.
## Is iterator_to weird?
iterator_to is expressly for this purpose, see https://www.boost.org/doc/libs/1_51_0/libs/multi_index/doc/tutorial/indices.html#iterator_to
> iterator_to provides a way to retrieve an iterator to an element from a pointer to the element, thus making iterators and pointers interchangeable for the purposes of element pointing (not so for traversal) in many situations. This notwithstanding, it is not the aim of iterator_to to promote the usage of pointers as substitutes for real iterators: the latter are specifically designed for handling the elements of a container, and not only benefit from the iterator orientation of container interfaces, but are also capable of exposing many more programming bugs than raw pointers, both at compile and run time. iterator_to is thus meant to be used in scenarios where access via iterators is not suitable or desireable:
>
> - Interoperability with preexisting APIs based on pointers or references.
> - Publication of pointer-based interfaces (for instance, when designing a C-compatible library).
> - The exposure of pointers in place of iterators can act as a type erasure barrier effectively decoupling the user of the code from the implementation detail of which particular container is being used. Similar techniques, like the famous Pimpl idiom, are used in large projects to reduce dependencies and build times.
> - Self-referencing contexts where an element acts upon its owner container and no iterator to itself is available.
In other words, iterator_to is the perfect tool for the job by the last reason given. Under the hood it should just be a simple pointer cast and have no major runtime overhead (depending on if the function call is inlined).
Edit by laanwj: removed at sign from the description
ACKs for top commit:
jonatack:
re-ACK 296be8f per `git range-diff ab338a19 3ba1665 296be8f`, sanity check gcc 10.2 debug build is clean.
hebasto:
re-ACK 296be8f58e02b39a58f017c52294aceed22c3ffd, only rebased since my [previous](https://github.com/bitcoin/bitcoin/pull/19478#pullrequestreview-482400727) review (verified with `git range-diff`).
Tree-SHA512: f5c30a4936fcde6ae32a02823c303b3568a747c2681d11f87df88a149f984a6d3b4c81f391859afbeb68864ef7f6a3d8779f74a58e3de701b3d51f78e498682e
2020-09-07 12:06:38 +02:00
const CTxMemPoolEntry : : Parents & parents = candidate - > GetMemPoolParentsConst ( ) ;
2018-06-11 15:35:42 +02:00
if ( parents . size ( ) = = 0 ) {
maximum = std : : max ( maximum , candidate - > GetCountWithDescendants ( ) ) ;
} else {
Merge #19478: Remove CTxMempool::mapLinks data structure member
296be8f58e02b39a58f017c52294aceed22c3ffd Get rid of unused functions CTxMemPool::GetMemPoolChildren, CTxMemPool::GetMemPoolParents (Jeremy Rubin)
46d955d196043cc297834baeebce31ff778dff80 Remove mapLinks in favor of entry inlined structs with iterator type erasure (Jeremy Rubin)
Pull request description:
Currently we have a peculiar data structure in the mempool called maplinks. Maplinks job is to track the in-pool children and parents of each transaction. This PR can be primarily understood and reviewed as a simple refactoring to remove this extra data structure, although it comes with a nice memory and performance improvement for free.
Maplinks is particularly peculiar because removing it is not as simple as just moving it's inner structure to the owning CTxMempoolEntry. Because TxLinks (the class storing the setEntries for parents and children) store txiters to each entry in the mempool corresponding to the parent or child, it means that the TxLinks type is "aware" of the boost multiindex (mapTx) it's coming from, which is in turn, aware of the entry type stored in mapTx. Thus we used maplinks to store this entry associated data we in an entirely separate data structure just to avoid a circular type reference caused by storing a txiter inside a CTxMempoolEntry.
It turns out, we can kill this circular reference by making use of iterator_to multiindex function and std::reference_wrapper. This allows us to get rid of the maplinks data structure and move the ownership of the parents/child sets to the entries themselves.
The benefit of this good all around, for any of the reasons given below the change would be acceptable, and it doesn't make the code harder to reason about or worse in any respect (as far as I can tell, there's no tradeoff).
### Simpler ownership model
No longer having to consistency check that mapLinks did have records for our CTxMempoolEntry, impossible to have a mapLinks entry outlive or incorrectly die before a CTxMempoolEntry.
### Memory Usage
We get rid of a O(Transactions) sized map in the mempool, which is a long lived data structure.
### Performance
If you have a CTxMemPoolEntry, you immediately know the address of it's children/parents, rather than having to do a O(log(Transactions)) lookup via maplinks (which we do very often). We do it in *so many* places that a true benchmark has to look at a full running node, but it is easy enough to show an improvement in this case.
The ComplexMemPool shows a good coherence check that we see the expected result of it being 12.5% faster / 1.14x faster.
```
Before:
# Benchmark, evals, iterations, total, min, max, median
ComplexMemPool, 5, 1, 1.40462, 0.277222, 0.285339, 0.279793
After:
# Benchmark, evals, iterations, total, min, max, median
ComplexMemPool, 5, 1, 1.22586, 0.243831, 0.247076, 0.244596
```
The ComplexMemPool benchmark only checks doing addUnchecked and TrimToSize for 800 transactions. While this bench does a good job of hammering the relevant types of function, it doesn't test everything.
Subbing in 5000 transactions shows a that the advantage isn't completely wiped out by other asymptotic factors (this isn't the only bottleneck in growing the mempool), but it's only a bit proportionally slower (10.8%, 1.12x), which adds evidence that this will be a good change for performance minded users.
```
# Benchmark, evals, iterations, total, min, max, median
ComplexMemPool, 5, 1, 59.1321, 11.5919, 12.235, 11.7068
# Benchmark, evals, iterations, total, min, max, median
ComplexMemPool, 5, 1, 52.1307, 10.2641, 10.5206, 10.4306
```
I don't think it's possible to come up with an example of where a maplinks based design would have better performance, but it's something for reviewers to consider.
# Discussion
## Why maplinks in the first place?
I spoke with the author of mapLinks (sdaftuar) a while back, and my recollection from our conversation was that it was implemented because he did not know how to resolve the circular dependency at the time, and there was no other reason for making it a separate map.
## Is iterator_to weird?
iterator_to is expressly for this purpose, see https://www.boost.org/doc/libs/1_51_0/libs/multi_index/doc/tutorial/indices.html#iterator_to
> iterator_to provides a way to retrieve an iterator to an element from a pointer to the element, thus making iterators and pointers interchangeable for the purposes of element pointing (not so for traversal) in many situations. This notwithstanding, it is not the aim of iterator_to to promote the usage of pointers as substitutes for real iterators: the latter are specifically designed for handling the elements of a container, and not only benefit from the iterator orientation of container interfaces, but are also capable of exposing many more programming bugs than raw pointers, both at compile and run time. iterator_to is thus meant to be used in scenarios where access via iterators is not suitable or desireable:
>
> - Interoperability with preexisting APIs based on pointers or references.
> - Publication of pointer-based interfaces (for instance, when designing a C-compatible library).
> - The exposure of pointers in place of iterators can act as a type erasure barrier effectively decoupling the user of the code from the implementation detail of which particular container is being used. Similar techniques, like the famous Pimpl idiom, are used in large projects to reduce dependencies and build times.
> - Self-referencing contexts where an element acts upon its owner container and no iterator to itself is available.
In other words, iterator_to is the perfect tool for the job by the last reason given. Under the hood it should just be a simple pointer cast and have no major runtime overhead (depending on if the function call is inlined).
Edit by laanwj: removed at sign from the description
ACKs for top commit:
jonatack:
re-ACK 296be8f per `git range-diff ab338a19 3ba1665 296be8f`, sanity check gcc 10.2 debug build is clean.
hebasto:
re-ACK 296be8f58e02b39a58f017c52294aceed22c3ffd, only rebased since my [previous](https://github.com/bitcoin/bitcoin/pull/19478#pullrequestreview-482400727) review (verified with `git range-diff`).
Tree-SHA512: f5c30a4936fcde6ae32a02823c303b3568a747c2681d11f87df88a149f984a6d3b4c81f391859afbeb68864ef7f6a3d8779f74a58e3de701b3d51f78e498682e
2020-09-07 12:06:38 +02:00
for ( const CTxMemPoolEntry & i : parents ) {
candidates . push_back ( mapTx . iterator_to ( i ) ) ;
2018-06-11 15:35:42 +02:00
}
}
}
return maximum ;
}
void CTxMemPool : : GetTransactionAncestry ( const uint256 & txid , size_t & ancestors , size_t & descendants ) const {
2016-12-20 13:12:46 +01:00
LOCK ( cs ) ;
auto it = mapTx . find ( txid ) ;
2018-06-11 15:35:42 +02:00
ancestors = descendants = 0 ;
if ( it ! = mapTx . end ( ) ) {
ancestors = it - > GetCountWithAncestors ( ) ;
descendants = CalculateDescendantMaximum ( it ) ;
}
2016-12-20 13:12:46 +01:00
}
2019-05-01 16:06:11 +02:00
bool CTxMemPool : : IsLoaded ( ) const
{
LOCK ( cs ) ;
return m_is_loaded ;
}
void CTxMemPool : : SetIsLoaded ( bool loaded )
{
LOCK ( cs ) ;
m_is_loaded = loaded ;
}