Merge #19968: doc: clarify CRollingBloomFilter size estimate

d9141a0002bb508b2e94e206a1bd28ef8f97ffde doc: clarify CRollingBloomFilter size estimate (Anthony Towns)

Pull request description:

  Based on #19130, this change improves the comment for `CRollingBloomFilter` in `bloom.h`:

  - Give examples to illustrate the heuristic "1.8 bytes per element per factor 0.1 of false positive rate"
  - Add some Python code which can be copy/pasted for convenient filter size calculation (in an interpreter)
  - Reconcile the newly added code with the existing approximation

ACKs for top commit:
  laanwj:
    ACK d9141a0002bb508b2e94e206a1bd28ef8f97ffde

Tree-SHA512: e7138b3c531883a750ead06368975c750863fde7ef6f2633b137eca011079226e9205316217322014399fba05a48f294c788dd700bb7d479c58fe1f23e40419f
This commit is contained in:
Wladimir J. van der Laan 2020-11-19 11:44:25 +01:00 committed by pasta
parent f2f4a8f085
commit a9b5723556

View File

@ -109,7 +109,18 @@ public:
* insert()'ed ... but may also return true for items that were not inserted. * insert()'ed ... but may also return true for items that were not inserted.
* *
* It needs around 1.8 bytes per element per factor 0.1 of false positive rate. * It needs around 1.8 bytes per element per factor 0.1 of false positive rate.
* (More accurately: 3/(log(256)*log(2)) * log(1/fpRate) * nElements bytes) * For example, if we want 1000 elements, we'd need:
* - ~1800 bytes for a false positive rate of 0.1
* - ~3600 bytes for a false positive rate of 0.01
* - ~5400 bytes for a false positive rate of 0.001
*
* If we make these simplifying assumptions:
* - logFpRate / log(0.5) doesn't get rounded or clamped in the nHashFuncs calculation
* - nElements is even, so that nEntriesPerGeneration == nElements / 2
*
* Then we get a more accurate estimate for filter bytes:
*
* 3/(log(256)*log(2)) * log(1/fpRate) * nElements
*/ */
class CRollingBloomFilter class CRollingBloomFilter
{ {