3284e6c09a84e9557ec72723ad636053d3ef7122 scripts: search for next position of magic bytes rather than fail (Tim Akinbo)
Pull request description:
When using the `linearize-data.py` contrib script to export block data, there are edge cases where the script fails with an `Invalid magic: 00000000` error. This error occurs due to the presence of padding bytes that occasionally appears between consecutive blocks in the block data file.
There's an ongoing conversation about this in #14986. sipa also admitted that it is a bug in #5028. Fortunately, this is not an issue in bitcoin core as it handles this type of situation gracefully and so no fix in bitcoin core is required.
This PR is an improvement on how the script handles these "invalid magic bytes". Rather than failing, this patch allows the script to search for the next occurrence of the magic bytes and then starts reading the block from there.
ACKs for top commit:
laanwj:
ACK 3284e6c09a84e9557ec72723ad636053d3ef7122
Tree-SHA512: 18067ae0b4b62e822dfc558a86439ad6acaf939b98479e38e8e4248536574643b26eb48e96ec7139375c88b42cbe7705a64deb13a3c239e16025a6aad3d69bfa
4de11a3682 Remove Python 2 import workarounds (practicalswift)
Pull request description:
Remove Python 2 import workarounds.
As noted by @jnewbery in https://github.com/bitcoin/bitcoin/pull/14903#discussion_r241396925:
> This exception handling is a vestige from when github-merge.py supported Python 2 and Python 3. We only support Python 3 now so we should be able to remove it entirely and just import from urllib.request.
Tree-SHA512: e0d21e6299dd62fb669ad95cbd3d19f7c803195fd336621aac72fd10ddc7431d90443831072a2e1eb2fc880d1d88eb7c3e2ead3da59f545f6db07d349af98fb3
317fb96de9c6257972f1213b4ef2c3fe87dde99f Add search for first blk file with pruned node (Rjected)
Pull request description:
<!--
*** Please remove the following help text before submitting: ***
Pull requests without a rationale and clear improvement may be closed
immediately.
-->
<!--
Please provide clear motivation for your patch and explain how it improves
Bitcoin Core user experience or Bitcoin Core developer experience
significantly:
* Any test improvements or new tests that improve coverage are always welcome.
* All other changes should have accompanying unit tests (see `src/test/`) or
functional tests (see `test/`). Contributors should note which tests cover
modified code. If no tests exist for a region of modified code, new tests
should accompany the change.
* Bug fixes are most welcome when they come with steps to reproduce or an
explanation of the potential issue as well as reasoning for the way the bug
was fixed.
* Features are welcome, but might be rejected due to design or scope issues.
If a feature is based on a lot of dependencies, contributors should first
consider building the system outside of Bitcoin Core, if possible.
* Refactoring changes are only accepted if they are required for a feature or
bug fix or otherwise improve developer experience significantly. For example,
most "code style" refactoring changes require a thorough explanation why they
are useful, what downsides they have and why they *significantly* improve
developer experience or avoid serious programming bugs. Note that code style
is often a subjective matter. Unless they are explicitly mentioned to be
preferred in the [developer notes](/doc/developer-notes.md), stylistic code
changes are usually rejected.
-->
When bitcoind is running in pruned mode, producing a hashlist with `./linearize-hashes.py linearize.cfg > hashlist.txt` and then executing `linearize-data.py linearize.cfg` will produce:
```
Read 313001 hashes
Input file /home/dan/.bitcoin/blocks/blk00000.dat
Premature end of block data
```
This happens because `linearize-data` starts by attempting to process `blk00000.dat` regardless of whether or not `blk00000.dat` actually exists - this may not be the case if working with a pruned node.
This PR adds a function which finds the first block file that does exist, and calls that function when the `BlockDataCopier` is initialized.
This is a refactor of #16431.
<!--
Bitcoin Core has a thorough review process and even the most trivial change
needs to pass a lot of eyes and requires non-zero or even substantial time
effort to review. There is a huge lack of active reviewers on the project, so
patches often sit for a long time.
-->
ACKs for top commit:
darosior:
ACK 317fb96de9c6257972f1213b4ef2c3fe87dde99f
laanwj:
Code review ACK 317fb96de9c6257972f1213b4ef2c3fe87dde99f
theStack:
Code review ACK 317fb96de9
Tree-SHA512: fc8014282df6cfe7b267e64db8ce7d82b86b758c302fbfea4a3c39b62d93512f5c2e31a0de4e9c5ec18fc0268c917f011257d37b45afaef6033eec90e4aa585f
643aad17fa Enable additional flake8 rules (practicalswift)
f020aca297 Minor Python cleanups to make flake8 pass with the new rules enabled (practicalswift)
Pull request description:
Enabled rules:
```
* E242: tab after ','
* E266: too many leading '#' for block comment
* E401: multiple imports on one line
* E402: module level import not at top of file
* E701: multiple statements on one line (colon)
* E901: SyntaxError: invalid syntax
* E902: TokenError: EOF in multi-line string
* F821: undefined name 'Foo'
* W293: blank line contains whitespace
* W606: 'async' and 'await' are reserved keywords starting with Python 3.7
```
Note to reviewers:
* In general we don't allow whitespace cleanups to existing code, but in order to allow for enabling Travis checking for these rules a few smaller whitespace cleanups had to made as part of this PR.
* Use [this `?w=1` link](https://github.com/bitcoin/bitcoin/pull/12987/files?w=1) to show a diff without whitespace changes.
Before this commit:
```
$ flake8 -qq --statistics --ignore=B,C,E,F,I,N,W --select=E112,E113,E115,E116,E125,E131,E133,E223,E224,E242,E266,E271,E272,E273,E274,E275,E304,E306,E401,E402,E502,E701,E702,E703,E714,E721,E741,E742,E743,F401,E901,E902,F402,F404,F406,F407,F601,F602,F621,F622,F631,F701,F702,F703,F704,F705,F706,F707,F811,F812,F821,F822,F823,F831,F841,W292,W293,W504,W601,W602,W603,W604,W605,W606 .
5 E266 too many leading '#' for block comment
4 E401 multiple imports on one line
6 E402 module level import not at top of file
5 E701 multiple statements on one line (colon)
1 F812 list comprehension redefines 'n' from line 159
4 F821 undefined name 'ConnectionRefusedError'
28 W293 blank line contains whitespace
```
After this commit:
```
$ flake8 -qq --statistics --ignore=B,C,E,F,I,N,W --select=E112,E113,E115,E116,E125,E131,E133,E223,E224,E242,E266,E271,E272,E273,E274,E275,E304,E306,E401,E402,E502,E701,E702,E703,E714,E721,E741,E742,E743,F401,E901,E902,F402,F404,F406,F407,F601,F602,F621,F622,F631,F701,F702,F703,F704,F705,F706,F707,F811,F812,F821,F822,F823,F831,F841,W292,W293,W504,W601,W602,W603,W604,W605,W606 .
$
```
Tree-SHA512: fc7d5e752298a50d4248afc620ee2c173135b4ca008e48e02913ac968e5a24a5fd5396926047ec62f1d580d537434ccae01f249bb2f3338fa59dc630bf97ca7a
Signed-off-by: pasta <pasta@dashboost.org>
It appears bitcoin has already done this previously(seemingly forever), however, we for some reason have tabs. This was causing the linter to fail because a backport touched lines which resulted in linter yelling at me for adding lines beginning with tabs. To fix this, replacing all tabs with spaces in these two files.
-BEGIN VERIFY SCRIPT-
sed -i 's/\t/ /g' contrib/linearize/linearize-*.py
-END VERIFY SCRIPT-
Signed-off-by: pasta <pasta@dashboost.org>
c8176b3cc7556d7bcec39a55ae4d6ba16453baaa Add linter: Make sure we explicitly open all text files using UTF-8 or ASCII encoding in Python (practicalswift)
634bd970013eca90f4b4c1f9044eec8c97ba62c2 Explicitly specify encoding when opening text files in Python code (practicalswift)
Pull request description:
Add linter: Make sure we explicitly open all text files using UTF-8 encoding in Python.
As requested by @laanwj in #13440.
Tree-SHA512: 1651c00fe220ceb273324abd6703aee504029b96c7ef0e3029145901762c733c9b9d24927da281394fd4681a5bff774336c04eed01fafea997bb32192c334c06
Signed-off-by: pasta <pasta@dashboost.org>
# Conflicts:
# contrib/devtools/circular-dependencies.py
# contrib/linearize/linearize-data.py
# contrib/linearize/linearize-hashes.py
# contrib/seeds/generate-seeds.py
# contrib/verify-commits/verify-commits.py
# test/functional/multiwallet.py
# test/functional/notifications.py
# test/functional/test_runner.py
# test/util/rpcauth-test.py
- Add a space after the fixed string prepended to file names when input or
output file changes
- Clarify the error message when the genesis block is not found in the
hash list (...why do we have this at all?)
Make it possible to read blocks in any order. This will be required
after headers-first (#4468), so should be merged before that.
- Read block header. For expected blocks, continue, else skip.
- For in-order blocks: copy block contents directly. Write prior
out-of-order blocks if this connects a consecutive span.
- For out-of-order blocks, store extents of block data for later
retrieval. Cache out-of-order blocks in memory up to 100MB
(configurable).
This was typically ensured implicitly by virtue of normal bitcoind
operation. Adding an explicit check provides a stronger guarantee, and
it is cheap to add.
Break into two steps:
* Generate hash list
* Build data file(s) from local bitcoind blocks/ directory.
This supports building one large bootstrap.dat, or multiple
smaller blocks/blkNNNNN.dat files.