From 516cfb54c574952d7bf015c80fa03603e64f3719 Mon Sep 17 00:00:00 2001 From: Jonas Schnelli Date: Wed, 2 Sep 2020 09:09:08 +0200 Subject: [PATCH] Merge #14687: zmq: enable tcp keepalive c276df775914e4e42993c76e172ef159e3b830d4 zmq: enable tcp keepalive (mruddy) Pull request description: This addresses https://github.com/bitcoin/bitcoin/issues/12754. These changes enable node operators to address the silent dropping (by network middle boxes) of long-lived low-activity ZMQ TCP connections via further operating system level TCP keepalive configuration. For example, ZMQ sockets that publish block hashes can be affected in this way due to the length of time it sometimes takes between finding blocks (e.g.- sometimes more than an hour). Prior to this patch, operating system level TCP keepalive configurations would not take effect since the SO_KEEPALIVE option was not enabled on the underlying socket. There are additional ZMQ socket options related to TCP keepalive that can be set. However, I decided not to implement those options in this changeset because doing so would require adding additional bitcoin node configuration options, and would not yield a better outcome. I preferred a small, easily reviewable patch that doesn't add a bunch of new config options, with the tradeoff that the fine tuning would have to be done via well-documented operating system specific configurations. I tested this patch by running a node with: `./src/qt/bitcoin-qt -regtest -txindex -datadir=/tmp/node -zmqpubhashblock=tcp://127.0.0.1:28332 &` and connecting to it with: `python3 ./contrib/zmq/zmq_sub.py` Without these changes, `ss -panto | grep 28332 | grep ESTAB | grep bitcoin` will report no keepalive timer information. With these changes, the output from the prior command will show keepalive timer information consistent with the configuration at the time of connection establishment, e.g.-: `timer:(keepalive,119min,0)`. I also tested with a non-TCP transport and did not witness any adverse effects: `./src/qt/bitcoin-qt -regtest -txindex -datadir=/tmp/node -zmqpubhashblock=ipc:///tmp/bitcoin.block &` ACKs for top commit: adamjonas: Just to summarize for those looking to review - as of c276df775914e4e42993c76e172ef159e3b830d4 there are 3 tACKs (n-thumann, Haaroon, and dlogemann), 1 "looks good to me" (laanwj) with no NACKs or any show-stopping concerns raised. jonasschnelli: utACK c276df775914e4e42993c76e172ef159e3b830d4 Tree-SHA512: b884c2c9814e97e666546a7188c48f9de9541499a11a934bd48dd16169a900c900fa519feb3b1cb7e9915fc7539aac2829c7806b5937b4e1409b4805f3ef6cd1 --- doc/zmq.md | 14 ++++++++++++++ src/zmq/zmqpublishnotifier.cpp | 8 ++++++++ 2 files changed, 22 insertions(+) diff --git a/doc/zmq.md b/doc/zmq.md index b7b93aaf26..ee8fd82430 100644 --- a/doc/zmq.md +++ b/doc/zmq.md @@ -127,6 +127,20 @@ ZMQ_SUBSCRIBE option set to one or either of these prefixes (for instance, just `hash`); without doing so will result in no messages arriving. Please see [`contrib/zmq/zmq_sub.py`](/contrib/zmq/zmq_sub.py) for a working example. +The ZMQ_PUB socket's ZMQ_TCP_KEEPALIVE option is enabled. This means that +the underlying SO_KEEPALIVE option is enabled when using a TCP transport. +The effective TCP keepalive values are managed through the underlying +operating system configuration and must be configured prior to connection establishment. + +For example, when running on GNU/Linux, one might use the following +to lower the keepalive setting to 10 minutes: + +sudo sysctl -w net.ipv4.tcp_keepalive_time=600 + +Setting the keepalive values appropriately for your operating environment may +improve connectivity in situations where long-lived connections are silently +dropped by network middle boxes. + ## Remarks From the perspective of dashd, the ZeroMQ socket is write-only; PUB diff --git a/src/zmq/zmqpublishnotifier.cpp b/src/zmq/zmqpublishnotifier.cpp index 62b98c1502..1da8401fa0 100644 --- a/src/zmq/zmqpublishnotifier.cpp +++ b/src/zmq/zmqpublishnotifier.cpp @@ -117,6 +117,14 @@ bool CZMQAbstractPublishNotifier::Initialize(void *pcontext) return false; } + const int so_keepalive_option {1}; + rc = zmq_setsockopt(psocket, ZMQ_TCP_KEEPALIVE, &so_keepalive_option, sizeof(so_keepalive_option)); + if (rc != 0) { + zmqError("Failed to set SO_KEEPALIVE"); + zmq_close(psocket); + return false; + } + rc = zmq_bind(psocket, address.c_str()); if (rc != 0) {