Merge #5962: fix: deadlock over cs_main and contributionsCacheCs in dkssessionmgr

ded1b5a3df fix: deadlock over cs_main and contributionsCacheCs in dkssessionmgr (Konstantin Akimov)

Pull request description:

  ## Issue being fixed or feature implemented
  **It fixes rpc failure: "Work queue depth exceeded"**

  As I checked on running `dashd` in deadlock condition:
  Thread 78 is a thread that keep `cs_main`:
  ```
  #14 0x0000aaaad1f8d604 in BuildSimplifiedMNListDiff () at evo/simplifiedmns.cpp:364
  ```
  but it is locked by `contributionsCacheCs`
  ```
  #8  llmq::CDKGSessionManager::GetVerifiedContributions () at llmq/dkgsessionmgr.cpp:392
  ```

  On other hand, `contributionsCacheCs` is blocked by Thread 59
  ```
  #17 0x0000aaaad1ba1940 in llmq::CDKGSessionManager::GetVerifiedContributions () at llmq/dkgsessionmgr.cpp:393
  ```
  and it makes circuit lock by waiting `cs_main` in
  ```
  #9  ReadBlockFromDisk () at node/blockstorage.cpp:75
  ```

  See https://github.com/dashpay/dash-issues/issues/69 for more details

  Seems introduced there: dashpay/dash#3911

  ## What was done?
  Deadlock is removed by reducing scope of mutex

  ## How Has This Been Tested?
  I reviewed 2 different servers which have status `work queue exceeded`, both have same deadlock, so, this patch should fix this issue. Once this fix is merged, we can test it on testnet.

  ## Breaking Changes
  N/A

  ## Checklist:
  - [x] I have performed a self-review of my own code
  - [x] I have commented my code, particularly in hard-to-understand areas
  - [ ] I have added or updated relevant unit/integration/functional/e2e tests
  - [ ] I have made corresponding changes to the documentation
  - [x] I have assigned this pull request to a milestone

Top commit has no ACKs.

Tree-SHA512: 4fe5c03c464ee6934fb927b897f007b65a8995723196edaffdae067edee7067da151130d4c4bac47d3418fdad5c8e130682f42d7ef9c044380a8c8fff78ee008
This commit is contained in:
pasta 2024-04-02 09:37:58 -05:00
parent ab7bc12e81
commit fd46c4ccda
No known key found for this signature in database
GPG Key ID: 52527BEDABE87984

View File

@ -389,7 +389,6 @@ void CDKGSessionManager::WriteEncryptedContributions(Consensus::LLMQType llmqTyp
bool CDKGSessionManager::GetVerifiedContributions(Consensus::LLMQType llmqType, const CBlockIndex* pQuorumBaseBlockIndex, const std::vector<bool>& validMembers, std::vector<uint16_t>& memberIndexesRet, std::vector<BLSVerificationVectorPtr>& vvecsRet, std::vector<CBLSSecretKey>& skContributionsRet) const
{
LOCK(contributionsCacheCs);
auto members = utils::GetAllQuorumMembers(llmqType, pQuorumBaseBlockIndex);
memberIndexesRet.clear();
@ -398,6 +397,9 @@ bool CDKGSessionManager::GetVerifiedContributions(Consensus::LLMQType llmqType,
memberIndexesRet.reserve(members.size());
vvecsRet.reserve(members.size());
skContributionsRet.reserve(members.size());
// NOTE: the `cs_main` should not be locked under scope of `contributionsCacheCs`
LOCK(contributionsCacheCs);
for (const auto i : irange::range(members.size())) {
if (validMembers[i]) {
const uint256& proTxHash = members[i]->proTxHash;