fix: improve concurrency handling of users#2012
Closed
MarcosNicolau wants to merge 18 commits intostagingfrom
Closed
fix: improve concurrency handling of users#2012MarcosNicolau wants to merge 18 commits intostagingfrom
MarcosNicolau wants to merge 18 commits intostagingfrom
Conversation
instead wait until it finishes and update the last block number with the receipt
MarcosNicolau
commented
Jul 1, 2025
MauroToscano
reviewed
Jul 2, 2025
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Description
Previously, we were performing proof pre-verification as soon as a new proof message arrived. This verification could take a few hundred milliseconds, and since it was done before even validating the message the lock would be held far too long.
We’ve moved the proof verification step after performing basic validation checks, such as nonce, max fee, and replacement.s This way, only well-formed messages proceed to the verification step.
However, this introduced a new problem: proof verification still needs to run with access to shared state
batch_state, and locking this state for the duration of the verification step could block all other processing for too much time. To solve this, we introduced a mutex per user:user_proof_processing_mutexes: this allows us to process proofs in parallel between users by not locking the wholebatch_state.This new mutex setup, means that the batch building and proof processing can run in parallel. Before, the proof message handler kept the
batch_state_lockfor the entire processing, making it more sequential. Now, they’ll only block each other when they need access thebatch_state.As long as we ensure the batch_state_lock is held while modifying shared state (adding/removing from the batch queue + updating the user state), things are safe. For example, if a replacement is validated but the proof is no longer in the state by the time we lock and check (because it was removed by the batch builder), we can just return an error to the user.
The tldr is: there's no need to synchronize the two processes explicitly, the
batch_state_lockalready ensures consistency. At worst, we'll hit a race that results in an error in one side, which might lead to a poorer ux, but I prever to avoid unnecessary mutex complexity.Another improvement was introduced. The
batch_postingmutex was used to synchronize the tasks spawned to submit a batch when a new block arrived. This could be solved much easily by simply not spawning a task and awaiting it.Finally, other improvements/refactors have been improved:
Moving forward
There are two other checks to ensure a better prevention of ddos:
n%with respect to the previous one.mproofs at least.How to test
To test this, I would suggest doing it in two. One with a few proofs so you can see the logs how everything is synchronized and another one with a high load of proofs to make sure it can handle it and nothing breaks.
Set up
seconds_per_slotinnetwork_params.yamland set it to2seconds. This is to create more frequent dependencies between the batch building process and the proof processing.Few proofs test
The idea of this test is to see the mutex behaviour step by step. For this I suggest:
verify_prooffunction incrates/batcher/src/lib.rsBig load test
For this test, we use the task-sender to spam the batcher with different wallets and make sure it doesn't lock and continues to build batches.
crates/task-sender/wallets/devnetand copy the 10 rich accounts of ethereum-package:# Modify this params as you wish make task_sender_send_infinite_proofs_devnet BURST_SIZE=50 BURST_TIME_SECS=1Type of change
Checklist
testnet, everything else tostaging