Conversation
There was a problem hiding this comment.
Claude Code Review
This repository is configured for manual code reviews. Comment @claude review to trigger a review and subscribe this PR to future pushes, or @claude review once for a one-time review.
Tip: disable this comment in your organization's Code Review settings.
There was a problem hiding this comment.
Pull request overview
Introduces a fork-gated “deterministic state sync” flow for Bor by resolving a cutoff-time → Heimdall height, then querying state-sync events “visible at” that fixed Heimdall height to avoid non-determinism from querying a moving Heimdall head.
Changes:
- Added Bor config fork flag
DeterministicStateSyncBlockand plumbing to include it in fork ID gathering. - Extended Heimdall client interfaces (HTTP/gRPC/in-process app + failover wrapper) with
GetBlockHeightByTimeandStateSyncEventsAtHeight, and switchedBor.CommitStatesto use them post-fork. - Added/updated tests and mocks to cover fork switching and URL formatting.
Reviewed changes
Copilot reviewed 18 out of 19 changed files in this pull request and generated 2 comments.
Show a summary per file
| File | Description |
|---|---|
consensus/bor/bor.go |
Switches state sync fetching logic to deterministic (height-anchored) mode post-fork. |
consensus/bor/heimdall.go |
Extends IHeimdallClient with the new deterministic state-sync APIs. |
consensus/bor/heimdall/client.go |
Implements the new endpoints for the HTTP Heimdall client + URL builders. |
consensus/bor/heimdallgrpc/state_sync.go |
Implements gRPC equivalents for height-anchored state sync and height-by-time lookup. |
consensus/bor/heimdallapp/state_sync.go |
Implements in-process HeimdallApp equivalents using keeper/query server. |
consensus/bor/heimdall/failover_client.go |
Extends failover endpoint interface and wrapper calls for the new APIs. |
consensus/bor/heimdall/metrics.go |
Adds request-type metrics for the new endpoints. |
consensus/bor/heimdall/state_sync_url_test.go |
Adds URL-format tests for the new HTTP paths (and validates RFC3339 formatting). |
params/config.go |
Adds DeterministicStateSyncBlock and IsDeterministicStateSync. |
params/config_test.go |
Unit tests for IsDeterministicStateSync. |
core/forkid/forkid.go |
Includes the new fork block in Bor fork gathering for fork ID computation. |
consensus/bor/bor_test.go |
Adds tests verifying pre-/post-fork behavior and failure semantics post-fork. |
tests/bor/mocks/IHeimdallClient.go |
Regenerated gomock for updated IHeimdallClient. |
consensus/bor/span_store_test.go |
Updates test mocks to satisfy the extended Heimdall client interface. |
eth/handler_bor_test.go |
Updates test mock to satisfy the extended Heimdall client interface. |
eth/ethconfig/config_test.go |
Updates test mock to satisfy the extended Heimdall client interface. |
consensus/bor/heimdall/failover_client_test.go |
Updates mock endpoint used by failover tests to satisfy the extended interface. |
go.mod |
Pins/upgrades github.com/0xPolygon/heimdall-v2 to a newer pseudo-version (via replace). |
go.sum |
Updates checksums for the new heimdall-v2 version. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
|
@claude review |
|
Claude finished @marcello33's task in 1m 50s —— View job Code Review: Deterministic State Syncs
Overall this is a well-structured PR. The fork-gating approach is correct, the error handling post-fork is properly strict (returning errors instead of silently swallowing), and the test coverage for the fork switch is good. Below are findings organized by severity. HIGH: HTTP client does client-side filtering that gRPC/app clients do notIn if e.Id >= fromID && e.RecordTime.Before(time.Unix(toTime, 0)) {But the gRPC implementation ( This means the three transport implementations have different behavior if the Heimdall endpoint returns events outside the expected range. For a deterministic state sync feature, all paths must produce identical results. Either:
The existing MEDIUM:
|
|
test |
Codecov Report❌ Patch coverage is ❌ Your patch check has failed because the patch coverage (24.71%) is below the target coverage (90.00%). You can increase the patch coverage or adjust the target coverage. Additional details and impacted files@@ Coverage Diff @@
## develop #2177 +/- ##
===========================================
- Coverage 51.92% 51.89% -0.03%
===========================================
Files 884 884
Lines 155431 155607 +176
===========================================
+ Hits 80701 80752 +51
- Misses 69524 69647 +123
- Partials 5206 5208 +2
... and 20 files with indirect coverage changes
🚀 New features to boost your workflow:
|
|
@claude addressed your points.
Re-review. |
|
Claude finished @marcello33's task in 1m 14s —— View job Re-review: Deterministic State Syncs
All six items from the previous review have been addressed correctly:
Remaining finding: Sort inconsistency across transportsThe HTTP client sorts events after fetching ( sort.SliceStable(eventRecords, func(i, j int) bool {
return eventRecords[i].ID < eventRecords[j].ID
})But neither the gRPC client ( Either all three should sort, or none should (if the server guarantees ordering). Given this feature targets determinism, I'd recommend all three apply the same sort for defense-in-depth. Everything else looks good. The core design is sound and the error handling is correct. |
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 18 out of 19 changed files in this pull request and generated 3 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
|
@claude addressed your points. Review once more and resolve the related open conversations if the comments are addressed. |
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 18 out of 19 changed files in this pull request and generated 2 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| if c.config.IsDeterministicStateSync(header.Number) { | ||
| log.Info("Using deterministic state sync", "cutoff", to.Unix()) | ||
|
|
||
| eventRecords, err = c.HeimdallClient.StateSyncEventsByTime(c.ctx, from, to.Unix()) | ||
| if err != nil { | ||
| // Match pre-fork resilience: log and return empty on transient errors. | ||
| // Determinism is preserved because all validators independently skip | ||
| // the same sprint, and events will be picked up in the next sprint. | ||
| log.Error("Error fetching deterministic state sync events", "fromID", from, "to", to.Unix(), "err", err) | ||
|
|
||
| stateSyncs := make([]*types.StateSyncData, 0) | ||
| return stateSyncs, nil | ||
| return make([]*types.StateSyncData, 0), nil |
There was a problem hiding this comment.
Post-fork deterministic mode currently logs and returns an empty state-sync set when StateSyncEventsByTime errors. That outcome depends on each node’s local Heimdall connectivity; if some validators succeed and others hit a transient error, they will apply different state updates and compute different state roots for the same block. For consensus-critical finalization, this should fail block processing (return the error) or use a deterministic fallback that guarantees all nodes derive the same result (e.g., only treat specific, globally-observable errors as skippable).
There was a problem hiding this comment.
The pre-fork path has the same behavior (log + return empty). I'll probably keep this comment open and discuss with the team.
Anyway, the risk is mitigated by multiple factors:
- FetchWithRetry retries aggressively
- waitUntilHeimdallIsSynced gates CommitStates until heimdall is caught up
- CommitStates only runs at sprint boundaries
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 18 out of 19 changed files in this pull request and generated 2 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
There was a problem hiding this comment.
Additional findings (outside current diff — PR may have been updated during review):
-
🟡
consensus/bor/heimdall/failover_client.go:257-271— Thedefer cancel()inside the nested for loop incascadeClients(failover_client.go:261-265) is the classic Go "defer in loop" anti-pattern introduced by this PR — the original code calledcancel()immediately after eachfn()invocation, correctly limiting each child context's lifetime, but the new code defers all cancels untilcascadeClientsreturns. As a result, each loop iteration that satisfies the\!hasDeadlinecondition accumulates a deferredcancelfromcontext.WithTimeout, meaning timer goroutines from earlier cascade attempts run concurrently with later ones until the function exits. The fix is to callcancel()directly afterfn()rather than usingdefer.Extended reasoning...
What the bug is and how it manifests
In
cascadeClients(failover_client.go), the PR changed from the original pattern of callingcancel()immediately after eachfn()call to usingdefer cancel()inside the nested for loop. The relevant code (lines 257-271) iterates over candidate endpoint passes and, for each candidate that satisfies\!hasDeadline, creates a child context viacontext.WithTimeout(ctx, f.attemptTimeout)and then defers the corresponding cancel:defer cancel(). In Go,deferstatements are function-scoped, not block- or loop-scoped, so each iteration appends a new cancel to the deferred list rather than calling it at the end of the loop body.The specific code path that triggers it
When
cascadeClientsiterates through multiple failover candidates — three passes ofcooled,uncooled, andunhealthyendpoints — each candidate that lacks a parent deadline creates a newcontext.WithTimeout. The original code in the diff shows the pre-PR pattern withcancel()called right afterfn(subCtx, f.clients[i]). The new code replaces this withdefer cancel(), which defers execution untilcascadeClientsreturns entirely.Why existing code does not prevent it
Go's compiler does not warn about deferred functions inside loops. The change looks superficially similar to the correct pattern used in
callWithFailover(wheredefer cancel()is acceptable because there is no loop), but the semantics differ critically in the loop context. Additionally,go vetwith the-shadowflag andstaticcheckwould normally flagdeferin a loop, but the build may not enforce this.What the impact would be
For each cascade attempt through multiple unhealthy endpoints, a timer goroutine is created and kept alive until
cascadeClientsreturns rather than being immediately released after each failed attempt. If cascading through N candidates, N timer goroutines run concurrently during the cascade. SincecascadeClientsis typically short-lived (connection-refused errors return in milliseconds) and the number of endpoints is small (2-3 in practice), the practical severity is low — all contexts are cleaned up within seconds and no permanent leak occurs. However, this is a real anti-pattern regression from the original code.Step-by-step proof
cascadeClientsiterates: pass 1 (cooled), pass 2 (uncooled), pass 3 (unhealthy).- For each candidate
iwherectxhas no deadline:subCtx, cancel = context.WithTimeout(ctx, f.attemptTimeout)is called. defer cancel()appends to the function's defer stack (not executed at loop iteration end).fn(subCtx, f.clients[i])runs; if it fails, the next iteration begins with a newcontext.WithTimeoutand anotherdefer cancel().- After all iterations,
cascadeClientsreturns and all deferred cancels fire — but only then. - Contrast with the pre-PR code:
cancel()was called immediately afterfn(), so each timer goroutine was released as soon as that attempt completed.
How to fix it
Replace
defer cancel()with an immediatecancel()call afterfn()returns (both on success and failure paths), matching the original code pattern. Alternatively, extract the inner loop body into a helper function so thatdefer cancel()is function-scoped to that helper and fires correctly at the end of each attempt. -
🟡
consensus/bor/heimdall/failover_client.go:257-271— This PR changedcascadeClients(failover_client.go:261-265) to usedefer cancel()inside the innerforloop, introducing the classic Go 'defer in loop' anti-pattern. In Go,deferis function-scoped, not iteration-scoped: each loop iteration that creates acontext.WithTimeoutaccumulates a deferred cancel that fires only whencascadeClientsreturns, not at the end of the iteration. The original code correctly calledcancel()immediately afterfn(). Fix: replacedefer cancel()with an explicitcancel()call immediately afterfn()returns.Extended reasoning...
What the bug is and how it manifests
In
cascadeClients(consensus/bor/heimdall/failover_client.go lines 258-290), the PR introduced a conditional block inside the inner for loop:subCtx := ctx if _, hasDeadline := ctx.Deadline(); \!hasDeadline { var cancel context.CancelFunc subCtx, cancel = context.WithTimeout(ctx, f.attemptTimeout) defer cancel() // ← defers accumulate, not cleaned up per iteration } result, err := fn(subCtx, f.clients[i])
In Go,
deferstatements are scoped to the enclosing function, not the enclosing loop iteration. Every loop iteration that satisfies\!hasDeadlineand callscontext.WithTimeoutregisters a new deferred cancel that accumulates untilcascadeClientsreturns. The pre-PR code calledcancel()immediately afterfn(), properly releasing the timer goroutine after each attempt.The specific code path that triggers it
cascadeClientsiterates through three passes (cooled, uncooled, unhealthy candidates). For each candidate endpoint where the caller context has no deadline, a newcontext.WithTimeoutis created and its cancel is deferred. If N candidates are tried before a success, N timer goroutines accumulate and run concurrently until the function returns rather than being cleaned up after each attempt.Why existing code does not prevent it
Go's
defersemantics do not scope to loops. There is no compiler warning for this pattern — it requiresgo vet -shadowor explicit code review to catch. The pre-PR code was correct; the regression was introduced when the PR added the\!hasDeadlineguard.What the impact would be
In practice, the severity is low/nit: (1) cascadeClients is typically short-lived since connection-refused errors return in milliseconds; (2) the goroutines ARE eventually cleaned up when cascadeClients returns; (3) the maximum accumulation is bounded by the number of endpoints, typically 2-3. However, this is a real code quality regression that
go vetwould flag, and the original code was strictly better.Step-by-step proof
- Caller context has no deadline (typical case).
- Iteration 1 (cooled[0] fails):
context.WithTimeoutcreates cancel1,defer cancel1()registered, timer goroutine T1 running. - Iteration 2 (uncooled[0] fails):
context.WithTimeoutcreates cancel2,defer cancel2()registered, timer goroutine T2 running. - Iteration 3 (unhealthy[0] succeeds): cascadeClients returns, deferred cancel1() and cancel2() fire.
- Between steps 3 and 4, T1 and T2 are live concurrently even though their respective attempt is complete.
- The pre-PR code called
cancel()after eachfn()call, so T1 was cleaned up before T2 was even created.
How to fix it
Replace
defer cancel()with an explicit call afterfn()returns:subCtx := ctx var cancel context.CancelFunc if _, hasDeadline := ctx.Deadline(); \!hasDeadline { subCtx, cancel = context.WithTimeout(ctx, f.attemptTimeout) } result, err := fn(subCtx, f.clients[i]) if cancel \!= nil { cancel() // explicit, immediate cleanup }
No verifiers refuted this bug.
Code Review1 issue found. Checked for bugs and CLAUDE.md compliance. Bug: Missing global timeout on StateSyncEventsAtHeight HTTP client pagination loop File: consensus/bor/heimdall/client.go L310-313 StateSyncEventsAtHeight has no context.WithTimeout bounding its pagination loop. Every sibling method added in this PR wraps its loop with a 1-minute timeout (HTTP StateSyncEventsByTime at L365-368, gRPC StateSyncEventsAtHeight at state_sync.go L95-97, gRPC StateSyncEventsByTime at state_sync.go L161-163). This method is the only one missing it. If the Heimdall server consistently returns full pages, the loop runs indefinitely. The comment on L364-365 of StateSyncEventsByTime states the intent to match the gRPC implementation, suggesting this was an oversight. Suggested fix: Add a global timeout at the top of StateSyncEventsAtHeight: ctx, cancel := context.WithTimeout(ctx, 1*time.Minute) / defer cancel() Flagged per consensus-security.md (timeout handling for Heimdall RPC calls) and security-common.md (context propagation for operations that can block). |
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 18 out of 19 changed files in this pull request and generated 3 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Code reviewNo issues found. Checked for bugs and CLAUDE.md compliance. |
There was a problem hiding this comment.
LGTM — all review feedback addressed including the failover pagination timeout fix (1-minute budget set at MultiHeimdallClient level before callWithFailover, per the recommendation in the prior inline comments).
Extended reasoning...
Overview
19-file PR introducing deterministic state syncs for the Bor consensus layer. Core change: post-fork, CommitStates calls the new combined StateSyncEventsByTime endpoint (which resolves Heimdall height internally) instead of the non-deterministic (fromID, toTime) pattern. Changes span config (DeterministicStateSyncBlock), consensus logic, all three Heimdall transports (HTTP, gRPC, in-process app), the failover client, fork ID, and tests.
Security risks
No auth or crypto changes. The new API calls go to Heimdall endpoints (trusted internal dependency). The main risk is liveness: post-fork, Heimdall connectivity failures cause block proposals with empty state sync sets rather than hard crashes, matching the pre-fork resilience posture. This was an explicit design decision documented in code comments and accepted by the team.
Level of scrutiny
High scrutiny was appropriate and applied across 5+ review rounds. All raised issues were addressed:
- Sort consistency across HTTP/gRPC/app transports
- URL encoding with
url.Values(notfmt.Sprintf) - 1-minute global pagination timeout at
MultiHeimdallClientlevel (set beforecallWithFailoverapplies its 30s per-attempt cap) StateSyncEventsByTimemethod in all mocks- Comprehensive failover tests for all three new methods
DeterministicStateSyncBlockin startup banner, fork ID, and JSON serialization- err shadowing, context chain growth, and other code quality issues
Other factors
The feature is gated by DeterministicStateSyncBlock: nil on mainnet/amoy, so it cannot activate until explicitly scheduled. The author confirmed manual testing on kurtosis devnets with chaos testing (2-vs-2 partition) with zero errors and state roots matching Erigon. The 19:53 inline comments from the previous approver run were about an intermediate implementation; the final diff implements the correct fix (timeout at MultiHeimdallClient level), as described in those same comments as the recommended fix.
There was a problem hiding this comment.
My previous comment flagging a missing timeout in StateSyncEventsAtHeight was incorrect — the 1-minute context.WithTimeout is already present in the current diff at client.go:313. This PR is ready for human sign-off.
Extended reasoning...
Overview
This PR adds deterministic state sync support to Bor's consensus engine. It introduces DeterministicStateSyncBlock fork gating, a new StateSyncEventsByTime combined endpoint, StateSyncEventsAtHeight and GetBlockHeightByTime for backward-compatibility, and wires all three through the HTTP/gRPC/in-process-app transports plus the MultiHeimdallClient failover wrapper. 19 files changed.
Security risks
The PR touches consensus-critical code (CommitStates), Heimdall client authentication, and fork-ID peer compatibility. The error-handling choice (log + return empty post-fork, matching pre-fork behavior) was explicitly discussed and accepted. No auth bypass or injection risks identified.
Level of scrutiny
This is a high-scrutiny change. It modifies how validators derive state sync event sets — any divergence between validators produces consensus failures. The PR has gone through six rounds of review; all previously raised issues (filtering inconsistency, sort ordering, URL encoding, context shadowing, timeout propagation through callWithFailover, missing mock stubs) have been addressed. My last inline comment was a false positive — the fix was already in place.
Other factors
- Codecov shows 11.8% patch coverage on the new transport code; the gRPC and app-client paths are 0% covered by automated tests. SonarQube flagged 26.4% duplication, primarily from the repeated mock stubs across test files.
- The pre-existing dead code in
internalFetchWithTimeout(unconditional 30s override) was acknowledged but not fixed in this PR. go.moduses a pseudo-version replace directive forheimdall-v2; a TODO marks it for cleanup post-release.DeterministicStateSyncBlockisnilon all live networks, so the new path is not yet active in production.
A human reviewer should confirm the overall design and sign off before merge.
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 18 out of 19 changed files in this pull request and generated 1 comment.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Code Review1 issue found. Checked for bugs and CLAUDE.md compliance. Bug: Per-attempt timeout (30s) silently caps the intended 1-minute pagination budgetFile: consensus/bor/heimdall/failover_client.go:113-122
Additionally, HeimdallClient.StateSyncEventsByTime (client.go:278) sets its own redundant 1-minute timeout inside the already-tighter 30s context - this inner timeout can never fire. Effect: With default settings, the paginated fetch for all pages must complete within 30s per failover attempt, not the intended 1 minute. During periods with many queued state sync events requiring multiple pages, this could cause premature timeouts, triggering unnecessary failover cascading and ultimately returning empty state syncs (the proposer misses a slot). Note: The test Suggested fix: Remove the 1-minute timeout from |
|
| func (m *mockHeimdallClient) GetBlockHeightByTime(context.Context, int64) (int64, error) { | ||
| return 0, nil | ||
| } | ||
| func (m *mockHeimdallClient) StateSyncEventsAtHeight(context.Context, uint64, int64, int64) ([]*clerk.EventRecordWithTime, error) { | ||
| return nil, nil | ||
| } |
There was a problem hiding this comment.
🟡 Multiple test mock structs across 5 files implement GetBlockHeightByTime and StateSyncEventsAtHeight as extra methods that are not declared in either the IHeimdallClient or Endpoint interface, making these stubs unreachable dead code. These are leftovers from an earlier two-step design iteration (get Heimdall height, then fetch by height) that was replaced by the unified StateSyncEventsByTime call; removing them would avoid falsely implying that failover test coverage exists for an AtHeight/ByHeight path that no longer exists in any interface.
Extended reasoning...
What the bug is and how it manifests
The PR refactored the deterministic state sync path to use a single unified StateSyncEventsByTime call instead of the previous two-step approach (GetBlockHeightByTime + StateSyncEventsAtHeight). The final IHeimdallClient interface in consensus/bor/heimdall.go and the Endpoint interface in consensus/bor/heimdall/failover_client.go both only declare StateSyncEventsByTime as the new method — neither interface contains GetBlockHeightByTime or StateSyncEventsAtHeight. Despite this, all test mock structs added or modified by this PR still implement both of those methods as extra struct methods:
eth/ethconfig/config_test.golines 52-57:mockHeimdallClient.GetBlockHeightByTimeandmockHeimdallClient.StateSyncEventsAtHeighteth/handler_bor_test.golines 68-73:mockHeimdall.GetBlockHeightByTimeandmockHeimdall.StateSyncEventsAtHeightconsensus/bor/bor_test.golines 122-127 and 2986-2991: same pattern onfailingHeimdallClientandmockHeimdallClientconsensus/bor/span_store_test.goat multiple locations:MockHeimdallClient,MockOverlappingHeimdallClient,dynamicHeimdallClient,MockSyncStatusClient, andTimeoutHeimdallClientconsensus/bor/heimdall/failover_client_test.golines 35-37: struct fieldsstateSyncEventsAtHeightFnandgetBlockHeightByTimeFn, plus method implementations at lines 128-145
The specific code path that triggers it
In Go, interface satisfaction is structural — the compiler only checks that required methods are present. Extra struct methods not listed in the interface are completely invisible when using the interface type. Since GetBlockHeightByTime and StateSyncEventsAtHeight appear in neither IHeimdallClient nor Endpoint, no call site using either interface type can ever dispatch to these methods. They can only be called via concrete type, and no call site does that.
Why existing code does not prevent it
Go silently accepts extra methods on a struct beyond what an interface requires, and the compiler emits no warning. The tests all compile and pass correctly. This class of dead code can only be identified by cross-referencing interface definitions against struct method sets, which no automated tool enforces here.
Addressing the refutation
The sole refutation notes that "having extra methods on a struct beyond what an interface requires is valid Go code that compiles and runs correctly." This is entirely accurate. The code is not broken. However, correctness and the absence of dead code are separate concerns. The specific harm here is that failover_client_test.go carries two configurable function fields (stateSyncEventsAtHeightFn, getBlockHeightByTimeFn) and their corresponding method implementations, while the PR adds 5 failover tests each for GetBlockHeightByTime and StateSyncEventsAtHeight. This combination creates the strong false impression that there are working failover tests exercising those paths — but since neither method is in any interface, no production code dispatches through them. A reviewer auditing test coverage for the new deterministic sync path would count these as coverage that does not exist.
What the impact would be
No functional impact — all tests compile and pass. The issue is purely one of code clarity and reviewer confidence: dead stubs waste struct layout, make the interface contract harder to read at a glance, and the configurable function fields in failover_client_test.go create a false sense of test coverage for a code path (GetBlockHeightByTime + StateSyncEventsAtHeight) that was explicitly replaced by this PR.
How to fix it
Remove all GetBlockHeightByTime and StateSyncEventsAtHeight method implementations from the test mocks across all affected files, and remove the stateSyncEventsAtHeightFn and getBlockHeightByTimeFn fields from the mockHeimdallClient struct in failover_client_test.go. These methods are not required by any interface and will not break compilation or any test when removed.


Description
Implements the bor side of deterministic state syncs. Post-fork, Bor switches from the non-deterministic
(from_id, to_time)state sync query pattern to the combinedGetStateSyncsByTimeendpoint that resolves the Heimdall height internally and returns events in a single call. This ensures all validators derive identical state-sync sets regardless of when they query Heimdall.Changes
DeterministicStateSyncBlocktoBorConfigwithIsDeterministicStateSync()checkCommitStatesbranches on fork activation — post-fork uses the combinedStateSyncEventsByTimeendpoint. On errors, logs and returns empty (matching pre-fork resilience) instead of crashing block productionStateSyncEventsByTimeto all three transports (HTTP, gRPC, in-process app). Consistent sorting and retry behavior across all transports.MultiHeimdallClientwraps all new methods with the existing failover mechanism. Fixed flaky failover tests that used global hit counters instead of method-specific tracking.DeterministicStateSyncBlockincluded inGatherForks()for peer compatibilityChainConfig.Description()Activation
DeterministicStateSyncBlock:nil(disabled) on mainnet/amoy until HF is scheduledChanges
Breaking changes
HF needed.
Checklist
Cross repository changes
Testing
Manual tests
Tested on kurtosis devnets with chaos testing (2-vs-2 network partition) and self-heal recovery. Many rounds of chaos, zero errors, state roots match erigon.