chore: Better check for stored meta not stale by alpe · Pull Request #3221 · evstack/ev-node

alpe · 2026-03-31T15:22:01Z

Overview

Enhanced stale block detection during failover to prevent resuming outdated execution states.

Summary by CodeRabbit

Bug Fixes
- Enhanced validation of execution metadata to prevent resumption of stale in-progress executions with mismatched transaction inputs or timestamps.

github-actions · 2026-03-31T15:22:16Z

The latest Buf updates on your PR. Results from workflow CI / buf-check (pull_request).

Build	Format	Lint	Breaking	Updated (UTC)
`✅ passed`	`⏩ skipped`	`✅ passed`	`✅ passed`	Mar 31, 2026, 3:24 PM

claude · 2026-03-31T15:22:17Z

Claude finished @alpe's task in 1m 53s —— View job

Code Review

Gather context and read changed files
Review execution/evm/execution.go
Review execution/evm/execution_reconcile_test.go
Summarize findings

Summary

This PR tightens the staleness check for in-progress (ExecStageStarted) execution metadata before resuming from a cached payloadID. Previously, the code would attempt to resume any stored payload at a given height without verifying that the stored timestamp and transaction set match the current request — a subtle correctness bug that could cause resume attempts with mismatched block inputs. The fix is correct and well-scoped.

The hashTxs helper extraction is a clean refactor with no behavior change.

Issues

1. Missing test: inputs match but EL payload is stale (execution_reconcile_test.go)

The resume_when_inputs_match case only tests the happy path where GetPayload succeeds. There's no test for the branch where inputs match but GetPayload returns an error (stale payload after node restart). That path should return found=false and payloadID=nil while logging a debug message. Currently, that branch is covered only by existing tests elsewhere (if any) — consider adding:

"stale_payload_when_inputs_match": {
    execMetaTimestamp: 1700000012,
    execMetaTxs:       [][]byte{[]byte("tx-1")},
    requestedTxs:      [][]byte{[]byte("tx-1")},
    requestedTime:     time.Unix(1700000012, 0),
    expectFound:       false,
    expectPayloadID:   false,
    expectGetPayloads: 1, // tried but failed
    // requires payloads map to NOT contain the payloadID
},

Fix this →

2. Log message missing tx hash context (execution.go:761-765)

When the mismatch is due to differing tx hashes (not timestamps), the log field names are misleading — only execmeta_timestamp and requested_timestamp are logged, implying the mismatch is always a timestamp issue. Consider adding a field or adjusting the message to reflect when it's a tx hash mismatch:

c.logger.Debug().
    Uint64("height", height).
    Int64("execmeta_timestamp", execMeta.Timestamp).
    Int64("requested_timestamp", timestamp.Unix()).
    Bool("timestamp_match", execMeta.Timestamp == timestamp.Unix()).
    Bool("txhash_match", bytes.Equal(execMeta.TxHash, requestedTxHash)).
    Msg("ExecuteTxs: ignoring stale in-progress execution for different block inputs")

Fix this →

Observations (non-blocking)

hashTxs with nil vs empty: hashTxs(nil) and hashTxs([][]byte{}) both return nil. Since bytes.Equal(nil, nil) == true, two empty-transaction blocks will correctly match. This is consistent between saveExecMeta and the comparison site. Worth noting that existing ExecMeta entries saved before this field existed will have TxHash == nil, which will match any empty-tx request — but that's an existing edge case, not introduced here.

Test isolation: mockReconcileEngineRPCClient uses a non-atomic getPayloadCalls counter. Since tests are sequential this is fine, but worth noting for future parallel test refactors.

Overall: This is a focused, well-reasoned correctness fix. The refactored hashTxs helper is cleaner than the inline version. The table-driven tests cover the three primary scenarios clearly.

coderabbitai · 2026-03-31T15:24:45Z

📝 Walkthrough

Walkthrough

The changes add improved validation for cached execution metadata during reconciliation. When resuming a started execution, the system now verifies that the execution timestamp and transaction set match the stored metadata before attempting to retrieve the payload, reducing the risk of stale cached data being used. Supporting changes include a refactored transaction hash computation helper and comprehensive test coverage.

Changes

Cohort / File(s)	Summary
Documentation Update `CHANGELOG.md`	Reformatted changelog entries to use consistent bullet-point notation across all version sections; added entry for stale metadata check improvement.
Execution Metadata Reconciliation `execution/evm/execution.go`	Added pre-check validation when reconciling `ExecStageStarted` metadata by comparing stored metadata's timestamp and transaction hash against current inputs before attempting payload retrieval. Refactored transaction hash computation into new `hashTxs()` helper function that returns `nil` for empty transaction sets. Updated control-flow logging to handle stale execution cases.
Reconciliation Testing `execution/evm/execution_reconcile_test.go`	Added new test `TestReconcileExecutionAtHeight_StartedExecMeta` covering three scenarios: matching inputs (returns found with payload ID and one `GetPayload` call), mismatched timestamp, and mismatched transactions (both return not found with no payload calls). Includes local mock implementations for engine and Ethereum RPC clients.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~30 minutes

Suggested reviewers

tac0turtle
randygrok

Poem

🐰 Hops of joy through execution's flow,
Stale metadata? No, no, no!
With timestamps and hashes we now align,
Fresh payloads only—such fine design! ✨

🚥 Pre-merge checks | ✅ 3

✅ Passed checks (3 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title clearly and specifically summarizes the main change: improving validation of stored execution metadata to prevent using stale cached state during failover scenarios.
Description check	✅ Passed	The description is minimal but covers the essential context—it explains the goal and rationale, though without explicit linking to issue `#3221` or detailed technical explanation of the validation improvements.
Docstring Coverage	✅ Passed	No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

📝 Generate docstrings

Create stacked PR
Commit on current branch

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch alex/exec_check

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

codecov · 2026-03-31T15:29:04Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 61.43%. Comparing base (4a70e0b) to head (0d5a6f5).

Additional details and impacted files

@@           Coverage Diff           @@
##             main    #3221   +/-   ##
=======================================
  Coverage   61.43%   61.43%           
=======================================
  Files         120      120           
  Lines       12504    12504           
=======================================
  Hits         7682     7682           
  Misses       3960     3960           
  Partials      862      862

Flag	Coverage Δ
combined	`61.43% <ø> (ø)`

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

julienrbrt

lgtm! this makes #3172 more readable as well so great

coderabbitai

Actionable comments posted: 4

🧹 Nitpick comments (1)

execution/evm/execution_reconcile_test.go (1)
19-57: Use a slice-backed table so the subtests run in a stable order.

range over a map randomizes the case order, which makes failures a bit noisier to reproduce. A slice keeps this table-driven without losing determinism.

As per coding guidelines, "Use table-driven tests in Go unit tests" and "Ensure tests are deterministic in Go."
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@execution/evm/execution_reconcile_test.go` around lines 19 - 57, The test
currently builds specs as a map (variable specs) and ranges over it, which
randomizes subtest order; change specs to a slice of structs that include a name
field (e.g., type case struct { name string; execMetaTimestamp int64;
execMetaTxs [][]byte; requestedTxs [][]byte; requestedTime time.Time;
expectFound bool; expectPayloadID bool; expectGetPayloads int }) and populate it
in the desired stable order, then iterate over that slice (for _, spec := range
cases { t.Run(spec.name, func(t *testing.T) { ... }) }) instead of ranging the
map so subtests run deterministically while keeping the same assertions and
using the same symbols (specs -> cases, t.Run).

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@CHANGELOG.md`:
- Line 333: Remove the orphan list marker '*' from the links footer in
CHANGELOG.md (the stray asterisk that renders as an empty bullet); open
CHANGELOG.md, locate the lone '*' in the footer area and delete that line so the
list renders correctly without the empty bullet.
- Around line 12-14: Move the entry "chore: Better check for stored meta not
stale" out of the "### Added" section into the "### Fixed" section and rewrite
it in user-facing language that explains the bug and the user impact (e.g.,
"Fixed stale execution-resume behavior by improving validation of stored
metadata to prevent resuming with outdated state"). Update the changelog entry
to remove commit-style phrasing and use a concise, user-first sentence
describing the fix and its effect.

In `@execution/evm/execution.go`:
- Around line 758-785: The resume path for ExecStageStarted currently only
checks timestamp and tx hash (execMeta.Timestamp, execMeta.TxHash via
hashTxs(txs)) before returning the stored PayloadID (pid); you must also
validate that the current parent hash matches execMeta.ParentHash to avoid
resuming on a changed parent. Retrieve the current parent hash for the block
(height-1) from the chain/headers API available in this package and compare it
to execMeta.ParentHash; if they differ, treat the payload as stale (log and fall
through to fresh execution) rather than returning pid even if
c.engineClient.GetPayload(ctx, pid) succeeds. Ensure you still keep the existing
GetPayload existence check (c.engineClient.GetPayload) but only return pid when
both the payload exists and the parent hash matches execMeta.ParentHash.
- Around line 1048-1059: The hashTxs function currently concatenates raw tx
bytes which can produce ambiguous collisions; modify hashTxs to make the tx-list
encoding unambiguous by prefixing each tx with its length (or hashing each tx
separately with a domain separator) before writing into the sha256 hasher: e.g.,
for each tx in hashTxs write the tx length as a fixed-size varint/uint64 or a
clear separator and then write the tx bytes (or write sha256(tx) with a
separator), then return h.Sum(nil) so the digest uniquely represents the ordered
list.

---

Nitpick comments:
In `@execution/evm/execution_reconcile_test.go`:
- Around line 19-57: The test currently builds specs as a map (variable specs)
and ranges over it, which randomizes subtest order; change specs to a slice of
structs that include a name field (e.g., type case struct { name string;
execMetaTimestamp int64; execMetaTxs [][]byte; requestedTxs [][]byte;
requestedTime time.Time; expectFound bool; expectPayloadID bool;
expectGetPayloads int }) and populate it in the desired stable order, then
iterate over that slice (for _, spec := range cases { t.Run(spec.name, func(t
*testing.T) { ... }) }) instead of ranging the map so subtests run
deterministically while keeping the same assertions and using the same symbols
(specs -> cases, t.Run).

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: e12b9912-7b63-430f-b6da-57e2d9c96b2c

📥 Commits

Reviewing files that changed from the base of the PR and between 4a70e0b and 0d5a6f5.

📒 Files selected for processing (3)

CHANGELOG.md
execution/evm/execution.go
execution/evm/execution_reconcile_test.go

coderabbitai · 2026-03-31T15:38:16Z

CHANGELOG.md

+### Added
+
+* chore: Better check for stored meta not stale [#3221](https://github.com/evstack/ev-node/pull/3221)


⚠️ Potential issue | 🟡 Minor

Move this entry under Fixed and rewrite it user-first.

This change hardens stale execution-resume behavior, so it reads like a bug fix rather than a new feature, and chore: Better check for stored meta not stale is still commit-title/internal wording instead of changelog wording.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@CHANGELOG.md` around lines 12 - 14, Move the entry "chore: Better check for stored meta not stale" out of the "### Added" section into the "### Fixed" section and rewrite it in user-facing language that explains the bug and the user impact (e.g., "Fixed stale execution-resume behavior by improving validation of stored metadata to prevent resuming with outdated state"). Update the changelog entry to remove commit-style phrasing and use a concise, user-first sentence describing the fix and its effect.

coderabbitai · 2026-03-31T15:38:16Z

CHANGELOG.md

 <!-- Links -->

-
+*


⚠️ Potential issue | 🟡 Minor

Remove the orphan list marker in the links footer.

Line 333 renders as an empty bullet and looks like a leftover from the list-marker normalization.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@CHANGELOG.md` at line 333, Remove the orphan list marker '*' from the links footer in CHANGELOG.md (the stray asterisk that renders as an empty bullet); open CHANGELOG.md, locate the lone '*' in the footer area and delete that line so the list renders correctly without the empty bullet.

coderabbitai · 2026-03-31T15:38:16Z

execution/evm/execution.go

 		if execMeta.Stage == ExecStageStarted && len(execMeta.PayloadID) == 8 {
-			var pid engine.PayloadID
-			copy(pid[:], execMeta.PayloadID)
+			requestedTxHash := hashTxs(txs)
+			if execMeta.Timestamp != timestamp.Unix() || !bytes.Equal(execMeta.TxHash, requestedTxHash) {
+				c.logger.Debug().
+					Uint64("height", height).
+					Int64("execmeta_timestamp", execMeta.Timestamp).
+					Int64("requested_timestamp", timestamp.Unix()).
+					Msg("ExecuteTxs: ignoring stale in-progress execution for different block inputs")
+			} else {
+				var pid engine.PayloadID
+				copy(pid[:], execMeta.PayloadID)

-			// Validate payload still exists by attempting to retrieve it
-			if _, err = c.engineClient.GetPayload(ctx, pid); err == nil {
-				c.logger.Info().
+				// Validate payload still exists by attempting to retrieve it
+				if _, err = c.engineClient.GetPayload(ctx, pid); err == nil {
+					c.logger.Info().
+						Uint64("height", height).
+						Str("stage", execMeta.Stage).
+						Msg("ExecuteTxs: found in-progress execution with payloadID, returning payloadID for resume")
+					return nil, &pid, true, nil
+				}
+				// Payload is stale (expired or node restarted) - proceed with fresh execution
+				c.logger.Debug().
 					Uint64("height", height).
-					Str("stage", execMeta.Stage).
-					Msg("ExecuteTxs: found in-progress execution with payloadID, returning payloadID for resume")
-				return nil, &pid, true, nil
+					Str("payloadID", pid.String()).
+					Err(err).
+					Msg("ExecuteTxs: stale ExecMeta payloadID no longer valid in EL, will re-execute")
+				// Don't return - fall through to fresh execution
 			}


⚠️ Potential issue | 🔴 Critical

ExecStageStarted resumes still ignore parent-chain changes.

This gate only checks timestamp and tx hash before reusing pid. After a rollback/reorg, the parent at height-1 can change while both still match, so we can resume and later promote a payload that was built on the old parent. Please also validate the current parent hash before returning pid; ExecMeta already has a ParentHash field you can use.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@execution/evm/execution.go` around lines 758 - 785, The resume path for ExecStageStarted currently only checks timestamp and tx hash (execMeta.Timestamp, execMeta.TxHash via hashTxs(txs)) before returning the stored PayloadID (pid); you must also validate that the current parent hash matches execMeta.ParentHash to avoid resuming on a changed parent. Retrieve the current parent hash for the block (height-1) from the chain/headers API available in this package and compare it to execMeta.ParentHash; if they differ, treat the payload as stale (log and fall through to fresh execution) rather than returning pid even if c.engineClient.GetPayload(ctx, pid) succeeds. Ensure you still keep the existing GetPayload existence check (c.engineClient.GetPayload) but only return pid when both the payload exists and the parent hash matches execMeta.ParentHash.

coderabbitai · 2026-03-31T15:38:16Z

execution/evm/execution.go

+func hashTxs(txs [][]byte) []byte {
+	if len(txs) == 0 {
+		return nil
+	}
+
+	h := sha256.New()
+	for _, tx := range txs {
+		h.Write(tx)
+	}
+
+	return h.Sum(nil)
+}


⚠️ Potential issue | 🟠 Major

Make the tx-list hash unambiguous.

hashTxs hashes the raw concatenation of every tx, so different [][]byte values can collide when they flatten to the same byte stream. That weakens the stale-meta guard this PR is adding. Hash each element independently or include a separator/length per element.

Possible fix

func hashTxs(txs [][]byte) []byte { if len(txs) == 0 { return nil } h := sha256.New() for _, tx := range txs { - h.Write(tx) + txHash := sha256.Sum256(tx) + h.Write(txHash[:]) } return h.Sum(nil) }

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

func hashTxs(txs [][]byte) []byte {

if len(txs) == 0 {

return nil

}

h := sha256.New()

for _, tx := range txs {

h.Write(tx)

}

return h.Sum(nil)

}

func hashTxs(txs [][]byte) []byte {

if len(txs) == 0 {

return nil

}

h := sha256.New()

for _, tx := range txs {

txHash := sha256.Sum256(tx)

h.Write(txHash[:])

}

return h.Sum(nil)

}

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@execution/evm/execution.go` around lines 1048 - 1059, The hashTxs function currently concatenates raw tx bytes which can produce ambiguous collisions; modify hashTxs to make the tx-list encoding unambiguous by prefixing each tx with its length (or hashing each tx separately with a domain separator) before writing into the sha256 hasher: e.g., for each tx in hashTxs write the tx length as a fixed-size varint/uint64 or a clear separator and then write the tx bytes (or write sha256(tx) with a separator), then return h.Sum(nil) so the digest uniquely represents the ordered list.

Better check for store meta not stale

e2ad7e6

alpe requested a review from julienrbrt March 31, 2026 15:22

github-actions bot assigned alpe Mar 31, 2026

alpe changed the title ~~Better check for store meta not stale~~ chore: Better check for store meta not stale Mar 31, 2026

alpe changed the title ~~chore: Better check for store meta not stale~~ chore: Better check for stored meta not stale Mar 31, 2026

Changelog

0d5a6f5

julienrbrt approved these changes Mar 31, 2026

View reviewed changes

coderabbitai bot reviewed Mar 31, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

chore: Better check for stored meta not stale#3221

chore: Better check for stored meta not stale#3221
alpe wants to merge 2 commits intomainfrom
alex/exec_check

alpe commented Mar 31, 2026 •

edited by coderabbitai bot

Loading

Uh oh!

github-actions bot commented Mar 31, 2026 •

edited

Loading

Uh oh!

claude bot commented Mar 31, 2026 •

edited

Loading

Uh oh!

coderabbitai bot commented Mar 31, 2026 •

edited

Loading

Walkthrough

Changes

Estimated code review effort

Suggested reviewers

Poem

Uh oh!

codecov bot commented Mar 31, 2026 •

edited

Loading

Uh oh!

julienrbrt left a comment •

edited

Loading

Uh oh!

coderabbitai bot left a comment

Uh oh!

coderabbitai bot Mar 31, 2026

Uh oh!

coderabbitai bot Mar 31, 2026

Uh oh!

coderabbitai bot Mar 31, 2026

Uh oh!

coderabbitai bot Mar 31, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

		### Added

		* chore: Better check for stored meta not stale [#3221](https://github.com/evstack/ev-node/pull/3221)

Conversation

alpe commented Mar 31, 2026 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Overview

Summary by CodeRabbit

Uh oh!

github-actions bot commented Mar 31, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

claude bot commented Mar 31, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Code Review

Summary

Issues

Observations (non-blocking)

Uh oh!

coderabbitai bot commented Mar 31, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Suggested reviewers

Poem

Uh oh!

codecov bot commented Mar 31, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

julienrbrt left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Mar 31, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Mar 31, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Mar 31, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Mar 31, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

alpe commented Mar 31, 2026 •

edited by coderabbitai bot

Loading

github-actions bot commented Mar 31, 2026 •

edited

Loading

claude bot commented Mar 31, 2026 •

edited

Loading

coderabbitai bot commented Mar 31, 2026 •

edited

Loading

codecov bot commented Mar 31, 2026 •

edited

Loading

julienrbrt left a comment •

edited

Loading