Skip to content

fix: renderer OOM crash recovery and memory management#1843

Open
mihai2mn wants to merge 6 commits intopingdotgg:mainfrom
mihai2mn:fix/renderer-oom-crash-recovery
Open

fix: renderer OOM crash recovery and memory management#1843
mihai2mn wants to merge 6 commits intopingdotgg:mainfrom
mihai2mn:fix/renderer-oom-crash-recovery

Conversation

@mihai2mn
Copy link
Copy Markdown

@mihai2mn mihai2mn commented Apr 8, 2026

Summary

Fixes #1686 — V8 OOM crash that kills the renderer during long sessions, causing the white-screen freeze.

Root cause: The renderer holds ALL thread data (messages, activities, tool outputs) for ALL threads in the Zustand store indefinitely. During long sessions with heavy tool use, the V8 heap grows to ~3.7GB and crashes. There is no crash recovery handler, so the user sees a dead white window while agents continue running invisibly in the background.

This PR adds three layers of defense:

  • Crash recoveryrender-process-gone handler auto-reloads the renderer after 500ms on crash/OOM, restoring real-time visibility into running agents. Also logs unresponsive/responsive events for diagnostics.
  • V8 heap cap--max-old-space-size=2048 forces GC to run more aggressively at 2GB instead of growing unchecked to 3.7GB+.
  • Thread data eviction — Only 5 threads stay fully hydrated in memory at a time. Inactive threads are dehydrated (messages/activities/plans/diffs cleared, sidebar metadata preserved) and re-fetched from the server on navigation. Threads with running agents are never evicted.

Changes

File Change
apps/desktop/src/main.ts render-process-gone handler, unresponsive/responsive logging, --max-old-space-size=2048
apps/web/src/lib/threadEviction.ts Eviction policy: which threads to dehydrate, when
apps/web/src/types.ts hydrated: boolean on Thread
apps/web/src/store.ts evictThreadData / hydrateThread pure actions
apps/web/src/routes/__root.tsx Eviction triggered on navigation in EventRouter
apps/web/src/components/ChatView.tsx Hydration trigger + loading state for dehydrated threads
Tests threadEviction.test.ts (5 tests), store.test.ts (+2 tests), updated Thread factories

Evidence from live crash

Captured during development — the renderer crashed after 39 minutes of use:

coredumpctl: PID 393635 (t3-code-desktop --type=zygote)
Signal: 5 (TRAP)  ← V8 heap exhaustion
Core dump: 927.2MB (truncated)

Main process + backend + Claude agents all survived — only the renderer died with zero recovery. The desktop log showed no awareness of the crash (no render-process-gone handler existed).

Test plan

  • bun typecheck — 0 errors
  • bun lint — 0 errors
  • bun fmt — clean
  • bun run test — 688/689 pass (1 pre-existing timeout in MessagesTimeline.test.tsx)
  • New tests: 7/7 pass (5 eviction policy + 2 store actions)
  • Manual: long session stress test with heavy tool use
  • Manual: verify dehydrated threads show loading state, then re-hydrate on navigation
  • Manual: verify threads with running agents are not evicted

🤖 Generated with Claude Code


Note

Medium Risk
Touches Electron process flags/crash recovery and adds thread data eviction/hydration in the web store, which can affect app stability and state consistency if hydration/eviction edge cases are missed. Server-side change adds caching around filesystem checks and could mask newly-created worktrees for up to the TTL if incorrect.

Overview
Prevents long-session renderer OOM/white-screen issues by capping V8 heap to 2GB (main + renderer) and adding renderer crash/unresponsive diagnostics plus an auto-reload on render-process-gone for crash/OOM/kill.

Adds thread memory eviction in the web app: introduces a Thread.hydrated flag, a keep-limit policy (EVICTION_KEEP_COUNT=5) to dehydrate inactive threads on navigation, and store actions to evictThreadData (drop messages/activities/plans/diffs) and hydrateThread (restore full thread from a server snapshot). ChatView now shows a loading state for dehydrated threads and re-hydrates them on demand.

On the server, adds a negative cache for missing git worktree paths to avoid repeated stat()/ENOENT loops when worktrees are deleted.

Reviewed by Cursor Bugbot for commit e7b400c. Bugbot is set up for automated code reviews on this repo. Configure here.

Note

Fix renderer OOM crash recovery and add thread memory eviction in desktop app

  • Caps V8 heap at 2 GB for main and renderer processes in main.ts via --max-old-space-size=2048, and auto-reloads the renderer window 500ms after a crash, OOM, or kill event.
  • Adds a hydrated flag to the Thread type and implements evictThreadData/hydrateThread store reducers to dehydrate inactive threads (clearing messages, activities, plans, and diff summaries) and restore them on demand.
  • On route changes, __root.tsx calls selectThreadsToEvict from the new threadEviction.ts module to dehydrate threads beyond a keep threshold, skipping the active thread and any with running sessions.
  • When navigating to a dehydrated thread, ChatView shows a "Loading conversation..." placeholder and re-fetches the full thread from the server.
  • Adds a TTL-based negative cache in GitCore.ts to skip repeated stat() calls for missing worktree paths for up to 5 minutes.

Macroscope summarized e7b400c.

mihai2mn and others added 4 commits April 8, 2026 22:32
When the renderer hits V8 heap limit (~3.7GB) and crashes, the main
process stays alive with the backend and agents still running. This
adds a render-process-gone handler that auto-reloads the renderer
after 500ms, restoring real-time visibility into running agent work.

Also caps the renderer V8 heap at 2GB via --max-old-space-size so
GC runs more aggressively instead of growing unchecked.

Fixes pingdotgg#1686

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Pure logic module that decides which threads to dehydrate based on
activity, session status, and recency. Keeps up to 5 fully-hydrated
threads in memory; never evicts the active thread or threads with
running agents. Oldest idle threads are evicted first.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Threads can be 'dehydrated' to free renderer memory — messages,
activities, plans, and diffs are cleared while preserving sidebar
metadata. They are re-hydrated from the server when the user
navigates back. The hydrated flag on Thread tracks load state.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
When navigating between threads, inactive threads beyond the keep
limit (5) are dehydrated to free memory. When viewing a dehydrated
thread, its data is re-fetched from the server automatically.
Dehydrated threads show a loading indicator while data loads.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Apr 8, 2026

Important

Review skipped

Auto reviews are disabled on this repository. Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: 861ca6e4-2ccc-4441-b2f7-e58a1003406a

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

  • 🔍 Trigger review
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@github-actions github-actions bot added size:L 100-499 changed lines (additions + deletions). vouch:unvouched PR author is not yet trusted in the VOUCHED list. labels Apr 8, 2026
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 489dc1b3c4

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +1147 to +1150
messages: [],
activities: [],
proposedPlans: [],
turnDiffSummaries: [],
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Retain sidebar status when evicting thread payloads

evictThreadData clears messages, activities, and proposedPlans, and updateThreadState then rebuilds sidebarThreadsById from this dehydrated thread. Because sidebar fields like latestUserMessageAt, hasPendingApprovals, hasPendingUserInput, and hasActionableProposedPlan are derived from those arrays, eviction drops pending-action badges and recency/sort metadata for inactive threads even though that state still exists on the server.

Useful? React with 👍 / 👎.

Comment on lines +1162 to +1164
...fresh,
// Preserve any client-only state
session: existing.session ?? fresh.session,
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Use snapshot session state during thread hydration

hydrateThread replaces the fetched snapshot with session: existing.session ?? fresh.session, which can keep stale/optimistic local session data and ignore authoritative server session fields from getSnapshot(). If the local session is outdated (for example after reconnects or stop-request races), the hydrated thread can show incorrect status and be treated as non-running for later eviction decisions.

Useful? React with 👍 / 👎.

const allThreads = useStore((store) => store.threads);

// Evict inactive threads when navigating to keep memory bounded.
// This is a separate effect from the WS subscription lifecycle.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔴 Critical routes/__root.tsx:596

The activeThreadId extraction at line 598 checks for pathname.startsWith("/chat/"), but thread routes are /$threadId (not /chat/$threadId). Since pathname never starts with /chat/, activeThreadId is always null and the currently viewed thread is never protected from eviction. Consider updating the prefix check to match the actual route structure.

🤖 Copy this AI Prompt to have your agent fix this:
In file apps/web/src/routes/__root.tsx around line 596:

The `activeThreadId` extraction at line 598 checks for `pathname.startsWith("/chat/")`, but thread routes are `/$threadId` (not `/chat/$threadId`). Since `pathname` never starts with `/chat/`, `activeThreadId` is always `null` and the currently viewed thread is never protected from eviction. Consider updating the prefix check to match the actual route structure.

Evidence trail:
- apps/web/src/routes/__root.tsx lines 598-600: `const activeThreadId = pathname.startsWith("/chat/") ? (pathname.split("/chat/")[1]?.split("/")[0] ?? null) : null;`
- apps/web/src/routes/_chat.$threadId.tsx line 249: `export const Route = createFileRoute("/_chat/$threadId")({` - underscore prefix makes it pathless layout route
- apps/web/src/components/Sidebar.tsx line 826: `to: "/$threadId"` - confirms thread URLs are /$threadId
- apps/web/src/components/ChatView.tsx lines 939, 962, 1598, 3524, 3902: all navigate `to: "/$threadId"`
- apps/web/src/routes/__root.tsx line 255: `to: "/$threadId"` - root navigation also uses /$threadId pattern

});
}

export function evictThreadData(state: AppState, threadId: ThreadId): AppState {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 Medium src/store.ts:1144

After evictThreadData clears the thread's messages, activities, and proposedPlans, updateThreadState rebuilds the sidebar summary from the now-empty arrays. This causes hasPendingApprovals, hasPendingUserInput, hasActionableProposedPlan, and latestUserMessageAt to be recomputed as false/null even when the server still has pending items, so the sidebar incorrectly hides status indicators until re-hydration. Consider preserving the original sidebar summary during eviction, or ensuring the sidebar reflects server state rather than client-evicted state.

🤖 Copy this AI Prompt to have your agent fix this:
In file apps/web/src/store.ts around line 1144:

After `evictThreadData` clears the thread's `messages`, `activities`, and `proposedPlans`, `updateThreadState` rebuilds the sidebar summary from the now-empty arrays. This causes `hasPendingApprovals`, `hasPendingUserInput`, `hasActionableProposedPlan`, and `latestUserMessageAt` to be recomputed as `false`/`null` even when the server still has pending items, so the sidebar incorrectly hides status indicators until re-hydration. Consider preserving the original sidebar summary during eviction, or ensuring the sidebar reflects server state rather than client-evicted state.

Evidence trail:
apps/web/src/store.ts:1144-1152 (evictThreadData clears arrays and calls updateThreadState), apps/web/src/store.ts:536-573 (updateThreadState calls buildSidebarThreadSummary on the updated thread), apps/web/src/store.ts:214-234 (buildSidebarThreadSummary computes latestUserMessageAt, hasPendingApprovals, hasPendingUserInput, hasActionableProposedPlan from thread.messages, thread.activities, thread.proposedPlans)

Copy link
Copy Markdown
Contributor

@cursor cursor bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 3 potential issues.

Fix All in Cursor

❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.

Reviewed by Cursor Bugbot for commit 489dc1b. Configure here.

useEffect(() => {
const activeThreadId = pathname.startsWith("/chat/")
? (pathname.split("/chat/")[1]?.split("/")[0] ?? null)
: null;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wrong pathname pattern makes active thread always null

High Severity

The thread eviction effect incorrectly assumes thread URLs start with /chat/. TanStack Router's pathless layout means actual thread URLs are /$threadId, causing activeThreadId to always be null. This prevents the active thread from being protected, leading to an infinite evict-hydrate loop and a "Loading conversation..." screen.

Fix in Cursor Fix in Web

Reviewed by Cursor Bugbot for commit 489dc1b. Configure here.

turnDiffSummaries: [],
hydrated: false,
}));
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Eviction corrupts sidebar summary derived from cleared arrays

Medium Severity

Evicting thread data clears messages, activities, and proposedPlans. This causes updateThreadState to rebuild the sidebar summary from empty arrays, incorrectly setting latestUserMessageAt to null and pending action flags to false. As a result, evicted threads lose their correct sort order and status badges in the sidebar.

Fix in Cursor Fix in Web

Reviewed by Cursor Bugbot for commit 489dc1b. Configure here.

}
}, 500);
}
});
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No retry limit risks infinite crash-reload loop

Medium Severity

The render-process-gone handler unconditionally schedules a reload after 500ms for every crash/OOM/killed event, with no maximum retry count or backoff. If the renderer crashes immediately after loading (e.g., due to corrupted persisted state or a deterministic startup error), this creates an infinite crash-reload loop at ~2Hz. The user would see a rapidly flickering white screen with no way to recover except force-quitting the app — arguably worse than the previous static white screen behavior.

Fix in Cursor Fix in Web

Reviewed by Cursor Bugbot for commit 489dc1b. Configure here.

@macroscopeapp
Copy link
Copy Markdown
Contributor

macroscopeapp bot commented Apr 8, 2026

Approvability

Verdict: Needs human review

2 blocking correctness issues found. This PR introduces a significant new memory management system with thread eviction/hydration, crash recovery, and V8 heap limits. Multiple critical bugs have been identified in review comments, including a pathname pattern mismatch that would cause infinite evict-hydrate loops on the active thread, and no retry limit on crash recovery that could cause infinite reload loops.

You can customize Macroscope's approvability policy. Learn more.

mihai2mn and others added 2 commits April 9, 2026 17:34
app.commandLine.appendSwitch("js-flags", ...) alone doesn't
propagate to sandboxed renderer child processes. Add the flag
via webPreferences.additionalArguments so the renderer's V8
heap is actually capped at 2GB. Without this, the renderer was
observed growing to 4.7GB over 19 hours before crashing.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
When a worktree directory is deleted, fileSystem.stat() was called
on every 30-second git status poll with no caching, creating a tight
ENOENT error loop with rapidly growing fiber IDs (observed up to
#228210). This adds a 5-minute negative cache so missing paths are
only re-checked periodically instead of hammered every poll cycle.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

size:L 100-499 changed lines (additions + deletions). vouch:unvouched PR author is not yet trusted in the VOUCHED list.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

V8 OOM crash → white screen after extended sessions (Linux)

1 participant