fix: renderer OOM crash recovery and memory management#1843
fix: renderer OOM crash recovery and memory management#1843mihai2mn wants to merge 6 commits intopingdotgg:mainfrom
Conversation
When the renderer hits V8 heap limit (~3.7GB) and crashes, the main process stays alive with the backend and agents still running. This adds a render-process-gone handler that auto-reloads the renderer after 500ms, restoring real-time visibility into running agent work. Also caps the renderer V8 heap at 2GB via --max-old-space-size so GC runs more aggressively instead of growing unchecked. Fixes pingdotgg#1686 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Pure logic module that decides which threads to dehydrate based on activity, session status, and recency. Keeps up to 5 fully-hydrated threads in memory; never evicts the active thread or threads with running agents. Oldest idle threads are evicted first. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Threads can be 'dehydrated' to free renderer memory — messages, activities, plans, and diffs are cleared while preserving sidebar metadata. They are re-hydrated from the server when the user navigates back. The hydrated flag on Thread tracks load state. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
When navigating between threads, inactive threads beyond the keep limit (5) are dehydrated to free memory. When viewing a dehydrated thread, its data is re-fetched from the server automatically. Dehydrated threads show a loading indicator while data loads. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
Important Review skippedAuto reviews are disabled on this repository. Please check the settings in the CodeRabbit UI or the ⚙️ Run configurationConfiguration used: Repository UI Review profile: CHILL Plan: Pro Run ID: You can disable this status message by setting the Use the checkbox below for a quick retry:
✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 489dc1b3c4
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| messages: [], | ||
| activities: [], | ||
| proposedPlans: [], | ||
| turnDiffSummaries: [], |
There was a problem hiding this comment.
Retain sidebar status when evicting thread payloads
evictThreadData clears messages, activities, and proposedPlans, and updateThreadState then rebuilds sidebarThreadsById from this dehydrated thread. Because sidebar fields like latestUserMessageAt, hasPendingApprovals, hasPendingUserInput, and hasActionableProposedPlan are derived from those arrays, eviction drops pending-action badges and recency/sort metadata for inactive threads even though that state still exists on the server.
Useful? React with 👍 / 👎.
| ...fresh, | ||
| // Preserve any client-only state | ||
| session: existing.session ?? fresh.session, |
There was a problem hiding this comment.
Use snapshot session state during thread hydration
hydrateThread replaces the fetched snapshot with session: existing.session ?? fresh.session, which can keep stale/optimistic local session data and ignore authoritative server session fields from getSnapshot(). If the local session is outdated (for example after reconnects or stop-request races), the hydrated thread can show incorrect status and be treated as non-running for later eviction decisions.
Useful? React with 👍 / 👎.
| const allThreads = useStore((store) => store.threads); | ||
|
|
||
| // Evict inactive threads when navigating to keep memory bounded. | ||
| // This is a separate effect from the WS subscription lifecycle. |
There was a problem hiding this comment.
🔴 Critical routes/__root.tsx:596
The activeThreadId extraction at line 598 checks for pathname.startsWith("/chat/"), but thread routes are /$threadId (not /chat/$threadId). Since pathname never starts with /chat/, activeThreadId is always null and the currently viewed thread is never protected from eviction. Consider updating the prefix check to match the actual route structure.
🤖 Copy this AI Prompt to have your agent fix this:
In file apps/web/src/routes/__root.tsx around line 596:
The `activeThreadId` extraction at line 598 checks for `pathname.startsWith("/chat/")`, but thread routes are `/$threadId` (not `/chat/$threadId`). Since `pathname` never starts with `/chat/`, `activeThreadId` is always `null` and the currently viewed thread is never protected from eviction. Consider updating the prefix check to match the actual route structure.
Evidence trail:
- apps/web/src/routes/__root.tsx lines 598-600: `const activeThreadId = pathname.startsWith("/chat/") ? (pathname.split("/chat/")[1]?.split("/")[0] ?? null) : null;`
- apps/web/src/routes/_chat.$threadId.tsx line 249: `export const Route = createFileRoute("/_chat/$threadId")({` - underscore prefix makes it pathless layout route
- apps/web/src/components/Sidebar.tsx line 826: `to: "/$threadId"` - confirms thread URLs are /$threadId
- apps/web/src/components/ChatView.tsx lines 939, 962, 1598, 3524, 3902: all navigate `to: "/$threadId"`
- apps/web/src/routes/__root.tsx line 255: `to: "/$threadId"` - root navigation also uses /$threadId pattern
| }); | ||
| } | ||
|
|
||
| export function evictThreadData(state: AppState, threadId: ThreadId): AppState { |
There was a problem hiding this comment.
🟡 Medium src/store.ts:1144
After evictThreadData clears the thread's messages, activities, and proposedPlans, updateThreadState rebuilds the sidebar summary from the now-empty arrays. This causes hasPendingApprovals, hasPendingUserInput, hasActionableProposedPlan, and latestUserMessageAt to be recomputed as false/null even when the server still has pending items, so the sidebar incorrectly hides status indicators until re-hydration. Consider preserving the original sidebar summary during eviction, or ensuring the sidebar reflects server state rather than client-evicted state.
🤖 Copy this AI Prompt to have your agent fix this:
In file apps/web/src/store.ts around line 1144:
After `evictThreadData` clears the thread's `messages`, `activities`, and `proposedPlans`, `updateThreadState` rebuilds the sidebar summary from the now-empty arrays. This causes `hasPendingApprovals`, `hasPendingUserInput`, `hasActionableProposedPlan`, and `latestUserMessageAt` to be recomputed as `false`/`null` even when the server still has pending items, so the sidebar incorrectly hides status indicators until re-hydration. Consider preserving the original sidebar summary during eviction, or ensuring the sidebar reflects server state rather than client-evicted state.
Evidence trail:
apps/web/src/store.ts:1144-1152 (evictThreadData clears arrays and calls updateThreadState), apps/web/src/store.ts:536-573 (updateThreadState calls buildSidebarThreadSummary on the updated thread), apps/web/src/store.ts:214-234 (buildSidebarThreadSummary computes latestUserMessageAt, hasPendingApprovals, hasPendingUserInput, hasActionableProposedPlan from thread.messages, thread.activities, thread.proposedPlans)
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 3 potential issues.
❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.
Reviewed by Cursor Bugbot for commit 489dc1b. Configure here.
| useEffect(() => { | ||
| const activeThreadId = pathname.startsWith("/chat/") | ||
| ? (pathname.split("/chat/")[1]?.split("/")[0] ?? null) | ||
| : null; |
There was a problem hiding this comment.
Wrong pathname pattern makes active thread always null
High Severity
The thread eviction effect incorrectly assumes thread URLs start with /chat/. TanStack Router's pathless layout means actual thread URLs are /$threadId, causing activeThreadId to always be null. This prevents the active thread from being protected, leading to an infinite evict-hydrate loop and a "Loading conversation..." screen.
Reviewed by Cursor Bugbot for commit 489dc1b. Configure here.
| turnDiffSummaries: [], | ||
| hydrated: false, | ||
| })); | ||
| } |
There was a problem hiding this comment.
Eviction corrupts sidebar summary derived from cleared arrays
Medium Severity
Evicting thread data clears messages, activities, and proposedPlans. This causes updateThreadState to rebuild the sidebar summary from empty arrays, incorrectly setting latestUserMessageAt to null and pending action flags to false. As a result, evicted threads lose their correct sort order and status badges in the sidebar.
Reviewed by Cursor Bugbot for commit 489dc1b. Configure here.
| } | ||
| }, 500); | ||
| } | ||
| }); |
There was a problem hiding this comment.
No retry limit risks infinite crash-reload loop
Medium Severity
The render-process-gone handler unconditionally schedules a reload after 500ms for every crash/OOM/killed event, with no maximum retry count or backoff. If the renderer crashes immediately after loading (e.g., due to corrupted persisted state or a deterministic startup error), this creates an infinite crash-reload loop at ~2Hz. The user would see a rapidly flickering white screen with no way to recover except force-quitting the app — arguably worse than the previous static white screen behavior.
Reviewed by Cursor Bugbot for commit 489dc1b. Configure here.
ApprovabilityVerdict: Needs human review 2 blocking correctness issues found. This PR introduces a significant new memory management system with thread eviction/hydration, crash recovery, and V8 heap limits. Multiple critical bugs have been identified in review comments, including a pathname pattern mismatch that would cause infinite evict-hydrate loops on the active thread, and no retry limit on crash recovery that could cause infinite reload loops. You can customize Macroscope's approvability policy. Learn more. |
app.commandLine.appendSwitch("js-flags", ...) alone doesn't
propagate to sandboxed renderer child processes. Add the flag
via webPreferences.additionalArguments so the renderer's V8
heap is actually capped at 2GB. Without this, the renderer was
observed growing to 4.7GB over 19 hours before crashing.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
When a worktree directory is deleted, fileSystem.stat() was called on every 30-second git status poll with no caching, creating a tight ENOENT error loop with rapidly growing fiber IDs (observed up to #228210). This adds a 5-minute negative cache so missing paths are only re-checked periodically instead of hammered every poll cycle. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>


Summary
Fixes #1686 — V8 OOM crash that kills the renderer during long sessions, causing the white-screen freeze.
Root cause: The renderer holds ALL thread data (messages, activities, tool outputs) for ALL threads in the Zustand store indefinitely. During long sessions with heavy tool use, the V8 heap grows to ~3.7GB and crashes. There is no crash recovery handler, so the user sees a dead white window while agents continue running invisibly in the background.
This PR adds three layers of defense:
render-process-gonehandler auto-reloads the renderer after 500ms on crash/OOM, restoring real-time visibility into running agents. Also logsunresponsive/responsiveevents for diagnostics.--max-old-space-size=2048forces GC to run more aggressively at 2GB instead of growing unchecked to 3.7GB+.Changes
apps/desktop/src/main.tsrender-process-gonehandler,unresponsive/responsivelogging,--max-old-space-size=2048apps/web/src/lib/threadEviction.tsapps/web/src/types.tshydrated: booleanonThreadapps/web/src/store.tsevictThreadData/hydrateThreadpure actionsapps/web/src/routes/__root.tsxEventRouterapps/web/src/components/ChatView.tsxthreadEviction.test.ts(5 tests),store.test.ts(+2 tests), updated Thread factoriesEvidence from live crash
Captured during development — the renderer crashed after 39 minutes of use:
Main process + backend + Claude agents all survived — only the renderer died with zero recovery. The desktop log showed no awareness of the crash (no
render-process-gonehandler existed).Test plan
bun typecheck— 0 errorsbun lint— 0 errorsbun fmt— cleanbun run test— 688/689 pass (1 pre-existing timeout inMessagesTimeline.test.tsx)🤖 Generated with Claude Code
Note
Medium Risk
Touches Electron process flags/crash recovery and adds thread data eviction/hydration in the web store, which can affect app stability and state consistency if hydration/eviction edge cases are missed. Server-side change adds caching around filesystem checks and could mask newly-created worktrees for up to the TTL if incorrect.
Overview
Prevents long-session renderer OOM/white-screen issues by capping V8 heap to 2GB (main + renderer) and adding renderer crash/unresponsive diagnostics plus an auto-reload on
render-process-gonefor crash/OOM/kill.Adds thread memory eviction in the web app: introduces a
Thread.hydratedflag, a keep-limit policy (EVICTION_KEEP_COUNT=5) to dehydrate inactive threads on navigation, and store actions toevictThreadData(drop messages/activities/plans/diffs) andhydrateThread(restore full thread from a server snapshot).ChatViewnow shows a loading state for dehydrated threads and re-hydrates them on demand.On the server, adds a negative cache for missing git worktree paths to avoid repeated
stat()/ENOENT loops when worktrees are deleted.Reviewed by Cursor Bugbot for commit e7b400c. Bugbot is set up for automated code reviews on this repo. Configure here.
Note
Fix renderer OOM crash recovery and add thread memory eviction in desktop app
--max-old-space-size=2048, and auto-reloads the renderer window 500ms after a crash, OOM, or kill event.hydratedflag to theThreadtype and implementsevictThreadData/hydrateThreadstore reducers to dehydrate inactive threads (clearing messages, activities, plans, and diff summaries) and restore them on demand.selectThreadsToEvictfrom the new threadEviction.ts module to dehydrate threads beyond a keep threshold, skipping the active thread and any with running sessions.ChatViewshows a "Loading conversation..." placeholder and re-fetches the full thread from the server.stat()calls for missing worktree paths for up to 5 minutes.Macroscope summarized e7b400c.