Summary
When the remote ACP subprocess crashes or closes its stdout, Connection._receive_loop exits cleanly on EOF without rejecting pending outgoing requests. Any in-flight send_request() futures hang forever.
This means a subprocess crash during initialize(), new_session(), or prompt() is silently converted into an infinite hang instead of raising an error.
Reproduction
- Spawn an ACP subprocess that crashes immediately on startup (e.g.,
claude-agent-acp on Node.js < 20 which crashes with SyntaxError)
- Call
conn.initialize() — it sends the JSON-RPC request and awaits the response future
- The subprocess exits with code 1, stdout closes
_filter_jsonrpc_lines reads EOF → feeds EOF to filtered_reader
_receive_loop reads empty line → breaks normally (no exception)
TaskSupervisor._on_done calls task.result() → returns None (clean exit)
_on_receive_error is never called (only fires on exceptions)
_state.reject_all_outgoing() is never called
- The
initialize() future hangs forever
Root Cause
In connection.py:
async def _receive_loop(self) -> None:
try:
while True:
line = await self._reader.readline()
if not line:
break # EOF — exits cleanly, no exception raised
...
except asyncio.CancelledError:
return
The _on_receive_error callback is registered via TaskSupervisor.create(..., on_error=self._on_receive_error), but _on_done only calls on_error when task.result() raises an exception. A clean EOF exit does not raise, so reject_all_outgoing is never invoked.
Suggested Fix
Reject all pending requests when the receive loop exits on EOF:
async def _receive_loop(self) -> None:
try:
while True:
line = await self._reader.readline()
if not line:
break
...
except asyncio.CancelledError:
return
# EOF: remote end closed. Reject any in-flight requests so callers
# get an exception instead of hanging forever.
self._state.reject_all_outgoing(
ConnectionError("Connection closed: remote end sent EOF")
)
Impact
We discovered this while debugging why ACP evals (SWE-bench Multimodal) hang indefinitely for certain repos. The ACP subprocess crashed on startup due to incompatible Node.js versions, but instead of getting an error, the SDK hung forever at conn.initialize(). This affected ~65% of eval instances.
Environment
agent-client-protocol version: 0.8.1
- Python: 3.12/3.13
- OS: Linux (K8s pods)
Summary
When the remote ACP subprocess crashes or closes its stdout,
Connection._receive_loopexits cleanly on EOF without rejecting pending outgoing requests. Any in-flightsend_request()futures hang forever.This means a subprocess crash during
initialize(),new_session(), orprompt()is silently converted into an infinite hang instead of raising an error.Reproduction
claude-agent-acpon Node.js < 20 which crashes withSyntaxError)conn.initialize()— it sends the JSON-RPC request and awaits the response future_filter_jsonrpc_linesreads EOF → feeds EOF tofiltered_reader_receive_loopreads empty line → breaks normally (no exception)TaskSupervisor._on_donecallstask.result()→ returnsNone(clean exit)_on_receive_erroris never called (only fires on exceptions)_state.reject_all_outgoing()is never calledinitialize()future hangs foreverRoot Cause
In
connection.py:The
_on_receive_errorcallback is registered viaTaskSupervisor.create(..., on_error=self._on_receive_error), but_on_doneonly callson_errorwhentask.result()raises an exception. A clean EOF exit does not raise, soreject_all_outgoingis never invoked.Suggested Fix
Reject all pending requests when the receive loop exits on EOF:
Impact
We discovered this while debugging why ACP evals (SWE-bench Multimodal) hang indefinitely for certain repos. The ACP subprocess crashed on startup due to incompatible Node.js versions, but instead of getting an error, the SDK hung forever at
conn.initialize(). This affected ~65% of eval instances.Environment
agent-client-protocolversion: 0.8.1