fix(flows): resume long-running tools after matching responses#5072
fix(flows): resume long-running tools after matching responses#5072zeel2104 wants to merge 3 commits intogoogle:mainfrom
Conversation
|
Thanks for your pull request! It looks like this may be your first contribution to a Google open source project. Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA). View this failed invocation of the CLA check for more information. For the most up to date status, view the checks section at the bottom of the pull request. |
|
Hi @zeel2104 , Thank you for your contribution! We appreciate you taking the time to submit this pull request. |
tottenjordan
left a comment
There was a problem hiding this comment.
Tested this against our production interactive_creative agent (3 sequential LongRunningFunctionTool checkpoints with streaming enabled, ResumabilityConfig(is_resumable=True)).
Verification
- Installed PR branch (
zeel2104/adk-python@fix-5064-long-running-resume) — installs asgoogle-adk==1.28.0 - All 4 patches confirmed present:
InvocationContext.has_unresolved_long_running_tool_calls()replaces oldevents[-2:]checkpreserve_existing_function_call_ids()added tofunctions.py_finalize_model_response_eventcallspreserve_existing_function_call_idsbeforepopulate_client_function_call_idLlmAgent._run_async_impluseshas_unresolved_long_running_tool_calls
- 3 upstream unit tests pass:
test_has_unresolved_long_running_tool_calls_with_matching_response,test_has_unresolved_long_running_tool_calls_without_matching_response,test_finalize_model_response_event_preserves_function_call_ids - 125 downstream project tests pass with no regressions
Review of the fix
Bug 1 fix — Clean approach. Collecting all functionResponse IDs across the event list and checking against long_running_tool_ids is correct and handles the general case (not just the last 2 events). Using the same method in both llm_agent.py and base_llm_flow.py eliminates the inconsistency.
Bug 2 fix — preserve_existing_function_call_ids correctly carries forward IDs by matching on function name and only filling in missing IDs (if current_function_call.id: continue). Calling it before populate_client_function_call_id ensures existing IDs are preserved and only truly new calls get fresh UUIDs.
LGTM — this fixes both issues cleanly with minimal surface area. Thanks @zeel2104!
1. pyink formatting: collapse method signature to single line 2. Only count author='user' function_responses as resolutions — agent- generated auto-responses from LongRunningFunctionTool should not resolve the pause, only actual user resume responses should 3. Add null guard on event.long_running_tool_ids to fix mypy type error All 5158 unit tests pass with these changes. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
|
@zeel2104 I dug into the 3 CI failures and opened a PR against your branch with fixes: zeel2104#1 All 3 issues are in 1. pyink — method signature needs to be on one line per pyink rules. 2. mypy — 3. 5 test failures — the Fix: filter to function_response_ids = {
function_response.id
for event in events
for function_response in event.get_function_responses()
if function_response.id and event.author == 'user'
}After these fixes: 5,158 tests pass (including the 5 previously failing pause/resume tests + your 3 new tests). |
1. pyink formatting: collapse method signature to single line 2. Only count author='user' function_responses as resolutions — agent- generated auto-responses from LongRunningFunctionTool should not resolve the pause, only actual user resume responses should 3. Add null guard on event.long_running_tool_ids to fix mypy type error All 5158 unit tests pass with these changes.
|
@zeel2104 The CLA check is failing because my commit had a I've force-pushed an amended commit to my branch ( git reset --hard a7b2b73 # back to before the merge
git pull https://github.com/tottenjordan/adk-python.git fix-5064-long-running-resume-ci-fixes
git push --forceOr just squash everything when merging to main — that would also resolve the CLA issue. |
c1af36b to
d361c12
Compare
|
@tottenjordan |
Link to Issue or Description of Change
1. Link to an existing issue (if applicable):
2. Or, if no issue exists, describe the change:
Problem:
LongRunningFunctionToolresume could fail in resumable flows when streaming was enabled. There were two compounding issues:functionResponsehad already resolved the long-running tool call.Solution:
This change fixes both sides of the resume path:
functionResponseIDs in the current invocation branch before deciding to stop execution.Testing Plan
I verified the change with targeted unit tests covering both parts of the fix:
resumable invocation logic now continues when a long-running tool call has a matching functionResponse
streaming finalization preserves function call IDs across partial and final events
Commands run
python -m pytest tests\unittests\agents\test_invocation_context.py tests\unittests\flows\llm_flows\test_base_llm_flow.py -q --basetemp=.pytest_tmp
python -m pytest tests\unittests\agents\test_invocation_context.py -q -k "has_unresolved_long_running_tool_calls" --basetemp=.pytest_tmp
python -m pytest tests\unittests\flows\llm_flows\test_base_llm_flow.py -q -k "preserves_function_call_ids" --basetemp=.pytest_tmp
Unit Tests:
Passed locally: