Skip to content

Normalize surrogate characters in ADK invoke_model responses#1427

Open
xumaple wants to merge 4 commits intomainfrom
maplexu/surrogate-bug-repro
Open

Normalize surrogate characters in ADK invoke_model responses#1427
xumaple wants to merge 4 commits intomainfrom
maplexu/surrogate-bug-repro

Conversation

@xumaple
Copy link
Copy Markdown
Contributor

@xumaple xumaple commented Apr 8, 2026

Summary

  • LLM responses may contain Unicode surrogate characters in Part.text fields (e.g., surrogate pairs representing emoji, or lone surrogates from encoding issues)
  • pydantic_core.to_json() crashes on these with PydanticSerializationError: UnicodeEncodeError: surrogates not allowed when serializing the invoke_model activity result
  • Normalize surrogates via UTF-16 encode/decode before returning from invoke_model: surrogate pairs are combined into proper code points (lossless), lone surrogates are replaced with U+FFFD (per Unicode spec)

LLM responses may contain Unicode surrogate characters in Part.text
fields, which cause pydantic_core.to_json() to crash with
PydanticSerializationError when serializing activity results.

Normalize surrogates via UTF-16 encode/decode before returning from
invoke_model: surrogate pairs are combined into proper code points
(lossless), and lone surrogates are replaced with U+FFFD (per the
Unicode spec).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@xumaple xumaple requested a review from a team as a code owner April 8, 2026 18:34
xumaple and others added 3 commits April 8, 2026 14:41
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant