fix(langchain): handle prompt_tokens_details as dict in _parse_usage_model#1614
Open
tanyelai wants to merge 1 commit intolangfuse:mainfrom
Open
fix(langchain): handle prompt_tokens_details as dict in _parse_usage_model#1614tanyelai wants to merge 1 commit intolangfuse:mainfrom
tanyelai wants to merge 1 commit intolangfuse:mainfrom
Conversation
…model
When LiteLLM proxy or OpenAI returns prompt_tokens_details as a dict
(e.g. {"cached_tokens": 12000}), _parse_usage_model only handled the
Vertex AI list format and silently dropped the dict via the
isinstance(v, int) filter on line 1318.
This caused cached token counts to be lost and input costs to be
inflated in Langfuse, since prompt_tokens was never adjusted for
cache hits.
Add dict handling for prompt_tokens_details mirroring the existing
input_token_details pattern: flatten keys as input_{key}, subtract
from input total. Existing Vertex AI list handling is preserved
via elif.
Closes langfuse/langfuse#13024
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Why
_parse_usage_modelhandlesinput_token_detailsas a dict (flattening cache keys and subtracting frominput), butprompt_tokens_details— the OpenAI/LiteLLM field name for the same data — is only handled as a list (Vertex AI format). When it arrives as a dict (e.g.{"cached_tokens": 12000}from LiteLLM proxy or OpenAI), the SDK skips it and theisinstance(v, int)filter on line 1318 silently drops it.This causes:
prompt_tokens_details.cached_tokensis dropped entirelyinputincludes cached tokens priced at the full rate instead of being reducedReported in langfuse/langfuse#13024.
What changed
Added dict handling for
prompt_tokens_detailsbefore the existing Vertex AI list handling, mirroring theinput_token_detailspattern (line 1233-1244):input_{key}(e.g.cached_tokens→input_cached_tokens)inputtotalelif— no behavioral change for list formatTest plan
4 new test cases in
tests/test_parse_usage_model.py:test_prompt_tokens_details_dict_cached_tokens— dict withcached_tokens, verifiesinputis reduced andinput_cached_tokensis settest_prompt_tokens_details_dict_with_cache_creation— dict + top-levelcache_creation_input_tokens, verifies both fields are preservedtest_prompt_tokens_details_list_vertex_ai— list format, verifies existing Vertex AI behavior is unchangedtest_prompt_tokens_details_dict_empty— empty dict, verifies no crash andinputunchangedAll 6 tests pass (2 existing + 4 new):
Disclaimer: Experimental PR review
Greptile Summary
This PR fixes a silent data-loss bug in
_parse_usage_modelwhereprompt_tokens_detailsarriving as adict(OpenAI / LiteLLM format, e.g.{"cached_tokens": 12000}) was unhandled, causing cached tokens to be omitted from the usage model and inflating reported input costs.prompt_tokens_detailsas a Vertex AIlist; adictvalue fell through with no processing.if isinstance(..., dict)branch is added before the existingelif isinstance(..., list)branch, mirroring the establishedinput_token_detailspattern — values are flattened asinput_{key}, non-intvalues are skipped, and each value is subtracted frominputwithmax(0, ...)clamping.elifensures the list path is unchanged whenprompt_tokens_detailsis a list.priority/priority_*key skip guard present in theinput_token_detailshandler; while no known provider sends priority data viaprompt_tokens_details, adding the guard would keep both handlers symmetric.Confidence Score: 5/5
Safe to merge — the fix is correct, well-tested, and introduces no regressions.
The change is a targeted, well-scoped bug fix with comprehensive tests covering all new and existing paths. The only finding is a P2 style suggestion (missing priority-key guard for forward-safety), which does not affect current behavior or any known provider.
No files require special attention.
Vulnerabilities
No security concerns identified. The change only processes dict keys from an already-trusted usage model object; no external input is executed or exposed.
Important Files Changed
prompt_tokens_detailsbefore existing Vertex AI list handling; logic correctly mirrorsinput_token_detailspattern with int guard andmax(0, ...)clamping, with one minor inconsistency: no priority-key skip guard.Flowchart
%%{init: {'theme': 'neutral'}}%% flowchart TD A[_parse_usage_model called] --> B{input_token_details\nin usage_model?} B -- yes --> C[Flatten keys as input_key\nSkip priority keys\nSubtract from input] B -- no --> D{output_token_details\nin usage_model?} C --> D D -- yes --> E[Flatten keys as output_key\nSkip priority keys\nSubtract from output] D -- no --> F{prompt_tokens_details\nin usage_model?} E --> F F -- dict --> G["NEW: Flatten keys as input_key\nSkip non-int values\nSubtract from input\nmax(0, ...)"] F -- list --> H["EXISTING (Vertex AI): Extract modality+token_count\nStore as input_modality_X\nSubtract from input"] F -- absent/other --> I[Skip] G --> J[Filter: keep only int values] H --> J I --> J J --> K[Return cleaned usage_model or None]Reviews (1): Last reviewed commit: "fix(langchain): handle prompt_tokens_det..." | Re-trigger Greptile
(2/5) Greptile learns from your feedback when you react with thumbs up/down!