fix(langchain): handle prompt_tokens_details as dict in _parse_usage_model by tanyelai · Pull Request #1614 · langfuse/langfuse-python

tanyelai · 2026-04-08T07:39:57Z

Why

_parse_usage_model handles input_token_details as a dict (flattening cache keys and subtracting from input), but prompt_tokens_details — the OpenAI/LiteLLM field name for the same data — is only handled as a list (Vertex AI format). When it arrives as a dict (e.g. {"cached_tokens": 12000} from LiteLLM proxy or OpenAI), the SDK skips it and the isinstance(v, int) filter on line 1318 silently drops it.

This causes:

Cache read tokens lost — prompt_tokens_details.cached_tokens is dropped entirely
Inflated input cost — input includes cached tokens priced at the full rate instead of being reduced

Reported in langfuse/langfuse#13024.

What changed

Added dict handling for prompt_tokens_details before the existing Vertex AI list handling, mirroring the input_token_details pattern (line 1233-1244):

Dict values are flattened as input_{key} (e.g. cached_tokens → input_cached_tokens)
Each value is subtracted from input total
Non-int values are skipped
Existing Vertex AI list handling is preserved via elif — no behavioral change for list format

Test plan

4 new test cases in tests/test_parse_usage_model.py:

test_prompt_tokens_details_dict_cached_tokens — dict with cached_tokens, verifies input is reduced and input_cached_tokens is set
test_prompt_tokens_details_dict_with_cache_creation — dict + top-level cache_creation_input_tokens, verifies both fields are preserved
test_prompt_tokens_details_list_vertex_ai — list format, verifies existing Vertex AI behavior is unchanged
test_prompt_tokens_details_dict_empty — empty dict, verifies no crash and input unchanged

All 6 tests pass (2 existing + 4 new):

tests/test_parse_usage_model.py  6 passed in 1.08s

Disclaimer: Experimental PR review

Greptile Summary

This PR fixes a silent data-loss bug in _parse_usage_model where prompt_tokens_details arriving as a dict (OpenAI / LiteLLM format, e.g. {"cached_tokens": 12000}) was unhandled, causing cached tokens to be omitted from the usage model and inflating reported input costs.

Root cause: the existing code only handled prompt_tokens_details as a Vertex AI list; a dict value fell through with no processing.
Fix: a new if isinstance(..., dict) branch is added before the existing elif isinstance(..., list) branch, mirroring the established input_token_details pattern — values are flattened as input_{key}, non-int values are skipped, and each value is subtracted from input with max(0, ...) clamping.
Vertex AI regression risk: zero — the elif ensures the list path is unchanged when prompt_tokens_details is a list.
Tests: 4 new targeted cases (dict, dict+cache_creation, Vertex AI list, empty dict) plus the 2 pre-existing tests all pass.
Minor inconsistency: the new dict handler omits the priority / priority_* key skip guard present in the input_token_details handler; while no known provider sends priority data via prompt_tokens_details, adding the guard would keep both handlers symmetric.

Confidence Score: 5/5

Safe to merge — the fix is correct, well-tested, and introduces no regressions.

The change is a targeted, well-scoped bug fix with comprehensive tests covering all new and existing paths. The only finding is a P2 style suggestion (missing priority-key guard for forward-safety), which does not affect current behavior or any known provider.

No files require special attention.

Vulnerabilities

No security concerns identified. The change only processes dict keys from an already-trusted usage model object; no external input is executed or exposed.

Important Files Changed

Filename	Overview
langfuse/langchain/CallbackHandler.py	Adds dict-typed handling for `prompt_tokens_details` before existing Vertex AI list handling; logic correctly mirrors `input_token_details` pattern with int guard and `max(0, ...)` clamping, with one minor inconsistency: no priority-key skip guard.
tests/test_parse_usage_model.py	Adds 4 well-targeted test cases covering dict-with-cached-tokens, dict-with-top-level-cache-creation, Vertex AI list (regression), and empty-dict edge cases; all assertions are correct against the updated implementation.

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A[_parse_usage_model called] --> B{input_token_details\nin usage_model?}
    B -- yes --> C[Flatten keys as input_key\nSkip priority keys\nSubtract from input]
    B -- no --> D{output_token_details\nin usage_model?}
    C --> D
    D -- yes --> E[Flatten keys as output_key\nSkip priority keys\nSubtract from output]
    D -- no --> F{prompt_tokens_details\nin usage_model?}
    E --> F
    F -- dict --> G["NEW: Flatten keys as input_key\nSkip non-int values\nSubtract from input\nmax(0, ...)"]
    F -- list --> H["EXISTING (Vertex AI): Extract modality+token_count\nStore as input_modality_X\nSubtract from input"]
    F -- absent/other --> I[Skip]
    G --> J[Filter: keep only int values]
    H --> J
    I --> J
    J --> K[Return cleaned usage_model or None]

_{Reviews (1): Last reviewed commit: "fix(langchain): handle prompt_tokens_det..." | Re-trigger Greptile}

_{(2/5) Greptile learns from your feedback when you react with thumbs up/down!}

…model When LiteLLM proxy or OpenAI returns prompt_tokens_details as a dict (e.g. {"cached_tokens": 12000}), _parse_usage_model only handled the Vertex AI list format and silently dropped the dict via the isinstance(v, int) filter on line 1318. This caused cached token counts to be lost and input costs to be inflated in Langfuse, since prompt_tokens was never adjusted for cache hits. Add dict handling for prompt_tokens_details mirroring the existing input_token_details pattern: flatten keys as input_{key}, subtract from input total. Existing Vertex AI list handling is preserved via elif. Closes langfuse/langfuse#13024

claude

Claude Code Review

This pull request is from a fork — automated review is disabled. A repository maintainer can comment @claude review to run a one-time review.

CLAassistant · 2026-04-08T07:40:15Z

All committers have signed the CLA.

claude bot reviewed Apr 8, 2026

View reviewed changes

tanyelai mentioned this pull request Apr 8, 2026

bug(sdk-python): LangchainCallbackHandler loses cache token metrics and inflates input costs langfuse/langfuse#13024

Open

hassiebp self-requested a review April 8, 2026 17:52

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(langchain): handle prompt_tokens_details as dict in _parse_usage_model#1614

fix(langchain): handle prompt_tokens_details as dict in _parse_usage_model#1614
tanyelai wants to merge 1 commit intolangfuse:mainfrom
tanyelai:fix/prompt-tokens-details-dict-handling

tanyelai commented Apr 8, 2026 •

edited by greptile-apps bot

Loading

Uh oh!

claude bot left a comment

Uh oh!

CLAassistant commented Apr 8, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

tanyelai commented Apr 8, 2026 • edited by greptile-apps bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Why

What changed

Test plan

Disclaimer: Experimental PR review

Greptile Summary

Confidence Score: 5/5

Vulnerabilities

Important Files Changed

Flowchart

Uh oh!

claude bot left a comment

Choose a reason for hiding this comment

Claude Code Review

Uh oh!

CLAassistant commented Apr 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

tanyelai commented Apr 8, 2026 •

edited by greptile-apps bot

Loading

CLAassistant commented Apr 8, 2026 •

edited

Loading