Skip to content

fix(langchain): handle prompt_tokens_details as dict in _parse_usage_model#1614

Open
tanyelai wants to merge 1 commit intolangfuse:mainfrom
tanyelai:fix/prompt-tokens-details-dict-handling
Open

fix(langchain): handle prompt_tokens_details as dict in _parse_usage_model#1614
tanyelai wants to merge 1 commit intolangfuse:mainfrom
tanyelai:fix/prompt-tokens-details-dict-handling

Conversation

@tanyelai
Copy link
Copy Markdown

@tanyelai tanyelai commented Apr 8, 2026

Why

_parse_usage_model handles input_token_details as a dict (flattening cache keys and subtracting from input), but prompt_tokens_details — the OpenAI/LiteLLM field name for the same data — is only handled as a list (Vertex AI format). When it arrives as a dict (e.g. {"cached_tokens": 12000} from LiteLLM proxy or OpenAI), the SDK skips it and the isinstance(v, int) filter on line 1318 silently drops it.

This causes:

  • Cache read tokens lostprompt_tokens_details.cached_tokens is dropped entirely
  • Inflated input costinput includes cached tokens priced at the full rate instead of being reduced

Reported in langfuse/langfuse#13024.

What changed

Added dict handling for prompt_tokens_details before the existing Vertex AI list handling, mirroring the input_token_details pattern (line 1233-1244):

  • Dict values are flattened as input_{key} (e.g. cached_tokensinput_cached_tokens)
  • Each value is subtracted from input total
  • Non-int values are skipped
  • Existing Vertex AI list handling is preserved via elif — no behavioral change for list format

Test plan

4 new test cases in tests/test_parse_usage_model.py:

  • test_prompt_tokens_details_dict_cached_tokens — dict with cached_tokens, verifies input is reduced and input_cached_tokens is set
  • test_prompt_tokens_details_dict_with_cache_creation — dict + top-level cache_creation_input_tokens, verifies both fields are preserved
  • test_prompt_tokens_details_list_vertex_ai — list format, verifies existing Vertex AI behavior is unchanged
  • test_prompt_tokens_details_dict_empty — empty dict, verifies no crash and input unchanged

All 6 tests pass (2 existing + 4 new):

tests/test_parse_usage_model.py  6 passed in 1.08s

Disclaimer: Experimental PR review

Greptile Summary

This PR fixes a silent data-loss bug in _parse_usage_model where prompt_tokens_details arriving as a dict (OpenAI / LiteLLM format, e.g. {"cached_tokens": 12000}) was unhandled, causing cached tokens to be omitted from the usage model and inflating reported input costs.

  • Root cause: the existing code only handled prompt_tokens_details as a Vertex AI list; a dict value fell through with no processing.
  • Fix: a new if isinstance(..., dict) branch is added before the existing elif isinstance(..., list) branch, mirroring the established input_token_details pattern — values are flattened as input_{key}, non-int values are skipped, and each value is subtracted from input with max(0, ...) clamping.
  • Vertex AI regression risk: zero — the elif ensures the list path is unchanged when prompt_tokens_details is a list.
  • Tests: 4 new targeted cases (dict, dict+cache_creation, Vertex AI list, empty dict) plus the 2 pre-existing tests all pass.
  • Minor inconsistency: the new dict handler omits the priority / priority_* key skip guard present in the input_token_details handler; while no known provider sends priority data via prompt_tokens_details, adding the guard would keep both handlers symmetric.

Confidence Score: 5/5

Safe to merge — the fix is correct, well-tested, and introduces no regressions.

The change is a targeted, well-scoped bug fix with comprehensive tests covering all new and existing paths. The only finding is a P2 style suggestion (missing priority-key guard for forward-safety), which does not affect current behavior or any known provider.

No files require special attention.

Vulnerabilities

No security concerns identified. The change only processes dict keys from an already-trusted usage model object; no external input is executed or exposed.

Important Files Changed

Filename Overview
langfuse/langchain/CallbackHandler.py Adds dict-typed handling for prompt_tokens_details before existing Vertex AI list handling; logic correctly mirrors input_token_details pattern with int guard and max(0, ...) clamping, with one minor inconsistency: no priority-key skip guard.
tests/test_parse_usage_model.py Adds 4 well-targeted test cases covering dict-with-cached-tokens, dict-with-top-level-cache-creation, Vertex AI list (regression), and empty-dict edge cases; all assertions are correct against the updated implementation.

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A[_parse_usage_model called] --> B{input_token_details\nin usage_model?}
    B -- yes --> C[Flatten keys as input_key\nSkip priority keys\nSubtract from input]
    B -- no --> D{output_token_details\nin usage_model?}
    C --> D
    D -- yes --> E[Flatten keys as output_key\nSkip priority keys\nSubtract from output]
    D -- no --> F{prompt_tokens_details\nin usage_model?}
    E --> F
    F -- dict --> G["NEW: Flatten keys as input_key\nSkip non-int values\nSubtract from input\nmax(0, ...)"]
    F -- list --> H["EXISTING (Vertex AI): Extract modality+token_count\nStore as input_modality_X\nSubtract from input"]
    F -- absent/other --> I[Skip]
    G --> J[Filter: keep only int values]
    H --> J
    I --> J
    J --> K[Return cleaned usage_model or None]
Loading

Reviews (1): Last reviewed commit: "fix(langchain): handle prompt_tokens_det..." | Re-trigger Greptile

(2/5) Greptile learns from your feedback when you react with thumbs up/down!

…model

When LiteLLM proxy or OpenAI returns prompt_tokens_details as a dict
(e.g. {"cached_tokens": 12000}), _parse_usage_model only handled the
Vertex AI list format and silently dropped the dict via the
isinstance(v, int) filter on line 1318.

This caused cached token counts to be lost and input costs to be
inflated in Langfuse, since prompt_tokens was never adjusted for
cache hits.

Add dict handling for prompt_tokens_details mirroring the existing
input_token_details pattern: flatten keys as input_{key}, subtract
from input total. Existing Vertex AI list handling is preserved
via elif.

Closes langfuse/langfuse#13024
Copy link
Copy Markdown

@claude claude bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Claude Code Review

This pull request is from a fork — automated review is disabled. A repository maintainer can comment @claude review to run a one-time review.

@CLAassistant
Copy link
Copy Markdown

CLAassistant commented Apr 8, 2026

CLA assistant check
All committers have signed the CLA.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants