Fix #480: Multi Vector > 10000 documents throws scoring error#574
Fix #480: Multi Vector > 10000 documents throws scoring error#574JiwaniZakir wants to merge 1 commit intoredis:mainfrom
Conversation
When an index has more than 10 000 documents, Redis FT.AGGREGATE can produce pipeline rows where a VECTOR_RANGE distance attribute is absent, causing "Could not find the value for a parameter name" errors. Wrap each score APPLY expression with if(exists(@distance_i), ..., 0) so missing distances default to 0 instead of crashing. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
Hi, I’m Jit, a friendly security platform designed to help developers build secure applications from day zero with an MVS (Minimal viable security) mindset. In case there are security findings, they will be communicated to you as a comment inside the PR. Hope you’ll enjoy using Jit. Questions? Comments? Want to learn more? Get in touch with us. |
There was a problem hiding this comment.
Pull request overview
Fixes an aggregation scoring failure in MultiVectorQuery when Redis does not yield a distance_i value for some documents (reported when querying indexes with >10,000 documents), by guarding the distance-to-similarity conversion with exists() and defaulting missing distances to a score of 0.
Changes:
- Wrap per-vector similarity computation in
if(exists(@distance_i), ..., 0)to preventAPPLYfrom erroring when@distance_iis missing. - Update the unit test’s expected serialized aggregation string to match the new guarded
APPLYexpressions.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated no comments.
| File | Description |
|---|---|
redisvl/query/aggregate.py |
Adds an exists() guard around per-vector distance normalization to avoid aggregation errors when distance_i is absent. |
tests/unit/test_aggregation_types.py |
Updates the expected query string for MultiVectorQuery to reflect the new if(exists(...)) scoring expression. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Closes #480
When a
MultiVectorQueryruns against an index with more than 10,000 documents, Redis does not yield a distance value for every document, causing the bare(2 - @distance_i)/2expression inMultiVectorQuery.__init__(redisvl/query/aggregate.py) to raise"Could not find the value for a parameter name". The fix wraps eachapply()call in anif(exists(@distance_i), (2 - @distance_i)/2, 0)guard so that documents missing a distance score default to 0 instead of erroring.redisvl/query/aggregate.py—MultiVectorQuery.__init__: replaced the two bareapply()expressions withif(exists(...), ..., 0)variants.tests/unit/test_aggregation_types.py—test_multi_vector_query_string: updated the expected query string to match the newif(exists(...))form.The fix was verified by updating the unit test in
test_aggregation_types.py, which asserts the full serialized aggregation string and now passes with the new expression.This PR was created with AI assistance (Claude). The changes were reviewed by quality gates and a critic model before submission.
Note
Low Risk
Low risk: small query-string change that only affects how missing
@distance_ivalues are handled, plus a matching unit-test update.Overview
Fixes
MultiVectorQueryscoring failures when Redis does not return@distance_ifor some documents by wrapping each per-vector score calculation inif(exists(@distance_i), (2 - @distance_i)/2, 0)so missing distances default to 0 instead of erroring.Updates the unit test asserting the serialized aggregation query string to match the new guarded
APPLYexpressions.Reviewed by Cursor Bugbot for commit c1e69dc. Bugbot is set up for automated code reviews on this repo. Configure here.