Skip to content

perf: Vectorize fuzzy_lookup_embedding with numpy ops#234

Merged
gvanrossum merged 4 commits intomicrosoft:mainfrom
KRRT7:perf/vectorbase-numpy
Apr 11, 2026
Merged

perf: Vectorize fuzzy_lookup_embedding with numpy ops#234
gvanrossum merged 4 commits intomicrosoft:mainfrom
KRRT7:perf/vectorbase-numpy

Conversation

@KRRT7
Copy link
Copy Markdown
Contributor

@KRRT7 KRRT7 commented Apr 10, 2026

Summary

  • Replace Python-level list comprehension + sort in fuzzy_lookup_embedding with numpy vectorized operations (np.flatnonzero, np.argpartition)
  • Rewrite fuzzy_lookup_embedding_in_subset to compute dot products only for subset indices instead of scanning all vectors + predicate filter

Benchmark (Azure Standard_D2s_v5, 384-dim embeddings)

Benchmark Before After Speedup
fuzzy_lookup_embedding (1K vecs) 259μs 64μs 4.1x
fuzzy_lookup_embedding (10K vecs) 6.1ms 531μs 11.5x
fuzzy_lookup_embedding_in_subset (1K of 10K) 3.2ms 244μs 13.0x

Why this matters

These functions are called on every fuzzy_lookup — the core search path for conversation queries. At 10K vectors (a long conversation), the Python iteration path takes 6ms per query. With multiple queries per request (e.g., searching across properties + timestamps + topics), this adds up.

The numpy path stays in C for the heavy lifting: score filtering via np.flatnonzero, O(n) top-k via np.argpartition, and subset dot products via fancy indexing. ScoredInt objects are only created for the final top-k results.

Test plan

  • make format check test passes (470 passed, 12 pre-existing online test failures)
  • Benchmark included: tests/benchmarks/test_benchmark_vectorbase.py

KRRT7 added 4 commits April 10, 2026 13:02
Replace Python-level iteration + sort with numpy operations:
- No-predicate path: np.flatnonzero for score filtering, np.argpartition
  for O(n) top-k selection — avoids building ScoredInt for every vector
- Predicate path: numpy pre-filters by score, applies predicate only to
  candidates above threshold
- Subset lookup: numpy fancy indexing computes dot products only for
  subset indices instead of delegating to full-vector scan with predicate
- Add pytest-benchmark to dev dependency group so CI has the benchmark
  fixture available
- Replace hand-rolled StubEmbeddingModel with create_test_embedding_model()
  to satisfy IEmbeddingModel protocol (fixes pyright)
@gvanrossum gvanrossum merged commit 8c8f67a into microsoft:main Apr 11, 2026
16 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants