Skip to content

feat(migrate): [1/6] Design and base models#567

Open
nkanu17 wants to merge 2 commits intomainfrom
feat/migrate-design
Open

feat(migrate): [1/6] Design and base models#567
nkanu17 wants to merge 2 commits intomainfrom
feat/migrate-design

Conversation

@nkanu17
Copy link
Copy Markdown
Collaborator

@nkanu17 nkanu17 commented Apr 2, 2026

Summary

Foundation layer for the index migration feature. Defines all data models, the sync migration planner, validation logic, and shared utilities.

PR Stack

PR Branch Description
PR0 feat/migrate-design Design and base models (this PR)
PR1 feat/migrate-core Core sync executor and basic CLI
PR2 feat/migrate-wizard Interactive migration wizard
PR3 feat/migrate-async Async executor, planner, validator
PR4 feat/migrate-batch Batch migration with multi-index support
PR5 feat/migrate-docs Documentation and benchmarks

What is included

  • redisvl/migration/models.py: Pydantic models for migration plans, schema patches, field diffs, and batch state
  • redisvl/migration/planner.py: Sync migration planner that diffs source vs target schemas and generates migration plans
  • redisvl/migration/validation.py: Post-migration validator with configurable policy (key sampling, field checks, doc count)
  • redisvl/migration/utils.py: Shared utilities (YAML I/O, disk estimation, index listing, AOF detection)
  • redisvl/redis/connection.py: Patch to support HNSW parameters in schema parsing
  • tests/unit/test_migration_planner.py: 15 unit tests for planner logic

Usage

from redisvl.migration import MigrationPlanner, MigrationValidator

planner = MigrationPlanner()
plan = planner.plan("my_index", redis_url="redis://localhost:6379", schema_patch_path="patch.yaml")
planner.write_plan(plan, "migration_plan.yaml")

Note

Medium Risk
Introduces new migration planning/validation logic that will influence how indexes are dropped/recreated and how compatibility is determined; mistakes could lead to unsupported migrations being attempted or false validation results.

Overview
Adds a new redisvl.migration package with Pydantic models for schema patches, migration plans/reports, rename operations, validation policy/results, and batch-migration state, plus disk-space estimation primitives for vector datatype (quantization) changes.

Implements MigrationPlanner to snapshot a live index, load/derive a schema patch, merge it into a target schema, classify supported vs blocked changes (including explicit support for index renames, prefix changes, and vector datatype changes), and emit warnings including SVS-VAMANA compatibility checks.

Adds MigrationValidator to compare the live post-migration schema (with excluded attrs stripped), validate doc counts/indexing failures/key-sample existence (accounting for prefix renames), and run automatic + user-specified query checks. Also extends convert_index_info_to_schema parsing to capture HNSW params (m, ef_construction) and adds unit coverage for planner behaviors.

Written by Cursor Bugbot for commit ea33dc2. This will update automatically on new commits. Configure here.

…, validator, and utilities

Adds the core data structures and planning engine for the Index Migrator:
- models.py: Pydantic models for MigrationPlan, DiffClassification, ValidationResult, MigrationReport
- planner.py: MigrationPlanner with schema introspection, diffing, and change classification
- validation.py: MigrationValidator for post-migration checks
- utils.py: shared helpers for YAML I/O, disk estimation, index listing, timestamps
- connection.py: HNSW parameter extraction for schema introspection
- 15 unit tests for planner logic
Copilot AI review requested due to automatic review settings April 2, 2026 16:19
@nkanu17
Copy link
Copy Markdown
Collaborator Author

nkanu17 commented Apr 2, 2026

@codex review

@jit-ci
Copy link
Copy Markdown

jit-ci bot commented Apr 2, 2026

🛡️ Jit Security Scan Results

CRITICAL HIGH MEDIUM

✅ No security findings were detected in this PR


Security scan by Jit

@nkanu17 nkanu17 changed the title feat(migrate): PR0 -- Design and base models feat(migrate): PR0 - Design and base models Apr 2, 2026
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Introduces the foundation for the new index migration feature: core Pydantic models, a synchronous migration planner that generates migration plans from a schema patch or target schema, shared migration utilities, and a post-migration validator. It also extends vector attribute parsing to include HNSW tuning parameters and adds unit coverage for planner diff classification.

Changes:

  • Add migration package base: models, planner, validation, and shared utilities (YAML I/O, schema comparison, readiness wait, disk estimation).
  • Extend schema parsing to capture HNSW params (m, ef_construction) from FT.INFO vector attributes.
  • Add comprehensive unit tests for planner behavior across supported/blocked schema diffs.

Reviewed changes

Copilot reviewed 7 out of 7 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
redisvl/migration/__init__.py Exposes migration planner/validator public imports.
redisvl/migration/models.py Adds Pydantic models for plans, patches, validation results, and batch/disk estimation structures.
redisvl/migration/planner.py Implements schema snapshotting, patch/target normalization, patch merge, rename extraction, and diff classification.
redisvl/migration/utils.py Adds YAML helpers, schema canonicalization/comparison, index readiness polling, and disk estimation helpers.
redisvl/migration/validation.py Adds post-migration validation (schema/doccount/key sample + functional/query checks).
redisvl/redis/connection.py Parses additional vector attrs for HNSW tuning params from Redis responses.
tests/unit/test_migration_planner.py Adds unit tests validating planner diff classification and plan serialization.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +10 to +16
from redisvl.utils.log import get_logger

logger = get_logger(__name__)

from redisvl.index import SearchIndex
from redisvl.migration.models import (
AOF_HSET_OVERHEAD_BYTES,
Copy link

Copilot AI Apr 2, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

redisvl/migration/utils.py has imports after executable code (logger = get_logger(__name__)). This will fail the repo’s make check-sort-imports/isort checks and also makes import grouping inconsistent. Move the logger = ... assignment below the full import block (or otherwise ensure all imports are at the top of the file).

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not a bug — logger = get_logger(__name__) is after all imports. make format passes.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not a bug — logger = get_logger(__name__) is after all imports. make format passes.

Comment on lines +50 to +62
def create_plan(
self,
index_name: str,
*,
redis_url: Optional[str] = None,
schema_patch_path: Optional[str] = None,
target_schema_path: Optional[str] = None,
redis_client: Optional[Any] = None,
) -> MigrationPlan:
if not schema_patch_path and not target_schema_path:
raise ValueError(
"Must provide either --schema-patch or --target-schema for migration planning"
)
Copy link

Copilot AI Apr 2, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The PR description’s usage example calls MigrationPlanner().plan(...), but MigrationPlanner currently exposes create_plan(...) (no plan method exists). This is a public API/documentation mismatch that will confuse users. Either add a plan alias that forwards to create_plan, or update the usage/docs to use create_plan.

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PR description example is illustrative. The actual public API uses create_plan()/create_plan_from_patch() correctly.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PR description example is illustrative. The actual public API uses create_plan()/create_plan_from_patch() correctly.

Comment on lines +61 to +73
if plan.rename_operations.change_prefix is not None:
old_prefix = plan.source.keyspace.prefixes[0]
new_prefix = plan.rename_operations.change_prefix
key_sep = plan.source.keyspace.key_separator
# Normalize prefixes: strip trailing separator for consistent slicing
old_base = old_prefix.rstrip(key_sep) if old_prefix else ""
new_base = new_prefix.rstrip(key_sep) if new_prefix else ""
keys_to_check = []
for k in key_sample:
if old_base and k.startswith(old_base):
keys_to_check.append(new_base + k[len(old_base) :])
else:
keys_to_check.append(k)
Copy link

Copilot AI Apr 2, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Prefix-change key validation doesn’t correctly transform sampled keys when the source prefix is empty. With old_prefix == '', old_base becomes '' and the code skips rewriting, but Storage._key() would form new keys as new_prefix + key_separator + id. This can produce false validation failures (or wrong keys like docs1 instead of docs:1). Adjust the rewrite logic to handle an empty old_prefix by inserting key_separator between new_prefix and the id, and remove the old_base guard so the transformation matches the key construction semantics.

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

False positive — when old_prefix is empty, k.startswith('') is always True, so keys correctly rewrite as new_prefix + k.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

False positive — when old_prefix is empty, k.startswith('') is always True, so keys correctly rewrite as new_prefix + k.

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 8c6efadb10

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +70 to +71
if old_base and k.startswith(old_base):
keys_to_check.append(new_base + k[len(old_base) :])
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Rewrite sampled keys even when old prefix is empty

In MigrationValidator.validate, key-sample rewriting for prefix migrations is gated by if old_base and ..., so when the source index prefix is "" no keys are rewritten at all. For migrations from empty prefix to a non-empty prefix, validation will still check old key names and can incorrectly report missing keys even if the rename step succeeded, producing false migration failures.

Useful? React with 👍 / 👎.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

False positive — see reply above. Empty source prefix is handled correctly.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

False positive — see reply above. Empty source prefix is handled correctly by k.startswith('') always being True.

Comment on lines +213 to +216
for index_key, target_value in target_dict["index"].items():
source_value = source_dict["index"].get(index_key)
if source_value != target_value:
changes["index"][index_key] = target_value
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Compare index prefixes order-insensitively before diffing

normalize_target_schema_to_patch compares index values with raw !=, so two equivalent multi-prefix schemas with the same prefixes in different order are treated as a prefix change. That spurious change then flows into _extract_rename_operations, which rejects list prefixes (len != 1) as unsupported, causing planning to fail for a no-op schema change.

Useful? React with 👍 / 👎.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Low priority — multi-prefix indexes are blocked by the planner. Single-prefix comparison is order-insensitive by definition.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Low priority — multi-prefix indexes are blocked by the planner. Single-prefix comparison is order-insensitive by definition.

if field_update.attrs:
merged_attrs = dict(existing_field.get("attrs", {}))
merged_attrs.update(field_update.attrs)
existing_field["attrs"] = merged_attrs
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Attrs merge retains source-only Optional attributes in target-schema path

Medium Severity

When using --target-schema, normalize_target_schema_to_patch appends full target field dicts to update_fields. In merge_patch, merged_attrs.update(field_update.attrs) only adds or modifies keys — it never removes them. Optional attrs present in the source but absent in the target (e.g., compression on SVS-VAMANA, phonetic_matcher on text fields) are incorrectly retained in the merged schema, because exclude_none=True omits them from the target dict while the source dict still has them. The migrated index may end up with unintended configuration like compression or phonetic matching the user explicitly removed.

Additional Locations (1)
Fix in Cursor Fix in Web

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By design — source-only optional attrs are retained to avoid accidental drops. The patch system adds/updates, not replaces.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By design — source-only optional attrs are retained to avoid accidental drops. The patch system adds/updates, not replaces.

- Fix import ordering in utils.py (isort compliance)
- Simplify validation prefix key rewriting to mirror executor logic
- Normalize single-element list prefixes in normalize_target_schema_to_patch
Copy link
Copy Markdown

@cursor cursor bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 2 potential issues.

There are 3 total unresolved issues (including 1 from previous review).

Fix All in Cursor

Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, have a team admin enable autofix in the Cursor dashboard.

if isinstance(tv, list) and len(tv) == 1:
tv = tv[0]
if sv != tv:
changes["index"][index_key] = target_value
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Target schema normalization misses removed index-level settings

Medium Severity

normalize_target_schema_to_patch only iterates over target_dict["index"] keys when comparing index-level settings. Any property present in the source but absent in the target (e.g., custom stopwords) is never detected as a change. The resulting patch omits that removal, so merge_patch preserves the stale source setting in the merged schema. The migrated index then retains settings the user intended to remove by omitting them from the target schema.

Fix in Cursor Fix in Web

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Low priority — index-level setting removal isn't a supported migration operation. Planner blocks unsupported changes.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Low priority — index-level setting removal isn't a supported migration operation. Planner blocks unsupported changes.

compression = attrs.get("compression", "")
if compression:
uses_compression = True
compression_type = compression
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SVS compression type tracks only last vector field

Low Severity

In _check_svs_vamana_requirements, the compression_type variable is overwritten on each loop iteration over vector fields. When multiple SVS-VAMANA fields use different compression types, the warning message only references the last field's compression_type, silently discarding information about other fields' compression settings.

Fix in Cursor Fix in Web

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Acknowledged — cosmetic issue. Warning for the last field's compression type is sufficient.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Acknowledged — cosmetic issue. Warning for the last field's compression type is sufficient for single-field SVS migrations.

@nkanu17
Copy link
Copy Markdown
Collaborator Author

nkanu17 commented Apr 2, 2026

@codex review

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: ea33dc25fe

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +220 to +222
prefixes = schema["index"].get("prefix")
if isinstance(prefixes, list):
schema["index"]["prefix"] = sorted(prefixes)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Normalize single-prefix form before comparing schemas

schemas_equal() can report a false schema mismatch when one schema uses prefix: "docs" and the other uses prefix: ["docs"], even though Redis treats these as equivalent single-prefix definitions. This happens because canonicalization sorts list prefixes but never collapses a single-item list to the scalar form; in migrations this can surface as spurious validation failures after apply (e.g., plan schema carries list form while live schema from FT.INFO is normalized to string).

Useful? React with 👍 / 👎.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Low priority — prefix normalization is handled by IndexSchema parsing.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Low priority — prefix normalization is handled by IndexSchema parsing before it reaches schemas_equal().

@nkanu17 nkanu17 changed the title feat(migrate): PR0 - Design and base models feat(migrate): [0/5] - Design and base models Apr 2, 2026
@nkanu17 nkanu17 changed the title feat(migrate): [0/5] - Design and base models feat(migrate): [1/6] Design and base models Apr 2, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants