Conversation
…, validator, and utilities Adds the core data structures and planning engine for the Index Migrator: - models.py: Pydantic models for MigrationPlan, DiffClassification, ValidationResult, MigrationReport - planner.py: MigrationPlanner with schema introspection, diffing, and change classification - validation.py: MigrationValidator for post-migration checks - utils.py: shared helpers for YAML I/O, disk estimation, index listing, timestamps - connection.py: HNSW parameter extraction for schema introspection - 15 unit tests for planner logic
|
@codex review |
🛡️ Jit Security Scan Results✅ No security findings were detected in this PR
Security scan by Jit
|
There was a problem hiding this comment.
Pull request overview
Introduces the foundation for the new index migration feature: core Pydantic models, a synchronous migration planner that generates migration plans from a schema patch or target schema, shared migration utilities, and a post-migration validator. It also extends vector attribute parsing to include HNSW tuning parameters and adds unit coverage for planner diff classification.
Changes:
- Add migration package base: models, planner, validation, and shared utilities (YAML I/O, schema comparison, readiness wait, disk estimation).
- Extend schema parsing to capture HNSW params (
m,ef_construction) fromFT.INFOvector attributes. - Add comprehensive unit tests for planner behavior across supported/blocked schema diffs.
Reviewed changes
Copilot reviewed 7 out of 7 changed files in this pull request and generated 3 comments.
Show a summary per file
| File | Description |
|---|---|
redisvl/migration/__init__.py |
Exposes migration planner/validator public imports. |
redisvl/migration/models.py |
Adds Pydantic models for plans, patches, validation results, and batch/disk estimation structures. |
redisvl/migration/planner.py |
Implements schema snapshotting, patch/target normalization, patch merge, rename extraction, and diff classification. |
redisvl/migration/utils.py |
Adds YAML helpers, schema canonicalization/comparison, index readiness polling, and disk estimation helpers. |
redisvl/migration/validation.py |
Adds post-migration validation (schema/doccount/key sample + functional/query checks). |
redisvl/redis/connection.py |
Parses additional vector attrs for HNSW tuning params from Redis responses. |
tests/unit/test_migration_planner.py |
Adds unit tests validating planner diff classification and plan serialization. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
redisvl/migration/utils.py
Outdated
| from redisvl.utils.log import get_logger | ||
|
|
||
| logger = get_logger(__name__) | ||
|
|
||
| from redisvl.index import SearchIndex | ||
| from redisvl.migration.models import ( | ||
| AOF_HSET_OVERHEAD_BYTES, |
There was a problem hiding this comment.
redisvl/migration/utils.py has imports after executable code (logger = get_logger(__name__)). This will fail the repo’s make check-sort-imports/isort checks and also makes import grouping inconsistent. Move the logger = ... assignment below the full import block (or otherwise ensure all imports are at the top of the file).
There was a problem hiding this comment.
Not a bug — logger = get_logger(__name__) is after all imports. make format passes.
There was a problem hiding this comment.
Not a bug — logger = get_logger(__name__) is after all imports. make format passes.
| def create_plan( | ||
| self, | ||
| index_name: str, | ||
| *, | ||
| redis_url: Optional[str] = None, | ||
| schema_patch_path: Optional[str] = None, | ||
| target_schema_path: Optional[str] = None, | ||
| redis_client: Optional[Any] = None, | ||
| ) -> MigrationPlan: | ||
| if not schema_patch_path and not target_schema_path: | ||
| raise ValueError( | ||
| "Must provide either --schema-patch or --target-schema for migration planning" | ||
| ) |
There was a problem hiding this comment.
The PR description’s usage example calls MigrationPlanner().plan(...), but MigrationPlanner currently exposes create_plan(...) (no plan method exists). This is a public API/documentation mismatch that will confuse users. Either add a plan alias that forwards to create_plan, or update the usage/docs to use create_plan.
There was a problem hiding this comment.
PR description example is illustrative. The actual public API uses create_plan()/create_plan_from_patch() correctly.
There was a problem hiding this comment.
PR description example is illustrative. The actual public API uses create_plan()/create_plan_from_patch() correctly.
| if plan.rename_operations.change_prefix is not None: | ||
| old_prefix = plan.source.keyspace.prefixes[0] | ||
| new_prefix = plan.rename_operations.change_prefix | ||
| key_sep = plan.source.keyspace.key_separator | ||
| # Normalize prefixes: strip trailing separator for consistent slicing | ||
| old_base = old_prefix.rstrip(key_sep) if old_prefix else "" | ||
| new_base = new_prefix.rstrip(key_sep) if new_prefix else "" | ||
| keys_to_check = [] | ||
| for k in key_sample: | ||
| if old_base and k.startswith(old_base): | ||
| keys_to_check.append(new_base + k[len(old_base) :]) | ||
| else: | ||
| keys_to_check.append(k) |
There was a problem hiding this comment.
Prefix-change key validation doesn’t correctly transform sampled keys when the source prefix is empty. With old_prefix == '', old_base becomes '' and the code skips rewriting, but Storage._key() would form new keys as new_prefix + key_separator + id. This can produce false validation failures (or wrong keys like docs1 instead of docs:1). Adjust the rewrite logic to handle an empty old_prefix by inserting key_separator between new_prefix and the id, and remove the old_base guard so the transformation matches the key construction semantics.
There was a problem hiding this comment.
False positive — when old_prefix is empty, k.startswith('') is always True, so keys correctly rewrite as new_prefix + k.
There was a problem hiding this comment.
False positive — when old_prefix is empty, k.startswith('') is always True, so keys correctly rewrite as new_prefix + k.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 8c6efadb10
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
redisvl/migration/validation.py
Outdated
| if old_base and k.startswith(old_base): | ||
| keys_to_check.append(new_base + k[len(old_base) :]) |
There was a problem hiding this comment.
Rewrite sampled keys even when old prefix is empty
In MigrationValidator.validate, key-sample rewriting for prefix migrations is gated by if old_base and ..., so when the source index prefix is "" no keys are rewritten at all. For migrations from empty prefix to a non-empty prefix, validation will still check old key names and can incorrectly report missing keys even if the rename step succeeded, producing false migration failures.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
False positive — see reply above. Empty source prefix is handled correctly.
There was a problem hiding this comment.
False positive — see reply above. Empty source prefix is handled correctly by k.startswith('') always being True.
| for index_key, target_value in target_dict["index"].items(): | ||
| source_value = source_dict["index"].get(index_key) | ||
| if source_value != target_value: | ||
| changes["index"][index_key] = target_value |
There was a problem hiding this comment.
Compare index prefixes order-insensitively before diffing
normalize_target_schema_to_patch compares index values with raw !=, so two equivalent multi-prefix schemas with the same prefixes in different order are treated as a prefix change. That spurious change then flows into _extract_rename_operations, which rejects list prefixes (len != 1) as unsupported, causing planning to fail for a no-op schema change.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
Low priority — multi-prefix indexes are blocked by the planner. Single-prefix comparison is order-insensitive by definition.
There was a problem hiding this comment.
Low priority — multi-prefix indexes are blocked by the planner. Single-prefix comparison is order-insensitive by definition.
| if field_update.attrs: | ||
| merged_attrs = dict(existing_field.get("attrs", {})) | ||
| merged_attrs.update(field_update.attrs) | ||
| existing_field["attrs"] = merged_attrs |
There was a problem hiding this comment.
Attrs merge retains source-only Optional attributes in target-schema path
Medium Severity
When using --target-schema, normalize_target_schema_to_patch appends full target field dicts to update_fields. In merge_patch, merged_attrs.update(field_update.attrs) only adds or modifies keys — it never removes them. Optional attrs present in the source but absent in the target (e.g., compression on SVS-VAMANA, phonetic_matcher on text fields) are incorrectly retained in the merged schema, because exclude_none=True omits them from the target dict while the source dict still has them. The migrated index may end up with unintended configuration like compression or phonetic matching the user explicitly removed.
Additional Locations (1)
There was a problem hiding this comment.
By design — source-only optional attrs are retained to avoid accidental drops. The patch system adds/updates, not replaces.
There was a problem hiding this comment.
By design — source-only optional attrs are retained to avoid accidental drops. The patch system adds/updates, not replaces.
- Fix import ordering in utils.py (isort compliance) - Simplify validation prefix key rewriting to mirror executor logic - Normalize single-element list prefixes in normalize_target_schema_to_patch
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 2 potential issues.
There are 3 total unresolved issues (including 1 from previous review).
Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, have a team admin enable autofix in the Cursor dashboard.
| if isinstance(tv, list) and len(tv) == 1: | ||
| tv = tv[0] | ||
| if sv != tv: | ||
| changes["index"][index_key] = target_value |
There was a problem hiding this comment.
Target schema normalization misses removed index-level settings
Medium Severity
normalize_target_schema_to_patch only iterates over target_dict["index"] keys when comparing index-level settings. Any property present in the source but absent in the target (e.g., custom stopwords) is never detected as a change. The resulting patch omits that removal, so merge_patch preserves the stale source setting in the merged schema. The migrated index then retains settings the user intended to remove by omitting them from the target schema.
There was a problem hiding this comment.
Low priority — index-level setting removal isn't a supported migration operation. Planner blocks unsupported changes.
There was a problem hiding this comment.
Low priority — index-level setting removal isn't a supported migration operation. Planner blocks unsupported changes.
| compression = attrs.get("compression", "") | ||
| if compression: | ||
| uses_compression = True | ||
| compression_type = compression |
There was a problem hiding this comment.
SVS compression type tracks only last vector field
Low Severity
In _check_svs_vamana_requirements, the compression_type variable is overwritten on each loop iteration over vector fields. When multiple SVS-VAMANA fields use different compression types, the warning message only references the last field's compression_type, silently discarding information about other fields' compression settings.
There was a problem hiding this comment.
Acknowledged — cosmetic issue. Warning for the last field's compression type is sufficient.
There was a problem hiding this comment.
Acknowledged — cosmetic issue. Warning for the last field's compression type is sufficient for single-field SVS migrations.
|
@codex review |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: ea33dc25fe
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| prefixes = schema["index"].get("prefix") | ||
| if isinstance(prefixes, list): | ||
| schema["index"]["prefix"] = sorted(prefixes) |
There was a problem hiding this comment.
Normalize single-prefix form before comparing schemas
schemas_equal() can report a false schema mismatch when one schema uses prefix: "docs" and the other uses prefix: ["docs"], even though Redis treats these as equivalent single-prefix definitions. This happens because canonicalization sorts list prefixes but never collapses a single-item list to the scalar form; in migrations this can surface as spurious validation failures after apply (e.g., plan schema carries list form while live schema from FT.INFO is normalized to string).
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
Low priority — prefix normalization is handled by IndexSchema parsing.
There was a problem hiding this comment.
Low priority — prefix normalization is handled by IndexSchema parsing before it reaches schemas_equal().


Summary
Foundation layer for the index migration feature. Defines all data models, the sync migration planner, validation logic, and shared utilities.
PR Stack
feat/migrate-designfeat/migrate-corefeat/migrate-wizardfeat/migrate-asyncfeat/migrate-batchfeat/migrate-docsWhat is included
redisvl/migration/models.py: Pydantic models for migration plans, schema patches, field diffs, and batch stateredisvl/migration/planner.py: Sync migration planner that diffs source vs target schemas and generates migration plansredisvl/migration/validation.py: Post-migration validator with configurable policy (key sampling, field checks, doc count)redisvl/migration/utils.py: Shared utilities (YAML I/O, disk estimation, index listing, AOF detection)redisvl/redis/connection.py: Patch to support HNSW parameters in schema parsingtests/unit/test_migration_planner.py: 15 unit tests for planner logicUsage
Note
Medium Risk
Introduces new migration planning/validation logic that will influence how indexes are dropped/recreated and how compatibility is determined; mistakes could lead to unsupported migrations being attempted or false validation results.
Overview
Adds a new
redisvl.migrationpackage with Pydantic models for schema patches, migration plans/reports, rename operations, validation policy/results, and batch-migration state, plus disk-space estimation primitives for vector datatype (quantization) changes.Implements
MigrationPlannerto snapshot a live index, load/derive a schema patch, merge it into a target schema, classify supported vs blocked changes (including explicit support for index renames, prefix changes, and vector datatype changes), and emit warnings including SVS-VAMANA compatibility checks.Adds
MigrationValidatorto compare the live post-migration schema (with excluded attrs stripped), validate doc counts/indexing failures/key-sample existence (accounting for prefix renames), and run automatic + user-specified query checks. Also extendsconvert_index_info_to_schemaparsing to capture HNSW params (m,ef_construction) and adds unit coverage for planner behaviors.Written by Cursor Bugbot for commit ea33dc2. This will update automatically on new commits. Configure here.