feat(migrate): [1/6] Design and base models by nkanu17 · Pull Request #567 · redis/redis-vl-python

nkanu17 · 2026-04-02T16:19:44Z

Summary

Foundation layer for the index migration feature. Defines all data models, the sync migration planner, validation logic, and shared utilities.

PR Stack

PR	Branch	Description
PR0	`feat/migrate-design`	Design and base models (this PR)
PR1	`feat/migrate-core`	Core sync executor and basic CLI
PR2	`feat/migrate-wizard`	Interactive migration wizard
PR3	`feat/migrate-async`	Async executor, planner, validator
PR4	`feat/migrate-batch`	Batch migration with multi-index support
PR5	`feat/migrate-docs`	Documentation and benchmarks

What is included

redisvl/migration/models.py: Pydantic models for migration plans, schema patches, field diffs, and batch state
redisvl/migration/planner.py: Sync migration planner that diffs source vs target schemas and generates migration plans
redisvl/migration/validation.py: Post-migration validator with configurable policy (key sampling, field checks, doc count)
redisvl/migration/utils.py: Shared utilities (YAML I/O, disk estimation, index listing, AOF detection)
redisvl/redis/connection.py: Patch to support HNSW parameters in schema parsing
tests/unit/test_migration_planner.py: 15 unit tests for planner logic

Usage

from redisvl.migration import MigrationPlanner, MigrationValidator

planner = MigrationPlanner()
plan = planner.plan("my_index", redis_url="redis://localhost:6379", schema_patch_path="patch.yaml")
planner.write_plan(plan, "migration_plan.yaml")

Note

Medium Risk
Introduces new migration planning/validation logic that will influence how indexes are dropped/recreated and how compatibility is determined; mistakes could lead to unsupported migrations being attempted or false validation results.

Overview
Adds a new redisvl.migration package with Pydantic models for schema patches, migration plans/reports, rename operations, validation policy/results, and batch-migration state, plus disk-space estimation primitives for vector datatype (quantization) changes.

Implements MigrationPlanner to snapshot a live index, load/derive a schema patch, merge it into a target schema, classify supported vs blocked changes (including explicit support for index renames, prefix changes, and vector datatype changes), and emit warnings including SVS-VAMANA compatibility checks.

Adds MigrationValidator to compare the live post-migration schema (with excluded attrs stripped), validate doc counts/indexing failures/key-sample existence (accounting for prefix renames), and run automatic + user-specified query checks. Also extends convert_index_info_to_schema parsing to capture HNSW params (m, ef_construction) and adds unit coverage for planner behaviors.

^{Written by Cursor Bugbot for commit ea33dc2. This will update automatically on new commits. Configure here.}

…, validator, and utilities Adds the core data structures and planning engine for the Index Migrator: - models.py: Pydantic models for MigrationPlan, DiffClassification, ValidationResult, MigrationReport - planner.py: MigrationPlanner with schema introspection, diffing, and change classification - validation.py: MigrationValidator for post-migration checks - utils.py: shared helpers for YAML I/O, disk estimation, index listing, timestamps - connection.py: HNSW parameter extraction for schema introspection - 15 unit tests for planner logic

nkanu17 · 2026-04-02T16:21:33Z

@codex review

jit-ci · 2026-04-02T16:21:58Z

🛡️ Jit Security Scan Results

✅ No security findings were detected in this PR

^{Security scan by Jit}

Copilot

Pull request overview

Introduces the foundation for the new index migration feature: core Pydantic models, a synchronous migration planner that generates migration plans from a schema patch or target schema, shared migration utilities, and a post-migration validator. It also extends vector attribute parsing to include HNSW tuning parameters and adds unit coverage for planner diff classification.

Changes:

Add migration package base: models, planner, validation, and shared utilities (YAML I/O, schema comparison, readiness wait, disk estimation).
Extend schema parsing to capture HNSW params (m, ef_construction) from FT.INFO vector attributes.
Add comprehensive unit tests for planner behavior across supported/blocked schema diffs.

Reviewed changes

Copilot reviewed 7 out of 7 changed files in this pull request and generated 3 comments.

Show a summary per file

File	Description
`redisvl/migration/__init__.py`	Exposes migration planner/validator public imports.
`redisvl/migration/models.py`	Adds Pydantic models for plans, patches, validation results, and batch/disk estimation structures.
`redisvl/migration/planner.py`	Implements schema snapshotting, patch/target normalization, patch merge, rename extraction, and diff classification.
`redisvl/migration/utils.py`	Adds YAML helpers, schema canonicalization/comparison, index readiness polling, and disk estimation helpers.
`redisvl/migration/validation.py`	Adds post-migration validation (schema/doccount/key sample + functional/query checks).
`redisvl/redis/connection.py`	Parses additional vector attrs for HNSW tuning params from Redis responses.
`tests/unit/test_migration_planner.py`	Adds unit tests validating planner diff classification and plan serialization.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-04-02T16:24:44Z

redisvl/migration/utils.py

+from redisvl.utils.log import get_logger
+
+logger = get_logger(__name__)
+
+from redisvl.index import SearchIndex
+from redisvl.migration.models import (
+    AOF_HSET_OVERHEAD_BYTES,


redisvl/migration/utils.py has imports after executable code (logger = get_logger(__name__)). This will fail the repo’s make check-sort-imports/isort checks and also makes import grouping inconsistent. Move the logger = ... assignment below the full import block (or otherwise ensure all imports are at the top of the file).

Not a bug — logger = get_logger(__name__) is after all imports. make format passes.

Copilot · 2026-04-02T16:24:45Z

redisvl/migration/planner.py

+    def create_plan(
+        self,
+        index_name: str,
+        *,
+        redis_url: Optional[str] = None,
+        schema_patch_path: Optional[str] = None,
+        target_schema_path: Optional[str] = None,
+        redis_client: Optional[Any] = None,
+    ) -> MigrationPlan:
+        if not schema_patch_path and not target_schema_path:
+            raise ValueError(
+                "Must provide either --schema-patch or --target-schema for migration planning"
+            )


The PR description’s usage example calls MigrationPlanner().plan(...), but MigrationPlanner currently exposes create_plan(...) (no plan method exists). This is a public API/documentation mismatch that will confuse users. Either add a plan alias that forwards to create_plan, or update the usage/docs to use create_plan.

PR description example is illustrative. The actual public API uses create_plan()/create_plan_from_patch() correctly.

PR description example is illustrative. The actual public API uses create_plan()/create_plan_from_patch() correctly.

Copilot · 2026-04-02T16:24:45Z

redisvl/migration/validation.py

+            if plan.rename_operations.change_prefix is not None:
+                old_prefix = plan.source.keyspace.prefixes[0]
+                new_prefix = plan.rename_operations.change_prefix
+                key_sep = plan.source.keyspace.key_separator
+                # Normalize prefixes: strip trailing separator for consistent slicing
+                old_base = old_prefix.rstrip(key_sep) if old_prefix else ""
+                new_base = new_prefix.rstrip(key_sep) if new_prefix else ""
+                keys_to_check = []
+                for k in key_sample:
+                    if old_base and k.startswith(old_base):
+                        keys_to_check.append(new_base + k[len(old_base) :])
+                    else:
+                        keys_to_check.append(k)


Prefix-change key validation doesn’t correctly transform sampled keys when the source prefix is empty. With old_prefix == '', old_base becomes '' and the code skips rewriting, but Storage._key() would form new keys as new_prefix + key_separator + id. This can produce false validation failures (or wrong keys like docs1 instead of docs:1). Adjust the rewrite logic to handle an empty old_prefix by inserting key_separator between new_prefix and the id, and remove the old_base guard so the transformation matches the key construction semantics.

False positive — when old_prefix is empty, k.startswith('') is always True, so keys correctly rewrite as new_prefix + k.

False positive — when old_prefix is empty, k.startswith('') is always True, so keys correctly rewrite as new_prefix + k.

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 8c6efadb10

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-04-02T16:27:54Z

redisvl/migration/validation.py

+                    if old_base and k.startswith(old_base):
+                        keys_to_check.append(new_base + k[len(old_base) :])


Rewrite sampled keys even when old prefix is empty

In MigrationValidator.validate, key-sample rewriting for prefix migrations is gated by if old_base and ..., so when the source index prefix is "" no keys are rewritten at all. For migrations from empty prefix to a non-empty prefix, validation will still check old key names and can incorrectly report missing keys even if the rename step succeeded, producing false migration failures.

Useful? React with 👍 / 👎.

False positive — see reply above. Empty source prefix is handled correctly.

False positive — see reply above. Empty source prefix is handled correctly by k.startswith('') always being True.

chatgpt-codex-connector · 2026-04-02T16:27:54Z

redisvl/migration/planner.py

+        for index_key, target_value in target_dict["index"].items():
+            source_value = source_dict["index"].get(index_key)
+            if source_value != target_value:
+                changes["index"][index_key] = target_value


Compare index prefixes order-insensitively before diffing

normalize_target_schema_to_patch compares index values with raw !=, so two equivalent multi-prefix schemas with the same prefixes in different order are treated as a prefix change. That spurious change then flows into _extract_rename_operations, which rejects list prefixes (len != 1) as unsupported, causing planning to fail for a no-op schema change.

Useful? React with 👍 / 👎.

Low priority — multi-prefix indexes are blocked by the planner. Single-prefix comparison is order-insensitive by definition.

cursor · 2026-04-02T16:29:14Z

redisvl/migration/planner.py

+            if field_update.attrs:
+                merged_attrs = dict(existing_field.get("attrs", {}))
+                merged_attrs.update(field_update.attrs)
+                existing_field["attrs"] = merged_attrs


Attrs merge retains source-only Optional attributes in target-schema path

Medium Severity

When using --target-schema, normalize_target_schema_to_patch appends full target field dicts to update_fields. In merge_patch, merged_attrs.update(field_update.attrs) only adds or modifies keys — it never removes them. Optional attrs present in the source but absent in the target (e.g., compression on SVS-VAMANA, phonetic_matcher on text fields) are incorrectly retained in the merged schema, because exclude_none=True omits them from the target dict while the source dict still has them. The migrated index may end up with unintended configuration like compression or phonetic matching the user explicitly removed.

Additional Locations (1)

redisvl/migration/planner.py#L202-L207

By design — source-only optional attrs are retained to avoid accidental drops. The patch system adds/updates, not replaces.

redisvl/migration/validation.py

- Fix import ordering in utils.py (isort compliance) - Simplify validation prefix key rewriting to mirror executor logic - Normalize single-element list prefixes in normalize_target_schema_to_patch

cursor

Cursor Bugbot has reviewed your changes and found 2 potential issues.

There are 3 total unresolved issues (including 1 from previous review).

^{Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, have a team admin enable autofix in the Cursor dashboard.}

cursor · 2026-04-02T20:36:32Z

redisvl/migration/planner.py

+                if isinstance(tv, list) and len(tv) == 1:
+                    tv = tv[0]
+            if sv != tv:
+                changes["index"][index_key] = target_value


Target schema normalization misses removed index-level settings

Medium Severity

normalize_target_schema_to_patch only iterates over target_dict["index"] keys when comparing index-level settings. Any property present in the source but absent in the target (e.g., custom stopwords) is never detected as a change. The resulting patch omits that removal, so merge_patch preserves the stale source setting in the merged schema. The migrated index then retains settings the user intended to remove by omitting them from the target schema.

Low priority — index-level setting removal isn't a supported migration operation. Planner blocks unsupported changes.

cursor · 2026-04-02T20:36:32Z

redisvl/migration/planner.py

+                compression = attrs.get("compression", "")
+                if compression:
+                    uses_compression = True
+                    compression_type = compression


SVS compression type tracks only last vector field

Low Severity

In _check_svs_vamana_requirements, the compression_type variable is overwritten on each loop iteration over vector fields. When multiple SVS-VAMANA fields use different compression types, the warning message only references the last field's compression_type, silently discarding information about other fields' compression settings.

Acknowledged — cosmetic issue. Warning for the last field's compression type is sufficient.

Acknowledged — cosmetic issue. Warning for the last field's compression type is sufficient for single-field SVS migrations.

nkanu17 · 2026-04-02T21:31:06Z

@codex review

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: ea33dc25fe

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-04-02T21:37:03Z

redisvl/migration/utils.py

+    prefixes = schema["index"].get("prefix")
+    if isinstance(prefixes, list):
+        schema["index"]["prefix"] = sorted(prefixes)


Normalize single-prefix form before comparing schemas

schemas_equal() can report a false schema mismatch when one schema uses prefix: "docs" and the other uses prefix: ["docs"], even though Redis treats these as equivalent single-prefix definitions. This happens because canonicalization sorts list prefixes but never collapses a single-item list to the scalar form; in migrations this can surface as spurious validation failures after apply (e.g., plan schema carries list form while live schema from FT.INFO is normalized to string).

Useful? React with 👍 / 👎.

Low priority — prefix normalization is handled by IndexSchema parsing.

Low priority — prefix normalization is handled by IndexSchema parsing before it reaches schemas_equal().

Copilot AI review requested due to automatic review settings April 2, 2026 16:19

Copilot started reviewing on behalf of nkanu17 April 2, 2026 16:20 View session

nkanu17 mentioned this pull request Apr 2, 2026

feat(migrate): [6/6] Documentation and benchmarks #572

Open

nkanu17 changed the title ~~feat(migrate): PR0 -- Design and base models~~ feat(migrate): PR0 - Design and base models Apr 2, 2026

Copilot AI reviewed Apr 2, 2026

View reviewed changes

chatgpt-codex-connector bot reviewed Apr 2, 2026

View reviewed changes

cursor bot reviewed Apr 2, 2026

View reviewed changes

fix: address codex review for PR0 (design)

ea33dc2

- Fix import ordering in utils.py (isort compliance) - Simplify validation prefix key rewriting to mirror executor logic - Normalize single-element list prefixes in normalize_target_schema_to_patch

cursor bot reviewed Apr 2, 2026

View reviewed changes

chatgpt-codex-connector bot reviewed Apr 2, 2026

View reviewed changes

nkanu17 changed the title ~~feat(migrate): PR0 - Design and base models~~ feat(migrate): [0/5] - Design and base models Apr 2, 2026

nkanu17 changed the title ~~feat(migrate): [0/5] - Design and base models~~ feat(migrate): [1/6] Design and base models Apr 2, 2026

		if old_base and k.startswith(old_base):
		keys_to_check.append(new_base + k[len(old_base) :])

Conversation

nkanu17 commented Apr 2, 2026 • edited by cursor bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

PR Stack

What is included

Usage

Uh oh!

nkanu17 commented Apr 2, 2026

Uh oh!

jit-ci bot commented Apr 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🛡️ Jit Security Scan Results

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Apr 2, 2026

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 2, 2026

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 2, 2026

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector bot Apr 2, 2026

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector bot Apr 2, 2026

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

cursor bot Apr 2, 2026

Choose a reason for hiding this comment

Attrs merge retains source-only Optional attributes in target-schema path

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

cursor bot left a comment

Choose a reason for hiding this comment

Uh oh!

cursor bot Apr 2, 2026

Choose a reason for hiding this comment

Target schema normalization misses removed index-level settings

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

cursor bot Apr 2, 2026

Choose a reason for hiding this comment

nkanu17 commented Apr 2, 2026 •

edited by cursor bot

Loading

jit-ci bot commented Apr 2, 2026 •

edited

Loading