Skip to content

Fix Array::validity#7285

Merged
gatesn merged 2 commits intodevelopfrom
ngates/validity
Apr 3, 2026
Merged

Fix Array::validity#7285
gatesn merged 2 commits intodevelopfrom
ngates/validity

Conversation

@gatesn
Copy link
Copy Markdown
Contributor

@gatesn gatesn commented Apr 3, 2026

The vtables refactor ended up obscuring the validity function in favor of ValidityHelper, which was incorrect in a few places.

Signed-off-by: Nicholas Gates <nick@nickgates.com>
@gatesn gatesn added the changelog/break A breaking API change label Apr 3, 2026
Signed-off-by: Nicholas Gates <nick@nickgates.com>
@gatesn gatesn requested a review from connortsui20 April 3, 2026 20:59
@connortsui20
Copy link
Copy Markdown
Contributor

@claude review

@claude
Copy link
Copy Markdown
Contributor

claude bot commented Apr 3, 2026

Claude finished @connortsui20's task in 2m 58s —— View job


PR Review: Fix Array::validity

  • Read PR context and description
  • Analyze git diff and key changed files
  • Review core validity changes
  • Review encoding-specific changes
  • Check for bugs, correctness issues, style problems
  • Post final review

Summary

This PR fixes a correctness bug introduced by the vtable refactor: previously, some ArrayData types cached a Validity field directly (via the ValidityHelper trait), but this cached value could become stale when with_slots updated the array's child slots (e.g., during deserialization or re-sharding). The fix is to store only Nullability in ArrayData and compute Validity on demand from the slots via the new child_to_validity helper. The Array<V>::validity() and ArrayView<V>::validity() methods now return VortexResult<Validity> instead of &Validity.

The approach is architecturally sound and the change is mechanically consistent across ~150 files. A few minor issues are worth noting:


Issues

1. .unwrap() in test code — minor style violation

encodings/datetime-parts/src/canonical.rs:163 uses .unwrap() instead of ? in a test that returns VortexResult<()>. All other tests in this PR were updated to use either ? or vortex_expect. This is inconsistent with the project's style guide:

// Current (inconsistent):
assert!(primitive_values.validity().unwrap().mask_eq(&validity, &mut ctx)?);

// Expected:
assert!(primitive_values.validity()?.mask_eq(&validity, &mut ctx)?);

Fix this →


2. vortex_expect in production (non-test) code

Several production paths use vortex_expect on validity() where ? would be cleaner and maintain error propagation. Since all current vtable implementations are infallible (Ok(child_to_validity(...))), these panics will never be triggered for well-formed arrays — but the pattern is inconsistent with the rest of the codebase which prefers ?:

File Usage
vortex-array/src/arrays/primitive/array/accessor.rs vortex_expect("primitive validity should be derivable")
vortex-array/src/arrays/primitive/array/cast.rs (reinterpret_cast) vortex_expect(...)
vortex-array/src/patches.rs (patch values) vortex_expect(...)
encodings/fastlanes/src/rle/array/rle_compress.rs (padded_validity) vortex_expect(...)
encodings/alp/src/alp/decompress.rs (decompress_chunked_core) vortex_expect(...)
encodings/alp/src/alp_rd/mod.rs vortex_expect(...)

The accessor.rs case is worth calling out specifically: ArrayAccessor::with_iterator is a hot path and using vortex_expect here means a validity error is an unrecoverable panic even though the function signature already returns R, not VortexResult<R>. If the intent is that this is always infallible, a comment explaining that invariant would help.


3. Nice micro-optimization in accumulator_grouped.rs

The change from calling groups.validity() multiple times in a loop to caching groups_validity upfront is a good improvement — avoids repeated vtable dispatch and slot reads.


4. Breaking API change in public-api.lock — looks intentional and correct

The removal of:

  • ValidityHelper trait
  • ValidityVTableFromValidityHelper
  • Array<V>::validity(&self) -> &Validity (with ValidityHelper bound)
  • ArrayView<'a, V>::validity(&self) -> &'a Validity (with ValidityHelper bound)

…and the addition of unconstrained Array<V>::validity() -> VortexResult<Validity> is the right direction. The old APIs were error-prone by design.


5. Old with_slots side-effect was subtly incomplete

The removed code in bytebool and parquet-variant:

// Old bytebool with_slots:
array.validity = match &slots[VALIDITY_SLOT] {
    Some(arr) => Validity::Array(arr.clone()),
    None => Validity::from(array.validity.nullability()),
};

This would miss the AllInvalid case — a slot that was a ConstantArray(false, ...) would be treated as Validity::Array(...) rather than Validity::AllInvalid. The new child_to_validity function handles this correctly by inspecting ConstantArray with a ScalarValue::Bool check.


Overall

The core fix is correct and well-motivated. The child_to_validity/validity_to_child symmetry is clean, and the with_slots side-effect removal is the right long-term approach. The main actionable item is the .unwrap() in datetime-parts/canonical.rs. The vortex_expect usages in production code are defensible but worth a follow-up cleanup.

@connortsui20
Copy link
Copy Markdown
Contributor

lets yolo it

I still think that it is weird that validity returns a result (if something has gone wrong with looking up my validity then I feel like that is on me). See #7282

@codspeed-hq
Copy link
Copy Markdown

codspeed-hq bot commented Apr 3, 2026

Merging this PR will degrade performance by 29.97%

❌ 4 regressed benchmarks
✅ 1118 untouched benchmarks
⏩ 1530 skipped benchmarks1

⚠️ Please fix the performance issues or acknowledge them on CodSpeed.

Performance Changes

Mode Benchmark BASE HEAD Efficiency
Simulation rebuild_naive 117.9 µs 133.2 µs -11.48%
Simulation take_map[(0.1, 0.5)] 967.7 µs 1,209.5 µs -19.99%
Simulation take_map[(0.1, 1.0)] 1.6 ms 2.3 ms -29.97%
Simulation take_map[(0.1, 0.05)] 379.4 µs 424.2 µs -10.56%

Comparing ngates/validity (0092911) with develop (bf97ba2)

Open in CodSpeed

Footnotes

  1. 1530 benchmarks were skipped, so the baseline results were used instead. If they were deleted from the codebase, click here and archive them to remove them from the performance reports.

@gatesn gatesn merged commit c1b9223 into develop Apr 3, 2026
108 of 111 checks passed
@gatesn gatesn deleted the ngates/validity branch April 3, 2026 21:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

changelog/break A breaking API change

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants