Skip to content

Replace AsThreadPool with owned RayonThreadPool and borrowed RayonThreadPoolRef#906

Open
Copilot wants to merge 6 commits intomainfrom
copilot/whittle-down-scope-of-rayon-utils
Open

Replace AsThreadPool with owned RayonThreadPool and borrowed RayonThreadPoolRef#906
Copilot wants to merge 6 commits intomainfrom
copilot/whittle-down-scope-of-rayon-utils

Conversation

Copy link
Copy Markdown
Contributor

Copilot AI commented Apr 6, 2026

Replaces the generic AsThreadPool trait with a concrete two-type design: an owned RayonThreadPool and a borrowed RayonThreadPoolRef<'a>. All pool-accepting APIs now take RayonThreadPoolRef<'_>, which is Copy for zero-friction passing.

Core changes (rayon_util.rs)

Removed:

  • AsThreadPool trait + sealed module
  • forward_threadpool! macro
  • Free functions create_thread_pool, create_thread_pool_for_test, create_thread_pool_for_bench

Added:

  • RayonThreadPool — owned pool with proper constructors:
    • ::new(num_threads) — creates a pool (0 = auto-detect)
    • ::for_test() — reads DISKANN_TEST_POOL_THREADS env var, defaults to 3
    • ::for_bench() — auto-detect thread count
    • ::as_ref()RayonThreadPoolRef<'_>
  • RayonThreadPoolRef<'a>Copy borrowed handle wrapping &'a rayon::ThreadPool:
    • ::new(&rayon::ThreadPool) — wrap an external pool (bring-your-own-pool support)
    • ::install(op) — run op inside this pool

Updated:

  • All ParallelIteratorInPool methods now take RayonThreadPoolRef<'_> instead of &RayonThreadPool

API signature changes (30 files, 6 crates)

All pool: &RayonThreadPool parameters → pool: RayonThreadPoolRef<'_>:

  • diskann-providers: train_pq, generate_pq_pivots, generate_pq_data_from_pivots, k_means_clustering, run_lloyds, compute_vecs_l2sq, compute_closest_centers, process_residuals, and related functions
  • diskann-disk: PQGenerationContext.pool, builder methods, partition utilities
  • diskann-benchmark: disk index search benchmark
  • diskann-tools: build_pq, ground_truth, search_disk_index, range_search_disk_index
  • diskann-providers benches: all criterion and iai benchmarks

Call site pattern

Before:

let pool = create_thread_pool(num_threads)?;
generate_pq_pivots(&pool)?;

After:

let pool = RayonThreadPool::new(num_threads)?;
generate_pq_pivots(pool.as_ref())?;
// Or bring your own rayon pool:
let external = rayon::ThreadPoolBuilder::new().build()?;
generate_pq_pivots(RayonThreadPoolRef::new(&external))?;

Conflict resolution

Merged with main's deletion of normalizing_util.rs (#902) — accepted the deletion since those functions were dead code.

Verification

  • cargo check --workspace --all-targets passes
  • cargo clippy --workspace --all-targets -- -D warnings passes

Copilot AI and others added 2 commits April 6, 2026 20:54
…er args

The train_pq function signature changed from taking a generic
AsThreadPool parameter to taking &RayonThreadPool. Update all call
sites that passed integer values (1usize, 1, 2) to instead create
a RayonThreadPool via create_thread_pool() and pass a reference.

Files updated:
- diskann_async.rs: 12 call sites
- debug_provider.rs: 3 call sites
- wrapped_async.rs: 1 call site
- index_storage.rs: 1 call site

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Co-authored-by: harsha-simhadri <5590673+harsha-simhadri@users.noreply.github.com>
…ool parameters

Remove the AsThreadPool trait, sealed module, forward_threadpool! macro,
and execute_with_rayon function. Replace all generic Pool: AsThreadPool
parameters with concrete &RayonThreadPool references across the codebase.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Co-authored-by: harsha-simhadri <5590673+harsha-simhadri@users.noreply.github.com>
Copilot AI changed the title [WIP] Refactor rayon_util to reduce generics usage Remove AsThreadPool generic trait; take &RayonThreadPool directly Apr 6, 2026
Copilot AI requested a review from harsha-simhadri April 6, 2026 21:16
…scope-of-rayon-utils

# Conflicts:
#	diskann-providers/src/utils/normalizing_util.rs

Co-authored-by: harsha-simhadri <5590673+harsha-simhadri@users.noreply.github.com>
@codecov-commenter
Copy link
Copy Markdown

codecov-commenter commented Apr 7, 2026

Codecov Report

❌ Patch coverage is 81.60920% with 48 lines in your changes missing coverage. Please review.
✅ Project coverage is 89.34%. Comparing base (de98ea6) to head (3e4e87b).

Files with missing lines Patch % Lines
diskann-tools/src/utils/ground_truth.rs 4.65% 41 Missing ⚠️
diskann-tools/src/utils/build_pq.rs 0.00% 4 Missing ⚠️
diskann-tools/src/utils/search_disk_index.rs 0.00% 2 Missing ⚠️
diskann-disk/src/build/builder/quantizer.rs 75.00% 1 Missing ⚠️

❌ Your patch status has failed because the patch coverage (81.60%) is below the target coverage (90.00%). You can increase the patch coverage or adjust the target coverage.

Additional details and impacted files

Impacted file tree graph

@@            Coverage Diff             @@
##             main     #906      +/-   ##
==========================================
- Coverage   89.34%   89.34%   -0.01%     
==========================================
  Files         444      444              
  Lines       83986    83931      -55     
==========================================
- Hits        75036    74986      -50     
+ Misses       8950     8945       -5     
Flag Coverage Δ
miri 89.34% <81.60%> (-0.01%) ⬇️
unittests 89.18% <81.60%> (-0.01%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Files with missing lines Coverage Δ
diskann-benchmark/src/backend/index/product.rs 100.00% <ø> (ø)
diskann-disk/src/build/builder/build.rs 94.16% <100.00%> (ø)
diskann-disk/src/build/builder/core.rs 95.25% <100.00%> (ø)
diskann-disk/src/search/provider/disk_provider.rs 91.02% <100.00%> (ø)
diskann-disk/src/storage/quant/generator.rs 92.67% <100.00%> (-0.06%) ⬇️
diskann-disk/src/storage/quant/pq/pq_generation.rs 93.22% <100.00%> (-0.12%) ⬇️
diskann-disk/src/utils/partition.rs 92.51% <100.00%> (-0.04%) ⬇️
diskann-providers/src/index/diskann_async.rs 96.40% <100.00%> (+0.02%) ⬆️
diskann-providers/src/index/wrapped_async.rs 46.17% <100.00%> (+0.31%) ⬆️
.../src/model/graph/provider/async_/debug_provider.rs 84.99% <100.00%> (+0.05%) ⬆️
... and 9 more
🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@harsha-simhadri harsha-simhadri marked this pull request as ready for review April 7, 2026 00:55
@harsha-simhadri harsha-simhadri requested review from a team and Copilot April 7, 2026 00:55
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR removes the AsThreadPool abstraction and updates the workspace to pass &RayonThreadPool explicitly, simplifying thread-pool handling across PQ training, k-means utilities, disk build workflows, and benchmarks.

Changes:

  • Removed AsThreadPool and the forward_threadpool macro; APIs now take &RayonThreadPool directly.
  • Updated PQ/k-means/math utilities and all call sites (including tests) to create and pass a RayonThreadPool reference.
  • Adjusted generic signatures in PQ construction functions to drop the former thread-pool type parameter.

Reviewed changes

Copilot reviewed 18 out of 18 changed files in this pull request and generated no comments.

Show a summary per file
File Description
diskann-tools/src/utils/build_pq.rs Creates a thread pool once and passes &RayonThreadPool into PQ pivot/data generation.
diskann-providers/src/utils/rayon_util.rs Removes AsThreadPool + macro; keeps RayonThreadPool helpers and in-pool parallel iterator extensions.
diskann-providers/src/utils/mod.rs Stops re-exporting removed thread-pool abstractions/utilities.
diskann-providers/src/utils/math_util.rs Updates math helpers to accept &RayonThreadPool directly.
diskann-providers/src/utils/kmeans.rs Updates k-means entrypoints to accept &RayonThreadPool directly.
diskann-providers/src/storage/index_storage.rs Updates tests to pass a &RayonThreadPool into train_pq.
diskann-providers/src/model/pq/pq_construction.rs Removes thread-pool generic type parameters and accepts &RayonThreadPool.
diskann-providers/src/model/graph/provider/async_/debug_provider.rs Updates tests to create/pass &RayonThreadPool to PQ training.
diskann-providers/src/model/graph/provider/async_/caching/example.rs Updates tests to create/pass &RayonThreadPool to PQ training.
diskann-providers/src/index/wrapped_async.rs Updates tests to create/pass &RayonThreadPool to PQ training.
diskann-providers/src/index/diskann_async.rs train_pq now takes &RayonThreadPool; updates all internal tests/callers.
diskann-disk/src/utils/partition.rs Partitioning now takes &RayonThreadPool directly and threads it through helpers.
diskann-disk/src/storage/quant/pq/pq_generation.rs PQ generation context/compressor updated to store/pass &RayonThreadPool.
diskann-disk/src/storage/quant/generator.rs Quant data generator now takes &RayonThreadPool directly.
diskann-disk/src/build/builder/quantizer.rs Creates a pool and passes &RayonThreadPool into PQ training during quantizer training.
diskann-disk/src/build/builder/core.rs Updates partition call to match new partitioning signature.
diskann-disk/src/build/builder/build.rs Updates quant data generation pipeline to pass &RayonThreadPool directly.
diskann-benchmark/src/backend/index/product.rs Benchmark build path now creates a pool and passes &RayonThreadPool to PQ training.
Comments suppressed due to low confidence (1)

diskann-providers/src/utils/rayon_util.rs:15

  • create_thread_pool is documented to treat num_threads == 0 as “default to logical CPUs”, but the implementation always calls ThreadPoolBuilder::num_threads(num_threads). This makes the behavior for 0 rely on Rayon internals and can fail (or at least be ambiguous), and there are callers in the workspace that pass 0. Consider handling num_threads == 0 explicitly (e.g., skip num_threads(...) / use available_parallelism, or clamp to >= 1) so the documented contract is enforced deterministically.
/// Creates a new thread pool with the specified number of threads.
/// If `num_threads` is 0, it defaults to the number of logical CPUs.
pub fn create_thread_pool(num_threads: usize) -> ANNResult<RayonThreadPool> {
    let pool = rayon::ThreadPoolBuilder::new()
        .num_threads(num_threads)
        .build()
        .map_err(|err| ANNError::log_thread_pool_error(err.to_string()))?;
    Ok(RayonThreadPool(pool))

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Mark Hildebrand and others added 2 commits April 7, 2026 17:05
…hreadPool

- Add RayonThreadPoolRef<'a>(&'a ThreadPool) as the borrowed handle all
  pool-accepting APIs take. It is Copy, enabling zero-friction passing.
- Replace free functions (create_thread_pool, create_thread_pool_for_test,
  create_thread_pool_for_bench) with RayonThreadPool::new(), ::for_test(),
  ::for_bench() constructors.
- Callers can one-liner pool creation:
    train_pq(data, RayonThreadPool::new(4)?.as_ref())
- External pools are supported via RayonThreadPoolRef::new(&rayon_pool).
- Update all 30 files across 6 crates.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@hildebrandmw
Copy link
Copy Markdown
Contributor

@copilot - I pushed some local changes I had an agent do that both simplified the interface a little more and expanded it to use RayonThreadPoolRef to allow users to provider their own pre-existing thread pool if they want. An unfortunate consequence of the latter is that some of the deeply nested parallel loops changed in indentation level, which blows up the size of the diff. Can you please review the code again and update the PR description in light of the new changes?

Copy link
Copy Markdown
Contributor Author

Copilot AI commented Apr 8, 2026

@copilot - I pushed some local changes I had an agent do that both simplified the interface a little more and expanded it to use RayonThreadPoolRef to allow users to provider their own pre-existing ...

Reviewed the new changes — the RayonThreadPoolRef<'a> design with Copy semantics and bring-your-own-pool support via RayonThreadPoolRef::new(&external) looks clean. All pool-accepting APIs consistently take RayonThreadPoolRef<'_>, and the constructors on RayonThreadPool (::new(), ::for_test(), ::for_bench()) are a nice consolidation of the old free functions. Build and clippy pass. Updated the PR description to reflect the full scope of changes.

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 29 out of 29 changed files in this pull request and generated 2 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +53 to +59
/// A borrowed handle to a Rayon thread pool.
///
/// This trait should be "sealed" to avoid external users being able to implement it.
/// See [as_threadpool_tests] for examples of how to use this trait.
pub trait AsThreadPool: sealed::Sealed + Send + Sync {
type Returns: std::ops::Deref<Target = RayonThreadPool>;
fn as_threadpool(&self) -> ANNResult<Self::Returns>;
}
/// This is the type that all pool-accepting APIs should take, enabling callers
/// to either use an owned [`RayonThreadPool`] (via [`RayonThreadPool::as_ref`])
/// or bring their own `rayon::ThreadPool`.
#[derive(Clone, Copy)]
pub struct RayonThreadPoolRef<'a>(&'a rayon::ThreadPool);
Copy link

Copilot AI Apr 8, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The PR description/title says APIs should take &RayonThreadPool directly, but the new public surface is RayonThreadPoolRef<'_> (a new wrapper type). That’s still a breaking API change and may be surprising to callers; consider either (a) updating the PR title/description to reflect the RayonThreadPoolRef design, or (b) adjusting the API to accept &RayonThreadPool / &rayon::ThreadPool directly if that was the intended outcome.

Copilot uses AI. Check for mistakes.
Copilot AI changed the title Remove AsThreadPool generic trait; take &RayonThreadPool directly Replace AsThreadPool with owned RayonThreadPool and borrowed RayonThreadPoolRef Apr 8, 2026
Copilot AI requested a review from hildebrandmw April 8, 2026 00:17
($out:ident = $in:ident: $type:ty) => {
let $out = &*<$type as $crate::utils::AsThreadPool>::as_threadpool(&$in)?;
};
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@wuw92, I recall we added abstractions to help guard against potential deadlocks when using Rayon thread pools with nested loops. Could you please take a look at this PR to make sure those guardrails are still intact? If not, perhaps we can connect with Mark to ensure we’re making an informed decision about dropping these checks.

@arrayka arrayka requested a review from wuw92 April 10, 2026 20:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

whittle down scope of rayon_utils.rs

6 participants