Replace AsThreadPool with owned RayonThreadPool and borrowed RayonThreadPoolRef#906
Replace AsThreadPool with owned RayonThreadPool and borrowed RayonThreadPoolRef#906
AsThreadPool with owned RayonThreadPool and borrowed RayonThreadPoolRef#906Conversation
…er args The train_pq function signature changed from taking a generic AsThreadPool parameter to taking &RayonThreadPool. Update all call sites that passed integer values (1usize, 1, 2) to instead create a RayonThreadPool via create_thread_pool() and pass a reference. Files updated: - diskann_async.rs: 12 call sites - debug_provider.rs: 3 call sites - wrapped_async.rs: 1 call site - index_storage.rs: 1 call site Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> Co-authored-by: harsha-simhadri <5590673+harsha-simhadri@users.noreply.github.com>
…ool parameters Remove the AsThreadPool trait, sealed module, forward_threadpool! macro, and execute_with_rayon function. Replace all generic Pool: AsThreadPool parameters with concrete &RayonThreadPool references across the codebase. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> Co-authored-by: harsha-simhadri <5590673+harsha-simhadri@users.noreply.github.com>
…scope-of-rayon-utils # Conflicts: # diskann-providers/src/utils/normalizing_util.rs Co-authored-by: harsha-simhadri <5590673+harsha-simhadri@users.noreply.github.com>
Codecov Report❌ Patch coverage is ❌ Your patch status has failed because the patch coverage (81.60%) is below the target coverage (90.00%). You can increase the patch coverage or adjust the target coverage. Additional details and impacted files@@ Coverage Diff @@
## main #906 +/- ##
==========================================
- Coverage 89.34% 89.34% -0.01%
==========================================
Files 444 444
Lines 83986 83931 -55
==========================================
- Hits 75036 74986 -50
+ Misses 8950 8945 -5
Flags with carried forward coverage won't be shown. Click here to find out more.
🚀 New features to boost your workflow:
|
There was a problem hiding this comment.
Pull request overview
This PR removes the AsThreadPool abstraction and updates the workspace to pass &RayonThreadPool explicitly, simplifying thread-pool handling across PQ training, k-means utilities, disk build workflows, and benchmarks.
Changes:
- Removed
AsThreadPooland theforward_threadpoolmacro; APIs now take&RayonThreadPooldirectly. - Updated PQ/k-means/math utilities and all call sites (including tests) to create and pass a
RayonThreadPoolreference. - Adjusted generic signatures in PQ construction functions to drop the former thread-pool type parameter.
Reviewed changes
Copilot reviewed 18 out of 18 changed files in this pull request and generated no comments.
Show a summary per file
| File | Description |
|---|---|
| diskann-tools/src/utils/build_pq.rs | Creates a thread pool once and passes &RayonThreadPool into PQ pivot/data generation. |
| diskann-providers/src/utils/rayon_util.rs | Removes AsThreadPool + macro; keeps RayonThreadPool helpers and in-pool parallel iterator extensions. |
| diskann-providers/src/utils/mod.rs | Stops re-exporting removed thread-pool abstractions/utilities. |
| diskann-providers/src/utils/math_util.rs | Updates math helpers to accept &RayonThreadPool directly. |
| diskann-providers/src/utils/kmeans.rs | Updates k-means entrypoints to accept &RayonThreadPool directly. |
| diskann-providers/src/storage/index_storage.rs | Updates tests to pass a &RayonThreadPool into train_pq. |
| diskann-providers/src/model/pq/pq_construction.rs | Removes thread-pool generic type parameters and accepts &RayonThreadPool. |
| diskann-providers/src/model/graph/provider/async_/debug_provider.rs | Updates tests to create/pass &RayonThreadPool to PQ training. |
| diskann-providers/src/model/graph/provider/async_/caching/example.rs | Updates tests to create/pass &RayonThreadPool to PQ training. |
| diskann-providers/src/index/wrapped_async.rs | Updates tests to create/pass &RayonThreadPool to PQ training. |
| diskann-providers/src/index/diskann_async.rs | train_pq now takes &RayonThreadPool; updates all internal tests/callers. |
| diskann-disk/src/utils/partition.rs | Partitioning now takes &RayonThreadPool directly and threads it through helpers. |
| diskann-disk/src/storage/quant/pq/pq_generation.rs | PQ generation context/compressor updated to store/pass &RayonThreadPool. |
| diskann-disk/src/storage/quant/generator.rs | Quant data generator now takes &RayonThreadPool directly. |
| diskann-disk/src/build/builder/quantizer.rs | Creates a pool and passes &RayonThreadPool into PQ training during quantizer training. |
| diskann-disk/src/build/builder/core.rs | Updates partition call to match new partitioning signature. |
| diskann-disk/src/build/builder/build.rs | Updates quant data generation pipeline to pass &RayonThreadPool directly. |
| diskann-benchmark/src/backend/index/product.rs | Benchmark build path now creates a pool and passes &RayonThreadPool to PQ training. |
Comments suppressed due to low confidence (1)
diskann-providers/src/utils/rayon_util.rs:15
create_thread_poolis documented to treatnum_threads == 0as “default to logical CPUs”, but the implementation always callsThreadPoolBuilder::num_threads(num_threads). This makes the behavior for 0 rely on Rayon internals and can fail (or at least be ambiguous), and there are callers in the workspace that pass 0. Consider handlingnum_threads == 0explicitly (e.g., skipnum_threads(...)/ useavailable_parallelism, or clamp to >= 1) so the documented contract is enforced deterministically.
/// Creates a new thread pool with the specified number of threads.
/// If `num_threads` is 0, it defaults to the number of logical CPUs.
pub fn create_thread_pool(num_threads: usize) -> ANNResult<RayonThreadPool> {
let pool = rayon::ThreadPoolBuilder::new()
.num_threads(num_threads)
.build()
.map_err(|err| ANNError::log_thread_pool_error(err.to_string()))?;
Ok(RayonThreadPool(pool))
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
…hreadPool
- Add RayonThreadPoolRef<'a>(&'a ThreadPool) as the borrowed handle all
pool-accepting APIs take. It is Copy, enabling zero-friction passing.
- Replace free functions (create_thread_pool, create_thread_pool_for_test,
create_thread_pool_for_bench) with RayonThreadPool::new(), ::for_test(),
::for_bench() constructors.
- Callers can one-liner pool creation:
train_pq(data, RayonThreadPool::new(4)?.as_ref())
- External pools are supported via RayonThreadPoolRef::new(&rayon_pool).
- Update all 30 files across 6 crates.
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
|
@copilot - I pushed some local changes I had an agent do that both simplified the interface a little more and expanded it to use |
Reviewed the new changes — the |
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 29 out of 29 changed files in this pull request and generated 2 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| /// A borrowed handle to a Rayon thread pool. | ||
| /// | ||
| /// This trait should be "sealed" to avoid external users being able to implement it. | ||
| /// See [as_threadpool_tests] for examples of how to use this trait. | ||
| pub trait AsThreadPool: sealed::Sealed + Send + Sync { | ||
| type Returns: std::ops::Deref<Target = RayonThreadPool>; | ||
| fn as_threadpool(&self) -> ANNResult<Self::Returns>; | ||
| } | ||
| /// This is the type that all pool-accepting APIs should take, enabling callers | ||
| /// to either use an owned [`RayonThreadPool`] (via [`RayonThreadPool::as_ref`]) | ||
| /// or bring their own `rayon::ThreadPool`. | ||
| #[derive(Clone, Copy)] | ||
| pub struct RayonThreadPoolRef<'a>(&'a rayon::ThreadPool); |
There was a problem hiding this comment.
The PR description/title says APIs should take &RayonThreadPool directly, but the new public surface is RayonThreadPoolRef<'_> (a new wrapper type). That’s still a breaking API change and may be surprising to callers; consider either (a) updating the PR title/description to reflect the RayonThreadPoolRef design, or (b) adjusting the API to accept &RayonThreadPool / &rayon::ThreadPool directly if that was the intended outcome.
AsThreadPool with owned RayonThreadPool and borrowed RayonThreadPoolRef
| ($out:ident = $in:ident: $type:ty) => { | ||
| let $out = &*<$type as $crate::utils::AsThreadPool>::as_threadpool(&$in)?; | ||
| }; | ||
| } |
There was a problem hiding this comment.
@wuw92, I recall we added abstractions to help guard against potential deadlocks when using Rayon thread pools with nested loops. Could you please take a look at this PR to make sure those guardrails are still intact? If not, perhaps we can connect with Mark to ensure we’re making an informed decision about dropping these checks.
Replaces the generic
AsThreadPooltrait with a concrete two-type design: an ownedRayonThreadPooland a borrowedRayonThreadPoolRef<'a>. All pool-accepting APIs now takeRayonThreadPoolRef<'_>, which isCopyfor zero-friction passing.Core changes (
rayon_util.rs)Removed:
AsThreadPooltrait +sealedmoduleforward_threadpool!macrocreate_thread_pool,create_thread_pool_for_test,create_thread_pool_for_benchAdded:
RayonThreadPool— owned pool with proper constructors:::new(num_threads)— creates a pool (0 = auto-detect)::for_test()— readsDISKANN_TEST_POOL_THREADSenv var, defaults to 3::for_bench()— auto-detect thread count::as_ref()→RayonThreadPoolRef<'_>RayonThreadPoolRef<'a>—Copyborrowed handle wrapping&'a rayon::ThreadPool:::new(&rayon::ThreadPool)— wrap an external pool (bring-your-own-pool support)::install(op)— runopinside this poolUpdated:
ParallelIteratorInPoolmethods now takeRayonThreadPoolRef<'_>instead of&RayonThreadPoolAPI signature changes (30 files, 6 crates)
All
pool: &RayonThreadPoolparameters →pool: RayonThreadPoolRef<'_>:train_pq,generate_pq_pivots,generate_pq_data_from_pivots,k_means_clustering,run_lloyds,compute_vecs_l2sq,compute_closest_centers,process_residuals, and related functionsPQGenerationContext.pool, builder methods, partition utilitiesbuild_pq,ground_truth,search_disk_index,range_search_disk_indexCall site pattern
Before:
After:
Conflict resolution
Merged with main's deletion of
normalizing_util.rs(#902) — accepted the deletion since those functions were dead code.Verification
cargo check --workspace --all-targetspassescargo clippy --workspace --all-targets -- -D warningspasses