Skip to content

Add fast skip pre-check to avoid loading full datasets for up-to-date entries#73

Merged
koenvo merged 4 commits intomainfrom
feature/fast-skip-precheck
Apr 9, 2026
Merged

Add fast skip pre-check to avoid loading full datasets for up-to-date entries#73
koenvo merged 4 commits intomainfrom
feature/fast-skip-precheck

Conversation

@koenvo
Copy link
Copy Markdown
Contributor

@koenvo koenvo commented Apr 9, 2026

Before processing batches, loads a lightweight {identifier_key: last_modified_at} dict from the database in a single query (no joins to revision/file tables). Datasets where last_modified_at >= max(file.last_modified) are skipped instantly without the expensive get_dataset_collection call.

The cache is built once per (provider, dataset_type) in the loader and reused across selectors within the same run.

No false negatives: datasets that might need updating always fall through to the full should_refetch check.

koenvo added 4 commits April 9, 2026 09:50
… entries

Before processing batches, loads a lightweight {identifier_key: last_modified_at}
dict from the database in a single query (no joins to revision/file tables).
Datasets where last_modified_at >= max(file.last_modified) are skipped instantly
without the expensive get_dataset_collection call.

The cache is built once per (provider, dataset_type) in the loader and reused
across selectors within the same run.

No false negatives: datasets that might need updating always fall through to
the full should_refetch check.
@koenvo koenvo merged commit 3c77e1b into main Apr 9, 2026
13 checks passed
@koenvo koenvo deleted the feature/fast-skip-precheck branch April 9, 2026 11:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant