Skip to content

feat: lazy per-bundle loading for vault, driver, datastore, and report registries#1089

Merged
stack72 merged 5 commits intomainfrom
lazy-loading-all-registries
Apr 6, 2026
Merged

feat: lazy per-bundle loading for vault, driver, datastore, and report registries#1089
stack72 merged 5 commits intomainfrom
lazy-loading-all-registries

Conversation

@stack72
Copy link
Copy Markdown
Contributor

@stack72 stack72 commented Apr 4, 2026

Summary

  • Extends the lazy per-bundle loading pattern (from Lazy per-bundle loading — import individual extension bundles on demand #1053) to all remaining extension registries: vault, driver, datastore, and report
  • Each registry now supports LazyEntry types, setTypeLoader/ensureTypeLoaded for on-demand single-bundle imports, and registerLazy/promoteFromLazy lifecycle
  • Each loader gains buildIndex() (mtime-based catalog population without importing) and loadSingleType() (on-demand single bundle import)
  • Shared ExtensionCatalogStore (SQLite at .swamp/_extension_catalog.db) passed from CLI startup to all loaders
  • Vault type search returns lazy entries from catalog without importing bundles

Performance

Before/after on swamp vault type search --json with @swamp/aws-sm pulled:

Run Old binary New binary
First run (bootstrap) 1.56s 3.67s
Second run (cached) 1.56s 0.70s (55% faster)

Test Plan

  • Added lazy loading tests to all 4 registry test files (registerLazy, ensureTypeLoaded, concurrent dedup, retry after failure)
  • All 4159 existing tests pass
  • Before/after validation with real pulled extensions:
    • @swamp/s3-datastore (datastore), @swamp/aws-sm (vault), @john/tensorlake (driver), @webframp/aws/cost-report (report)
    • Verified catalog entries populated correctly across all extension kinds
    • Verified search commands return results from catalog without importing bundles

Closes #1062

🤖 Generated with Claude Code

@stack72
Copy link
Copy Markdown
Contributor Author

stack72 commented Apr 4, 2026

Heads up: merge conflicts with #1103 + #1104

PRs #1103 and #1104 (fixing #1100 — S3 datastore bundle path resolution) landed on main and touch the same files this PR modifies. Here's what changed and what you'll need to account for when rebasing:

What changed

All 4 extension loaders (UserVaultLoader, UserDriverLoader, UserDatastoreLoader, UserReportLoader) now accept an optional DatastorePathResolver parameter and have a resolveBundlePath() helper method. The constructors changed from:

constructor(denoRuntime: DenoRuntime, repoDir: string | null = null)

to:

constructor(
  denoRuntime: DenoRuntime,
  repoDir: string | null = null,
  datastoreResolver?: DatastorePathResolver,
)

Except UserDatastoreLoader — it must NOT receive the resolver due to bootstrap ordering (it loads datastore extensions that configure the resolver). Its constructor is unchanged.

What this PR needs to do after rebasing

  1. New buildIndex() and loadSingleType() methods added by this PR will construct bundle paths. These should use this.resolveBundlePath(...) instead of join(this.repoDir, SWAMP_DATA_DIR, SWAMP_SUBDIRS.*, ...) — for vault, driver, and report loaders. The datastore loader's new methods should keep using the hardcoded local path (no resolver).

  2. mod.ts wiring — the loader instantiation sites changed. Model, vault, driver, and report loaders now receive a lazy DatastorePathResolver factory. The datastore loader is explicitly excluded. Your new buildIndex calls and catalog wiring need to work with this pattern.

  3. auto_resolver_adapters.tsInstallerAdapterConfig now has an optional datastoreResolver field. Hot-load functions pass it to model and vault loaders.

Key constraint

datastore-bundles is intentionally excluded from DEFAULT_DATASTORE_SUBDIRS (#1104). The datastore loader always reads from local .swamp/datastore-bundles/. Including it in the datastore tier causes the S3 setup migration to delete the bundles, breaking subsequent extension loading.

stack72 and others added 3 commits April 6, 2026 15:47
…t registries

Extend the lazy per-bundle loading pattern (from #1053) to all remaining
extension registries. Each registry now supports:

- LazyEntry types for catalog-indexed but not-yet-imported types
- setTypeLoader/ensureTypeLoaded for on-demand single-bundle imports
- registerLazy/promoteFromLazy for lazy entry lifecycle
- buildIndex on loaders for mtime-based catalog population
- loadSingleType on loaders for on-demand bundle import

The shared ExtensionCatalogStore (SQLite at .swamp/_extension_catalog.db)
is passed from CLI startup to all loaders. Search commands benefit from
zero-import catalog reads on subsequent runs.

Before/after on vault type search with @swamp/aws-sm pulled:
- Old binary: 1.56s (full import every time)
- New binary (cached): 0.70s (catalog read, no bundle import)

Closes #1062

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
After rebasing onto main (which added DatastorePathResolver in #1103),
update the new catalog-related methods (populateCatalogFromRegistry,
getVaultBundlePath, getDriverBundlePath, getReportBundlePath) to use
this.resolveBundlePath() instead of hardcoded join() paths. This ensures
lazy loading works correctly with S3 datastores.

The datastore loader intentionally keeps hardcoded local paths due to
bootstrap ordering constraints.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…oaders

Port fixes from the model loader to vault, driver, datastore, and report
loaders to prevent regressions of known bugs:

- #1107: Add sourceDirsFingerprint() and catalog invalidation when
  extension source directories change between runs. Uses per-kind
  fingerprint keys so loaders don't overwrite each other.

- #1094: Update findStaleFiles() to check transitive dependency mtimes
  via resolveLocalImports(), not just the entry point file. Catches edits
  to imported .ts files that don't touch the entry point.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@stack72 stack72 force-pushed the lazy-loading-all-registries branch from 86a0f18 to 4302aee Compare April 6, 2026 15:15
@stack72 stack72 marked this pull request as ready for review April 6, 2026 15:37
github-actions[bot]

This comment was marked as outdated.

github-actions[bot]

This comment was marked as outdated.

stack72 and others added 2 commits April 6, 2026 16:51
…lib skip

Three fixes to align the report loader with vault/driver/datastore loaders:

1. Freshness check failure now falls through to rebundle attempt instead of
   immediately returning stale cache
2. bundleExtension() wrapped in try/catch with cache fallback +
   isExpectedBundleFailure logging (critical for pulled extensions lacking
   a local deno.json import map)
3. Async discoverFiles skips _-prefixed directories (e.g., _lib/) matching
   the sync version and all other loaders

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Three improvements across all extension loaders:

- importBundle now accepts baseDir and includes bundleNamespace in the
  bundle path, matching bundleWithCache. Previously the file URL import
  always failed (wrong path) and fell through to data URL overhead.

- Extract sourceDirsFingerprint to a shared export in
  extension_catalog_store.ts, removing identical copies from all 5
  loader files (model, vault, driver, datastore, report).

- Update model loader to use the shared function for consistency.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

Well-structured PR that extends the lazy per-bundle loading pattern from models to the remaining four extension registries (vault, driver, datastore, report). The shared ExtensionCatalogStore is correctly hoisted to CLI startup and passed down, the sourceDirsFingerprint utility is properly extracted for reuse, and all four registries follow the same consistent pattern.

Blocking Issues

None.

Suggestions

  1. High code duplication across loaders: The buildIndex, findStaleFiles, rebundleAndUpdateCatalog, populateCatalogFromRegistry, populateCatalogFromDir, and discoverFilesSync methods are nearly identical across all four loaders (vault, driver, datastore, report), differing only in the kind string, export name, schema, and registry calls. This is a significant amount of copy-paste (~400 lines × 4). A shared base class or a generic helper parameterized by kind/schema/registry would reduce the maintenance surface considerably. Not blocking since the pattern is proven from the model loader and all four copies are consistent.

  2. Report loader bundleWithCache refactor: The report loader's bundleWithCache method received a significant rewrite in this PR (the "try rebundle, fall back to cache" logic with isExpectedBundleFailure). This looks like a separate improvement that could have been its own commit for easier bisection, but the logic itself is sound.

  3. vault_type_search.ts imports from internal path: src/cli/commands/vault_type_search.ts:33 imports getVaultTypes from ../../domain/vaults/vault_types.ts rather than from ../../libswamp/mod.ts. This predates this PR, so not blocking here, but worth noting for future cleanup per the CLAUDE.md libswamp import boundary rule.

DDD Assessment

The changes correctly keep domain logic (registries, lazy entries, type promotion) in the domain layer and infrastructure concerns (catalog store, SQLite) in the infrastructure layer. The LazyEntry types are proper value objects (immutable metadata with no identity). The registries continue to function as domain services. The ExtensionCatalogStore remains a clean infrastructure concern. Good separation.

Test Coverage

All four registries have comprehensive lazy loading tests covering: registerLazy, ensureTypeLoaded (happy path, already loaded, unknown type), concurrent dedup, and retry after failure. The vault registry has additional tests for promoteFromLazy, getAllLazy, and registerLazy skip-if-already-loaded. Good coverage.

Copy link
Copy Markdown

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adversarial Review

Critical / High

None found.

Medium

  1. ReportRegistry case normalization mismatchsrc/domain/reports/report_registry.ts:65-68 vs user_report_loader.ts:363

    registerLazyFromCatalog stores lazy entries keyed by entry.type_normalized (lowercased from populateCatalogFromDir). But promoteFromLazy(parsed.data.name, ...) (line 363 of the loader) uses the original-case report name. Since promoteFromLazy does this.lazyTypes.delete(name) with the original-case key, it will miss the lowercased lazy entry if the name contains uppercase characters.

    Breaking example: A report named "Cost-Report" is cataloged as "cost-report" (lowercased). registerLazy stores it under key "cost-report". After loading, promoteFromLazy("Cost-Report", ...) tries this.lazyTypes.delete("Cost-Report") — no-op, since the key is "cost-report". The lazy entry is orphaned. If ensureTypeLoaded("cost-report") is later called, it sees the orphaned lazy, triggers re-import, and importAndRegisterBundle checks reportRegistry.get(entry.type_normalized) = get("cost-report") which misses "Cost-Report" in reports — causing a redundant bundle import.

    Not currently triggered since no production code calls reportRegistry.ensureTypeLoaded() yet, but this is a latent bug waiting to surface.

    Suggested fix: Either normalize the name in promoteFromLazy and ensureTypeLoaded (using .toLowerCase() like all other registries), or store lazy entries with the original-case key. The former is more consistent with the other registries.

  2. listAvailableTypes doesn't include lazy typessrc/libswamp/vaults/create.ts:88

    listAvailableTypes() calls vaultTypeRegistry.getAll() which only returns fully loaded types, not lazy ones. When vault creation fails with "Unknown vault type", the error message listing available types will omit extension vault types that are indexed but not yet imported. Not a correctness issue — the error message is just incomplete.

Low

  1. getVaultBundlePath / getDriverBundlePath / getDatastoreBundlePath / getReportBundlePath return empty string when repoDir is null — e.g. user_vault_loader.ts around the getVaultBundlePath method. The empty string is then passed to catalog.upsert as bundle_path. If loadSingleType later reads this entry, Deno.readTextFile("") would throw. In practice, repoDir is always set when buildIndex is called (the code path requires a repo), so this is theoretical.

  2. discoverFilesSync has unbounded recursion — All four loader implementations recursively descend directories with no depth limit. A symlink loop (even accidental) could cause a stack overflow. The async discoverFiles has the same pattern, so this is pre-existing.

Verdict

PASS — Well-structured extension of the existing lazy-loading pattern. The four registries and loaders follow a consistent template with good test coverage (lazy registration, concurrent dedup, retry-after-failure). The report registry case mismatch (Medium #1) is latent and not yet triggered in production code. The catalog store's per-kind fingerprint approach is backward-compatible with the existing model loader. The bundleWithCache refactor in the report loader improves resilience by attempting a rebundle before falling back to cache.

@stack72 stack72 merged commit 8f12f22 into main Apr 6, 2026
10 checks passed
@stack72 stack72 deleted the lazy-loading-all-registries branch April 6, 2026 16:19
stack72 added a commit that referenced this pull request Apr 6, 2026
PR #1089 introduced lazy per-bundle loading for vault, datastore, driver,
and report registries. The get() method only returns fully-loaded types,
returning undefined for lazy entries — but many consumer code paths call
get() without first calling ensureTypeLoaded() to promote lazy entries.

This caused extension types like @swamp/s3-datastore and @swamp/aws-sm
to be treated as unknown, breaking 11 UATs across vault and datastore
operations.

The fix adds await registry.ensureTypeLoaded(type) before every .get()
call across all four registries, matching the pattern already used by
resolveModelType() in extension_auto_resolver.ts.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
stack72 added a commit that referenced this pull request Apr 6, 2026
Adds 10 tests verifying the lazy registry promotion contract introduced
by PR #1089 and fixed by PR #1116:

Category 1 (8 tests, fresh instances): Documents the registry invariant
that get() returns undefined for lazy entries and ensureTypeLoaded()
must be called first to promote them. Covers all four registries (vault,
datastore, driver, report).

Category 2 (2 tests, global singletons): Verifies that resolveVaultType()
and resolveDatastoreType() promote lazy entries before returning, so
callers can safely use get() afterwards. These tests would have caught
the PR #1089 regression — they fail without the ensureTypeLoaded() calls
added in PR #1116.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Lazy per-bundle loading for vault, driver, datastore, and report registries

1 participant