-
Notifications
You must be signed in to change notification settings - Fork 96
Add strict_provenance config flag for upstream-only enforcement in make() #1425
Description
Summary
Add a dj.config["strict_provenance"] flag that, when enabled, enforces the upstream-only convention at runtime: inside make(), self.upstream is the only way to read data, and self (including its Part tables) is the only way to write.
Context
Discussion: #1232
Depends on: #1423 (Diagram.trace()), #1424 (self.upstream in make())
Problem
The upstream-only convention — that make() should only access declared upstream dependencies — is the foundation of DataJoint's provenance guarantee. Today the framework defines this convention but does not enforce it. A make() method can fetch() from any table, making the undeclared dependency invisible to the provenance graph. Similarly, make() can insert into arbitrary tables, not just the target table and its parts.
Design
dj.config["strict_provenance"] = TrueRead enforcement
When enabled, only self.upstream[Table] can access data inside make(). Direct fetch() / to_dicts() / to_pandas() / to_arrays() calls on table objects that are not part of the pre-restricted ancestor graph raise an error.
Write enforcement
When enabled, inserts inside make() are restricted to:
self— the target table being populatedself's Part tables — e.g.self.PartName.insert(...)
Inserts into any other table raise an error. Additionally, every inserted row's primary key must be consistent with the current key — preventing make() from inserting rows for keys it wasn't called with.
| Operation | Allowed target | Blocked |
|---|---|---|
| Read (fetch) | self.upstream[Ancestor] |
All other tables |
| Write (insert) | self and self's Part tables |
All other tables |
| Key scope | Must match current key |
Mismatched primary keys |
Default behavior
When not enabled (the default), everything works as before. Zero breaking changes.
Provenance guarantee
Strict mode ensures:
- The trace only contains declared ancestors (provenance-complete)
- The trace is restricted by the key (no access to unrelated entities)
- Writes are scoped to the target table and its parts with matching keys
- Every data access is mediated and auditable
- Undeclared dependencies become impossible, not just unconventional
Implementation approach
The framework sets a context flag during make() execution.
- Read gating: Query execution on table objects checks whether strict provenance is active and whether the table is part of the current
self.upstreamancestor graph. If not, it raises aDataJointError. - Write gating: Insert calls check whether the target is
selfor one ofself's Part tables, and whether the primary key is consistent with the currentkey. The existing_allow_insertmechanism on the class can be narrowed to enforce this.
This is an operational concern, not a schema property — the same schema definitions work in both modes. Teams can enable it globally without touching schema constructors. Useful for enabling in production while leaving it off during development/debugging.