You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository was archived by the owner on Feb 27, 2026. It is now read-only.
The semantic-recall hook loads entity profile context using specific canonical keys:
timezone
current_timezone
communication_style
expertise
preferences
location
occupation
pronouns
The memory extraction pipeline captures relevant information but stores it under ad-hoc keys like sleep_pattern, diet, career_transition, schedule, current_location, company_role, etc. As a result, the entity profile injected into agent context is empty or sparse even when the underlying data exists in entity_facts.
Example
Entity #2 (I)ruid) had 66 facts but only 2 of the 7 profile keys were populated:
❌ communication_style — data existed as sleep_do_not_nag, acknowledgment_style, name_preference
❌ preferences — data existed as diet, caffeine_source, gambling_strategy, lifestyle
❌ location — data existed as current_location
❌ occupation — data existed as company_role, career_transition
❌ timezone — data existed as home_timezone
❌ pronouns — not captured at all
Generalizing: Canonical Key Classification
This problem extends beyond entity profiles. The memory extraction pipeline stores information under ad-hoc keys without awareness of how downstream systems consume that data. Any system that queries entity_facts by specific keys will hit this same gap.
Broader Applications
Entity profiles — the immediate case (timezone, communication_style, expertise, etc.)
Entity relationships — the extractor captures facts like partner_chris, partner_tara, partner_tabby as flat key-value pairs, but the entity_relationships table in nova-relationships exists specifically to model structured relationships between entities (type, direction, metadata). The extractor should recognize relationship information and populate both entity_facts (for raw context) and entity_relationships (for structured queries like "who are I)ruid's partners?" or "who works at Trammell Ventures?").
Temporal facts — some facts have time relevance (current_location vs home_location, current_timezone vs home_timezone) that the extractor should classify
Design Principle
The extraction pipeline should have a canonical key registry — a defined set of well-known keys that downstream systems depend on, with classification rules that map extracted information to those keys. Ad-hoc keys remain useful for raw detail, but canonical keys ensure interoperability.
This is essentially a schema contract between the extractor (producer) and downstream consumers (semantic-recall hook, entity resolver, relationship queries, etc.).
Proposed Solution
The memory extraction pipeline should:
Recognize profile-relevant information when extracting entity facts
Map to canonical profile keys in addition to (or instead of) ad-hoc keys
Aggregate related facts into the profile key — e.g., combine diet, caffeine_source, lifestyle, gambling_strategy into a single preferences value
Update profile keys incrementally — when new relevant info is learned, append/update the canonical key rather than only creating new ad-hoc keys
Populate entity_relationships when relationship information is detected (e.g., partner, employer, collaborator)
(not currently captured — extractor should identify these)
Implementation Options
Post-extraction mapping — After extracting facts, check if any map to profile keys and upsert
Extraction-time classification — Train the extractor prompt to recognize profile categories and canonical keys
Periodic aggregation job — Cron that scans entity_facts and builds/updates profile keys
Option 2 is preferred as it catches information at the source. A canonical key registry (possibly a database table or config file) would define the contract.
Problem
The semantic-recall hook loads entity profile context using specific canonical keys:
timezonecurrent_timezonecommunication_styleexpertisepreferenceslocationoccupationpronounsThe memory extraction pipeline captures relevant information but stores it under ad-hoc keys like
sleep_pattern,diet,career_transition,schedule,current_location,company_role, etc. As a result, the entity profile injected into agent context is empty or sparse even when the underlying data exists inentity_facts.Example
Entity #2 (I)ruid) had 66 facts but only 2 of the 7 profile keys were populated:
current_timezone→America/Chicagoexpertise→vulnerability research, exploitation, ...communication_style— data existed assleep_do_not_nag,acknowledgment_style,name_preferencepreferences— data existed asdiet,caffeine_source,gambling_strategy,lifestylelocation— data existed ascurrent_locationoccupation— data existed ascompany_role,career_transitiontimezone— data existed ashome_timezonepronouns— not captured at allGeneralizing: Canonical Key Classification
This problem extends beyond entity profiles. The memory extraction pipeline stores information under ad-hoc keys without awareness of how downstream systems consume that data. Any system that queries
entity_factsby specific keys will hit this same gap.Broader Applications
partner_chris,partner_tara,partner_tabbyas flat key-value pairs, but theentity_relationshipstable in nova-relationships exists specifically to model structured relationships between entities (type, direction, metadata). The extractor should recognize relationship information and populate bothentity_facts(for raw context) andentity_relationships(for structured queries like "who are I)ruid's partners?" or "who works at Trammell Ventures?").phone,email,signal_username,telegram_idare used by sender resolution; the extractor should normalize these (see feat: normalize phone numbers to E.164 format before storage #135 for E.164 phone normalization)current_locationvshome_location,current_timezonevshome_timezone) that the extractor should classifyDesign Principle
The extraction pipeline should have a canonical key registry — a defined set of well-known keys that downstream systems depend on, with classification rules that map extracted information to those keys. Ad-hoc keys remain useful for raw detail, but canonical keys ensure interoperability.
This is essentially a schema contract between the extractor (producer) and downstream consumers (semantic-recall hook, entity resolver, relationship queries, etc.).
Proposed Solution
The memory extraction pipeline should:
diet,caffeine_source,lifestyle,gambling_strategyinto a singlepreferencesvalueCanonical Profile Keys
timezonehome_timezone,current_timezonecommunication_styleacknowledgment_style,name_preference,sleep_do_not_nagexpertiseexpertise,research,contributionspreferencesdiet,caffeine_source,lifestyle,schedule,gambling_strategylocationcurrent_locationoccupationcompany_role,career_transition,business_namepronounsImplementation Options
Option 2 is preferred as it catches information at the source. A canonical key registry (possibly a database table or config file) would define the contract.
Context
hooks/semantic-recall/handler.tsnova-relationships/lib/entity-resolver/resolver.ts(getEntityProfile())entity_relationshipsin nova-relationships schema