Skip to content
This repository was archived by the owner on Feb 27, 2026. It is now read-only.
This repository was archived by the owner on Feb 27, 2026. It is now read-only.

feat: memory extractor should populate canonical entity profile keys #138

@NOVA-Openclaw

Description

@NOVA-Openclaw

Problem

The semantic-recall hook loads entity profile context using specific canonical keys:

  • timezone
  • current_timezone
  • communication_style
  • expertise
  • preferences
  • location
  • occupation
  • pronouns

The memory extraction pipeline captures relevant information but stores it under ad-hoc keys like sleep_pattern, diet, career_transition, schedule, current_location, company_role, etc. As a result, the entity profile injected into agent context is empty or sparse even when the underlying data exists in entity_facts.

Example

Entity #2 (I)ruid) had 66 facts but only 2 of the 7 profile keys were populated:

  • current_timezoneAmerica/Chicago
  • expertisevulnerability research, exploitation, ...
  • communication_style — data existed as sleep_do_not_nag, acknowledgment_style, name_preference
  • preferences — data existed as diet, caffeine_source, gambling_strategy, lifestyle
  • location — data existed as current_location
  • occupation — data existed as company_role, career_transition
  • timezone — data existed as home_timezone
  • pronouns — not captured at all

Generalizing: Canonical Key Classification

This problem extends beyond entity profiles. The memory extraction pipeline stores information under ad-hoc keys without awareness of how downstream systems consume that data. Any system that queries entity_facts by specific keys will hit this same gap.

Broader Applications

  • Entity profiles — the immediate case (timezone, communication_style, expertise, etc.)
  • Entity relationships — the extractor captures facts like partner_chris, partner_tara, partner_tabby as flat key-value pairs, but the entity_relationships table in nova-relationships exists specifically to model structured relationships between entities (type, direction, metadata). The extractor should recognize relationship information and populate both entity_facts (for raw context) and entity_relationships (for structured queries like "who are I)ruid's partners?" or "who works at Trammell Ventures?").
  • Contact methods — keys like phone, email, signal_username, telegram_id are used by sender resolution; the extractor should normalize these (see feat: normalize phone numbers to E.164 format before storage #135 for E.164 phone normalization)
  • Temporal facts — some facts have time relevance (current_location vs home_location, current_timezone vs home_timezone) that the extractor should classify

Design Principle

The extraction pipeline should have a canonical key registry — a defined set of well-known keys that downstream systems depend on, with classification rules that map extracted information to those keys. Ad-hoc keys remain useful for raw detail, but canonical keys ensure interoperability.

This is essentially a schema contract between the extractor (producer) and downstream consumers (semantic-recall hook, entity resolver, relationship queries, etc.).

Proposed Solution

The memory extraction pipeline should:

  1. Recognize profile-relevant information when extracting entity facts
  2. Map to canonical profile keys in addition to (or instead of) ad-hoc keys
  3. Aggregate related facts into the profile key — e.g., combine diet, caffeine_source, lifestyle, gambling_strategy into a single preferences value
  4. Update profile keys incrementally — when new relevant info is learned, append/update the canonical key rather than only creating new ad-hoc keys
  5. Populate entity_relationships when relationship information is detected (e.g., partner, employer, collaborator)

Canonical Profile Keys

Key Description Sources to aggregate
timezone Home timezone home_timezone, current_timezone
communication_style How the person prefers to interact acknowledgment_style, name_preference, sleep_do_not_nag
expertise Professional/technical skills expertise, research, contributions
preferences Lifestyle, diet, habits diet, caffeine_source, lifestyle, schedule, gambling_strategy
location Current location current_location
occupation Current role/career company_role, career_transition, business_name
pronouns Gender pronouns (not currently captured — extractor should identify these)

Implementation Options

  1. Post-extraction mapping — After extracting facts, check if any map to profile keys and upsert
  2. Extraction-time classification — Train the extractor prompt to recognize profile categories and canonical keys
  3. Periodic aggregation job — Cron that scans entity_facts and builds/updates profile keys

Option 2 is preferred as it catches information at the source. A canonical key registry (possibly a database table or config file) would define the contract.

Context

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions