Skip to main content

Document Intelligence Layer

The Document Intelligence Layer, implemented in the memory-store-documents repository, is responsible for the structural parsing, coordinate mapping, and raw storage of source documents.

Core Responsibilities

  • Blob Storage: Managing raw binary data of documents (PDFs, images, text files) using MinIO.
  • Structural Parsing: Extracting the physical hierarchy of documents (pages, text elements, lines).
  • Coordinate Mapping: Storing the exact X/Y coordinates and bounding boxes for every piece of text.
  • Source Location Finding: Providing fuzzy search capabilities to find the specific location (page + coordinates) of an extracted memory within its source document.

Architecture

graph TD
Upload[User Upload] --> Ingest API]
Ingest[Ingest API] --> MinIO[MinIO Blob Storage]
Ingest --> DB[(PostgreSQL)]
Ingest --> Task[Structural Parser]
Task --> Pages[Page & Text Element Extraction]
Pages --> DB
Pages --> Event[document.parsed Event]

Data Model (PostgreSQL)

DocumentModel

Base metadata for the document (title, hash, status, subject).

PageModel

Represents a physical page, including dimensions and rotation.

TextElementModel

The most granular unit, storing:

  • Text: The actual string content.
  • Coordinates: x, y, width, height (normalized 0.0-1.0).
  • Offsets: Character start/end within the page.
  • Style: Font, size, bold/italic markers.

Integration Flow

  1. Ingestion: memory-store-documents saves the blob and performs structural parsing.
  2. Notification: It emits a document.parsed event via NATS.
  3. Extraction: memory-curator-worker receives the event and fetches the structural data to perform LLM-based memory extraction.
  4. Source Mapping: When a memory is created, it references the document_id. The visualizer then uses the SourceLocationFinder to highlight the specific text in the PDF.

Source Location Finder

The layer provides a specialized component, SourceLocationFinder, which allows the system to:

  1. Receive an extracted snippet of text.
  2. Search through the stored TextElementModel records using fuzzy matching.
  3. Reconstruct a SourceLocation (Page # + Bounding Boxes) for highlighting.