Unified Retrieval System

Platform 2.0 implements a hybrid retrieval strategy that combines graph-based memory navigation with high-performance vector search in document chunks.

Architecture

The UnifiedSearchService orchestrates multiple retrieval streams to provide the most relevant context for AI agents.

graph TD
    Query[User Query] --> Unified[Unified Search Service]
    Unified --> Graph[JanusGraph Retrieval]
    Unified --> Vector[Qdrant Chunk Search]
    
    Graph --> Memories[Memory Blocks]
    Vector --> Chunks[Document Chunks]
    
    Memories --> RRF[Reciprocal Rank Fusion]
    Chunks --> RRF
    
    RRF --> Final[Ranked Working Set]

Retrieval Streams

1. Graph-Based Memory Retrieval

Navigates the Knowledge Graph in JanusGraph to find memory blocks related to the subject and context. This stream is optimized for high-level facts and behavioral patterns.

2. Semantic Document Search

Performs vector similarity search in Qdrant across document chunks. This stream provides granular evidence and raw technical data that might not have been fully distilled into the graph yet.

Ranking & Fusion

To combine scores from different systems (which use different distance metrics), we utilize Reciprocal Rank Fusion (RRF).

RRF Calculation

The score for a candidate $d$ is calculated as:

RRFscore(d \in D) = \sum_{r \in R} \frac{1}{k + r(d)}

where:

$R$ is the set of rankings (Graph, Vector).
$r(d)$ is the rank of document $d$ in ranking $r$.
$k$ is a constant (typically 60) to mitigate the impact of low-rank outliers.

Source Transparency

Every result returned by the unified retrieval system includes a SourceLocation if applicable. This ensures that even if a chunk is returned, the UI can immediately pinpoint its origin in the raw PDF/Docx files.

Architecture​

Retrieval Streams​

1. Graph-Based Memory Retrieval​

2. Semantic Document Search​

Ranking & Fusion​

RRF Calculation​

Source Transparency​