Unified Retrieval System
Platform 2.0 implements a hybrid retrieval strategy that combines graph-based memory navigation with high-performance vector search in document chunks.
Architecture
The UnifiedSearchService orchestrates multiple retrieval streams to provide the most relevant context for AI agents.
graph TD
Query[User Query] --> Unified[Unified Search Service]
Unified --> Graph[JanusGraph Retrieval]
Unified --> Vector[Qdrant Chunk Search]
Graph --> Memories[Memory Blocks]
Vector --> Chunks[Document Chunks]
Memories --> RRF[Reciprocal Rank Fusion]
Chunks --> RRF
RRF --> Final[Ranked Working Set]
Retrieval Streams
1. Graph-Based Memory Retrieval
Navigates the Knowledge Graph in JanusGraph to find memory blocks related to the subject and context. This stream is optimized for high-level facts and behavioral patterns.
2. Semantic Document Search
Performs vector similarity search in Qdrant across document chunks. This stream provides granular evidence and raw technical data that might not have been fully distilled into the graph yet.
Ranking & Fusion
To combine scores from different systems (which use different distance metrics), we utilize Reciprocal Rank Fusion (RRF).
RRF Calculation
The score for a candidate $d$ is calculated as:
RRFscore(d \in D) = \sum_{r \in R} \frac{1}{k + r(d)}
where:
- $R$ is the set of rankings (Graph, Vector).
- $r(d)$ is the rank of document $d$ in ranking $r$.
- $k$ is a constant (typically 60) to mitigate the impact of low-rank outliers.
Source Transparency
Every result returned by the unified retrieval system includes a SourceLocation if applicable. This ensures that even if a chunk is returned, the UI can immediately pinpoint its origin in the raw PDF/Docx files.