Skip to main content

Working Memory

Working Memory provides a high-speed, Redis-backed session buffer for real-time access to recent context. While the main memory store (Qdrant + JanusGraph) handles long-term retrieval, Working Memory delivers sub-10ms access to the current conversation context.

Three-Tier Memory Architecture

The Memory Platform uses a tiered approach:

TierStoreLatencyUse Case
Working MemoryRedisUnder 10msCurrent session, recent messages
Short-term CacheRedis50-100msLast 24 hours of memories
Long-term StoreQdrant + PostgreSQL100-500msFull semantic retrieval

Session Structure

A Working Memory session contains:

FieldDescription
Session IDUnique identifier for the session
Tenant IDOrganization context
SubjectWho the session is about
ItemsRecent messages/context items
Token CountTotal tokens in the buffer
Primed Memory IDsPre-loaded memory references

Using Working Memory

In the Visualizer

Navigate to Working Memory in the sidebar to:

  1. View active session status
  2. Monitor buffer contents
  3. See token usage and item count
  4. Inspect individual context items

Session Management

Sessions are automatically created when needed. You can:

  • View session contents in real-time
  • Monitor token usage for context window management
  • See which memories are "primed" for quick access

API Usage

Get or Create Session

GET /api/v1/working-memory/sessions/{session_id}
Headers:
x-tenant-id: your-tenant-id

If the session doesn't exist, create it:

POST /api/v1/working-memory/sessions
Headers:
x-tenant-id: your-tenant-id
Content-Type: application/json

{
"session_id": "chat-12345",
"tenant_id": "your-tenant-id",
"ttl": 3600
}

Add Message to Session

POST /api/v1/working-memory/sessions/{session_id}/messages
Headers:
x-tenant-id: your-tenant-id
Content-Type: application/json

{
"role": "user",
"content": "What was discussed in our last meeting?",
"timestamp": "2024-01-15T10:30:00Z"
}

Prime Memories

Pre-load specific memories for fast access:

POST /api/v1/working-memory/sessions/{session_id}/prime
Headers:
x-tenant-id: your-tenant-id
Content-Type: application/json

["mem_abc123", "mem_def456"]

Configuration

Working Memory sessions have a configurable TTL (time-to-live):

  • Default TTL: 3600 seconds (1 hour)
  • Maximum Items: 50 context items
  • Token Limit: Configurable based on your LLM context window

Integration Tips

  1. Create sessions early - Initialize sessions at conversation start
  2. Prime relevant memories - Pre-load memories based on user/topic
  3. Monitor token count - Stay within context window limits
  4. Let sessions expire - Redis automatically cleans up stale sessions