Home

mnesis logo

mnesis

Lossless Context Management for long-horizon LLM agents

LLMs suffer from context rot: accuracy degrades 30–40% before hitting nominal token limits — not because the model runs out of space, but because reasoning quality collapses as the window fills with stale content.

The standard fix — telling the model to "summarize itself" — is unreliable. The model may silently drop constraints, forget file paths, or produce a summary that is itself too large.

mnesis solves this by making the engine — not the model — responsible for memory. It is a Python implementation of the LCM: Lossless Context Management architecture.

Benchmarks¶

Evaluated on OOLONG, a long-context reasoning and aggregation benchmark. Both LCM-managed and Claude Code agents are built on Claude Opus 4.6; the gap comes entirely from context architecture.

Chart summary

The charts below compare LCM-managed context against Claude Code and unmanaged Opus 4.6 across context lengths from 8K to 1M tokens. Raw Opus 4.6 uses no context management — scores collapse past 32K tokens.

Score improvement over raw Opus 4.6 at each context length:

OOLONG benchmark — score improvement over raw Opus 4.6

Absolute scores vs raw Opus 4.6 baseline:

OOLONG benchmark — absolute scores

How it works¶

Traditional agentic frameworks ("RLM" — Recursive Language Models) ask the model to manage its own context via tool calls. LCM moves that responsibility to a deterministic engine layer:

RLM vs LCM approach

The engine handles memory deterministically so the model can focus entirely on the task.

Key properties¶

Context trigger

RLM uses model judgment. mnesis uses a configurable token threshold — deterministic, not probabilistic.

Summarization failure

RLM risks silent data loss. mnesis has a three-level fallback that never fails — Level 3 is always deterministic.

Tool output growth

RLM lets tool outputs grow unbounded. mnesis uses a backward-scan pruner to tombstone stale outputs.

Large files

RLM inlines files, eating the token budget. mnesis uses content-addressed file references with structural summaries.

Parallel workloads

RLM is sequential or ad-hoc. mnesis provides LLMMap and AgenticMap operators for true parallelism.

History

RLM history is ephemeral. mnesis keeps an append-only SQLite log — nothing is ever deleted.

Quick install¶

uv add mnesis
# or
pip install mnesis

import asyncio
from mnesis import MnesisSession

async def main():
    async with MnesisSession.open(
        model="anthropic/claude-opus-4-6",
        system_prompt="You are a helpful assistant.",
    ) as session:
        result = await session.send("Explain the GIL in Python.")
        print(result.text)

asyncio.run(main())

No API key needed to try it — set MNESIS_MOCK_LLM=1 and run any of the examples.

Provider support¶

mnesis works with any LLM provider via litellm. Pass the model string and set the corresponding API key:

Provider	Model string	API key env var
Anthropic	`"anthropic/claude-opus-4-6"`	`ANTHROPIC_API_KEY`
OpenAI	`"openai/gpt-4o"`	`OPENAI_API_KEY`
Google Gemini	`"gemini/gemini-1.5-pro"`	`GEMINI_API_KEY`
OpenRouter	`"openrouter/meta-llama/llama-3.1-70b-instruct"`	`OPENROUTER_API_KEY`

See the Provider Configuration guide for the full list.

Jump to Getting Started for a full walkthrough.