Configuration¶
All configuration is done through MnesisConfig, which groups settings into sub-configs. Every field has a sensible default — you only need to override what you want to change.
from mnesis import MnesisSession, MnesisConfig, CompactionConfig, FileConfig, StoreConfig, OperatorConfig
config = MnesisConfig(
compaction=CompactionConfig(...),
file=FileConfig(...),
store=StoreConfig(...),
operators=OperatorConfig(...),
)
session = await MnesisSession.create(model="openai/gpt-4o", config=config)
CompactionConfig¶
Controls when and how context compaction fires.
| Field | Default | Description |
|---|---|---|
auto |
True |
Auto-trigger compaction on overflow |
compaction_output_budget |
20_000 |
Tokens reserved as headroom for compaction summary output |
prune |
True |
Run tool output pruning before compaction |
prune_protect_tokens |
40_000 |
Token window from the end of history that is never pruned |
prune_minimum_tokens |
20_000 |
Minimum prunable volume required before pruning fires |
compaction_model |
None |
Model for summarisation. None = use session model |
level2_enabled |
True |
Attempt Level 2 compression before falling back to Level 3 |
compaction_prompt |
None |
Custom prompt string for Level 1/2 LLM summarisation. None = use the built-in agentic prompt |
soft_threshold_fraction |
0.6 |
Fraction of usable context at which background compaction triggers (before hard threshold). Advanced. |
max_compaction_rounds |
10 |
Cap on summarise+condense cycles in multi-round loop. Advanced. |
condensation_enabled |
True |
Whether to attempt condensation of accumulated summary nodes. Advanced. |
Tuning for large models¶
For models with 1M+ token contexts (e.g. Gemini 1.5 Pro), raise the budget and protect window:
CompactionConfig(
compaction_output_budget=100_000,
prune_protect_tokens=200_000,
prune_minimum_tokens=50_000,
)
Custom compaction prompt¶
CompactionConfig(
compaction_prompt="Summarise this conversation focusing on technical decisions only.",
)
FileConfig¶
Controls how large files are handled.
| Field | Default | Description |
|---|---|---|
inline_threshold |
10_000 |
Files estimated above this token count are stored as FileRefPart objects |
storage_dir |
~/.mnesis/files/ |
Directory for external file storage. Defaults to ~/.mnesis/files/ |
exploration_summary_model |
None |
Reserved for future LLM-based structural summaries (AST, key lists, headings). Currently ignored — structural exploration summaries are generated deterministically only. |
StoreConfig¶
Controls the SQLite persistence layer.
| Field | Default | Description |
|---|---|---|
db_path |
~/.mnesis/sessions.db |
Path to the SQLite database file (~ is expanded at runtime) |
wal_mode |
True |
Use WAL journal mode for better concurrent read performance |
connection_timeout |
30.0 |
Seconds to wait for the database connection |
OperatorConfig¶
Controls LLMMap and AgenticMap parallelism.
| Field | Default | Description |
|---|---|---|
llm_map_concurrency |
16 |
Maximum concurrent LLM calls in LLMMap.run() |
agentic_map_concurrency |
4 |
Maximum concurrent sub-agent sessions in AgenticMap.run() |
max_retries |
3 |
Per-item retry attempts on validation or transient errors |
SessionConfig¶
Controls session-level behaviour.
| Field | Default | Description |
|---|---|---|
doom_loop_threshold |
3 |
Consecutive identical tool calls before DOOM_LOOP_DETECTED fires |
retry |
RetryConfig() |
Automatic retry configuration for transient LLM errors in send() |
RetryConfig¶
Controls automatic retry of transient LLM errors inside send(). Retry is opt-in: the default max_retries=0 preserves the pre-0.3.0 behaviour of failing immediately.
| Field | Default | Description |
|---|---|---|
max_retries |
0 |
Maximum retry attempts. 0 disables retry entirely |
base_delay |
1.0 |
Base delay in seconds for exponential backoff |
max_delay |
60.0 |
Maximum delay cap in seconds |
jitter |
True |
Add random jitter (sampled from [0, base_delay)) to spread retries |
Backoff formula: min(base_delay × 2^(attempt-1) + jitter, max_delay)
where attempt is the 1-based retry number matching LlmRetryPayload.attempt (first retry = 1, so the first backoff = base_delay × 2^0 = base_delay).
Retryable errors¶
Only transient provider-side errors are retried:
| litellm exception | HTTP status | Scenario |
|---|---|---|
RateLimitError |
429 | Provider rate limit |
InternalServerError |
500 | Provider server error |
ServiceUnavailableError |
503 | Provider temporarily down |
Timeout |
— | Request timed out |
APIConnectionError |
— | Network connection failed |
All other exceptions (including AuthenticationError, ContextWindowExceededError, BadRequestError) fail immediately without retry — they indicate caller mistakes that retrying will not fix.
litellm num_retries interaction¶
Mnesis explicitly passes num_retries=0 to litellm.acompletion() to disable litellm's own built-in retry mechanism. All retry logic is owned by Mnesis via RetryConfig, so there is no double-retrying. Do not set num_retries in call_kwargs passed to litellm alongside Mnesis.
Difference from OperatorConfig.max_retries¶
OperatorConfig.max_retries handles per-item validation errors inside LLMMap and AgenticMap operators — a different failure domain (schema validation, JSON parse failures). RetryConfig handles transient LLM transport errors in the send() call path. Both can be set independently.
AgenticMap sub-sessions¶
Each AgenticMap sub-session has its own independent RetryConfig. The effective maximum LLM calls per sub-agent is (max_retries + 1) × max_turns. Plan capacity accordingly.
Example¶
from mnesis import MnesisSession, MnesisConfig
from mnesis.models.config import SessionConfig, RetryConfig
config = MnesisConfig(
session=SessionConfig(
retry=RetryConfig(
max_retries=3,
base_delay=1.0,
max_delay=30.0,
jitter=True,
)
)
)
async with MnesisSession.open(model="openai/gpt-4o", config=config) as session:
# send() will now automatically retry up to 3 times on rate limits,
# server errors, timeouts, and connection failures.
result = await session.send("Hello!")
Monitoring retries via the event bus¶
Each retry attempt publishes an LLM_RETRY event before the backoff sleep:
from mnesis.events.bus import MnesisEvent
from mnesis.events.payloads import LlmRetryPayload
def on_retry(event: MnesisEvent, payload: LlmRetryPayload) -> None:
print(
f"Retry {payload['attempt']}/{payload['max_retries']}: "
f"{payload['error_type']} — sleeping {payload['delay_seconds']:.1f}s"
)
session.event_bus.subscribe(MnesisEvent.LLM_RETRY, on_retry) # type: ignore[arg-type]
model_overrides¶
MnesisConfig.model_overrides lets you correct or override the context and
output token limits that Mnesis auto-detects from the model string. This is
useful when you are using a fine-tuned model, a custom deployment, or a model
that litellm does not yet know about.
Supported keys:
| Key | Type | Description |
|---|---|---|
context_limit |
int |
Total input + output token limit for the model |
max_output_tokens |
int |
Maximum tokens the model can generate per response |
Example — custom or fine-tuned model¶
from mnesis import MnesisSession, MnesisConfig
config = MnesisConfig(
model_overrides={
"context_limit": 128_000,
"max_output_tokens": 16_384,
}
)
async with MnesisSession.open(model="openai/acme-support-ft-v1", config=config) as session:
result = await session.send("Hello!")
print(result.text)
Example — correcting an underestimated limit¶
If Mnesis falls back to conservative defaults for a model it does not recognise, override just the field that is wrong without touching any other configuration:
config = MnesisConfig(
model_overrides={"max_output_tokens": 32_768}, # provider raised the output limit
)
The overrides are applied after ModelInfo.from_model_string() resolves the
base limits, so only the keys you specify are changed — the rest (encoding,
provider, etc.) are inferred normally. Both context_limit and
max_output_tokens affect how Mnesis sizes the compaction budget, so incorrect
values can cause over-limit contexts to reach the provider or leave headroom
unused. Always set them to match the model's true limits.
model_overrides applies to the session model only. If you configure a
separate compaction.compaction_model, overrides are not applied to it — the
compaction model's limits are always auto-detected from the model string.
Note
model_overrides only affects how Mnesis allocates the context budget and
compaction thresholds — it does not change how litellm routes the request.
You still need to configure litellm (API base, headers, etc.) separately
for non-standard endpoints. See LLM Providers for the full
list of supported providers and model string formats.