Configuration¶

All configuration is done through MnesisConfig, which groups settings into sub-configs. Every field has a sensible default — you only need to override what you want to change.

from mnesis import MnesisSession, MnesisConfig, CompactionConfig, FileConfig, StoreConfig, OperatorConfig

config = MnesisConfig(
    compaction=CompactionConfig(...),
    file=FileConfig(...),
    store=StoreConfig(...),
    operators=OperatorConfig(...),
)

session = await MnesisSession.create(model="openai/gpt-4o", config=config)

CompactionConfig¶

Controls when and how context compaction fires.

Field	Default	Description
`auto`	`True`	Auto-trigger compaction on overflow
`compaction_output_budget`	`20_000`	Tokens reserved as headroom for compaction summary output
`prune`	`True`	Run tool output pruning before compaction
`prune_protect_tokens`	`40_000`	Token window from the end of history that is never pruned
`prune_minimum_tokens`	`20_000`	Minimum prunable volume required before pruning fires
`compaction_model`	`None`	Model for summarisation. `None` = use session model
`level2_enabled`	`True`	Attempt Level 2 compression before falling back to Level 3
`compaction_prompt`	`None`	Custom prompt string for Level 1/2 LLM summarisation. `None` = use the built-in agentic prompt
`soft_threshold_fraction`	`0.6`	Fraction of usable context at which background compaction triggers (before hard threshold). Advanced.
`max_compaction_rounds`	`10`	Cap on summarise+condense cycles in multi-round loop. Advanced.
`condensation_enabled`	`True`	Whether to attempt condensation of accumulated summary nodes. Advanced.

Tuning for large models¶

For models with 1M+ token contexts (e.g. Gemini 1.5 Pro), raise the budget and protect window:

CompactionConfig(
    compaction_output_budget=100_000,
    prune_protect_tokens=200_000,
    prune_minimum_tokens=50_000,
)

Custom compaction prompt¶

CompactionConfig(
    compaction_prompt="Summarise this conversation focusing on technical decisions only.",
)

FileConfig¶

Controls how large files are handled.

Field	Default	Description
`inline_threshold`	`10_000`	Files estimated above this token count are stored as `FileRefPart` objects
`storage_dir`	`~/.mnesis/files/`	Directory for external file storage. Defaults to `~/.mnesis/files/`
`exploration_summary_model`	`None`	Reserved for future LLM-based structural summaries (AST, key lists, headings). Currently ignored — structural exploration summaries are generated deterministically only.

StoreConfig¶

Controls the SQLite persistence layer.

Field	Default	Description
`db_path`	`~/.mnesis/sessions.db`	Path to the SQLite database file (`~` is expanded at runtime)
`wal_mode`	`True`	Use WAL journal mode for better concurrent read performance
`connection_timeout`	`30.0`	Seconds to wait for the database connection

OperatorConfig¶

Controls LLMMap and AgenticMap parallelism.

Field	Default	Description
`llm_map_concurrency`	`16`	Maximum concurrent LLM calls in `LLMMap.run()`
`agentic_map_concurrency`	`4`	Maximum concurrent sub-agent sessions in `AgenticMap.run()`
`max_retries`	`3`	Per-item retry attempts on validation or transient errors

SessionConfig¶

Controls session-level behaviour.

Field	Default	Description
`doom_loop_threshold`	`3`	Consecutive identical tool calls before `DOOM_LOOP_DETECTED` fires
`retry`	`RetryConfig()`	Automatic retry configuration for transient LLM errors in `send()`

RetryConfig¶

Controls automatic retry of transient LLM errors inside send(). Retry is opt-in: the default max_retries=0 preserves the pre-0.3.0 behaviour of failing immediately.

Field	Default	Description
`max_retries`	`0`	Maximum retry attempts. `0` disables retry entirely
`base_delay`	`1.0`	Base delay in seconds for exponential backoff
`max_delay`	`60.0`	Maximum delay cap in seconds
`jitter`	`True`	Add random jitter (sampled from `[0, base_delay)`) to spread retries

Backoff formula: min(base_delay × 2^(attempt-1) + jitter, max_delay)

where attempt is the 1-based retry number matching LlmRetryPayload.attempt (first retry = 1, so the first backoff = base_delay × 2^0 = base_delay).

Retryable errors¶

Only transient provider-side errors are retried:

litellm exception	HTTP status	Scenario
`RateLimitError`	429	Provider rate limit
`InternalServerError`	500	Provider server error
`ServiceUnavailableError`	503	Provider temporarily down
`Timeout`	—	Request timed out
`APIConnectionError`	—	Network connection failed

All other exceptions (including AuthenticationError, ContextWindowExceededError, BadRequestError) fail immediately without retry — they indicate caller mistakes that retrying will not fix.

litellm num_retries interaction¶

Mnesis explicitly passes num_retries=0 to litellm.acompletion() to disable litellm's own built-in retry mechanism. All retry logic is owned by Mnesis via RetryConfig, so there is no double-retrying. Do not set num_retries in call_kwargs passed to litellm alongside Mnesis.

Difference from OperatorConfig.max_retries¶

OperatorConfig.max_retries handles per-item validation errors inside LLMMap and AgenticMap operators — a different failure domain (schema validation, JSON parse failures). RetryConfig handles transient LLM transport errors in the send() call path. Both can be set independently.

AgenticMap sub-sessions¶

Each AgenticMap sub-session has its own independent RetryConfig. The effective maximum LLM calls per sub-agent is (max_retries + 1) × max_turns. Plan capacity accordingly.

Example¶

from mnesis import MnesisSession, MnesisConfig
from mnesis.models.config import SessionConfig, RetryConfig

config = MnesisConfig(
    session=SessionConfig(
        retry=RetryConfig(
            max_retries=3,
            base_delay=1.0,
            max_delay=30.0,
            jitter=True,
        )
    )
)

async with MnesisSession.open(model="openai/gpt-4o", config=config) as session:
    # send() will now automatically retry up to 3 times on rate limits,
    # server errors, timeouts, and connection failures.
    result = await session.send("Hello!")

Monitoring retries via the event bus¶

Each retry attempt publishes an LLM_RETRY event before the backoff sleep:

from mnesis.events.bus import MnesisEvent
from mnesis.events.payloads import LlmRetryPayload

def on_retry(event: MnesisEvent, payload: LlmRetryPayload) -> None:
    print(
        f"Retry {payload['attempt']}/{payload['max_retries']}: "
        f"{payload['error_type']} — sleeping {payload['delay_seconds']:.1f}s"
    )

session.event_bus.subscribe(MnesisEvent.LLM_RETRY, on_retry)  # type: ignore[arg-type]

model_overrides¶

MnesisConfig.model_overrides lets you correct or override the context and output token limits that Mnesis auto-detects from the model string. This is useful when you are using a fine-tuned model, a custom deployment, or a model that litellm does not yet know about.

Supported keys:

Key	Type	Description
`context_limit`	`int`	Total input + output token limit for the model
`max_output_tokens`	`int`	Maximum tokens the model can generate per response

Example — custom or fine-tuned model¶

from mnesis import MnesisSession, MnesisConfig

config = MnesisConfig(
    model_overrides={
        "context_limit": 128_000,
        "max_output_tokens": 16_384,
    }
)

async with MnesisSession.open(model="openai/acme-support-ft-v1", config=config) as session:
    result = await session.send("Hello!")
    print(result.text)

Example — correcting an underestimated limit¶

If Mnesis falls back to conservative defaults for a model it does not recognise, override just the field that is wrong without touching any other configuration:

config = MnesisConfig(
    model_overrides={"max_output_tokens": 32_768},  # provider raised the output limit
)

The overrides are applied after ModelInfo.from_model_string() resolves the base limits, so only the keys you specify are changed — the rest (encoding, provider, etc.) are inferred normally. Both context_limit and max_output_tokens affect how Mnesis sizes the compaction budget, so incorrect values can cause over-limit contexts to reach the provider or leave headroom unused. Always set them to match the model's true limits.

model_overrides applies to the session model only. If you configure a separate compaction.compaction_model, overrides are not applied to it — the compaction model's limits are always auto-detected from the model string.

Note

model_overrides only affects how Mnesis allocates the context budget and compaction thresholds — it does not change how litellm routes the request. You still need to configure litellm (API base, headers, etc.) separately for non-standard endpoints. See LLM Providers for the full list of supported providers and model string formats.