Skip to content

mnesis.compaction

compaction

Mnesis compaction components.

CompactionEngine

CompactionEngine(
    store: ImmutableStore,
    dag_store: SummaryDAGStore,
    token_estimator: TokenEstimator,
    event_bus: EventBus,
    config: MnesisConfig,
    id_generator: Any = None,
    session_model: str | None = None,
)

Orchestrates the full compaction protocol (summarise → condense → loop).

Guarantees: - run_compaction() never raises — errors are caught and Level 3 runs. - The resulting summary always fits within the token budget. - Level 3 (deterministic) is the final fallback and always succeeds. - Atomic SQLite commit per round: partial failures leave no inconsistent state. - The EventBus receives COMPACTION_COMPLETED (or COMPACTION_FAILED).

Threshold semantics:

  • Soft threshold (soft_threshold_fraction, default 60 %): triggers early background compaction via check_and_trigger(). This keeps the context lean well before the hard limit.
  • Hard threshold (100 % of usable): checked inside session.send() before the LLM call; if exceeded, the caller should await wait_for_pending() to block until compaction completes.

Example::

engine = CompactionEngine(store, dag_store, estimator, event_bus, config)
if engine.is_soft_overflow(tokens, model_info):
    engine.check_and_trigger(session_id, tokens, model_info)  # non-blocking
if engine.is_hard_overflow(tokens, model_info):
    await engine.wait_for_pending()  # block if must compact before LLM call

is_soft_overflow

is_soft_overflow(
    tokens: TokenUsage, model: ModelInfo
) -> bool

Return True if the soft threshold has been crossed.

The soft threshold triggers early background compaction (non-blocking).

Parameters:

Name Type Description Default
tokens TokenUsage

Current cumulative token usage.

required
model ModelInfo

Model metadata providing context limit.

required

Returns:

Type Description
bool

True if tokens exceed soft_threshold_fraction * usable.

is_hard_overflow

is_hard_overflow(
    tokens: TokenUsage, model: ModelInfo
) -> bool

Return True if the hard threshold has been crossed.

The hard threshold means the current context must be compacted before the next LLM call to avoid an over-limit request.

Parameters:

Name Type Description Default
tokens TokenUsage

Current cumulative token usage.

required
model ModelInfo

Model metadata providing context limit.

required

Returns:

Type Description
bool

True if tokens exceed the full usable budget.

is_overflow

is_overflow(tokens: TokenUsage, model: ModelInfo) -> bool

Return True if compaction should be triggered (soft threshold check).

Preserved for backwards compatibility; delegates to :meth:is_soft_overflow.

Parameters:

Name Type Description Default
tokens TokenUsage

Current cumulative token usage for the session.

required
model ModelInfo

Model metadata providing context limit.

required

Returns:

Type Description
bool

True if tokens exceed the soft threshold.

check_and_trigger

check_and_trigger(
    session_id: str,
    tokens: TokenUsage,
    model: ModelInfo,
    abort: Event | None = None,
) -> bool

Check for overflow and trigger async background compaction if needed.

Non-blocking — schedules a background task and returns immediately. The task handle is stored in self._pending_task so callers can await or cancel it during shutdown.

Parameters:

Name Type Description Default
session_id str

The session to compact.

required
tokens TokenUsage

Current token usage.

required
model ModelInfo

Model info for overflow detection.

required
abort Event | None

Optional event to signal early termination.

None

Returns:

Type Description
bool

True if compaction was triggered, False otherwise.

wait_for_pending async

wait_for_pending() -> None

Await any in-flight background compaction task to natural completion, then clear it.

run_compaction async

run_compaction(
    session_id: str,
    abort: Event | None = None,
    model_override: str | None = None,
) -> CompactionResult

Run the full compaction protocol. Never raises.

Steps (per round, up to max_compaction_rounds): 1. Run tool output pruner (reduce input size first). 2. Summarise raw messages: level 1 → level 2 → level 3. 3. Condense accumulated summary nodes if still over budget: lvl 1→2→3. 4. If no progress was made, break early to avoid spinning.

Parameters:

Name Type Description Default
session_id str

The session to compact.

required
abort Event | None

Optional asyncio.Event — checked before each level attempt.

None
model_override str | None

Override compaction model (for testing).

None

Returns:

Type Description
CompactionResult

CompactionResult describing what happened.

SummaryCandidate dataclass

SummaryCandidate(
    text: str,
    token_count: int,
    span_start_message_id: str,
    span_end_message_id: str,
    compaction_level: int,
    messages_covered: int,
)

A candidate compaction summary before it is committed to the store.

ToolOutputPruner

ToolOutputPruner(
    store: ImmutableStore,
    estimator: TokenEstimator,
    config: MnesisConfig,
)

Bases: ToolOutputPruner

Async version that properly resolves part IDs.

prune async

prune(session_id: str) -> PruneResult

Run prune pass with async part ID resolution.

level1_summarise async

level1_summarise(
    messages: list[MessageWithParts],
    model: str,
    budget: ContextBudget,
    estimator: TokenEstimator,
    llm_call: Any,
    compaction_prompt: str | None = None,
    model_context_limit: int = 200000,
) -> SummaryCandidate | None

Attempt Level 1 (selective) summarisation via LLM.

File IDs found in the input messages are automatically appended to the summary via a [LCM File IDs: ...] footer.

Input messages are capped at 75 % of the compaction model's context window to prevent the summarisation call itself from overflowing.

Parameters:

Name Type Description Default
messages list[MessageWithParts]

All non-summary messages in the session.

required
model str

Model string to use for compaction.

required
budget ContextBudget

Token budget for validation.

required
estimator TokenEstimator

Token estimator for result validation.

required
llm_call Any

Async callable (model, messages, max_tokens) -> str.

required
compaction_prompt str | None

Custom system prompt override.

None
model_context_limit int

Context window of the compaction model.

200000

Returns:

Type Description
SummaryCandidate | None

SummaryCandidate if successful and fits budget, or None to escalate.

level2_summarise async

level2_summarise(
    messages: list[MessageWithParts],
    model: str,
    budget: ContextBudget,
    estimator: TokenEstimator,
    llm_call: Any,
    compaction_prompt: str | None = None,
    model_context_limit: int = 200000,
) -> SummaryCandidate | None

Attempt Level 2 (aggressive) summarisation via LLM.

Uses a more compressed prompt format and drops reasoning details. File IDs found in the input messages are propagated to the summary. Input is capped at 75 % of the compaction model's context window.

Parameters:

Name Type Description Default
messages list[MessageWithParts]

All non-summary messages in the session.

required
model str

Model string to use for compaction.

required
budget ContextBudget

Token budget for validation.

required
estimator TokenEstimator

Token estimator for result validation.

required
llm_call Any

Async callable.

required
compaction_prompt str | None

Custom system prompt override.

None
model_context_limit int

Context window of the compaction model.

200000

Returns:

Type Description
SummaryCandidate | None

SummaryCandidate if successful and fits budget, or None to escalate.

level3_deterministic

level3_deterministic(
    messages: list[MessageWithParts],
    budget: ContextBudget,
    estimator: TokenEstimator,
) -> SummaryCandidate

Level 3 deterministic fallback (no LLM required).

Keeps the most recent messages that fit within 85% of the usable budget, prefixed with a truncation notice. This always produces a valid result.

File IDs found in all input messages (not just kept ones) are preserved in a [LCM File IDs: ...] footer even when their surrounding context is truncated — this is the lossless guarantee.

Parameters:

Name Type Description Default
messages list[MessageWithParts]

All non-summary messages in the session.

required
budget ContextBudget

Token budget — result is guaranteed to fit within usable.

required
estimator TokenEstimator

Token estimator.

required

Returns:

Type Description
SummaryCandidate

SummaryCandidate that always fits within budget.usable.