Skip to content

mnesis.compaction.engine

engine

Compaction orchestration engine with condensation and multi-round loop.

This engine implements the full Volt-compatible compaction flow:

  1. Tool output pruning — backward-scan and tombstone oversized tool outputs.
  2. Summarisation — level 1 → level 2 → level 3 escalation for raw messages.
  3. Condensation — if accumulated summary nodes still exceed the hard threshold after summarisation, condense them (level 1 → 2 → 3).
  4. Multi-round loop — repeat steps 2-3 up to max_compaction_rounds times until either the context fits or no progress is made.

Soft/hard threshold distinction:

  • Soft (soft_threshold_fraction * usable, default 60 %) — triggers early background compaction so the next turn is likely already compact.
  • Hard (100 % of usable) — blocks the next send until compaction finishes, preventing an over-limit context from reaching the LLM.

File IDs are propagated through every compaction round; see :mod:mnesis.compaction.file_ids and :mod:mnesis.compaction.levels.

CompactionEngine

CompactionEngine(
    store: ImmutableStore,
    dag_store: SummaryDAGStore,
    token_estimator: TokenEstimator,
    event_bus: EventBus,
    config: MnesisConfig,
    id_generator: Any = None,
    session_model: str | None = None,
)

Orchestrates the full compaction protocol (summarise → condense → loop).

Guarantees: - run_compaction() never raises — errors are caught and Level 3 runs. - The resulting summary always fits within the token budget. - Level 3 (deterministic) is the final fallback and always succeeds. - Atomic SQLite commit per round: partial failures leave no inconsistent state. - The EventBus receives COMPACTION_COMPLETED (or COMPACTION_FAILED).

Threshold semantics:

  • Soft threshold (soft_threshold_fraction, default 60 %): triggers early background compaction via check_and_trigger(). This keeps the context lean well before the hard limit.
  • Hard threshold (100 % of usable): checked inside session.send() before the LLM call; if exceeded, the caller should await wait_for_pending() to block until compaction completes.

Example::

engine = CompactionEngine(store, dag_store, estimator, event_bus, config)
if engine.is_soft_overflow(tokens, model_info):
    engine.check_and_trigger(session_id, tokens, model_info)  # non-blocking
if engine.is_hard_overflow(tokens, model_info):
    await engine.wait_for_pending()  # block if must compact before LLM call

is_soft_overflow

is_soft_overflow(
    tokens: TokenUsage, model: ModelInfo
) -> bool

Return True if the soft threshold has been crossed.

The soft threshold triggers early background compaction (non-blocking).

Parameters:

Name Type Description Default
tokens TokenUsage

Current cumulative token usage.

required
model ModelInfo

Model metadata providing context limit.

required

Returns:

Type Description
bool

True if tokens exceed soft_threshold_fraction * usable.

is_hard_overflow

is_hard_overflow(
    tokens: TokenUsage, model: ModelInfo
) -> bool

Return True if the hard threshold has been crossed.

The hard threshold means the current context must be compacted before the next LLM call to avoid an over-limit request.

Parameters:

Name Type Description Default
tokens TokenUsage

Current cumulative token usage.

required
model ModelInfo

Model metadata providing context limit.

required

Returns:

Type Description
bool

True if tokens exceed the full usable budget.

is_overflow

is_overflow(tokens: TokenUsage, model: ModelInfo) -> bool

Return True if compaction should be triggered (soft threshold check).

Preserved for backwards compatibility; delegates to :meth:is_soft_overflow.

Parameters:

Name Type Description Default
tokens TokenUsage

Current cumulative token usage for the session.

required
model ModelInfo

Model metadata providing context limit.

required

Returns:

Type Description
bool

True if tokens exceed the soft threshold.

check_and_trigger

check_and_trigger(
    session_id: str,
    tokens: TokenUsage,
    model: ModelInfo,
    abort: Event | None = None,
) -> bool

Check for overflow and trigger async background compaction if needed.

Non-blocking — schedules a background task and returns immediately. The task handle is stored in self._pending_task so callers can await or cancel it during shutdown.

Parameters:

Name Type Description Default
session_id str

The session to compact.

required
tokens TokenUsage

Current token usage.

required
model ModelInfo

Model info for overflow detection.

required
abort Event | None

Optional event to signal early termination.

None

Returns:

Type Description
bool

True if compaction was triggered, False otherwise.

wait_for_pending async

wait_for_pending() -> None

Await any in-flight background compaction task to natural completion, then clear it.

run_compaction async

run_compaction(
    session_id: str,
    abort: Event | None = None,
    model_override: str | None = None,
) -> CompactionResult

Run the full compaction protocol. Never raises.

Steps (per round, up to max_compaction_rounds): 1. Run tool output pruner (reduce input size first). 2. Summarise raw messages: level 1 → level 2 → level 3. 3. Condense accumulated summary nodes if still over budget: lvl 1→2→3. 4. If no progress was made, break early to avoid spinning.

Parameters:

Name Type Description Default
session_id str

The session to compact.

required
abort Event | None

Optional asyncio.Event — checked before each level attempt.

None
model_override str | None

Override compaction model (for testing).

None

Returns:

Type Description
CompactionResult

CompactionResult describing what happened.