`mnesis.tokens.estimator`¶

estimator ¶

Multi-model token estimation with caching and graceful fallback.

TokenEstimator ¶

TokenEstimator(*, heuristic_only: bool = False)

Multi-model token counting with caching and graceful fallback.

Priority order: 1. tiktoken for OpenAI model families (gpt-4, gpt-3.5, o1, o3) 2. Character-based heuristic (len // 3) for Claude models 3. Character-based heuristic (len // 4) for all other models

Caching: - Encoder objects are cached by encoding name (one load per process). - Token counts are cached by SHA-256 of content for immutable content (file references, summary nodes). Use estimate_cached() for this.

Parameters:

Name	Type	Description	Default
`heuristic_only`	`bool`	When `True`, always use the character-based heuristic and skip tiktoken entirely. Useful for testing, benchmarking, or environments where tiktoken is not installed.	`False`

estimate ¶

estimate(text: str, model: ModelInfo | None = None) -> int

Estimate the token count for a string.

Parameters:

Name	Type	Description	Default
`text`	`str`	The text to estimate.	required
`model`	`ModelInfo \| None`	Optional model info for accurate tokenisation. Uses heuristic when None or model encoding is unknown.	`None`

Returns:

Type	Description
`int`	Estimated token count, always >= 1 for non-empty text.

estimate_cached ¶

estimate_cached(
    text: str,
    cache_key: str,
    model: ModelInfo | None = None,
) -> int

Estimate with caching, keyed by cache_key.

Use for immutable content (file references, summary nodes) where the same text will be estimated multiple times across context builds.

Parameters:

Name	Type	Description	Default
`text`	`str`	The text to estimate.	required
`cache_key`	`str`	A stable identifier for this content (e.g. SHA-256 hash).	required
`model`	`ModelInfo \| None`	Optional model info for accurate tokenisation.	`None`

Returns:

Type	Description
`int`	Estimated token count from cache or fresh computation.

estimate_message ¶

estimate_message(
    msg: MessageWithParts, model: ModelInfo | None = None
) -> int

Estimate total tokens for a message including all non-pruned parts.

Pruned tool outputs (compacted_at set) contribute only the tombstone string length rather than the full output length.

Parameters:

Name	Type	Description	Default
`msg`	`MessageWithParts`	The message with its associated parts.	required
`model`	`ModelInfo \| None`	Optional model info for accurate tokenisation.	`None`

Returns:

Type	Description
`int`	Total estimated token count for the message.

content_hash `staticmethod` ¶

content_hash(text: str) -> str

Return a stable SHA-256 hex digest for use as a cache key.

mnesis.tokens.estimator¶

estimator ¶

TokenEstimator ¶

estimate ¶

estimate_cached ¶

estimate_message ¶

content_hash staticmethod ¶

`mnesis.tokens.estimator`¶

content_hash `staticmethod` ¶