mnesis.tokens.estimator¶
estimator
¶
Multi-model token estimation with caching and graceful fallback.
TokenEstimator
¶
Multi-model token counting with caching and graceful fallback.
Priority order:
1. tiktoken for OpenAI model families (gpt-4, gpt-3.5, o1, o3)
2. Character-based heuristic (len // 3) for Claude models
3. Character-based heuristic (len // 4) for all other models
Caching:
- Encoder objects are cached by encoding name (one load per process).
- Token counts are cached by SHA-256 of content for immutable content
(file references, summary nodes). Use estimate_cached() for this.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
heuristic_only
|
bool
|
When |
False
|
estimate
¶
Estimate the token count for a string.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
text
|
str
|
The text to estimate. |
required |
model
|
ModelInfo | None
|
Optional model info for accurate tokenisation. Uses heuristic when None or model encoding is unknown. |
None
|
Returns:
| Type | Description |
|---|---|
int
|
Estimated token count, always >= 1 for non-empty text. |
estimate_cached
¶
Estimate with caching, keyed by cache_key.
Use for immutable content (file references, summary nodes) where the same text will be estimated multiple times across context builds.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
text
|
str
|
The text to estimate. |
required |
cache_key
|
str
|
A stable identifier for this content (e.g. SHA-256 hash). |
required |
model
|
ModelInfo | None
|
Optional model info for accurate tokenisation. |
None
|
Returns:
| Type | Description |
|---|---|
int
|
Estimated token count from cache or fresh computation. |
estimate_message
¶
Estimate total tokens for a message including all non-pruned parts.
Pruned tool outputs (compacted_at set) contribute only the tombstone
string length rather than the full output length.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
msg
|
MessageWithParts
|
The message with its associated parts. |
required |
model
|
ModelInfo | None
|
Optional model info for accurate tokenisation. |
None
|
Returns:
| Type | Description |
|---|---|
int
|
Total estimated token count for the message. |
content_hash
staticmethod
¶
Return a stable SHA-256 hex digest for use as a cache key.