Auto Compact & Tool Result Truncation

Feature Information

Feature ID: FEAT-011
Created: 2026-02-28
Last Updated: 2026-02-28
Status: Draft
Priority: P1 (Should Have)
Owner: TBD
Related RFC: TBD
Related Design: UI Design Spec

User Story

As a user of OneClaw, I want the app to automatically manage conversation context length so that long conversations never fail due to exceeding the model’s context window, so that I can have extended conversations without worrying about technical limits or losing important context.

Typical Scenarios

User has a long conversation (50+ messages) with an AI model. The accumulated token count approaches the model’s context window limit. The app automatically compresses older messages into a summary before the next request, and the conversation continues seamlessly.
User asks the AI to fetch a large web page using the HTTP request tool. The tool result (hundreds of KB) is automatically truncated to a reasonable size before being stored and sent to the model.
User switches from a model with a 200K context window to a model with an 8K context window mid-session. The app detects the next message would exceed the new model’s limit and triggers a compact before sending.

Feature Description

Overview

Auto Compact is a context window management system that prevents conversations from exceeding a model’s context window limit. When the accumulated input tokens approach a configurable threshold (default: 85% of the model’s context window), the app automatically summarizes older messages using the current model and replaces them with a compact summary for future API requests. The original messages are preserved in the database untouched.

Additionally, tool call results that are excessively large (e.g., fetching a web page) are truncated at the point of storage, preventing database bloat and ensuring tool results don’t consume disproportionate context space.

Detailed Description

Auto Compact

Trigger: After each API response completes (including all tool call rounds), the system checks whether the total token count of the conversation history exceeds 85% of the current model’s context window size.

Process:

Calculate total tokens for all messages in the session
If total exceeds the threshold, determine which messages to compact:
- Starting from the most recent message, walk backwards accumulating token counts
- Messages whose cumulative tokens fit within 25% of the context window are “protected” (kept as-is)
- All older messages are candidates for compaction
Send a summarization request to the current model with the candidate messages (plus any existing prior summary)
Store the resulting summary in the Session’s compactedSummary field
Record which messages have been compacted (by storing the timestamp or message ID of the compaction boundary)

Subsequent requests: When building the messages array for API calls, the system prepends the compactedSummary as a system-level context message, followed by only the non-compacted (recent) messages.

Multiple compactions: If a session undergoes multiple compact cycles, each new compact incorporates the previous compactedSummary plus newly compacted messages into a single merged summary. The Session always holds exactly one cumulative summary.

Tool Result Truncation

Trigger: Immediately before storing a tool result message in the database.

Process:

Check the character length of the tool result content
If it exceeds the truncation limit (default: 30,000 characters), truncate from the tail
Append a truncation marker: \n\n[... content truncated, showing first {kept} characters of {total} total ...]
Store the truncated version in the database

The truncation happens once at storage time. All downstream consumers (API requests, UI display) work with the already-truncated version.

User Interaction Flow

Auto Compact (mostly invisible)

1. User sends a message and receives a response
2. System checks token count against threshold
3. If threshold exceeded:
   a. System sends a background summarization request
   b. A brief indicator appears (e.g., "Optimizing conversation context...")
   c. Summary is stored on the Session
   d. Indicator disappears
4. User continues chatting normally
5. If compact fails after retry, system falls back to truncation
   and shows a Snackbar: "Conversation history has been automatically trimmed"

Tool Result Truncation (invisible)

AI invokes a tool (e.g., HTTP request to fetch a web page)
Tool executes and returns a large result
System truncates the result before storing
Truncated result is displayed in the tool call message
Truncated result is sent back to the model

Acceptance Criteria

Must pass (all required):

Optional (nice to have):

User can manually trigger compact from the UI
Token count display in the chat screen showing current usage vs. context window
Configurable threshold percentage in settings

UI/UX Requirements

Auto Compact Indicator

During compact: a subtle, non-blocking indicator (e.g., a small progress message below the latest AI response or a brief Snackbar-style banner)
Text: “Optimizing conversation context…” (or localized equivalent)
Duration: visible only during the summarization API call
The indicator must NOT block user input – the user can still type while compact runs

Compact Fallback Notification

When compact fails and falls back to truncation: a Snackbar notification
Text: “Conversation history has been automatically trimmed”
Duration: standard Snackbar duration (short)

Tool Result Truncation

No UI indication needed – truncation is invisible to the user
The truncation marker text is displayed as part of the tool result if the user expands the tool call detail view

Feature Boundary

Included

Automatic context compaction triggered by token threshold
Token-count-based protected window (most recent 25% of context)
Summary generation using current conversation model
Cumulative summary storage on Session entity
Fallback to message truncation on summary failure
Tool result truncation at storage time
Context window size field on AiModel entity

Not Included

User-configurable compact threshold in settings (future)
Manual compact trigger button (future)
Client-side token counting / tokenizer (we rely on API-reported token counts)
Compression of individual messages (only whole-message granularity)
Tool result streaming or pagination
Context window size auto-detection from API (we use pre-configured values)

Business Rules

Compact Rules

Compact triggers only after a complete API response (including all tool call rounds), never mid-stream
The 85% threshold is calculated against the current model’s contextWindowSize
If contextWindowSize is null (unknown), compact does not trigger (no-op)
The protected window (recent messages) targets 25% of the context window by token count
The summarization request uses the same model, provider, and API key as the current conversation
The summary prompt is a fixed system prompt instructing the model to produce a concise factual summary
Compact is idempotent – triggering it when already under threshold is a no-op

Token Counting Rules

Token counts come from API-reported Usage events stored on each message (tokenCountInput, tokenCountOutput)
For messages without token counts (e.g., user messages before they’ve been sent), use a character-based estimate (1 token per 4 characters)
The total token count for threshold comparison is the sum of estimated input tokens for all messages that would be sent in the next API request

Tool Result Truncation Rules

Truncation limit: 30,000 characters per tool result
Truncation preserves the beginning of the content (head), discards the tail
A truncation marker is appended to indicate content was truncated
Truncation happens before database insertion – the full content is never stored
Truncation applies to all tool types equally

Failure Handling Rules

If summarization fails, retry exactly once (same request)
If retry fails, fall back to truncation: remove the oldest messages from the API request (but NOT from the database) until under the 85% threshold
Show a Snackbar notification on fallback
Log the failure for debugging

Non-Functional Requirements

Performance

Compact should complete within 15 seconds (summarization API call)
Compact must not block the UI thread or prevent user input
Tool result truncation is a synchronous string operation and must complete in < 10ms
Token count checking after each response should add < 5ms overhead

Reliability

Compact failure must never prevent the user from continuing the conversation
Tool result truncation must never throw an exception
The system must handle edge cases: empty sessions, sessions with only user messages, sessions where all messages are within the protected window

Data Integrity

Original messages are never modified or deleted by compact
The compactedSummary field can be cleared if the user wants to “reset” context (future feature)
Database size is bounded by tool result truncation

Dependencies

Depends On

FEAT-001 (Chat Interaction): Compact integrates into the chat message flow
FEAT-003 (Model/Provider Management): Needs model context window size; uses the provider’s API for summarization
FEAT-005 (Session Management): Stores compact summary on Session entity

Depended On By

FEAT-006 (Token/Cost Tracking): Token counting infrastructure supports both features

Data Requirements

New/Modified Data Fields

Entity	Field	Type	Required	Description
AiModel	contextWindowSize	Int?	No	Model’s maximum context window in tokens. Null if unknown.
Session	compactedSummary	String?	No	Cumulative summary of compacted messages. Null if no compaction has occurred.
Session	compactBoundaryTimestamp	Long?	No	Timestamp of the oldest non-compacted message. Messages older than this are covered by the summary.

Pre-seeded Model Context Window Sizes

Model	Context Window
claude-sonnet-4-20250514	200,000
claude-haiku-3-5-20241022	200,000
claude-opus-4-20250514	200,000
gpt-4o	128,000
gpt-4o-mini	128,000
gpt-4-turbo	128,000
gemini-2.0-flash	1,048,576
gemini-2.5-pro-preview-05-06	1,048,576

Error Handling

Error Scenarios

Summarization API call fails (network error)
- Action: Retry once silently
- If retry fails: fall back to truncation, show Snackbar
- Conversation continues normally
Summarization API returns empty or malformed response
- Action: Treat as failure, fall back to truncation
- Show Snackbar notification
Model context window size unknown (null)
- Action: Skip compact entirely (no-op)
- No user notification needed
All messages fit within the protected window
- Action: No compact needed (no-op)
- This can happen if the conversation has few but very recent long messages
Tool result truncation edge case (content is exactly at limit)
- Action: No truncation needed, store as-is

Constants

object CompactConstants {
    /** Compact triggers when tokens exceed this fraction of the context window */
    const val COMPACT_THRESHOLD_RATIO = 0.85

    /** Recent messages within this fraction of the context window are protected from compaction */
    const val PROTECTED_WINDOW_RATIO = 0.25

    /** Maximum character length for a single tool result before truncation */
    const val TOOL_RESULT_MAX_CHARS = 30_000

    /** Character-to-token estimation ratio (1 token per N characters) */
    const val CHARS_PER_TOKEN_ESTIMATE = 4
}

Test Points

Functional Tests

Verify compact triggers when token count exceeds 85% threshold
Verify compact does not trigger when under threshold
Verify compact does not trigger when model context window is null
Verify protected window calculation (most recent 25% by tokens)
Verify summary is stored in Session.compactedSummary
Verify original messages remain unchanged in DB
Verify subsequent API requests use summary + recent messages only
Verify multiple compactions merge into one cumulative summary
Verify compact failure retry (one retry)
Verify fallback to truncation on double failure
Verify Snackbar shown on fallback
Verify tool result truncation at 30K characters
Verify truncation marker is appended
Verify tool results under 30K are not modified

Edge Cases

Session with 0 messages
Session with only 1 user message
Session where all messages are within the protected window
Tool result exactly at 30,000 characters
Tool result with multi-byte Unicode characters
Model switch mid-session to a smaller context window
Compact triggered while compact is already in progress (must not double-compact)
Session already has a compactedSummary from a previous compact

Performance Tests

Compact completion time with 100+ messages
Token counting overhead per message send
Tool result truncation speed with 1MB+ content

Open Issues

Exact wording of the summarization system prompt (to be finalized during RFC)
Whether to show a token usage indicator in the chat UI (deferred to future)
Whether dynamically fetched models should have context window sizes auto-populated (deferred)

Change History

Date	Version	Changes	Owner
2026-02-28	0.1	Initial version	-