RFC-011: Auto Compact & Tool Result Truncation

Document Information

RFC ID: RFC-011
Related PRD: FEAT-011 (Auto Compact & Tool Result Truncation)
Related Design: UI Design Spec
Related Architecture: RFC-000 (Overall Architecture)
Depends On: RFC-001 (Chat Interaction), RFC-003 (Provider Management), RFC-005 (Session Management)
Depended On By: None
Created: 2026-02-28
Last Updated: 2026-02-28
Status: Draft
Author: TBD

Overview

Background

Currently, SendMessageUseCase sends the entire message history of a session to the API on every request, with no context window management. As conversations grow long (50+ messages, heavy tool usage), the accumulated token count approaches or exceeds the model’s context window limit, causing API errors. Additionally, tool results (e.g., fetching a web page via HttpRequestTool) can be hundreds of KB, bloating the database and consuming disproportionate context space.

This RFC introduces two related features:

Auto Compact: Automatically summarize older messages when the conversation approaches the model’s context window limit. The summary replaces older messages in API requests while preserving all original messages in the database.
Tool Result Truncation: Truncate excessively large tool results at storage time, before they enter the database.

Goals

Add contextWindowSize to the AiModel entity and populate it for all preset models
Add compact summary storage fields to the Session entity
Implement database migration from version 1 to version 2
Implement ToolResultTruncator to truncate tool results exceeding 30K characters before DB storage
Implement TokenEstimator for character-based token estimation
Implement AutoCompactUseCase to detect when compaction is needed and generate summaries
Implement CompactAwareMessageBuilder to construct API requests using compact summaries
Integrate all components into SendMessageUseCase
Add compact-related ChatEvent types for UI feedback
Provide comprehensive unit tests for all new components

Non-Goals

Client-side tokenizer (we use character-based estimation)
User-configurable compact threshold in settings
Manual compact trigger button in the UI
Context window size auto-detection from provider APIs
Compression of individual messages (only whole-message granularity)
Tool result streaming or pagination
Token usage indicator in the chat UI

Technical Design

Architecture Overview

+--------------------------------------------------------------------------+
|                              UI Layer                                     |
|  ChatViewModel                                                            |
|    |-- handles ChatEvent.CompactStarted / CompactCompleted                |
|    |-- shows brief indicator during compaction                            |
|    |-- shows Snackbar on compact fallback                                 |
+--------------------------------------------------------------------------+
|                            Domain Layer                                   |
|  SendMessageUseCase                                                       |
|    |-- CompactAwareMessageBuilder.build()  (prepare API messages)         |
|    |-- adapter.sendMessageStream()         (call AI API)                  |
|    |-- ToolResultTruncator.truncate()      (before saving tool results)   |
|    |-- AutoCompactUseCase.compactIfNeeded() (after response completes)    |
|                                                                           |
|  AutoCompactUseCase                                                       |
|    |-- TokenEstimator.estimateTotalTokens()                               |
|    |-- splitMessages() (protected window vs. older)                       |
|    |-- adapter.generateSimpleCompletion()  (summarize)                    |
|    |-- sessionRepository.updateCompactedSummary()                         |
+--------------------------------------------------------------------------+
|                             Data Layer                                    |
|  SessionEntity  (+ compacted_summary, compact_boundary_timestamp)         |
|  ModelEntity    (+ context_window_size)                                   |
|  Migration(1,2) (ALTER TABLE + UPDATE preset values)                      |
+--------------------------------------------------------------------------+

Core Components

1. ToolResultTruncator

A stateless utility that truncates tool result strings exceeding a character limit.

File: app/src/main/kotlin/com/oneclaw/shadow/core/util/ToolResultTruncator.kt

package com.oneclaw.shadow.core.util

object ToolResultTruncator {
    const val MAX_CHARS = 30_000

    fun truncate(result: String): String {
        if (result.length <= MAX_CHARS) return result
        val suffix = "\n\n[... content truncated, showing first ${MAX_CHARS} of ${result.length} characters ...]"
        return result.substring(0, MAX_CHARS) + suffix
    }
}

Integration point: In SendMessageUseCase, where tool result messages are created (the toolOutput field), apply ToolResultTruncator.truncate() before constructing the Message.

2. TokenEstimator

Estimates token counts for messages using a character-based heuristic. Used for threshold detection only – precision is not critical since we are triggering a best-effort optimization, not an exact limit.

File: app/src/main/kotlin/com/oneclaw/shadow/core/util/TokenEstimator.kt

package com.oneclaw.shadow.core.util

import com.oneclaw.shadow.core.model.Message

object TokenEstimator {
    const val CHARS_PER_TOKEN = 4

    fun estimateTotalTokens(messages: List<Message>): Int =
        messages.sumOf { estimateMessageTokens(it) }

    fun estimateMessageTokens(msg: Message): Int {
        val contentTokens = estimateFromText(msg.content)
        val thinkingTokens = msg.thinkingContent?.let { estimateFromText(it) } ?: 0
        val toolInputTokens = msg.toolInput?.let { estimateFromText(it) } ?: 0
        val toolOutputTokens = msg.toolOutput?.let { estimateFromText(it) } ?: 0
        return contentTokens + thinkingTokens + toolInputTokens + toolOutputTokens
    }

    fun estimateFromText(text: String): Int {
        if (text.isEmpty()) return 0
        return (text.length / CHARS_PER_TOKEN).coerceAtLeast(1)
    }
}

3. AutoCompactUseCase

The central orchestrator for the compaction feature.

File: app/src/main/kotlin/com/oneclaw/shadow/feature/chat/usecase/AutoCompactUseCase.kt

Constructor dependencies:

SessionRepository
MessageRepository
ApiKeyStorage
ModelApiAdapterFactory

Key method: suspend fun compactIfNeeded(sessionId, model, provider): CompactResult

Algorithm:

1. Check model.contextWindowSize -- if null, return (no-op)
2. Get all messages for the session
3. Estimate total tokens via TokenEstimator
4. If totalTokens <= contextWindowSize * 0.85, return (no-op)
5. Split messages into (olderMessages, protectedMessages):
   - Walk backwards from newest, accumulating tokens
   - Stop when accumulated tokens reach contextWindowSize * 0.25
   - Everything before the split point = olderMessages
6. If olderMessages is empty, return (no-op)
7. Build summarization prompt:
   - If session already has compactedSummary, include it as "Previous summary"
   - Append all olderMessages as conversation transcript
   - Instruct model to produce a concise factual summary (200-500 words)
8. Call adapter.generateSimpleCompletion(prompt, maxTokens=2048)
9. If success: store summary + boundary timestamp on Session, return CompactResult(true)
10. If failure: retry once
11. If retry fails: return CompactResult(false) -- caller handles fallback

CompactResult data class:

data class CompactResult(
    val didCompact: Boolean,
    val fallbackToTruncation: Boolean = false
)

Summarization prompt:

You are summarizing a conversation for context continuity. Create a concise but
comprehensive summary that preserves:
- Key topics discussed
- Important decisions or conclusions
- Any pending questions or tasks
- Tool calls made and their results (briefly)

[If existing summary exists:]
Previous conversation summary:
{existingSummary}

Additional conversation to incorporate:

[Conversation transcript with role labels]

Provide a summary in 200-500 words. Be factual and concise.

4. CompactAwareMessageBuilder

Replaces the direct allMessages.toApiMessages() call in SendMessageUseCase. Handles injecting the compact summary into the system prompt and filtering messages.

File: app/src/main/kotlin/com/oneclaw/shadow/feature/chat/usecase/CompactAwareMessageBuilder.kt

package com.oneclaw.shadow.feature.chat.usecase

import com.oneclaw.shadow.core.model.Message
import com.oneclaw.shadow.core.model.Session
import com.oneclaw.shadow.data.remote.adapter.ApiMessage

object CompactAwareMessageBuilder {

    fun build(
        session: Session,
        allMessages: List<Message>,
        originalSystemPrompt: String?
    ): Pair<String?, List<ApiMessage>> {
        val summary = session.compactedSummary
        val boundary = session.compactBoundaryTimestamp

        if (summary == null || boundary == null) {
            return Pair(originalSystemPrompt, allMessages.toApiMessages())
        }

        val recentMessages = allMessages.filter { it.createdAt >= boundary }
        val apiMessages = recentMessages.toApiMessages()

        val summaryPrefix = "Previous conversation summary:\n$summary\n\n---\n\n"
        val enhancedPrompt = if (originalSystemPrompt != null) {
            summaryPrefix + originalSystemPrompt
        } else {
            summaryPrefix + "Continue the conversation based on the summary above."
        }

        return Pair(enhancedPrompt, apiMessages)
    }
}

Key design decision: The compact summary is prepended to the system prompt rather than injected as a separate user/assistant message. This ensures the model treats it as background context rather than part of the conversation turn sequence, which avoids confusing the role alternation pattern.

Data Model

Modified Entities

AiModel (core/model/AiModel.kt):

data class AiModel(
    val id: String,
    val displayName: String?,
    val providerId: String,
    val isDefault: Boolean,
    val source: ModelSource,
    val contextWindowSize: Int? = null  // max context window in tokens; null = unknown
)

ModelEntity (data/local/entity/ModelEntity.kt):

@Entity(tableName = "models", primaryKeys = ["id", "provider_id"], ...)
data class ModelEntity(
    val id: String,
    @ColumnInfo(name = "display_name") val displayName: String?,
    @ColumnInfo(name = "provider_id") val providerId: String,
    @ColumnInfo(name = "is_default") val isDefault: Boolean,
    val source: String,
    @ColumnInfo(name = "context_window_size") val contextWindowSize: Int? = null  // NEW
)

Session (core/model/Session.kt):

data class Session(
    val id: String,
    val title: String,
    val currentAgentId: String,
    val messageCount: Int,
    val lastMessagePreview: String?,
    val isActive: Boolean,
    val deletedAt: Long?,
    val createdAt: Long,
    val updatedAt: Long,
    val compactedSummary: String? = null,           // NEW
    val compactBoundaryTimestamp: Long? = null       // NEW
)

SessionEntity (data/local/entity/SessionEntity.kt):

@Entity(tableName = "sessions", ...)
data class SessionEntity(
    ...existing fields...,
    @ColumnInfo(name = "compacted_summary") val compactedSummary: String? = null,           // NEW
    @ColumnInfo(name = "compact_boundary_timestamp") val compactBoundaryTimestamp: Long? = null  // NEW
)

Database Migration

New file: app/src/main/kotlin/com/oneclaw/shadow/data/local/db/Migrations.kt

val MIGRATION_1_2 = object : Migration(1, 2) {
    override fun migrate(db: SupportSQLiteDatabase) {
        // Add context_window_size to models
        db.execSQL("ALTER TABLE models ADD COLUMN context_window_size INTEGER DEFAULT NULL")

        // Populate preset model context window sizes
        db.execSQL("UPDATE models SET context_window_size = 128000 WHERE id = 'gpt-4o'")
        db.execSQL("UPDATE models SET context_window_size = 128000 WHERE id = 'gpt-4o-mini'")
        db.execSQL("UPDATE models SET context_window_size = 200000 WHERE id = 'o1'")
        db.execSQL("UPDATE models SET context_window_size = 200000 WHERE id = 'o3-mini'")
        db.execSQL("UPDATE models SET context_window_size = 200000 WHERE id = 'claude-opus-4-5-20251101'")
        db.execSQL("UPDATE models SET context_window_size = 200000 WHERE id = 'claude-sonnet-4-5-20250929'")
        db.execSQL("UPDATE models SET context_window_size = 200000 WHERE id = 'claude-haiku-4-5-20251001'")
        db.execSQL("UPDATE models SET context_window_size = 1048576 WHERE id = 'gemini-2.0-flash'")
        db.execSQL("UPDATE models SET context_window_size = 1048576 WHERE id = 'gemini-2.5-pro'")

        // Add compact fields to sessions
        db.execSQL("ALTER TABLE sessions ADD COLUMN compacted_summary TEXT DEFAULT NULL")
        db.execSQL("ALTER TABLE sessions ADD COLUMN compact_boundary_timestamp INTEGER DEFAULT NULL")
    }
}

AppDatabase changes:

Bump version = 1 to version = 2
Update seed callback INSERT statements to include context_window_size for fresh installs
Register MIGRATION_1_2 in DatabaseModule.kt via .addMigrations(MIGRATION_1_2)

Preset Model Context Window Sizes

Model ID	Provider	Context Window
gpt-4o	OpenAI	128,000
gpt-4o-mini	OpenAI	128,000
o1	OpenAI	200,000
o3-mini	OpenAI	200,000
claude-opus-4-5-20251101	Anthropic	200,000
claude-sonnet-4-5-20250929	Anthropic	200,000
claude-haiku-4-5-20251001	Anthropic	200,000
gemini-2.0-flash	Gemini	1,048,576
gemini-2.5-pro	Gemini	1,048,576

API Design

SessionDao additions

@Query("UPDATE sessions SET compacted_summary = :summary, compact_boundary_timestamp = :boundaryTimestamp, updated_at = :updatedAt WHERE id = :id")
suspend fun updateCompactedSummary(id: String, summary: String?, boundaryTimestamp: Long?, updatedAt: Long)

SessionRepository additions

suspend fun updateCompactedSummary(id: String, summary: String?, boundaryTimestamp: Long?)

Mapper updates

ProviderMapper.kt: Map contextWindowSize in both ModelEntity.toDomain() and AiModel.toEntity().

SessionMapper.kt: Map compactedSummary and compactBoundaryTimestamp in both directions.

ChatEvent additions

sealed class ChatEvent {
    ...existing events...
    data object CompactStarted : ChatEvent()
    data class CompactCompleted(val didCompact: Boolean) : ChatEvent()
}

SendMessageUseCase Integration

Three changes to SendMessageUseCase:

Change 1: Compact-aware message building

Replace:

val allMessages = messageRepository.getMessagesSnapshot(sessionId)
val apiMessages = allMessages.toApiMessages()
// ...
adapter.sendMessageStream(..., systemPrompt = agent.systemPrompt)

With:

val allMessages = messageRepository.getMessagesSnapshot(sessionId)
val session = sessionRepository.getSessionById(sessionId)!!
val (effectiveSystemPrompt, apiMessages) = CompactAwareMessageBuilder.build(
    session = session,
    allMessages = allMessages,
    originalSystemPrompt = agent.systemPrompt
)
// ...
adapter.sendMessageStream(..., systemPrompt = effectiveSystemPrompt)

Change 2: Tool result truncation

In the tool result saving section, truncate before creating the Message:

val rawOutput = tr.result.result ?: tr.result.errorMessage ?: ""
val truncatedOutput = ToolResultTruncator.truncate(rawOutput)
// Use truncatedOutput as toolOutput

Change 3: Post-response compaction trigger

After send(ChatEvent.ResponseComplete(...)) and before break:

if (pendingToolCalls.isEmpty()) {
    sessionRepository.updateMessageStats(...)
    send(ChatEvent.ResponseComplete(aiMessage, usage))

    // Trigger auto-compact check
    send(ChatEvent.CompactStarted)
    val compactResult = autoCompactUseCase.compactIfNeeded(sessionId, model, provider)
    send(ChatEvent.CompactCompleted(compactResult.didCompact))

    break
}

Constructor change

Add autoCompactUseCase: AutoCompactUseCase parameter.

DI Registration

FeatureModule.kt:

// FEAT-011: Auto Compact
factory { AutoCompactUseCase(get(), get(), get(), get()) }

// Update SendMessageUseCase to include AutoCompactUseCase
factory { SendMessageUseCase(get(), get(), get(), get(), get(), get(), get(), get(), get()) }

Constants

object CompactConstants {
    const val COMPACT_THRESHOLD_RATIO = 0.85
    const val PROTECTED_WINDOW_RATIO = 0.25
    const val TOOL_RESULT_MAX_CHARS = 30_000
    const val CHARS_PER_TOKEN_ESTIMATE = 4
    const val SUMMARY_MAX_TOKENS = 2048
    const val MAX_RETRIES = 1
}

These can be defined inline in their respective classes (AutoCompactUseCase.Companion, ToolResultTruncator, TokenEstimator) rather than in a separate constants object, following the existing pattern in the codebase (e.g., SendMessageUseCase.MAX_TOOL_ROUNDS).

Data Flow

Auto Compact Flow

User sends message
  |
  v
SendMessageUseCase.execute()
  |
  +-> Get session + all messages
  +-> CompactAwareMessageBuilder.build(session, messages, systemPrompt)
  |     |
  |     +-> session.compactedSummary == null?
  |     |     YES -> return (originalPrompt, allMessages.toApiMessages())
  |     |     NO  -> filter messages >= boundaryTimestamp
  |     |            prepend summary to systemPrompt
  |     |            return (enhancedPrompt, recentMessages.toApiMessages())
  |
  +-> adapter.sendMessageStream(enhancedPrompt, filteredMessages)
  +-> Collect streaming response
  +-> Save AI response message
  |
  +-> pendingToolCalls.isEmpty()?
  |     NO  -> Execute tools, save results (with truncation), next round
  |     YES -> send(ResponseComplete)
  |            |
  |            v
  |            AutoCompactUseCase.compactIfNeeded(sessionId, model, provider)
  |              |
  |              +-> contextWindowSize == null? -> return (no-op)
  |              +-> TokenEstimator.estimateTotalTokens(messages)
  |              +-> totalTokens <= threshold? -> return (no-op)
  |              +-> splitMessages(messages, protectedBudget)
  |              +-> olderMessages.isEmpty()? -> return (no-op)
  |              +-> buildSummarizationPrompt(older, existingSummary)
  |              +-> adapter.generateSimpleCompletion(prompt, 2048)
  |              +-> success? -> sessionRepository.updateCompactedSummary()
  |              +-> failure? -> retry once -> still fail? -> return (no compact)
  |
  +-> break (flow ends)

Tool Result Truncation Flow

Tool executes -> returns result string
  |
  v
ToolResultTruncator.truncate(result)
  |
  +-> result.length <= 30,000? -> return as-is
  +-> result.length > 30,000? -> return first 30K chars + truncation marker
  |
  v
Save truncated result to Message.toolOutput -> DB

Error Handling

Scenario	Action
`contextWindowSize` is null	Skip compact entirely (no-op). Feature gracefully disabled.
All messages within protected window	Skip compact (nothing to summarize).
Summarization API fails (network error)	Retry once silently.
Retry also fails	Return `CompactResult(false)`. UI shows no error. Next request sends all messages – may hit API limit, which is handled by existing error flow.
Summarization returns empty/blank	Treat as failure, retry once.
Compact already in progress (race)	Not possible – compaction runs synchronously within the channelFlow after ResponseComplete, and UI blocks new messages during streaming.
Tool result truncation	Never throws. Returns input unchanged if under limit.

Performance Considerations

TokenEstimator: O(n) scan over messages, string length checks only. < 1ms for typical sessions.
CompactAwareMessageBuilder: O(n) filter by timestamp. Negligible overhead.
AutoCompactUseCase: The generateSimpleCompletion call is the bottleneck (network I/O). Expected 2-10 seconds depending on model/provider. Runs after ResponseComplete has been sent, so user already sees the response.
ToolResultTruncator: O(1) length check, O(n) substring only when needed. < 1ms.
DB migration: One-time cost. ALTER TABLE + UPDATE on 9 rows. < 100ms.

Security Considerations

Compact summaries may contain sensitive information from the conversation. They are stored in the same local Room database with the same access controls as the original messages.
The summarization request sends conversation content to the same API provider already handling the conversation. No new data exposure.
Tool result truncation reduces the amount of potentially sensitive external data stored locally.

Implementation Steps

Phase 1: Database Schema Changes

Add contextWindowSize: Int? to AiModel and ModelEntity
Add compactedSummary: String? and compactBoundaryTimestamp: Long? to Session and SessionEntity
Create Migrations.kt with MIGRATION_1_2
Bump AppDatabase to version 2, update seed callback
Register migration in DatabaseModule.kt
Update ProviderMapper (model mapping)
Update SessionMapper (session mapping)
Add updateCompactedSummary to SessionDao, SessionRepository, SessionRepositoryImpl

Phase 2: Tool Result Truncation

Create ToolResultTruncator.kt
Integrate into SendMessageUseCase (tool result saving)
Write ToolResultTruncatorTest.kt

Phase 3: Auto Compact Core

Create TokenEstimator.kt
Create AutoCompactUseCase.kt
Create CompactAwareMessageBuilder.kt
Write TokenEstimatorTest.kt
Write AutoCompactUseCaseTest.kt
Write CompactAwareMessageBuilderTest.kt

Phase 4: Integration

Add CompactStarted / CompactCompleted to ChatEvent
Modify SendMessageUseCase: compact-aware message building, compact trigger, add dependency
Handle compact events in ChatViewModel (brief indicator / Snackbar on fallback)
Register AutoCompactUseCase in FeatureModule.kt, update SendMessageUseCase factory
Update existing SendMessageUseCaseTest for new constructor parameter

Phase 5: Testing

Run ./gradlew test – all unit tests pass
Run ./gradlew connectedAndroidTest – all instrumented tests pass (update DAO tests for new columns)
Write migration instrumented test
Layer 2 adb verification if applicable
Write test report

Dependencies

Room: Database migration support (already available)
ModelApiAdapter.generateSimpleCompletion(): Already implemented in all 3 adapters, used by GenerateTitleUseCase
No new external libraries required

Risks and Mitigations

Risk	Likelihood	Impact	Mitigation
Character-based token estimation is inaccurate	Medium	Low	Over-estimation is acceptable; triggers compact slightly early. Under-estimation may cause API errors, handled by existing error flow.
Summary quality degrades context continuity	Low	Medium	Prompt is explicit about what to preserve. Summaries are cumulative. User can start a new session if context drifts.
Migration failure on existing installs	Low	High	ALTER TABLE ADD COLUMN is safe in SQLite. Thoroughly tested with instrumented migration tests.
`generateSimpleCompletion` timeout	Low	Low	Retry once. If both fail, conversation continues without compaction.

Alternatives Considered

Sliding window (drop oldest messages): Simpler but loses all context from older messages. Summary approach preserves key information.
Client-side tokenizer: More accurate token counts but adds dependency complexity (tiktoken/sentencepiece). Character estimation is sufficient for threshold detection.
Store summary as a Message: Would require a new MessageType.COMPACT_SUMMARY. Storing on Session is cleaner – one field per session, not mixed into the message sequence.
Fixed message count for protected window: Unreliable because message lengths vary wildly (a tool result can be 30K chars). Token-based proportion is more robust.

Future Extensions

User-configurable compact threshold in settings
Manual compact trigger button
Token usage indicator in chat UI
Auto-populate contextWindowSize when fetching models from provider APIs
“Reset context” button to clear compactedSummary

Change History

Date	Version	Changes	Owner
2026-02-28	0.1	Initial version	-