RFC-042: Telegram Bridge Image Delivery Fix
RFC-042: Telegram Bridge Image Delivery Fix
Document Information
- RFC ID: RFC-042
- Related PRD: FEAT-042 (Telegram Image Delivery Fix)
- Created: 2026-03-01
- Last Updated: 2026-03-01
- Status: Draft
- Author: TBD
Overview
Background
When a user sends an image (with or without a text caption) via Telegram to the bridge bot, the agent receives neither the image nor the caption. Two independent bugs cause this:
-
Wrong field for photo caption (
TelegramChannel.kt): Telegram places the text accompanying a photo inmessage["caption"], notmessage["text"]. The current code only readsmessage["text"], which isnullfor photo messages, so the caption is silently discarded. -
imagePaths dropped at executor boundary (
BridgeAgentExecutorImpl.kt):BridgeAgentExecutorImpl.executeMessage()receivesimagePathsfrom the channel layer but passes nothing toSendMessageUseCase.execute(). The downloaded image files are never forwarded to the agent.
Goals
- Fix caption extraction in
TelegramChannelto readmessage["caption"]for photo messages - Fix
BridgeAgentExecutorImplto convertimagePathsintoPendingAttachmentobjects and pass them toSendMessageUseCase
Non-Goals
- Multi-photo album support (Telegram sends each album photo as a separate update; out of scope)
- Image support for other channels (Discord, etc.)
- Outbound image responses from the agent to Telegram
Technical Design
Changed Files Overview
bridge/src/main/kotlin/com/oneclaw/shadow/bridge/channel/telegram/
└── TelegramChannel.kt # MODIFIED (caption extraction)
app/src/main/kotlin/com/oneclaw/shadow/
└── feature/bridge/
└── BridgeAgentExecutorImpl.kt # MODIFIED (imagePaths -> PendingAttachment)
bridge/src/test/kotlin/com/oneclaw/shadow/bridge/channel/telegram/
└── TelegramChannelTest.kt # MODIFIED (caption tests)
app/src/test/kotlin/com/oneclaw/shadow/feature/bridge/
└── BridgeAgentExecutorImplTest.kt # MODIFIED (imagePaths forwarding tests)
Detailed Design
Fix 1: Caption Extraction in TelegramChannel
Location: TelegramChannel.kt, in the polling loop where the update message is parsed.
Current code (reads only message["text"]):
val text = message["text"]?.jsonPrimitive?.content ?: ""
Fixed code (falls back to message["caption"] for photo messages):
val text = message["text"]?.jsonPrimitive?.content
?.takeIf { it.isNotBlank() }
?: message["caption"]?.jsonPrimitive?.content
?: ""
Rationale: Telegram’s Bot API places message text in message.text for text-only messages, and the image caption in message.caption for photo/document/video messages. These fields are mutually exclusive – a photo message never has a text field. Reading caption as the fallback covers all photo message variants without affecting text-only messages.
Behavior after fix:
| Telegram message type | message["text"] |
message["caption"] |
Resulting text |
|---|---|---|---|
| Text only | “Hello” | null | “Hello” |
| Photo with caption | null | “What is this?” | “What is this?” |
| Photo without caption | null | null | ”” |
| (unchanged) text only | “Hi” | null | “Hi” |
Fix 2: imagePaths Forwarded in BridgeAgentExecutorImpl
Location: BridgeAgentExecutorImpl.kt
Current code (imagePaths received but not forwarded):
override suspend fun executeMessage(
conversationId: String,
userMessage: String,
imagePaths: List<String>
) {
val agentId = resolveAgentId()
sendMessageUseCase.execute(
sessionId = conversationId,
userText = userMessage,
agentId = agentId
// imagePaths is silently dropped here
).collect()
}
Fixed code (imagePaths converted to PendingAttachment and forwarded):
override suspend fun executeMessage(
conversationId: String,
userMessage: String,
imagePaths: List<String>
) {
val agentId = resolveAgentId()
val pendingAttachments = imagePaths.mapNotNull { path ->
val file = File(path)
if (!file.exists()) return@mapNotNull null
AttachmentFileManager.PendingAttachment(
id = UUID.randomUUID().toString(),
type = AttachmentType.IMAGE,
fileName = file.name,
mimeType = mimeTypeFromExtension(file.extension),
fileSize = file.length(),
filePath = path,
thumbnailPath = null,
width = null,
height = null,
durationMs = null
)
}
sendMessageUseCase.execute(
sessionId = conversationId,
userText = userMessage,
agentId = agentId,
pendingAttachments = pendingAttachments
).collect()
}
private fun mimeTypeFromExtension(ext: String): String = when (ext.lowercase()) {
"jpg", "jpeg" -> "image/jpeg"
"png" -> "image/png"
"gif" -> "image/gif"
"webp" -> "image/webp"
"bmp" -> "image/bmp"
else -> "image/jpeg"
}
Why PendingAttachment directly (no AttachmentFileManager.copyFromUri)?
BridgeImageStorage.downloadAndStore() already copies the image to context.filesDir/bridge_images/ and returns the absolute path. The file is already in app-private storage. PendingAttachment only requires a valid filePath – the AttachmentFileManager.copyFromUri() path (for user-picked URIs) is not needed here. We construct PendingAttachment directly, skipping the URI copy step.
How SendMessageUseCase processes it:
SendMessageUseCase.execute() at lines 169–184 reads pendingAttachments, filters by ProviderCapability.supportsAttachmentType(provider.type, it.type), then calls attachmentFileManager.readAsBase64(pending.filePath) to convert each file to base64 for the API call. attachmentFileManager is already injected in the Koin featureModule – no DI changes required.
The image is also persisted to AttachmentRepository (lines 106–125), so it appears in the chat UI alongside the user message.
Required imports added to BridgeAgentExecutorImpl:
import com.oneclaw.shadow.core.model.AttachmentType
import com.oneclaw.shadow.data.local.AttachmentFileManager
import java.io.File
import java.util.UUID
Testing
Unit Tests
TelegramChannelTest – add:
photoMessage_withCaption_extractsCaption: mock update withphotoarray +captionfield; assertChannelMessage.text == "caption text"photoMessage_withoutCaption_textIsEmpty: mock update withphotoarray, nocaption; assertChannelMessage.text == ""textMessage_noChange_readsTextField: mock update withtextfield only; assertChannelMessage.text == "text content"
BridgeAgentExecutorImplTest – add:
executeMessage_withImagePaths_createsPendingAttachments: mocksendMessageUseCase.execute(); callexecuteMessage()with one imagePath pointing to a temp file; verifyexecute()called withpendingAttachmentslist of size 1executeMessage_withMissingImageFile_skipsIt: callexecuteMessage()with a path to a non-existent file; verifyexecute()called with emptypendingAttachmentsexecuteMessage_noImagePaths_forwardsEmpty: call with emptyimagePaths; verifyexecute()called with emptypendingAttachments
Manual Verification
- Send a Telegram photo with caption “describe this image”. Verify caption and image both appear in the active session in the app, and the agent responds with image description.
- Send a Telegram photo with no caption. Verify image appears in the session; agent receives and responds to the image.
- Send a plain text message via Telegram. Verify unchanged behavior (no regression).
- If image download fails (e.g., malformed file_id), verify message is still processed as text-only.
Migration Notes
- No database schema changes.
BridgeAgentExecutorImpladds three new imports (AttachmentType,AttachmentFileManager,File,UUID). No constructor changes required.TelegramChannelpolling loop: one-line change totextextraction. No interface or signature changes.
Open Questions
- Should the fallback
mimeTypedefault to"image/jpeg"(current proposal) or be left as"application/octet-stream"when extension is unknown? - Should the bridge-downloaded image files in
bridge_images/be cleaned up after they are forwarded to the agent, or retained for potential reuse?