RFC-033: PDF Tools
RFC-033: PDF Tools
Document Information
- RFC ID: RFC-033
- Related PRD: FEAT-033 (PDF Tools)
- Related Architecture: RFC-000 (Overall Architecture)
- Related RFC: RFC-004 (Tool System)
- Created: 2026-03-01
- Last Updated: 2026-03-01
- Status: Draft
- Author: TBD
Overview
Background
AI agents frequently need to work with PDF documents – summarizing reports, extracting data from invoices, reading research papers, or analyzing scanned forms. Currently, OneClaw has no built-in capability to read PDF files. Users can attach files (FEAT-026) and browse the file system (FEAT-025), but agents cannot extract content from PDFs.
OneClaw 1.0 has a mature PDF tools implementation in lib-pdf that provides three tools: pdf_info, pdf_extract_text, and pdf_render_page. This RFC ports that functionality to OneClaw’s tool architecture as Kotlin built-in tools, adapting the code to use OneClaw’s Tool interface, ToolResult, and ToolDefinition data types.
Goals
- Implement three Kotlin built-in tools:
PdfInfoTool,PdfExtractTextTool,PdfRenderPageTool - Create shared utility
PdfToolUtilsfor path resolution and page range parsing - Add PDFBox Android dependency to the project
- Initialize PDFBox in the application startup
- Register all three tools in
ToolModule - Add unit tests for all tools and utilities
Non-Goals
- PDF creation, editing, or annotation
- OCR for scanned PDFs (vision-capable models can analyze rendered images)
- Password-protected PDF support (deferred to future iteration)
- PDF form interaction
- PDF-to-Markdown conversion (beyond raw text extraction)
Technical Design
Architecture Overview
┌──────────────────────────────────────────────────────────────┐
│ Chat Layer (RFC-001) │
│ SendMessageUseCase │
│ │ │
│ │ tool call: pdf_info / pdf_extract_text / │
│ │ pdf_render_page │
│ v │
├──────────────────────────────────────────────────────────────┤
│ Tool Execution Engine (RFC-004) │
│ executeTool(name, params, availableToolIds) │
│ │ │
│ v │
│ ┌────────────────────────────────────────────────────────┐ │
│ │ ToolRegistry │ │
│ │ │ │
│ │ ┌─────────────┐ ┌──────────────────┐ ┌───────────┐ │ │
│ │ │ pdf_info │ │ pdf_extract_text │ │pdf_render │ │ │
│ │ │(PdfInfoTool) │ │(PdfExtractText │ │ _page │ │ │
│ │ │ │ │ Tool) │ │(PdfRender │ │ │
│ │ │ │ │ │ │ PageTool) │ │ │
│ │ └──────┬──────┘ └────────┬─────────┘ └─────┬─────┘ │ │
│ │ │ │ │ │ │
│ │ v v v │ │
│ │ ┌─────────────────────────────────────────────────┐ │ │
│ │ │ PdfToolUtils │ │ │
│ │ │ - initPdfBox(context) │ │ │
│ │ │ - parsePageRange(spec, totalPages) │ │ │
│ │ └─────────────────────────────────────────────────┘ │ │
│ │ │ │ │ │ │
│ │ v v v │ │
│ │ ┌──────────────┐ ┌──────────────┐ ┌─────────────┐ │ │
│ │ │ PDFBox │ │ PDFBox │ │ Android │ │ │
│ │ │ PDDocument │ │ PDFText │ │ PdfRenderer │ │ │
│ │ │ .docInfo │ │ Stripper │ │ + Bitmap │ │ │
│ │ └──────────────┘ └──────────────┘ └─────────────┘ │ │
│ └────────────────────────────────────────────────────────┘ │
└──────────────────────────────────────────────────────────────┘
Core Components
New:
PdfInfoTool– Kotlin built-in tool that reads PDF metadataPdfExtractTextTool– Kotlin built-in tool that extracts text from PDFsPdfRenderPageTool– Kotlin built-in tool that renders PDF pages as PNG imagesPdfToolUtils– Shared utility for PDFBox initialization and page range parsing
Modified:
ToolModule– Register the three PDF toolsbuild.gradle.kts– Add PDFBox Android dependency
Detailed Design
Directory Structure (New & Changed Files)
app/src/main/
├── kotlin/com/oneclaw/shadow/
│ ├── tool/
│ │ ├── builtin/
│ │ │ ├── PdfInfoTool.kt # NEW
│ │ │ ├── PdfExtractTextTool.kt # NEW
│ │ │ ├── PdfRenderPageTool.kt # NEW
│ │ │ ├── WebfetchTool.kt # unchanged
│ │ │ ├── BrowserTool.kt # unchanged
│ │ │ ├── LoadSkillTool.kt # unchanged
│ │ │ ├── CreateScheduledTaskTool.kt # unchanged
│ │ │ └── CreateAgentTool.kt # unchanged
│ │ └── util/
│ │ ├── PdfToolUtils.kt # NEW
│ │ └── HtmlToMarkdownConverter.kt # unchanged
│ └── di/
│ └── ToolModule.kt # MODIFIED
app/src/test/kotlin/com/oneclaw/shadow/
└── tool/
├── builtin/
│ ├── PdfInfoToolTest.kt # NEW
│ ├── PdfExtractTextToolTest.kt # NEW
│ └── PdfRenderPageToolTest.kt # NEW
└── util/
└── PdfToolUtilsTest.kt # NEW
PdfToolUtils
/**
* Located in: tool/util/PdfToolUtils.kt
*
* Shared utilities for PDF tools: PDFBox initialization
* and page range parsing.
*/
object PdfToolUtils {
private const val TAG = "PdfToolUtils"
private var initialized = false
/**
* Initialize PDFBox resource loader. Must be called once
* before any PDFBox operations. Safe to call multiple times.
*/
fun initPdfBox(context: Context) {
if (!initialized) {
PDFBoxResourceLoader.init(context.applicationContext)
initialized = true
Log.i(TAG, "PDFBox initialized")
}
}
/**
* Parse a page range specification string.
*
* Supported formats:
* - Single page: "3"
* - Range: "1-5"
* - Comma-separated: "1,3,5-7"
*
* @param spec Page range specification string
* @param totalPages Total number of pages in the document
* @return Pair of (startPage, endPage) 1-based inclusive, or null if invalid
*/
fun parsePageRange(spec: String, totalPages: Int): Pair<Int, Int>? {
val trimmed = spec.trim()
// Comma-separated: find overall min and max
if (trimmed.contains(",")) {
val parts = trimmed.split(",").map { it.trim() }
var min = Int.MAX_VALUE
var max = Int.MIN_VALUE
for (part in parts) {
val range = parsePageRange(part, totalPages) ?: return null
min = minOf(min, range.first)
max = maxOf(max, range.second)
}
return Pair(min, max)
}
// Range: "start-end"
if (trimmed.contains("-")) {
val parts = trimmed.split("-", limit = 2)
val start = parts[0].trim().toIntOrNull() ?: return null
val end = parts[1].trim().toIntOrNull() ?: return null
if (start < 1 || end > totalPages || start > end) return null
return Pair(start, end)
}
// Single page
val page = trimmed.toIntOrNull() ?: return null
if (page < 1 || page > totalPages) return null
return Pair(page, page)
}
}
PdfInfoTool
/**
* Located in: tool/builtin/PdfInfoTool.kt
*
* Reads PDF metadata: page count, file size, title, author,
* subject, creator, producer, and creation date.
*/
class PdfInfoTool(
private val context: Context
) : Tool {
companion object {
private const val TAG = "PdfInfoTool"
}
override val definition = ToolDefinition(
name = "pdf_info",
description = "Get metadata and info about a PDF file. " +
"Returns page count, file size, title, author, and other document properties.",
parametersSchema = ToolParametersSchema(
properties = mapOf(
"path" to ToolParameter(
type = "string",
description = "Path to the PDF file"
)
),
required = listOf("path")
),
requiredPermissions = emptyList(),
timeoutSeconds = 15
)
override suspend fun execute(parameters: Map<String, Any?>): ToolResult {
val path = parameters["path"]?.toString()
?: return ToolResult.error("validation_error", "Parameter 'path' is required")
val file = File(path)
if (!file.exists()) {
return ToolResult.error("file_not_found", "File not found: $path")
}
if (!file.canRead()) {
return ToolResult.error("permission_denied", "Cannot read file: $path")
}
PdfToolUtils.initPdfBox(context)
return try {
val doc = PDDocument.load(file)
val result = try {
val info = doc.documentInformation
val lines = mutableListOf<String>()
lines.add("File: ${file.name}")
lines.add("Path: $path")
lines.add("Pages: ${doc.numberOfPages}")
lines.add("File size: ${file.length()} bytes")
info.title?.takeIf { it.isNotBlank() }?.let {
lines.add("Title: $it")
}
info.author?.takeIf { it.isNotBlank() }?.let {
lines.add("Author: $it")
}
info.subject?.takeIf { it.isNotBlank() }?.let {
lines.add("Subject: $it")
}
info.creator?.takeIf { it.isNotBlank() }?.let {
lines.add("Creator: $it")
}
info.producer?.takeIf { it.isNotBlank() }?.let {
lines.add("Producer: $it")
}
info.creationDate?.time?.let {
lines.add("Created: $it")
}
ToolResult.success(lines.joinToString("\n"))
} finally {
doc.close()
}
result
} catch (e: Exception) {
Log.e(TAG, "Failed to read PDF info: $path", e)
ToolResult.error("pdf_error", "Failed to read PDF info: ${e.message}")
}
}
}
PdfExtractTextTool
/**
* Located in: tool/builtin/PdfExtractTextTool.kt
*
* Extracts text content from PDF files using PDFBox's PDFTextStripper.
* Supports page range selection and output truncation.
*/
class PdfExtractTextTool(
private val context: Context
) : Tool {
companion object {
private const val TAG = "PdfExtractTextTool"
private const val DEFAULT_MAX_CHARS = 50_000
}
override val definition = ToolDefinition(
name = "pdf_extract_text",
description = "Extract text content from a PDF file. " +
"Supports page range selection. For scanned PDFs with no text layer, " +
"use pdf_render_page to get page images instead.",
parametersSchema = ToolParametersSchema(
properties = mapOf(
"path" to ToolParameter(
type = "string",
description = "Path to the PDF file"
),
"pages" to ToolParameter(
type = "string",
description = "Page range to extract (e.g. \"1-5\", \"3\", \"1,3,5-7\"). " +
"Omit to extract all pages."
),
"max_chars" to ToolParameter(
type = "integer",
description = "Maximum characters to return (default 50000)"
)
),
required = listOf("path")
),
requiredPermissions = emptyList(),
timeoutSeconds = 30
)
override suspend fun execute(parameters: Map<String, Any?>): ToolResult {
val path = parameters["path"]?.toString()
?: return ToolResult.error("validation_error", "Parameter 'path' is required")
val maxChars = (parameters["max_chars"] as? Number)?.toInt()
?: DEFAULT_MAX_CHARS
val pagesArg = parameters["pages"]?.toString()
val file = File(path)
if (!file.exists()) {
return ToolResult.error("file_not_found", "File not found: $path")
}
if (!file.canRead()) {
return ToolResult.error("permission_denied", "Cannot read file: $path")
}
PdfToolUtils.initPdfBox(context)
return try {
val doc = PDDocument.load(file)
val result = try {
val stripper = PDFTextStripper()
val totalPages = doc.numberOfPages
if (pagesArg != null) {
val range = PdfToolUtils.parsePageRange(pagesArg, totalPages)
?: return ToolResult.error(
"invalid_page_range",
"Invalid page range: $pagesArg (document has $totalPages pages)"
)
stripper.startPage = range.first
stripper.endPage = range.second
}
val text = stripper.getText(doc)
if (text.isBlank()) {
ToolResult.success(
"No text content found in PDF. This may be a scanned document. " +
"Use pdf_render_page to render pages as images for visual inspection."
)
} else {
val truncated = if (text.length > maxChars) {
text.take(maxChars) +
"\n\n[Truncated at $maxChars characters. " +
"Total text length: ${text.length}. " +
"Use 'pages' parameter to extract specific pages.]"
} else {
text
}
val header = "Extracted text from ${file.name}" +
(if (pagesArg != null) " (pages: $pagesArg)" else "") +
" [$totalPages total pages]:\n\n"
ToolResult.success(header + truncated)
}
} finally {
doc.close()
}
result
} catch (e: Exception) {
Log.e(TAG, "Failed to extract PDF text: $path", e)
ToolResult.error("pdf_error", "Failed to extract PDF text: ${e.message}")
}
}
}
PdfRenderPageTool
/**
* Located in: tool/builtin/PdfRenderPageTool.kt
*
* Renders a PDF page to a PNG image using Android's PdfRenderer.
* Saves the output to the app's internal pdf-renders/ directory.
*/
class PdfRenderPageTool(
private val context: Context
) : Tool {
companion object {
private const val TAG = "PdfRenderPageTool"
private const val DEFAULT_DPI = 150
private const val MIN_DPI = 72
private const val MAX_DPI = 300
}
override val definition = ToolDefinition(
name = "pdf_render_page",
description = "Render a PDF page to a PNG image. " +
"Useful for scanned PDFs or pages with complex layouts, charts, or images. " +
"The rendered image is saved to pdf-renders/ in the app's storage.",
parametersSchema = ToolParametersSchema(
properties = mapOf(
"path" to ToolParameter(
type = "string",
description = "Path to the PDF file"
),
"page" to ToolParameter(
type = "integer",
description = "Page number to render (1-based)"
),
"dpi" to ToolParameter(
type = "integer",
description = "Render resolution in DPI (default 150, min 72, max 300)"
)
),
required = listOf("path", "page")
),
requiredPermissions = emptyList(),
timeoutSeconds = 30
)
override suspend fun execute(parameters: Map<String, Any?>): ToolResult {
val path = parameters["path"]?.toString()
?: return ToolResult.error("validation_error", "Parameter 'path' is required")
val pageNum = (parameters["page"] as? Number)?.toInt()
?: return ToolResult.error("validation_error", "Parameter 'page' is required")
val dpi = ((parameters["dpi"] as? Number)?.toInt() ?: DEFAULT_DPI)
.coerceIn(MIN_DPI, MAX_DPI)
val file = File(path)
if (!file.exists()) {
return ToolResult.error("file_not_found", "File not found: $path")
}
if (!file.canRead()) {
return ToolResult.error("permission_denied", "Cannot read file: $path")
}
return try {
val fd = ParcelFileDescriptor.open(file, ParcelFileDescriptor.MODE_READ_ONLY)
val renderer = PdfRenderer(fd)
val pageIndex = pageNum - 1
if (pageIndex < 0 || pageIndex >= renderer.pageCount) {
renderer.close()
fd.close()
return ToolResult.error(
"invalid_page",
"Page $pageNum out of range (document has ${renderer.pageCount} pages)"
)
}
val page = renderer.openPage(pageIndex)
val scale = dpi / 72f
val width = (page.width * scale).toInt()
val height = (page.height * scale).toInt()
val bitmap = Bitmap.createBitmap(width, height, Bitmap.Config.ARGB_8888)
bitmap.eraseColor(Color.WHITE)
page.render(
bitmap,
null,
null,
PdfRenderer.Page.RENDER_MODE_FOR_DISPLAY
)
page.close()
renderer.close()
fd.close()
// Save PNG to app's internal storage
val outputDir = File(context.filesDir, "pdf-renders").also { it.mkdirs() }
val baseName = file.nameWithoutExtension
val outputFile = File(outputDir, "${baseName}-page${pageNum}.png")
FileOutputStream(outputFile).use { out ->
bitmap.compress(Bitmap.CompressFormat.PNG, 100, out)
}
bitmap.recycle()
ToolResult.success(
"Page $pageNum rendered and saved to: ${outputFile.absolutePath}\n" +
"Resolution: ${width}x${height} (${dpi} DPI)\n" +
"File size: ${outputFile.length()} bytes"
)
} catch (e: Exception) {
Log.e(TAG, "Failed to render PDF page: $path page $pageNum", e)
ToolResult.error("pdf_error", "Failed to render PDF page: ${e.message}")
}
}
}
PDFBox Android Dependency
Add to app/build.gradle.kts:
dependencies {
// ... existing dependencies ...
// PDF tools: PDFBox for text extraction and metadata
implementation("com.tom-roush:pdfbox-android:2.0.27.0")
}
PDFBox Android:
- Size: ~2.5MB (includes fonts and resources)
- License: Apache 2.0
- Transitive dependencies: none significant
- Compatible with Android API 21+
- Used by OneClaw 1.0 (
lib-pdfmodule)
ToolModule Changes
// In ToolModule.kt, add imports:
import com.oneclaw.shadow.tool.builtin.PdfInfoTool
import com.oneclaw.shadow.tool.builtin.PdfExtractTextTool
import com.oneclaw.shadow.tool.builtin.PdfRenderPageTool
import com.oneclaw.shadow.tool.util.PdfToolUtils
val toolModule = module {
// ... existing declarations ...
// RFC-033: PDF tools
single {
PdfToolUtils.initPdfBox(androidContext())
PdfInfoTool(androidContext())
}
single {
PdfExtractTextTool(androidContext())
}
single {
PdfRenderPageTool(androidContext())
}
single {
ToolRegistry().apply {
// ... existing tool registrations ...
// RFC-033: PDF tools
try {
register(get<PdfInfoTool>(), ToolSourceInfo.BUILTIN)
} catch (e: Exception) {
Log.e("ToolModule", "Failed to register pdf_info: ${e.message}")
}
try {
register(get<PdfExtractTextTool>(), ToolSourceInfo.BUILTIN)
} catch (e: Exception) {
Log.e("ToolModule", "Failed to register pdf_extract_text: ${e.message}")
}
try {
register(get<PdfRenderPageTool>(), ToolSourceInfo.BUILTIN)
} catch (e: Exception) {
Log.e("ToolModule", "Failed to register pdf_render_page: ${e.message}")
}
// ... JS tool loading (unchanged) ...
}
}
// ... rest of module unchanged ...
}
Implementation Plan
Phase 1: Dependencies and Utilities
- Add
com.tom-roush:pdfbox-android:2.0.27.0toapp/build.gradle.kts - Create
PdfToolUtils.ktintool/util/ - Create
PdfToolUtilsTest.ktwith tests forparsePageRange() - Verify build compiles successfully
Phase 2: PdfInfoTool
- Create
PdfInfoTool.ktintool/builtin/ - Create
PdfInfoToolTest.ktwith unit tests - Register in
ToolModule.kt - Verify
./gradlew testpasses
Phase 3: PdfExtractTextTool
- Create
PdfExtractTextTool.ktintool/builtin/ - Create
PdfExtractTextToolTest.ktwith unit tests - Register in
ToolModule.kt - Verify
./gradlew testpasses
Phase 4: PdfRenderPageTool
- Create
PdfRenderPageTool.ktintool/builtin/ - Create
PdfRenderPageToolTest.ktwith unit tests - Register in
ToolModule.kt - Verify
./gradlew testpasses
Phase 5: Integration Testing
- Run full Layer 1A test suite (
./gradlew test) - Run Layer 1B tests if emulator available
- Manual testing with real PDF files on device
- Write test report
Data Model
No data model or database changes. The tools operate on files and return string results through the existing ToolResult type.
API Design
Tool Interfaces
Tool: pdf_info
Parameters:
- path: string (required) -- Path to the PDF file
Returns on success:
Multi-line text with file info, page count, and metadata fields
Returns on error:
ToolResult.error with error type and message
Tool: pdf_extract_text
Parameters:
- path: string (required) -- Path to the PDF file
- pages: string (optional) -- Page range specification
- max_chars: integer (optional, default: 50000) -- Output character limit
Returns on success:
Header line + extracted text content
Returns on error:
ToolResult.error with error type and message
Tool: pdf_render_page
Parameters:
- path: string (required) -- Path to the PDF file
- page: integer (required) -- Page number (1-based)
- dpi: integer (optional, default: 150) -- Render resolution (72-300)
Returns on success:
Text with output file path, resolution, and file size
Returns on error:
ToolResult.error with error type and message
Error Handling
| Error Type | Cause | Response |
|---|---|---|
validation_error |
Missing or invalid required parameter | ToolResult.error("validation_error", "Parameter 'X' is required") |
file_not_found |
File does not exist at given path | ToolResult.error("file_not_found", "File not found: <path>") |
permission_denied |
Cannot read the file | ToolResult.error("permission_denied", "Cannot read file: <path>") |
invalid_page |
Page number out of document range | ToolResult.error("invalid_page", "Page N out of range (document has M pages)") |
invalid_page_range |
Malformed page range string | ToolResult.error("invalid_page_range", "Invalid page range: <spec>") |
pdf_error |
PDFBox or PdfRenderer exception | ToolResult.error("pdf_error", "Failed to ...: <exception message>") |
All errors follow the existing ToolResult.error(errorType, errorMessage) pattern used by other built-in tools.
Security Considerations
-
File access: Tools accept file paths within app-private storage. No external storage permissions are required – file access is confined to app-private storage (context.filesDir) via FsBridge allowlist validation.
-
Resource management: PDDocument, PdfRenderer, ParcelFileDescriptor, and Bitmap objects are properly closed/recycled in try-finally blocks to prevent resource leaks.
-
Memory safety: PDFBox may consume significant memory for large PDFs. The tools do not load the entire document into memory at once – PDFTextStripper processes pages sequentially. PdfRenderer renders one page at a time with the bitmap recycled after saving.
-
Output isolation: Rendered PNG files are saved to the app’s internal storage (
context.filesDir/pdf-renders/), not to shared external storage. -
No network access: All operations are local file operations. No data is sent externally.
Performance
| Operation | Expected Time | Notes |
|---|---|---|
pdf_info (typical PDF) |
< 500ms | Opens file, reads metadata, closes |
pdf_extract_text (10 pages) |
< 1s | PDFTextStripper sequential processing |
pdf_extract_text (100 pages) |
< 5s | Linear scaling with page count |
pdf_render_page (150 DPI) |
< 2s | Render + PNG compression |
pdf_render_page (300 DPI) |
< 5s | 4x pixels vs 150 DPI |
Memory usage:
- PDDocument: ~2-10MB for typical PDFs (closed after use)
- Bitmap at 150 DPI (A4 page): ~3.5MB (recycled after saving)
- Bitmap at 300 DPI (A4 page): ~14MB (recycled after saving)
Testing Strategy
Unit Tests
PdfToolUtilsTest.kt:
testParsePageRange_singlePage– “3” returns (3, 3)testParsePageRange_range– “1-5” returns (1, 5)testParsePageRange_commaSeparated– “1,3,5-7” returns (1, 7)testParsePageRange_rangeWithSpaces– “1 - 5” returns (1, 5)testParsePageRange_invalidRange– “5-2” returns nulltestParsePageRange_outOfBounds– “0” and “11” for 10-page doc return nulltestParsePageRange_nonNumeric– “abc” returns nulltestParsePageRange_singlePageRange– “1-1” returns (1, 1)
PdfInfoToolTest.kt:
testDefinition– Tool definition has correct name, parameters, permissionstestExecute_missingPath– Returns validation errortestExecute_fileNotFound– Returns file_not_found errortestExecute_validPdf– Returns page count, file size, metadata (using test PDF resource)
PdfExtractTextToolTest.kt:
testDefinition– Tool definition has correct name, parameters, permissionstestExecute_missingPath– Returns validation errortestExecute_fileNotFound– Returns file_not_found errortestExecute_extractAllPages– Returns full text (using test PDF resource)testExecute_extractPageRange– Returns text from specified pagestestExecute_invalidPageRange– Returns invalid_page_range errortestExecute_truncation– Returns truncated text with notice when exceeding max_charstestExecute_defaultMaxChars– Default max_chars is 50000
PdfRenderPageToolTest.kt:
testDefinition– Tool definition has correct name, parameters, permissionstestExecute_missingPath– Returns validation errortestExecute_missingPage– Returns validation errortestExecute_fileNotFound– Returns file_not_found errortestExecute_pageOutOfRange– Returns invalid_page errortestExecute_dpiClamping– DPI values outside 72-300 are clamped
Test PDF Resources
Place test PDF files in app/src/test/resources/:
test-document.pdf– A simple text-based PDF with known content (2-3 pages)- Created programmatically in test setup using PDFBox, or checked in as a small test fixture
Integration Tests (Manual)
- Install app on device with PDFs in
/sdcard/Documents/ - Ask agent to summarize a text-based PDF
- Ask agent to render a page from a scanned PDF
- Ask agent to extract specific pages from a long document
- Verify tool calls appear correctly in chat history
Alternatives Considered
1. Single PdfTool class with mode parameter
Approach: One PdfTool class with a mode parameter (“info”, “extract_text”, “render_page”), similar to BrowserTool.
Rejected because: The three operations have very different parameter sets. Combining them into one tool with a mode parameter would make the parameter schema complex and confusing for AI models. Separate tools with focused parameter schemas produce better AI tool-calling behavior.
2. JavaScript-based PDF tools
Approach: Implement PDF tools as JS tools using a PDF library in QuickJS.
Rejected because: QuickJS has no DOM APIs and limited file I/O. PDF parsing libraries require native code or full JVM support. PDFBox Android and PdfRenderer are Kotlin/Java native and cannot run in QuickJS. The Kotlin built-in approach is the correct fit.
3. Use only Android PdfRenderer (no PDFBox)
Approach: Use Android’s built-in PdfRenderer for everything, including text extraction (by rendering pages and using OCR).
Rejected because: PdfRenderer can only render pages as bitmaps. It cannot extract text directly. Using OCR for text extraction would be slow, inaccurate, and add unnecessary complexity. PDFBox provides direct text extraction via PDFTextStripper.
Dependencies
External Dependencies
| Dependency | Version | Size | License |
|---|---|---|---|
| com.tom-roush:pdfbox-android | 2.0.27.0 | ~2.5MB | Apache 2.0 |
| Android PdfRenderer | Built-in (API 21+) | 0 | Android Framework |
Internal Dependencies
Toolinterface fromtool/engine/ToolResult,ToolDefinition,ToolParametersSchema,ToolParameterfromcore/model/ToolRegistry,ToolSourceInfofromtool/engine/andcore/model/Contextfrom Android framework (injected via KoinandroidContext())
Future Extensions
- Password-protected PDFs: Add optional
passwordparameter topdf_extract_textandpdf_info - Batch page rendering: Add
pagesparameter topdf_render_pageto render multiple pages at once - PDF search: Add a
pdf_searchtool to search for text within a PDF and return page numbers and context - PDF table extraction: Specialized table extraction using PDFBox’s table detection heuristics
- PDF bookmark/outline extraction: Extract the document outline/table of contents
Change History
| Date | Version | Changes | Owner |
|---|---|---|---|
| 2026-03-01 | 0.1 | Initial version | - |