RFC-021: Kotlin Webfetch Tool
RFC-021: Kotlin Webfetch Tool
Document Information
- RFC ID: RFC-021
- Related PRD: FEAT-021 (Kotlin Webfetch Tool)
- Related Architecture: RFC-000 (Overall Architecture)
- Related RFC: RFC-004 (Tool System), RFC-015 (JS Tool Migration)
- Created: 2026-03-01
- Last Updated: 2026-03-01
- Status: Draft
- Author: TBD
Overview
Background
RFC-015 introduced a JavaScript-based webfetch tool that fetches web pages and converts HTML to Markdown. The original design intended to use the Turndown library for DOM-based HTML-to-Markdown conversion. However, Turndown requires DOM APIs (document.createElement, DOMParser, etc.) that are unavailable in the QuickJS JavaScript runtime. The current implementation falls back to regex-based conversion, which has fundamental limitations:
- Cannot handle nested structures (lists within lists, tables within blockquotes)
- Fragile content extraction using single-match regex for
<main>/<article>tags - Fails on malformed HTML (unclosed tags, overlapping elements)
- Cannot properly handle character encoding edge cases
RFC-021 replaces the JS webfetch tool with a Kotlin-native implementation using Jsoup, a battle-tested Java HTML parser that provides a full DOM API. This gives us proper HTML parsing, robust content extraction, and accurate Markdown conversion without requiring DOM APIs in the JS runtime.
Goals
- Implement
WebfetchTool.ktas a Kotlin built-in tool intool/builtin/ - Implement
HtmlToMarkdownConverterutility class for DOM-based HTML-to-Markdown conversion - Add Jsoup dependency to the project
- Remove JS
webfetch.jsandwebfetch.jsonfromassets/js/tools/ - Update
ToolModuleto register the new KotlinWebfetchTool - Add output truncation with configurable character limit
Non-Goals
- Implementing full Mozilla Readability scoring algorithm
- JavaScript rendering for dynamic/SPA pages (deferred to RFC-022)
- Response caching
- PDF or binary content extraction
- Cookie/session management
Technical Design
Architecture Overview
┌──────────────────────────────────────────────────────────────┐
│ Chat Layer (RFC-001) │
│ SendMessageUseCase │
│ │ │
│ │ tool call: webfetch(url, max_length?) │
│ v │
├──────────────────────────────────────────────────────────────┤
│ Tool Execution Engine (RFC-004) │
│ executeTool(name, params, availableToolIds) │
│ │ │
│ v │
│ ┌────────────────────────────────────────────────────────┐ │
│ │ ToolRegistry │ │
│ │ ┌──────────────────┐ │ │
│ │ │ webfetch │ Kotlin built-in [NEW] │ │
│ │ │ (WebfetchTool.kt) │ │ │
│ │ └───────┬──────────┘ │ │
│ │ │ │ │
│ │ v │ │
│ │ ┌──────────────────────────────────────────────────┐ │ │
│ │ │ WebfetchTool │ │ │
│ │ │ 1. Validate URL │ │ │
│ │ │ 2. Fetch HTML via OkHttpClient │ │ │
│ │ │ 3. Parse with Jsoup │ │ │
│ │ │ 4. Extract main content │ │ │
│ │ │ 5. Convert to Markdown (HtmlToMarkdownConverter) │ │ │
│ │ │ 6. Truncate if needed │ │ │
│ │ └──────────────────────────────────────────────────┘ │ │
│ └────────────────────────────────────────────────────────┘ │
└──────────────────────────────────────────────────────────────┘
Core Components
New:
WebfetchTool– Kotlin built-in tool that fetches web pages and returns MarkdownHtmlToMarkdownConverter– Utility class for DOM-based HTML-to-Markdown conversion
Modified:
ToolModule– RegisterWebfetchToolas a Kotlin built-in tool
Removed:
assets/js/tools/webfetch.js– JS implementationassets/js/tools/webfetch.json– JS tool definition
Detailed Design
Directory Structure (New & Changed Files)
app/src/main/
├── kotlin/com/oneclaw/shadow/
│ ├── tool/
│ │ ├── builtin/
│ │ │ ├── WebfetchTool.kt # NEW
│ │ │ ├── LoadSkillTool.kt # unchanged
│ │ │ ├── CreateScheduledTaskTool.kt # unchanged
│ │ │ └── CreateAgentTool.kt # unchanged
│ │ └── util/
│ │ └── HtmlToMarkdownConverter.kt # NEW
│ └── di/
│ └── ToolModule.kt # MODIFIED
├── assets/
│ └── js/
│ └── tools/
│ ├── webfetch.js # DELETED
│ ├── webfetch.json # DELETED
│ ├── get_current_time.js # unchanged
│ └── ... # other JS tools unchanged
app/src/test/kotlin/com/oneclaw/shadow/
└── tool/
├── builtin/
│ └── WebfetchToolTest.kt # NEW
└── util/
└── HtmlToMarkdownConverterTest.kt # NEW
WebfetchTool
/**
* Located in: tool/builtin/WebfetchTool.kt
*
* Kotlin-native webfetch tool that fetches web pages and converts
* HTML to Markdown using Jsoup. Replaces the JS webfetch implementation.
*/
class WebfetchTool(
private val okHttpClient: OkHttpClient
) : Tool {
companion object {
private const val TAG = "WebfetchTool"
private const val DEFAULT_MAX_LENGTH = 50_000
private const val MAX_RESPONSE_SIZE = 5 * 1024 * 1024 // 5MB
private const val USER_AGENT = "Mozilla/5.0 (Linux; Android 14) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Mobile Safari/537.36"
}
override val definition = ToolDefinition(
name = "webfetch",
description = "Fetch a web page and return its content as Markdown",
parameters = ToolParameters(
properties = mapOf(
"url" to ToolParameter(
type = "string",
description = "The URL to fetch"
),
"max_length" to ToolParameter(
type = "integer",
description = "Maximum output length in characters. Default: 50000"
)
),
required = listOf("url")
)
)
override suspend fun execute(
params: Map<String, Any?>,
env: Map<String, String>
): ToolResult {
val url = params["url"]?.toString()
?: return ToolResult.error("Parameter 'url' is required")
val maxLength = (params["max_length"] as? Number)?.toInt()
?: DEFAULT_MAX_LENGTH
// Validate URL scheme
val parsedUrl = try {
java.net.URL(url)
} catch (e: Exception) {
return ToolResult.error("Invalid URL: ${e.message}")
}
if (parsedUrl.protocol !in listOf("http", "https")) {
return ToolResult.error("Only HTTP and HTTPS URLs are supported")
}
return try {
val response = fetchUrl(url)
processResponse(response, maxLength)
} catch (e: java.net.SocketTimeoutException) {
ToolResult.error("Request timed out: ${e.message}")
} catch (e: java.net.UnknownHostException) {
ToolResult.error("DNS resolution failed: ${e.message}")
} catch (e: java.io.IOException) {
ToolResult.error("Network error: ${e.message}")
} catch (e: Exception) {
Log.e(TAG, "Unexpected error fetching $url", e)
ToolResult.error("Error: ${e.message}")
}
}
private suspend fun fetchUrl(url: String): okhttp3.Response {
val request = okhttp3.Request.Builder()
.url(url)
.header("User-Agent", USER_AGENT)
.header("Accept", "text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8")
.header("Accept-Language", "en-US,en;q=0.5")
.build()
return withContext(Dispatchers.IO) {
okHttpClient.newCall(request).execute()
}
}
private fun processResponse(
response: okhttp3.Response,
maxLength: Int
): ToolResult {
if (!response.isSuccessful) {
val body = response.body?.string()?.take(1000) ?: ""
return ToolResult.error(
"HTTP ${response.code}: ${response.message}\n$body"
)
}
val contentType = response.header("Content-Type")?.lowercase() ?: ""
val body = response.body?.let { responseBody ->
// Limit response size to prevent OOM
val source = responseBody.source()
source.request(MAX_RESPONSE_SIZE.toLong())
val buffer = source.buffer
if (buffer.size > MAX_RESPONSE_SIZE) {
buffer.snapshot(MAX_RESPONSE_SIZE.toLong()).utf8()
} else {
responseBody.string()
}
} ?: return ToolResult.error("Empty response body")
// Non-HTML: return raw body (truncated)
if (!contentType.contains("text/html") && !contentType.contains("application/xhtml")) {
return ToolResult.success(truncateText(body, maxLength))
}
// HTML: parse and convert to Markdown
val markdown = HtmlToMarkdownConverter.convert(body, response.request.url.toString())
return ToolResult.success(truncateText(markdown, maxLength))
}
private fun truncateText(text: String, maxLength: Int): String {
if (maxLength <= 0 || text.length <= maxLength) return text
// Find the last paragraph/block boundary before the limit
val truncateAt = text.lastIndexOf("\n\n", maxLength)
val cutoff = if (truncateAt > maxLength / 2) truncateAt else maxLength
return text.substring(0, cutoff) + "\n\n[Content truncated at $maxLength characters]"
}
}
HtmlToMarkdownConverter
/**
* Located in: tool/util/HtmlToMarkdownConverter.kt
*
* Converts HTML to Markdown using Jsoup DOM traversal.
* Handles content extraction (article/main detection),
* noise removal, and element-by-element Markdown rendering.
*/
object HtmlToMarkdownConverter {
// Elements to remove entirely (content and tag)
private val NOISE_TAGS = setOf(
"script", "style", "nav", "header", "footer", "aside",
"noscript", "svg", "iframe", "form", "button", "input",
"select", "textarea"
)
// Block elements that produce paragraph breaks
private val BLOCK_TAGS = setOf(
"p", "div", "section", "article", "main", "figure",
"figcaption", "details", "summary", "address"
)
/**
* Convert HTML string to Markdown.
*
* @param html The raw HTML string
* @param baseUrl Optional base URL for resolving relative links
* @return Markdown string
*/
fun convert(html: String, baseUrl: String? = null): String {
val doc = if (baseUrl != null) {
Jsoup.parse(html, baseUrl)
} else {
Jsoup.parse(html)
}
// Extract title
val title = doc.title().takeIf { it.isNotBlank() }
// Remove noise elements
NOISE_TAGS.forEach { tag ->
doc.select(tag).remove()
}
// Find main content area
val contentElement = findMainContent(doc)
// Convert to Markdown
val markdown = convertElement(contentElement, depth = 0)
// Clean up whitespace
val cleaned = cleanupWhitespace(markdown)
// Prepend title if not already present in content
return if (title != null && !cleaned.startsWith("# ") && title !in cleaned.take(200)) {
"# $title\n\n$cleaned"
} else {
cleaned
}
}
/**
* Find the main content element using a priority-based strategy.
* article > main > [role="main"] > body
*/
private fun findMainContent(doc: Document): Element {
// Try <article> first -- most specific content marker
doc.selectFirst("article")?.let { return it }
// Try <main>
doc.selectFirst("main")?.let { return it }
// Try role="main"
doc.selectFirst("[role=main]")?.let { return it }
// Fallback to <body>
return doc.body() ?: doc
}
/**
* Recursively convert a Jsoup Element to Markdown.
*/
private fun convertElement(element: Element, depth: Int): String {
val sb = StringBuilder()
for (node in element.childNodes()) {
when (node) {
is TextNode -> {
val text = node.text()
if (text.isNotBlank()) {
sb.append(text)
} else if (text.isNotEmpty() && sb.isNotEmpty() && !sb.endsWith(" ")) {
sb.append(" ")
}
}
is Element -> {
sb.append(convertTag(node, depth))
}
}
}
return sb.toString()
}
/**
* Convert a specific HTML tag to its Markdown equivalent.
*/
private fun convertTag(el: Element, depth: Int): String {
val tag = el.tagName().lowercase()
return when (tag) {
// Headings
"h1" -> "\n\n# ${inlineText(el)}\n\n"
"h2" -> "\n\n## ${inlineText(el)}\n\n"
"h3" -> "\n\n### ${inlineText(el)}\n\n"
"h4" -> "\n\n#### ${inlineText(el)}\n\n"
"h5" -> "\n\n##### ${inlineText(el)}\n\n"
"h6" -> "\n\n###### ${inlineText(el)}\n\n"
// Paragraphs and block elements
"p" -> "\n\n${convertElement(el, depth)}\n\n"
"div", "section", "article", "main" -> "\n\n${convertElement(el, depth)}\n\n"
// Links
"a" -> {
val href = el.absUrl("href").ifEmpty { el.attr("href") }
val text = inlineText(el)
if (text.isNotBlank() && href.isNotBlank()) {
"[$text]($href)"
} else if (text.isNotBlank()) {
text
} else {
""
}
}
// Emphasis
"strong", "b" -> "**${inlineText(el)}**"
"em", "i" -> "*${inlineText(el)}*"
"del", "s", "strike" -> "~~${inlineText(el)}~~"
// Code
"code" -> {
if (el.parent()?.tagName() == "pre") {
// Handled by <pre> case
el.wholeText()
} else {
"`${el.text()}`"
}
}
"pre" -> {
val codeEl = el.selectFirst("code")
val code = codeEl?.wholeText() ?: el.wholeText()
val lang = codeEl?.className()
?.replace("language-", "")
?.replace("lang-", "")
?.takeIf { it.isNotBlank() && !it.contains(" ") }
?: ""
"\n\n```$lang\n${code.trimEnd()}\n```\n\n"
}
// Lists
"ul" -> "\n\n${convertList(el, ordered = false, indent = depth)}\n\n"
"ol" -> "\n\n${convertList(el, ordered = true, indent = depth)}\n\n"
// Blockquote
"blockquote" -> {
val content = convertElement(el, depth).trim()
val quoted = content.lines().joinToString("\n") { "> $it" }
"\n\n$quoted\n\n"
}
// Images
"img" -> {
val alt = el.attr("alt")
val src = el.absUrl("src").ifEmpty { el.attr("src") }
if (src.isNotBlank()) "" else ""
}
// Horizontal rule
"hr" -> "\n\n---\n\n"
// Line break
"br" -> "\n"
// Tables
"table" -> "\n\n${convertTable(el)}\n\n"
// Definition lists
"dl" -> "\n\n${convertDefinitionList(el)}\n\n"
// Figure
"figure" -> "\n\n${convertElement(el, depth)}\n\n"
"figcaption" -> "\n*${inlineText(el)}*\n"
// Other block elements
in BLOCK_TAGS -> "\n\n${convertElement(el, depth)}\n\n"
// Unknown/inline elements -- recurse into children
else -> convertElement(el, depth)
}
}
/**
* Convert a <ul> or <ol> to Markdown list items with proper nesting.
*/
private fun convertList(
listEl: Element,
ordered: Boolean,
indent: Int
): String {
val sb = StringBuilder()
val prefix = " ".repeat(indent)
var index = 1
for (li in listEl.children()) {
if (li.tagName().lowercase() != "li") continue
val bullet = if (ordered) "${index}. " else "- "
val content = StringBuilder()
for (child in li.childNodes()) {
when (child) {
is TextNode -> {
val text = child.text().trim()
if (text.isNotBlank()) content.append(text)
}
is Element -> {
if (child.tagName().lowercase() in listOf("ul", "ol")) {
// Nested list
content.append("\n")
content.append(convertList(
child,
ordered = child.tagName().lowercase() == "ol",
indent = indent + 1
))
} else {
content.append(inlineText(child))
}
}
}
}
sb.append("$prefix$bullet${content.toString().trim()}\n")
index++
}
return sb.toString().trimEnd()
}
/**
* Convert a <table> to Markdown table format.
*/
private fun convertTable(table: Element): String {
val rows = mutableListOf<List<String>>()
// Collect all rows from thead, tbody, tfoot
for (row in table.select("tr")) {
val cells = row.select("th, td").map { inlineText(it).trim() }
if (cells.isNotEmpty()) {
rows.add(cells)
}
}
if (rows.isEmpty()) return ""
// Determine column count
val colCount = rows.maxOf { it.size }
// Pad rows to equal column count
val paddedRows = rows.map { row ->
row + List(colCount - row.size) { "" }
}
val sb = StringBuilder()
// Header row
sb.append("| ${paddedRows[0].joinToString(" | ")} |\n")
// Separator
sb.append("| ${paddedRows[0].map { "---" }.joinToString(" | ")} |\n")
// Data rows
for (i in 1 until paddedRows.size) {
sb.append("| ${paddedRows[i].joinToString(" | ")} |\n")
}
return sb.toString().trimEnd()
}
/**
* Convert a <dl> to Markdown format.
*/
private fun convertDefinitionList(dl: Element): String {
val sb = StringBuilder()
for (child in dl.children()) {
when (child.tagName().lowercase()) {
"dt" -> sb.append("**${inlineText(child)}**\n")
"dd" -> sb.append(": ${inlineText(child)}\n\n")
}
}
return sb.toString().trimEnd()
}
/**
* Extract inline text from an element, stripping all HTML tags.
* Preserves inline Markdown formatting from child elements.
*/
private fun inlineText(el: Element): String {
val sb = StringBuilder()
for (node in el.childNodes()) {
when (node) {
is TextNode -> sb.append(node.text())
is Element -> {
when (node.tagName().lowercase()) {
"strong", "b" -> sb.append("**${inlineText(node)}**")
"em", "i" -> sb.append("*${inlineText(node)}*")
"code" -> sb.append("`${node.text()}`")
"a" -> {
val href = node.absUrl("href").ifEmpty { node.attr("href") }
val text = inlineText(node)
if (text.isNotBlank() && href.isNotBlank()) {
sb.append("[$text]($href)")
} else {
sb.append(text)
}
}
"br" -> sb.append("\n")
"img" -> {
val alt = node.attr("alt")
val src = node.absUrl("src").ifEmpty { node.attr("src") }
if (src.isNotBlank()) sb.append("")
}
else -> sb.append(inlineText(node))
}
}
}
}
return sb.toString()
}
/**
* Clean up whitespace in the final Markdown output.
*/
private fun cleanupWhitespace(markdown: String): String {
return markdown
.replace(Regex("\n{3,}"), "\n\n") // Collapse multiple blank lines
.replace(Regex("[ \t]+\n"), "\n") // Trailing whitespace
.replace(Regex("\n[ \t]+\n"), "\n\n") // Lines with only whitespace
.trim()
}
}
Jsoup Dependency
Add to app/build.gradle.kts:
dependencies {
// ... existing dependencies ...
implementation("org.jsoup:jsoup:1.18.3")
}
Jsoup is:
- ~400KB in size
- Apache 2.0 licensed
- Has zero transitive dependencies
- Compatible with Android API 21+
- Widely used in Android projects
ToolModule Changes
// In ToolModule.kt, add WebfetchTool registration
val toolModule = module {
// ... existing tool registrations ...
// WebfetchTool (replaces JS webfetch)
single { WebfetchTool(get()) }
single { ToolRegistry() } bind ToolRegistry::class
// In the ToolRegistry initialization, register WebfetchTool:
single {
val registry = get<ToolRegistry>()
// ... existing tool registrations ...
registry.register(get<WebfetchTool>())
// ... rest of initialization ...
registry
}
}
JS Tool Removal
Remove the following files from assets/js/tools/:
webfetch.js– The regex-based JS implementationwebfetch.json– The JS tool definition
The JsToolLoader will no longer find a webfetch JS tool, and the Kotlin WebfetchTool will be registered directly in ToolModule.
Implementation Plan
Phase 1: HtmlToMarkdownConverter (Core Logic)
- Add Jsoup dependency to
build.gradle.kts - Create
HtmlToMarkdownConverter.ktintool/util/ - Create
HtmlToMarkdownConverterTest.ktwith comprehensive test cases - Verify conversion quality against the current regex-based implementation
Phase 2: WebfetchTool (Tool Integration)
- Create
WebfetchTool.ktintool/builtin/ - Create
WebfetchToolTest.kt - Update
ToolModule.ktto registerWebfetchTool - Remove
webfetch.jsandwebfetch.jsonfrom assets
Phase 3: Testing & Verification
- Run Layer 1A tests (
./gradlew test) - Run Layer 1B tests if emulator available
- Manual testing with various real-world URLs
- Compare output quality against the JS implementation
Data Model
No data model changes. WebfetchTool implements the existing Tool interface.
API Design
Tool Interface
Tool Name: webfetch
Parameters:
- url: string (required) -- The URL to fetch
- max_length: integer (optional, default: 50000) -- Maximum output length
Returns on success:
Markdown string of the page content
Returns on error:
ToolResult.error with descriptive message
HtmlToMarkdownConverter Public API
object HtmlToMarkdownConverter {
fun convert(html: String, baseUrl: String? = null): String
}
Migration Strategy
The migration is a direct replacement:
WebfetchToolis registered inToolModuleas a Kotlin built-in- JS
webfetch.jsandwebfetch.jsonare deleted from assets - The
JsToolLoaderno longer loads a JS webfetch tool - The tool name
webfetchand parameterurlremain identical - AI models and users see no behavioral change (output format is the same: Markdown)
The only visible difference is improved Markdown quality for complex HTML pages.
Error Handling
| Error | Cause | Handling |
|---|---|---|
| Invalid URL | Malformed URL or non-HTTP scheme | ToolResult.error("Invalid URL: ...") |
| DNS failure | Unknown host | ToolResult.error("DNS resolution failed: ...") |
| Connection timeout | Server unreachable or slow | ToolResult.error("Request timed out: ...") |
| HTTP 4xx/5xx | Server error | ToolResult.error("HTTP {code}: {message}") |
| Response too large | Page exceeds 5MB | HTML truncated before parsing |
| Empty response | No body in response | ToolResult.error("Empty response body") |
| Parse failure | Jsoup cannot parse | Jsoup handles malformed HTML gracefully; returns best-effort output |
Security Considerations
- URL scheme validation: Only HTTP and HTTPS are accepted.
file://,content://,javascript:schemes are rejected. - Response size limit: Responses larger than 5MB are truncated before parsing to prevent OOM.
- No credential forwarding: No cookies, sessions, or authentication tokens are sent.
- User-Agent spoofing: A standard mobile browser User-Agent is set to avoid bot-blocking, which is transparent and not deceptive.
- Jsoup safety: Jsoup’s parser is safe against malicious HTML (no script execution, no external entity resolution).
- Output is Markdown: The output is plain text (Markdown), not HTML, so XSS concerns do not apply.
Performance
| Operation | Expected Time | Notes |
|---|---|---|
| Network fetch | Variable | Depends on server/network |
| Jsoup parse (500KB HTML) | ~50ms | Single-threaded DOM construction |
| Markdown conversion | ~30ms | DOM traversal, string building |
| Total (excluding network) | < 100ms | Well within 30s tool timeout |
Memory usage:
- Jsoup DOM: ~3-5x the HTML size in memory (temporary)
- Markdown output: typically smaller than source HTML
- All objects are garbage collected after the tool call returns
Testing Strategy
Unit Tests
HtmlToMarkdownConverterTest.kt:
testConvertSimpleHtml– Basic HTML with paragraphs and headingstestConvertLinks– Absolute and relative linkstestConvertNestedLists– Nested unordered and ordered liststestConvertTables– Tables with headers and data rowstestConvertCodeBlocks– Inline code and fenced code blockstestConvertBlockquotes– Single and nested blockquotestestNoiseRemoval– Script, style, nav elements removedtestContentExtraction_article– Prefers<article>contenttestContentExtraction_main– Falls back to<main>testContentExtraction_body– Falls back to<body>testTitlePrepend– Title added when not in contenttestEmptyHtml– Empty or minimal HTMLtestMalformedHtml– Unclosed tags, invalid nestingtestWhitespaceCleanup– Multiple blank lines collapsed
WebfetchToolTest.kt:
testExecute_success_html– Successful HTML fetch and conversiontestExecute_success_nonHtml– Non-HTML content returned as-istestExecute_httpError– HTTP error responsetestExecute_networkError– Network failuretestExecute_invalidUrl– Invalid URL parametertestExecute_missingUrl– Missing required parametertestExecute_truncation– Output truncated at max_lengthtestExecute_customMaxLength– Custom max_length parametertestDefinition– Tool definition has correct name and parameters
Integration Tests
Manual verification with real URLs:
- Simple blog post / article page
- Documentation page with code blocks and tables
- Page with nested lists and complex structure
- Non-English page (CJK characters)
- Page with large amount of content (truncation test)
Alternatives Considered
1. Keep JS webfetch with Turndown in WebView
Approach: Load Turndown in a WebView instead of QuickJS, using real DOM APIs. Rejected because: WebView is heavyweight, requires Android context and main thread coordination, and introduces unnecessary complexity for a simple conversion task. Jsoup is purpose-built for this.
2. Use Mozilla Readability port
Approach: Port or use a Java version of Mozilla’s Readability algorithm for content extraction. Rejected because: Full Readability is complex (scoring, candidate selection, etc.) and overkill for V1. The simple article/main/body priority works well for most pages. Can be added later.
3. Use a different HTML parser (TagSoup, HtmlCleaner)
Approach: Use an alternative to Jsoup. Rejected because: Jsoup is the de facto standard for Java/Android HTML parsing. It’s actively maintained, well-documented, has zero dependencies, and is the most popular choice.
Dependencies
External Dependencies
| Dependency | Version | Size | License |
|---|---|---|---|
| org.jsoup:jsoup | 1.18.3 | ~400KB | Apache 2.0 |
Internal Dependencies
Toolinterface fromtool/packageToolResult,ToolDefinition,ToolParametersfromtool/packageOkHttpClientfrom network module (already available via Koin)
Change History
| Date | Version | Changes | Owner |
|---|---|---|---|
| 2026-03-01 | 0.1 | Initial version | - |