WebView Browser Tool
WebView Browser Tool
Feature Information
- Feature ID: FEAT-022
- Created: 2026-03-01
- Last Updated: 2026-03-01
- Status: Draft
- Priority: P2 (Nice to Have)
- Owner: TBD
- Related RFC: RFC-022 (pending)
User Story
As an AI agent using OneClaw, I want a browser tool that can render web pages in a real browser environment, take screenshots, and extract content from dynamically-rendered pages (SPAs), so that I can interact with modern web applications that rely on JavaScript for content rendering, and visually inspect pages when text extraction is insufficient.
Typical Scenarios
- The agent needs to check the visual layout of a web page. It calls
browser_screenshotwith a URL and receives a screenshot image that it can analyze visually. - The agent needs content from a React/Vue/Angular SPA. The
webfetchtool returns mostly empty HTML because the content is rendered by JavaScript. The agent usesbrowser_extractto load the page in a real WebView, wait for JS rendering, and extract the final DOM content as Markdown. - The agent needs to verify that a web form looks correct. It takes a screenshot and confirms the layout, button placement, and text are as expected.
- The agent needs to extract structured data from a page that loads content via AJAX after initial page load.
browser_extractwaits for the page to settle before extracting.
Feature Description
Overview
FEAT-022 adds a WebView-based browser tool that provides two capabilities to AI agents:
browser_screenshot: Render a web page in an off-screen Android WebView and capture a screenshot as an image.browser_extract: Render a web page in a WebView, execute JavaScript in the browser context to extract content, and return the result as Markdown or structured text.
Unlike webfetch (FEAT-021), which fetches static HTML and parses it server-side, the browser tool uses a real browser engine (Android WebView / Chromium) that executes JavaScript, loads dynamic content, and renders the page visually. This enables interaction with modern SPAs and JavaScript-heavy pages.
Architecture Overview
AI Model
| tool call: browser_screenshot(url="...") or browser_extract(url="...")
v
ToolExecutionEngine (Kotlin, unchanged)
|
v
ToolRegistry
|
v
BrowserTool [NEW - Kotlin built-in tool]
|
+-- WebViewManager [NEW - manages off-screen WebView lifecycle]
| |
| +-- WebView (off-screen, created on main thread)
| | |
| | +-- JavaScript execution (evaluateJavascript)
| | +-- Page rendering (Chromium engine)
| |
| +-- ScreenshotCapture [NEW - captures WebView as bitmap]
| | |
| | +-- Canvas-based capture
| | +-- Image compression (PNG/JPEG)
| |
| +-- ContentExtractor [NEW - JS-based DOM extraction]
| |
| +-- Built-in extraction script (Readability-style)
| +-- Optional Turndown (runs in WebView with real DOM)
|
+-- Output handling
|
+-- Screenshot: file path or base64
+-- Extract: Markdown string
Two Capabilities, One Tool
The browser tool exposes two modes via a single tool registration with a mode parameter:
browser_screenshot
Renders a page and captures a screenshot:
| Field | Value |
|---|---|
| Mode | screenshot |
| Description | Render a web page and capture a screenshot |
| Parameters | url (string, required): The URL to render |
mode (string, required): "screenshot" |
|
width (integer, optional): Viewport width in pixels. Default: 412 (Pixel-like) |
|
height (integer, optional): Viewport height in pixels. Default: 915 |
|
wait_seconds (number, optional): Seconds to wait after page load for JS rendering. Default: 2 |
|
full_page (boolean, optional): Capture full scrollable page, not just viewport. Default: false |
|
| Returns | Object with image_path (file path to saved screenshot) |
browser_extract
Renders a page and extracts content via JavaScript:
| Field | Value |
|---|---|
| Mode | extract |
| Description | Render a web page and extract content as Markdown |
| Parameters | url (string, required): The URL to render |
mode (string, required): "extract" |
|
wait_seconds (number, optional): Seconds to wait after page load. Default: 2 |
|
max_length (integer, optional): Maximum output length. Default: 50000 |
|
javascript (string, optional): Custom JS to execute. Must return a string. |
|
| Returns | Markdown string of the extracted content |
Tool Definition
| Field | Value |
|---|---|
| Name | browser |
| Description | Render a web page in a browser, then take a screenshot or extract content |
| Parameters | url (string, required): The URL to load |
mode (string, required): "screenshot" or "extract" |
|
width (integer, optional): Viewport width. Default: 412 |
|
height (integer, optional): Viewport height. Default: 915 |
|
wait_seconds (number, optional): Wait time after load. Default: 2 |
|
full_page (boolean, optional): Full-page screenshot. Default: false |
|
max_length (integer, optional): Max output for extract mode. Default: 50000 |
|
javascript (string, optional): Custom JS for extract mode |
|
| Timeout | 60 seconds |
| Returns | Screenshot file path (screenshot mode) or Markdown string (extract mode) |
Screenshot Capture
The screenshot mode uses Android’s WebView rendering pipeline:
- Create or reuse an off-screen WebView (not visible to the user)
- Set viewport dimensions per parameters
- Load the URL and wait for
onPageFinishedcallback - Wait additional
wait_secondsfor dynamic content to settle - Capture the WebView content to a Bitmap:
- Viewport-only: Draw the WebView to a Canvas at the viewport size
- Full-page: Measure the full content height via
computeVerticalScrollRange(), create an appropriately-sized Bitmap, scroll and capture sections
- Compress the Bitmap to PNG
- Save to app-internal cache directory
- Return the file path
Content Extraction
The extract mode uses evaluateJavascript() to run JavaScript in the WebView’s browser context:
- Load the page and wait for rendering (same as screenshot flow)
- If
javascriptparameter is provided, execute that custom script - Otherwise, execute a built-in extraction script that:
a. Finds the main content area (same heuristic as FEAT-021: article > main > body)
b. Strips noise elements (script, style, nav, footer, etc.)
c. Uses Turndown (loaded in the WebView context where DOM APIs are available) to convert HTML to Markdown
d. Falls back to
innerTextextraction if Turndown fails - Return the extracted text, truncated to
max_length
This is where Turndown finally works as originally intended by FEAT-015 – running in a real browser environment with full DOM API access.
WebView Lifecycle
Managing WebView instances requires care to avoid memory leaks:
- WebViews must be created on the main (UI) thread
- A single reusable WebView instance is maintained by
WebViewManager - The WebView is created lazily on first use
- After each tool call, the WebView is reset (
loadUrl("about:blank"), clear cache) - The WebView is destroyed when the app goes to background or on explicit cleanup
- A timeout ensures the WebView is destroyed if not used for 5 minutes
Relationship to webfetch
browser and webfetch are complementary tools:
webfetch (FEAT-021) |
browser (FEAT-022) |
|
|---|---|---|
| Engine | OkHttp + Jsoup | Android WebView (Chromium) |
| JavaScript | Not executed | Fully executed |
| Dynamic content | No (static HTML only) | Yes (SPAs, AJAX, etc.) |
| Screenshot | No | Yes |
| Speed | Fast (< 1s typical) | Slower (2-5s typical) |
| Memory | Low (~5MB peak) | High (~50-100MB for WebView) |
| Use case | Static pages, docs, articles | SPAs, JS-rendered pages, visual inspection |
| Recommended for | Most web fetching | When webfetch returns empty/incomplete content |
The AI model should prefer webfetch for most tasks and fall back to browser when content requires JavaScript rendering.
User Interaction Flows
Screenshot Flow
1. User: "Show me what google.com looks like"
2. AI calls browser(url="https://google.com", mode="screenshot")
3. BrowserTool:
a. Creates/reuses off-screen WebView
b. Loads URL, waits for page load + 2s
c. Captures screenshot to PNG
d. Returns file path
4. AI returns the screenshot image to the user
5. Chat displays the screenshot inline
Extract Flow
1. User: "Get the content from this React app: https://example-spa.com"
2. AI first tries webfetch -- gets mostly empty HTML (JS not executed)
3. AI calls browser(url="https://example-spa.com", mode="extract")
4. BrowserTool:
a. Loads URL in WebView, waits for JS rendering
b. Runs extraction script via evaluateJavascript()
c. Returns Markdown content
5. AI summarizes the content for the user
Custom JavaScript Flow
1. User: "Get the price from this product page"
2. AI calls browser(url="https://shop.example.com/product/123", mode="extract",
javascript="document.querySelector('.product-price')?.textContent || 'Price not found'")
3. BrowserTool:
a. Loads URL, waits for rendering
b. Executes custom JS
c. Returns the price text
4. AI reports the price to the user
Acceptance Criteria
Must pass (all required):
browsertool is registered inToolRegistrywith bothscreenshotandextractmodesmodeparameter is required and validated (only"screenshot"or"extract"accepted)- Screenshot mode renders a page and saves a PNG file
- Screenshot file is accessible by the AI model (returned as file path)
- Extract mode loads a page, executes JavaScript, and returns text content
- Extract mode handles SPA pages that render content via JavaScript
- Default extraction script finds main content and converts to Markdown
- Custom
javascriptparameter is executed and its return value is used wait_secondsparameter controls the wait time after page load- WebView is created on the main thread and properly managed
- WebView is cleaned up after each tool call (no state leaks between calls)
- URL validation rejects non-HTTP schemes
- Tool timeout (60s) prevents hanging on pages that never finish loading
- All Layer 1A tests pass
Optional (nice to have for V1):
full_pagescreenshot captures the entire scrollable page- Viewport size is configurable via
widthandheightparameters - WebView reuse across multiple tool calls in the same session
- Screenshot format selection (PNG vs JPEG)
UI/UX Requirements
This feature has no new UI for the user. The WebView is off-screen and invisible:
- Screenshots are saved to cache and displayed in chat as image results
- Extract results are displayed as text tool results
- No browser window or WebView is shown to the user
Feature Boundary
Included
- Off-screen WebView management (
WebViewManager) - Screenshot capture via Canvas/Bitmap
- Content extraction via
evaluateJavascript() - Built-in extraction script with Turndown in WebView
- Custom JavaScript execution in extract mode
- Configurable wait time for dynamic content
- Output truncation for extract mode
- WebView lifecycle management (create, reuse, cleanup, destroy)
Not Included (V1)
- Interactive browsing (clicking, scrolling, form filling)
- Multi-page navigation within a single tool call
- Cookie/session persistence across tool calls
- Authentication (login flows)
- PDF rendering or download
- Video/audio content capture
- Browser DevTools or network inspection
- Headless Chrome or external browser integration
- Accessibility tree extraction
Business Rules
browseronly accepts HTTP and HTTPS URLs- WebView JavaScript execution is enabled (required for SPA support)
- Custom
javascriptparameter executes in the page’s context (has access to page DOM) - Screenshots are saved as PNG files in the app’s cache directory
- Screenshot files are temporary and may be cleaned up by the system
- WebView does not persist cookies, localStorage, or session data between tool calls
- Tool timeout is 60 seconds (longer than
webfetchdue to rendering time) - The WebView is not visible to the user at any time
Non-Functional Requirements
Performance
- Page load + screenshot: 3-8 seconds typical (depends on page complexity)
- Page load + extraction: 3-6 seconds typical
- WebView creation (cold start): ~500ms
- WebView reuse (warm): < 100ms overhead
- Screenshot PNG compression: < 200ms for viewport-size images
Memory
- WebView process: ~50-100MB (managed by Android system, separate process)
- Screenshot bitmap: ~5-15MB (viewport size, ARGB_8888)
- Full-page screenshot: potentially larger (capped at 10,000px height)
- WebView is destroyed when not in use to free memory
Compatibility
- Requires Android API 24+ (WebView
evaluateJavascriptand modern features) - WebView version depends on user’s system WebView update
- Most modern web pages render correctly in Android WebView
Security
- WebView JavaScript is enabled but sandboxed by the Android WebView security model
WebViewClientdoes not expose Android interfaces to web content (addJavascriptInterfaceis NOT used for page scripts)- Custom
javascriptparameter runs in the same sandbox as page scripts - No file:// or content:// URLs allowed
- WebView cache is cleared after each use
Dependencies
Depends On
- FEAT-004 (Tool System): Tool interface, registry, execution engine
- FEAT-021 (Kotlin Webfetch): Establishes the static HTML fetching baseline; browser tool complements it
Depended On By
- No other features currently depend on FEAT-022
External Dependencies
- Android WebView: System component, no additional dependency needed
- Turndown.js (~20KB): Already bundled as
assets/js/lib/turndown.min.js(from FEAT-015). Loaded into WebView for DOM-to-Markdown conversion.
Error Handling
Error Scenarios
- Invalid URL
- Cause: Malformed URL or non-HTTP scheme
- Handling: Return
ToolResult.error("Invalid URL: <message>")
- Invalid mode
- Cause:
modeis not"screenshot"or"extract" - Handling: Return
ToolResult.error("Invalid mode. Use 'screenshot' or 'extract'")
- Cause:
- Page load failure
- Cause: Network error, DNS failure, SSL error
- Handling:
WebViewClient.onReceivedErrorcaptures the error; returnToolResult.error("Page load failed: <message>")
- Page load timeout
- Cause: Page never finishes loading within tool timeout
- Handling: Cancel loading, return partial result or error
- JavaScript execution error
- Cause: Custom JS throws an exception or returns undefined
- Handling: Return
ToolResult.error("JavaScript error: <message>")
- Screenshot capture failure
- Cause: WebView not rendered, bitmap allocation failure
- Handling: Return
ToolResult.error("Screenshot capture failed: <message>")
- WebView unavailable
- Cause: System WebView not installed or disabled (rare)
- Handling: Return
ToolResult.error("WebView not available on this device")
- Out of memory
- Cause: Full-page screenshot of extremely long page
- Handling: Cap full-page height at 10,000px; fall back to viewport-only
Future Improvements
- Interactive mode: Support clicking, scrolling, and form filling for multi-step web interactions
- Session persistence: Allow the AI to maintain a browser session across multiple tool calls
- Network inspection: Capture XHR/fetch requests and responses for debugging
- Accessibility tree: Extract the accessibility tree for structured page understanding
- Element screenshot: Screenshot a specific CSS selector rather than the full page
- Video/GIF capture: Record short animations or page transitions
- Multiple tabs: Support opening and switching between multiple pages
Test Points
Functional Tests
- Verify screenshot mode produces a valid PNG file
- Verify screenshot file path is returned in the tool result
- Verify extract mode returns Markdown content
- Verify extract mode handles JS-rendered content (mock SPA page)
- Verify custom
javascriptparameter is executed and result returned - Verify
wait_secondsparameter delays extraction - Verify URL validation rejects non-HTTP schemes
- Verify
modeparameter validation - Verify WebView cleanup after tool call (no state leaks)
- Verify tool timeout works (60s)
Edge Cases
- Page with infinite scroll (screenshot should capture viewport only by default)
- Page that never finishes loading (timeout handling)
- Page with SSL certificate error
- Page with HTTP Basic Auth prompt
javascriptparameter that returns a very large stringjavascriptparameter that takes longer than wait_seconds- Concurrent browser tool calls (should queue or reject)
- App goes to background during browser tool execution
- WebView process crash during rendering
- Page with
<meta http-equiv="refresh">redirect
Change History
| Date | Version | Changes | Owner |
|---|---|---|---|
| 2026-03-01 | 0.1 | Initial version | - |