browsy

Zero-render browser engine for AI agents. browsy.dev

browsy converts web pages into a structured Spatial DOM -- a flat list of interactive and text elements with bounding boxes, roles, and states -- without rendering pixels. On top of this, it layers page intelligence: automatic page type detection, suggested actions with stable element IDs, CAPTCHA detection, and hidden content exposure.

$ browsy fetch https://github.com/login

page_type: Login
suggested_actions:
  Login { username: 19, password: 21, submit: 34 }

[19:input "Username or email address" @top-C]
[21:input "Password" @mid-C]
[34:button "Sign in" @mid-C]

// 203ms. No Chromium. No LLM needed.

Why browsy?

Every AI agent that touches the web today launches a 300MB Chromium instance, waits 5 seconds for it to render, then asks an LLM "what am I looking at?"

browsy skips all of that:

	Chromium-based tools	browsy
Speed	5-30 seconds per page	~200ms
Dependencies	282MB+ Chromium	6MB binary
Page intelligence	None (LLM must figure it out)	12 page types, 13 action recipes
Hidden content	Not accessible	Exposed with `hidden: true`
CAPTCHA detection	None	reCAPTCHA, hCaptcha, Turnstile, Cloudflare, image grid
Output	Raw accessibility tree	Structured Spatial DOM
Deterministic	No (LLM variance)	Yes (same HTML = same output)

When to use browsy

browsy handles server-rendered HTML -- the 90% of the web that doesn't need a browser to understand. Login forms, search pages, news sites, government portals, documentation, e-commerce product pages.

For JS-rendered SPAs (React, Angular, Vue apps that render client-side), you still need a real browser. browsy is the fast path, not a full browser replacement.

Key features

Page intelligence -- 12 page types detected automatically, 13 action recipes with element IDs
CAPTCHA detection -- identifies reCAPTCHA, hCaptcha, Cloudflare Turnstile, image grids with sitekey extraction
Hidden content exposure -- dropdowns, modals, accordions included with hidden: true
Session API -- navigate, click, type, select, search -- with cookie persistence
Built-in web search -- DuckDuckGo and Google, search and fetch results in one call
Smart deduplication -- 34-42% element reduction on real sites
Delta output -- only changes after first load
MCP server -- use browsy from Claude Code or any MCP client
Python bindings -- PyO3-based, full session API
6MB binary -- zero runtime dependencies

Quickstart

This guide covers the core browsy-core workflow: parse HTML, fetch live pages, read page intelligence, and interact with forms.

1. Install

cargo add browsy-core

This pulls in the fetch feature by default, which includes HTTP fetching via reqwest. See Installation for other installation methods.

2. Parse HTML

The simplest entry point is browsy_core::parse. Pass it an HTML string and a viewport size, and it returns a SpatialDom -- a flat list of elements with bounding boxes, roles, and states.

#![allow(unused)]
fn main() {
let html = r#"
<html>
  <body>
    <h1>Hello, world</h1>
    <a href="/about">About</a>
    <input type="text" placeholder="Search..." />
  </body>
</html>
"#;

let dom = browsy_core::parse(html, 1920.0, 1080.0);

// Iterate over elements
for el in &dom.els {
    println!("[{}:{} {:?}]", el.id, el.tag, el.text);
}
}

The viewport dimensions (1920x1080 here) affect layout computation -- elements get positioned and sized as they would in a real browser at that resolution.

SpatialDom serializes to JSON via serde:

#![allow(unused)]
fn main() {
let json = serde_json::to_string_pretty(&dom).unwrap();
println!("{}", json);
}

3. Fetch and parse a live page

The Session API handles HTTP fetching, cookie persistence, and page interaction. It requires the fetch feature (enabled by default).

use browsy_core::fetch::Session;

fn main() -> Result<(), Box<dyn std::error::Error>> {
    let mut session = Session::new()?;
    let dom = session.goto("https://example.com")?;

    println!("Title: {}", dom.title);
    println!("Elements: {}", dom.els.len());

    // Elements are accessible by ID
    if let Some(el) = dom.get(1) {
        println!("First element: {} {:?}", el.tag, el.text);
    }

    // Filter to visible-only or above-the-fold
    let visible = dom.visible();
    let above_fold = dom.above_fold();

    Ok(())
}

Sessions persist cookies across navigations. Each call to goto returns a fresh SpatialDom for the new page.

4. Read page intelligence

Every SpatialDom includes two forms of page intelligence: a detected page type and a list of suggested actions with stable element IDs.

use browsy_core::fetch::Session;
use browsy_core::output::{PageType, SuggestedAction};

fn main() -> Result<(), Box<dyn std::error::Error>> {
    let mut session = Session::new()?;
    let dom = session.goto("https://github.com/login")?;

    // Page type: Login, Search, Article, Form, List, Dashboard, etc.
    println!("Page type: {:?}", dom.page_type);

    // Suggested actions tell the agent exactly what to do
    for action in &dom.suggested_actions {
        match action {
            SuggestedAction::Login { username_id, password_id, submit_id, .. } => {
                println!("Login form found:");
                println!("  Username field: element {}", username_id);
                println!("  Password field: element {}", password_id);
                println!("  Submit button: element {}", submit_id);
            }
            SuggestedAction::Search { input_id, submit_id } => {
                println!("Search: input={}, submit={}", input_id, submit_id);
            }
            SuggestedAction::EnterCode { input_id, submit_id, code_length } => {
                println!("2FA code: input={}, submit={}, length={:?}",
                    input_id, submit_id, code_length);
            }
            _ => println!("Action: {:?}", action),
        }
    }

    Ok(())
}

Page types

browsy detects 12 page types automatically:

PageType	Meaning
`Login`	Login form with username/password fields
`TwoFactorAuth`	Verification code entry (2FA, email confirmation)
`OAuthConsent`	OAuth authorization prompt
`Captcha`	CAPTCHA challenge page
`Search`	Search page (empty query state)
`SearchResults`	Search results page
`Inbox`	Email or message inbox
`EmailBody`	Single email or message view
`Dashboard`	Dashboard or admin panel
`Form`	Generic form (registration, contact, settings)
`Article`	Article, blog post, documentation page
`List`	List or catalog page (products, directory)
`Error`	Error page (404, 500, access denied)
`Other`	No specific type detected

CAPTCHA detection

When browsy detects a CAPTCHA, it sets page_type to Captcha and populates captcha with details:

#![allow(unused)]
fn main() {
if dom.page_type == PageType::Captcha {
    if let Some(captcha) = &dom.captcha {
        println!("CAPTCHA type: {:?}", captcha.captcha_type);
        // ReCaptcha, HCaptcha, Turnstile, CloudflareChallenge, ImageGrid, TextCaptcha
        if let Some(sitekey) = &captcha.sitekey {
            println!("Site key: {}", sitekey);
        }
    }
}
}

Or use the session convenience methods:

#![allow(unused)]
fn main() {
if session.is_captcha() {
    println!("CAPTCHA: {:?}", session.captcha_info());
}
}

5. Log in to a site

browsy provides two ways to interact with login forms: manual (using element IDs) and automatic (using session.login).

Use the element IDs from SuggestedAction::Login to type credentials and submit:

use browsy_core::fetch::Session;
use browsy_core::output::SuggestedAction;

fn main() -> Result<(), Box<dyn std::error::Error>> {
    let mut session = Session::new()?;
    let dom = session.goto("https://github.com/login")?;

    // Find the login action
    for action in &dom.suggested_actions {
        if let SuggestedAction::Login { username_id, password_id, submit_id, .. } = action {
            session.type_text(*username_id, "user@example.com")?;
            session.type_text(*password_id, "my-password")?;
            let result = session.click(*submit_id)?;
            println!("After login: {:?}", result.page_type);
            break;
        }
    }

    Ok(())
}

session.login detects the login form from suggested_actions and fills it in one call:

#![allow(unused)]
fn main() {
let mut session = Session::new()?;
session.goto("https://github.com/login")?;

let result = session.login("user@example.com", "my-password")?;
println!("After login: {:?}", result.page_type);
}

This fails with FetchError::ActionError if no SuggestedAction::Login is detected on the current page.

2FA / verification codes

If the login redirects to a 2FA page, use enter_code:

#![allow(unused)]
fn main() {
if result.page_type == PageType::TwoFactorAuth {
    let final_page = session.enter_code("123456")?;
    println!("After 2FA: {:?}", final_page.page_type);
}
}

6. Search the web

browsy has built-in web search via DuckDuckGo and Google. No API keys required.

Get search results

use browsy_core::fetch::{Session, SearchEngine};

fn main() -> Result<(), Box<dyn std::error::Error>> {
    let mut session = Session::new()?;

    // DuckDuckGo (default)
    let results = session.search("rust web frameworks")?;
    for r in &results {
        println!("{}: {}", r.title, r.url);
        println!("  {}", r.snippet);
    }

    // Google
    let results = session.search_with("rust web frameworks", SearchEngine::Google)?;

    Ok(())
}

Search and read pages

search_and_read fetches the top N results and returns each page's SpatialDom:

#![allow(unused)]
fn main() {
let pages = session.search_and_read("browsy browser engine", 3)?;

for page in &pages {
    println!("--- {} ---", page.result.title);
    if let Some(dom) = &page.dom {
        println!("  Page type: {:?}", dom.page_type);
        println!("  Elements: {}", dom.els.len());
    } else {
        println!("  (fetch failed)");
    }
}
}

Next steps

Spatial DOM -- understand the output format in detail
Page Intelligence -- all 13 action recipes explained
Session API -- full reference for navigation, forms, and interaction
MCP Server -- use browsy from Claude Code

Installation

browsy is available as a Rust library, a CLI binary, a Python package, and an MCP server.

Rust library

Add browsy-core to your project:

cargo add browsy-core

This enables the fetch feature by default, which includes HTTP fetching, session management, and web search via reqwest.

Without networking

To use browsy as a pure HTML-to-Spatial-DOM parser with no network dependencies:

cargo add browsy-core --no-default-features

This disables the fetch feature. You get browsy_core::parse(html, width, height) and nothing else -- no Session, no HTTP, no reqwest. Useful for embedding browsy in contexts where you handle fetching yourself.

#![allow(unused)]
fn main() {
// Available without fetch feature
let dom = browsy_core::parse(html, 1920.0, 1080.0);

// Requires fetch feature (enabled by default)
use browsy_core::fetch::Session;
let mut session = Session::new()?;
}

Feature flags

Feature	Default	Description
`fetch`	Yes	HTTP fetching, `Session` API, web search, cookie persistence

CLI

Install the browsy CLI binary:

cargo install browsy

Usage:

# Fetch and parse a live page
browsy fetch https://example.com

# Parse local HTML from stdin
cat page.html | browsy parse

# JSON output
browsy fetch https://example.com --format json

REST API server

The CLI includes a built-in REST API + A2A server:

browsy serve
browsy serve --port 8080
browsy serve --allow-private-network

See REST API for endpoint documentation and A2A Protocol for agent-to-agent integration.

Python

browsy has PyO3 bindings published as the browsy-ai package:

pip install browsy-ai

import browsy

# Parse HTML directly
dom = browsy.parse(html, 1920.0, 1080.0)
print(dom.page_type)
print(dom.suggested_actions)

# Session-based browsing
session = browsy.Session()
dom = session.goto("https://example.com")
session.type_text(19, "hello")
session.click(34)

The Python bindings expose the same Session API as the Rust library, including login, search, enter_code, and all form interaction methods.

Framework integrations

Install browsy with framework-specific extras:

pip install browsy-ai[langchain]   # LangChain tools
pip install browsy-ai[crewai]      # CrewAI tool
pip install browsy-ai[openai]      # OpenAI function calling
pip install browsy-ai[autogen]     # AutoGen integration
pip install browsy-ai[smolagents]  # HuggingFace smolagents
pip install browsy-ai[all]         # All integrations

See Framework Integrations for usage guides.

Requirements

Python 3.9+
No native dependencies (the compiled extension includes everything)

JavaScript / TypeScript

The browsy-ai npm package provides a TypeScript SDK with integrations for LangChain.js, OpenAI, and Vercel AI SDK:

npm install browsy-ai

import { BrowsyClient, BrowsyContext } from "browsy-ai";       // Core SDK
import { getTools } from "browsy-ai/langchain";                  // LangChain.js
import { getToolDefinitions, handleToolCall } from "browsy-ai/openai";  // OpenAI
import { browsyTools } from "browsy-ai/vercel-ai";               // Vercel AI SDK

Framework dependencies are optional peer dependencies -- install only what you need:

npm install browsy-ai @langchain/core    # LangChain.js
npm install browsy-ai openai             # OpenAI
npm install browsy-ai ai                 # Vercel AI SDK

Requires Node.js 22+ and the browsy CLI (cargo install browsy) for the REST server.

See JavaScript / TypeScript for the full SDK guide.

MCP Server

browsy ships an MCP server that exposes the full Session API as tools. This works with Claude Code, Claude Desktop, and any MCP-compatible client.

Install

cargo install browsy-mcp

Configure for Claude Code

Add to your Claude Code MCP configuration (.claude/mcp.json or equivalent):

{
  "mcpServers": {
    "browsy": {
      "command": "browsy-mcp",
      "args": []
    }
  }
}

Configure for Claude Desktop

Add to your Claude Desktop config (claude_desktop_config.json):

{
  "mcpServers": {
    "browsy": {
      "command": "browsy-mcp",
      "args": []
    }
  }
}

Available MCP tools

The MCP server exposes these tools:

Tool	Description
`browse`	Navigate to a URL, returns Spatial DOM
`click`	Click an element by ID
`type_text`	Type into an input field by ID
`check` / `uncheck`	Toggle checkboxes and radio buttons
`select`	Select a dropdown option
`get_page`	Get the current page DOM with form state
`back`	Go back in navigation history
`search`	Web search via DuckDuckGo or Google
`find`	Find elements by text or ARIA role
`login`	Fill and submit a login form
`enter_code`	Fill and submit a verification code
`tables`	Extract structured table data
`page_info`	Get page metadata, type, and suggested actions

Building from source

git clone https://github.com/GhostPeony/browsy
cd browsy

# Build everything (library + CLI + MCP server)
cargo build --release

# Run tests
cargo test -p browsy-core

# Install CLI and MCP server from local source
cargo install --path crates/cli
cargo install --path crates/mcp

Spatial DOM

The Spatial DOM is the primary output of browsy. It converts an HTML document into a flat list of SpatialElement structs -- each representing an interactive element, text block, or structural landmark -- with bounding boxes, ARIA roles, and form state. No tree traversal, no pixel rendering.

#![allow(unused)]
fn main() {
use browsy_core::parse;

let dom = parse(html, 1920.0, 1080.0);
// dom.els: Vec<SpatialElement> -- flat, ordered, ready for agent consumption
}

SpatialElement fields

Every element in the Spatial DOM is a SpatialElement with these fields:

Field	Type	Description
`id`	`u32`	Stable numeric ID, assigned sequentially. Used for all interactions (`click`, `type_text`, etc.)
`tag`	`String`	HTML tag name (`a`, `button`, `input`, `p`, `h1`, etc.)
`role`	`Option<String>`	ARIA role -- explicit from `role` attr or implicit from tag. `link`, `button`, `textbox`, `heading`, `navigation`, etc.
`text`	`Option<String>`	Visible text content. For images, this is the `alt` text
`href`	`Option<String>`	Link destination (resolved to absolute URL when parsed via Session)
`b`	`[i32; 4]`	Bounding box: `[x, y, width, height]` in pixels relative to the document
`hidden`	`Option<bool>`	`Some(true)` if the element is hidden. Absent (`None`) when visible
`name`	`Option<String>`	HTML `name` attribute (form fields only: `input`, `textarea`, `select`)
`val`	`Option<String>`	Current value from the HTML `value` attribute
`ph`	`Option<String>`	Placeholder text
`label`	`Option<String>`	Associated `<label>` text (resolved via `<label for="id">`)
`input_type`	`Option<String>`	Input type (`text`, `password`, `email`, `checkbox`, `radio`, `search`, etc.). Serializes as `type` in JSON
`checked`	`Option<bool>`	Whether a checkbox/radio is checked
`disabled`	`Option<bool>`	Whether the element is disabled
`expanded`	`Option<bool>`	ARIA expanded state (dropdowns, accordions)
`selected`	`Option<bool>`	ARIA selected state (tabs, options)
`required`	`Option<bool>`	Whether the field is required
`alert_type`	`Option<String>`	Alert classification: `"alert"`, `"status"`, `"error"`, `"success"`, `"warning"`

All Option fields use skip_serializing_if -- absent fields are omitted from JSON output to keep payloads compact.

Hidden content exposure

Elements with display: none, visibility: hidden, aria-hidden="true", or the hidden attribute are not discarded. They appear in the Spatial DOM with hidden: Some(true).

This is a deliberate design decision. Without JavaScript execution, browsy cannot toggle visibility. By including hidden elements, agents can see:

Dropdown menus -- <ul> inside a nav that only appears on hover
Modal dialogs -- login forms, cookie consent, popups
Accordion panels -- FAQ content behind collapsed sections
Tab content -- inactive tab panels
Off-canvas navigation -- mobile menus hidden at desktop widths

#![allow(unused)]
fn main() {
// All elements including hidden
let all = &dom.els;

// Only visible elements
let visible = dom.visible();

// Hidden elements are distinguishable
for el in &dom.els {
    if el.hidden == Some(true) {
        // This element is hidden in the rendered page
    }
}
}

Hidden elements always have a zero-size exemption -- they are preserved regardless of bounding box dimensions. Visible elements with zero width and height are skipped as layout artifacts.

Deduplication

HTML commonly wraps interactive elements in container tags that carry no additional meaning:

<li><a href="/about">About</a></li>
<td><span><button>Submit</button></span></td>

browsy collapses these wrappers. When a wrapper tag (li, td, th, span, p, dt, dd) contains only interactive children and no meaningful text of its own, the wrapper is skipped. Only the inner interactive element is emitted.

This produces a 34-42% element reduction on real sites without losing any semantic content.

Landmark markers

HTML5 landmark elements (nav, header, footer, main, aside, section, form) and elements with explicit landmark ARIA roles (navigation, banner, contentinfo, complementary, region, main, form) emit as role-only structural markers.

A landmark element appears in the output with its role but no recursive text. Its children carry the actual content as separate elements. This prevents the entire navigation bar's text from being duplicated into a single massive nav element.

{"id": 1, "tag": "nav", "role": "navigation", "b": [0, 0, 1920, 60]},
{"id": 2, "tag": "a", "role": "link", "text": "Home", "href": "/", "b": [20, 10, 80, 40]},
{"id": 3, "tag": "a", "role": "link", "text": "About", "href": "/about", "b": [120, 10, 80, 40]}

Element lookup

The SpatialDom maintains an internal HashMap<u32, usize> index for O(1) element lookup by ID:

#![allow(unused)]
fn main() {
// O(1) -- does not scan the element list
let element = dom.get(42);
}

The index is built automatically during parsing and can be rebuilt after mutation:

#![allow(unused)]
fn main() {
dom.els.push(new_element);
dom.rebuild_index();
}

Filtering

#![allow(unused)]
fn main() {
// Only visible (non-hidden) elements
let visible: Vec<&SpatialElement> = dom.visible();

// Elements whose top edge is within the viewport
let above: Vec<&SpatialElement> = dom.above_fold();

// Elements whose top edge is below the viewport
let below: Vec<&SpatialElement> = dom.below_fold();

// New SpatialDom containing only above-fold elements (for token-limited contexts)
let trimmed: SpatialDom = dom.filter_above_fold();
}

The fold line is determined by dom.vp[1] (viewport height, default 1080px).

Tables

dom.tables() extracts structured table data by grouping th and td elements by their Y coordinates:

#![allow(unused)]
fn main() {
let tables: Vec<TableData> = dom.tables();
for table in &tables {
    println!("Headers: {:?}", table.headers);   // Vec<String>
    for row in &table.rows {
        println!("Row: {:?}", row);              // Vec<String>
    }
}
}

Elements within 5px of the same Y coordinate are grouped into the same row. Cells are sorted left-to-right by X position within each row.

Alerts

dom.alerts() returns elements with a detected alert_type:

#![allow(unused)]
fn main() {
let alerts: Vec<&SpatialElement> = dom.alerts();
for alert in &alerts {
    println!("{}: {}", alert.alert_type.as_deref().unwrap(), alert.text.as_deref().unwrap_or(""));
    // "error: Invalid password"
    // "success: Account created"
}
}

Alert types are detected from ARIA role attributes (alert, status) and CSS class patterns (alert-error, msg-danger, flash-success, etc.). Only compound class patterns are matched -- a bare error class is too ambiguous.

Verification codes

dom.find_codes() extracts 4-8 digit verification codes from page text:

#![allow(unused)]
fn main() {
let codes: Vec<String> = dom.find_codes();
// ["847291"] -- extracted from "Your verification code is 847291"
}

Codes are found near keyword context (verification code, security code, your code, otp, passcode, one-time). Year-like 4-digit numbers (1900-2099) are filtered out. Proximity matching also checks nearby elements within 100px Y distance for keyword context.

Text fallback chain

For interactive elements (links, buttons) that contain no direct text -- only images or icons -- browsy walks a fallback chain to find meaningful text:

aria-label attribute
title attribute
Child <img> alt text
Child <svg> <title> text

This ensures that icon-only buttons and image links always have text for the agent to read.

Page Intelligence

Page intelligence is browsy's deterministic classification layer. Given a Spatial DOM, browsy computes a page type and a set of suggested actions (action recipes) -- each with concrete element IDs that agents can use directly. No LLM inference, no probabilistic guessing.

#![allow(unused)]
fn main() {
let dom = session.goto("https://github.com/login")?;

assert_eq!(dom.page_type, PageType::Login);
// dom.suggested_actions[0] == Login { username_id: 19, password_id: 21, submit_id: 34 }
}

Page types

browsy classifies pages into one of 14 types, detected via priority-ordered heuristics applied to the Spatial DOM. The first matching rule wins.

Page Type	Detection Signal
`Error`	Alert elements with `alert_type == "error"`, or title contains `404`, `500`, `403`, `not found`, `error`
`Captcha`	CAPTCHA service detected in HTML (reCAPTCHA, hCaptcha, Turnstile), or title/heading contains `captcha`, `verify you're human`, `just a moment`
`Login`	Visible password input field present
`TwoFactorAuth`	Title/heading contains verification keywords (`verification`, `2fa`, `otp`, `one-time`, `passcode`) AND a visible text/number/tel input exists
`OAuthConsent`	Title/heading contains `authorize`, `allow access`, `grant permission`, `oauth`, `consent`
`Inbox`	Title contains `inbox`, `mail`, `messages` AND page has 10+ links
`EmailBody`	3+ email markers present in element text (`from:`, `to:`, `subject:`, `date:`)
`Dashboard`	Title/heading contains `dashboard`, `welcome back`, `overview` AND both `nav` and `main` landmarks exist
`Article`	3+ headings AND 2+ long paragraphs (>100 chars). When link count >= 20, requires 10+ long paragraphs. Heading-heavy pages (15+ headings with low paragraph ratio) are excluded
`SearchResults`	Search input present AND 8+ links AND (title/heading contains `search results`/`results for` OR URL contains search query params like `?q=`)
`List`	10+ visible links
`Search`	Visible search input (type `search`, role `searchbox`, name `q`, or placeholder/name containing `search`)
`Form`	2+ visible data-entry inputs (excludes checkbox, radio, hidden, submit, button, image)
`Other`	No other type matched

Detection order matters. A page with a password field and a search bar is classified as Login, not Search, because Login is checked first.

Action recipes

Alongside page type, browsy detects suggested actions -- structured recipes telling the agent exactly what to do and which element IDs to use. Each action maps directly to Session API calls.

Detected when a visible password input exists near a text/email input.

{
  "action": "Login",
  "username_id": 19,
  "password_id": 21,
  "submit_id": 34,
  "remember_me_id": 36
}

Agent usage: session.type_text(19, "user@example.com"), session.type_text(21, "pass"), session.click(34). Or simply: session.login("user@example.com", "pass").

Register

Detected when a password field is accompanied by a confirm-password field or registration keywords in the title/heading. Login takes priority when both login and registration sections are present on the same page.

{
  "action": "Register",
  "email_id": 12,
  "username_id": 14,
  "password_id": 16,
  "confirm_password_id": 18,
  "name_id": 10,
  "submit_id": 22
}

EnterCode

Detected on verification/2FA pages with code-related keywords in the title or heading.

{
  "action": "EnterCode",
  "input_id": 8,
  "submit_id": 12,
  "code_length": 6
}

code_length is set when the page uses separate narrow digit inputs (4-8 inputs each <60px wide).

Search

Detected when an input has type search, role searchbox, name q, or a name/placeholder containing search.

{
  "action": "Search",
  "input_id": 5,
  "submit_id": 7
}

Detected on OAuth/authorization pages with approve/deny buttons.

{
  "action": "Consent",
  "approve_ids": [15, 18],
  "deny_ids": [20]
}

CookieConsent

Detected when a substantial text block mentions cookies/GDPR and accept/reject buttons are present.

{
  "action": "CookieConsent",
  "accept_id": 42,
  "reject_id": 44
}

Contact

Detected on pages with contact-related keywords and a visible textarea for the message body.

{
  "action": "Contact",
  "name_id": 5,
  "email_id": 7,
  "message_id": 9,
  "submit_id": 11
}

FillForm

Generic form detection. Emitted when visible form fields exist and no more specific action (Login, Register, Contact) matched. Includes labeled field metadata.

{
  "action": "FillForm",
  "fields": [
    {"id": 10, "label": "First Name", "name": "first_name", "type": "text"},
    {"id": 12, "label": "Email", "name": "email", "type": "email"}
  ],
  "submit_id": 20
}

SelectFromList

Detected when 5+ links are arranged in distinct vertical rows (list-like layout).

{
  "action": "SelectFromList",
  "items": [3, 8, 13, 18, 23]
}

Paginate

Detected when next/previous navigation links are found (text matching next, previous, >, >>, etc.).

{
  "action": "Paginate",
  "next_id": 95,
  "prev_id": 91
}

Download

Detected when links point to downloadable file types.

{
  "action": "Download",
  "items": [{"id": 30, "text": "Report Q4 2024", "href": "/files/report.pdf"}]
}

CaptchaChallenge

Detected when a CAPTCHA service is found in the HTML structure.

{
  "action": "CaptchaChallenge",
  "captcha_type": "ReCaptcha",
  "sitekey": "6LcXxxAAAABBBCCC...",
  "submit_id": 50
}

CAPTCHA detection

browsy identifies CAPTCHA services by scanning the HTML structure for known markers:

Type	Detection
`ReCaptcha`	`g-recaptcha` class, `data-sitekey` attr, reCAPTCHA script URLs
`HCaptcha`	`h-captcha` class, hCaptcha script URLs
`Turnstile`	`cf-turnstile` class, Turnstile script URLs
`CloudflareChallenge`	Cloudflare "Just a moment..." challenge page pattern
`ImageGrid`	Custom image-grid CAPTCHA (select matching images)
`TextCaptcha`	Text-based CAPTCHA (type characters from an image)
`Unknown`	CAPTCHA detected but service not identified

CAPTCHA info is available at dom.captcha:

#![allow(unused)]
fn main() {
if let Some(captcha) = &dom.captcha {
    println!("Type: {:?}", captcha.captcha_type);     // CaptchaType::ReCaptcha
    println!("Sitekey: {:?}", captcha.sitekey);        // Some("6Lc...")
}
}

How detection works

All detection is deterministic, heuristic-based, priority-ordered. No machine learning models, no token costs. The same HTML always produces the same page type and action set.

The detection pipeline:

Parse HTML into the Spatial DOM (element list with bounding boxes and roles)
Scan for CAPTCHA markers in the layout tree
Run detect_page_type -- walks through page type checks in priority order, returns the first match
Run detect_suggested_actions -- runs all action detectors independently, collecting all that match

Multiple actions can coexist. A login page might have both Login and CookieConsent actions. A search results page might have Search, SelectFromList, and Paginate.

Example flow

#![allow(unused)]
fn main() {
use browsy_core::fetch::Session;
use browsy_core::output::PageType;

let mut session = Session::new()?;
let dom = session.goto("https://example.com/login")?;

match dom.page_type {
    PageType::Login => {
        // Use the Login action recipe directly
        session.login("user@example.com", "hunter2")?;
    }
    PageType::TwoFactorAuth => {
        session.enter_code("847291")?;
    }
    PageType::Captcha => {
        let info = session.captcha_info();
        // Report to the caller -- browsy cannot solve CAPTCHAs
    }
    _ => {
        // Read the page content, follow links, etc.
    }
}
}

Session API

The Session API provides stateful web browsing with cookie persistence, form interaction, navigation history, and built-in web search. It is the primary interface for agents interacting with the web through browsy.

#![allow(unused)]
fn main() {
use browsy_core::fetch::Session;

let mut session = Session::new()?;
let dom = session.goto("https://example.com")?;
}

Requires the fetch feature (enabled by default).

Creating a session

`Session::new()`

Creates a session with default configuration (1920x1080 viewport, 30s timeout, CSS fetching enabled).

#![allow(unused)]
fn main() {
let mut session = Session::new()?;
}

`Session::with_config(config)`

Creates a session with custom configuration.

#![allow(unused)]
fn main() {
use browsy_core::fetch::{Session, SessionConfig};

let config = SessionConfig {
    viewport_width: 1366.0,
    viewport_height: 768.0,
    timeout_secs: 15,
    fetch_css: false,  // Skip external CSS for speed
    ..Default::default()
};
let mut session = Session::with_config(config)?;
}

SessionConfig fields

Field	Type	Default	Description
`viewport_width`	`f32`	`1920.0`	Viewport width in pixels. Affects layout computation and fold detection
`viewport_height`	`f32`	`1080.0`	Viewport height in pixels. Defines the fold line
`user_agent`	`String`	Chrome-like UA	HTTP User-Agent header
`timeout_secs`	`u64`	`30`	HTTP request timeout
`fetch_css`	`bool`	`true`	Whether to fetch external CSS stylesheets. Disabling speeds up parsing but reduces layout accuracy
`blocked_patterns`	`Vec<String>`	Analytics/tracking URLs	URL patterns to block (analytics, ads, tracking pixels)
`max_response_bytes`	`usize`	`5MB`	Maximum HTML response size
`max_css_bytes_total`	`usize`	`2MB`	Maximum total CSS bytes across all stylesheets
`max_css_bytes_per_file`	`usize`	`512KB`	Maximum size per individual CSS file
`max_redirects`	`usize`	`10`	Maximum HTTP redirect chain length
`allow_private_network`	`bool`	`false`	Whether to allow requests to private/internal IPs
`allow_non_http`	`bool`	`false`	Whether to allow non-HTTP(S) schemes

`goto(url) -> Result<SpatialDom, FetchError>`

Navigate to a URL. Fetches the page, parses HTML, optionally fetches external CSS, computes layout, and returns the Spatial DOM. Cookies are persisted automatically.

#![allow(unused)]
fn main() {
let dom = session.goto("https://news.ycombinator.com")?;
println!("Title: {}", dom.title);
println!("Elements: {}", dom.els.len());
}

`back() -> Result<SpatialDom, FetchError>`

Navigate to the previous page in history. Returns an error if there is no history.

#![allow(unused)]
fn main() {
session.goto("https://example.com")?;
session.goto("https://example.com/about")?;
let dom = session.back()?;  // Back to example.com
}

`url() -> Option<&str>`

Returns the current page URL.

#![allow(unused)]
fn main() {
if let Some(url) = session.url() {
    println!("Currently at: {}", url);
}
}

Interaction

`click(id) -> Result<SpatialDom, FetchError>`

Click an element by ID. Behavior depends on the element type:

Links (<a>) -- navigates to the href URL. Skips javascript:, mailto:, tel:, and anchor-only (#) links.
Buttons / submit inputs -- submits the parent form with all current form values.
Elements with JS behaviors -- simulated. onclick handlers with window.location trigger navigation. Toggle/show/hide behaviors modify the DOM.

#![allow(unused)]
fn main() {
let dom = session.goto("https://news.ycombinator.com")?;
// Click the first link
let dom = session.click(3)?;
}

`type_text(id, text) -> Result<(), FetchError>`

Type text into an input or textarea. The value is stored in the session and overlaid onto the DOM. When a form is submitted via click, these values are included in the form data.

#![allow(unused)]
fn main() {
session.type_text(19, "user@example.com")?;
session.type_text(21, "hunter2")?;
}

Returns an error if the element is not an input or textarea.

`check(id) -> Result<(), FetchError>`

Check a checkbox or radio button.

#![allow(unused)]
fn main() {
session.check(36)?;  // Check "Remember me"
}

`uncheck(id) -> Result<(), FetchError>`

Uncheck a checkbox or radio button.

#![allow(unused)]
fn main() {
session.uncheck(36)?;
}

`toggle(id) -> Result<(), FetchError>`

Toggle a checkbox or radio button based on its current effective state (considering session overrides and HTML defaults).

#![allow(unused)]
fn main() {
session.toggle(36)?;  // If checked, unchecks. If unchecked, checks.
}

`select(id, value) -> Result<(), FetchError>`

Select an option in a <select> element by value.

#![allow(unused)]
fn main() {
session.select(15, "california")?;
}

Reading page state

`dom() -> Option<SpatialDom>`

Returns the current Spatial DOM with form state overlaid. Typed values, checked/unchecked states from type_text, check, and uncheck are reflected in the returned DOM.

#![allow(unused)]
fn main() {
session.type_text(19, "hello")?;
let dom = session.dom().unwrap();
let el = dom.get(19).unwrap();
assert_eq!(el.val.as_deref(), Some("hello"));
}

`dom_ref() -> Option<&SpatialDom>`

Returns a reference to the raw Spatial DOM without form state overlay. Reflects the page as parsed, ignoring any type_text/check/uncheck calls.

#![allow(unused)]
fn main() {
let raw = session.dom_ref().unwrap();
}

`delta() -> Option<DeltaDom>`

Returns the diff between the current and previous page. Only available after at least two navigations.

#![allow(unused)]
fn main() {
session.goto("https://example.com")?;
session.goto("https://example.com/about")?;
if let Some(delta) = session.delta() {
    println!("Added/changed: {}", delta.changed.len());
    println!("Removed IDs: {:?}", delta.removed);
}
}

`element(id) -> Option<&SpatialElement>`

O(1) element lookup by ID.

#![allow(unused)]
fn main() {
if let Some(el) = session.element(42) {
    println!("{}: {}", el.tag, el.text.as_deref().unwrap_or(""));
}
}

Finding elements

`find_by_text(text) -> Vec<&SpatialElement>`

Exact substring match on element text (case-sensitive).

#![allow(unused)]
fn main() {
let results = session.find_by_text("Sign in");
}

`find_by_text_fuzzy(text) -> Vec<&SpatialElement>`

Case-insensitive substring match on element text.

#![allow(unused)]
fn main() {
let results = session.find_by_text_fuzzy("sign in");
// Matches "Sign In", "SIGN IN", "Please sign in", etc.
}

`find_by_role(role) -> Vec<&SpatialElement>`

Find all elements with a specific ARIA role.

#![allow(unused)]
fn main() {
let headings = session.find_by_role("heading");
let links = session.find_by_role("link");
let buttons = session.find_by_role("button");
}

`find_input_by_purpose(purpose) -> Option<&SpatialElement>`

Find an input element by its semantic purpose. Matches on input type, name, label, and placeholder.

#![allow(unused)]
fn main() {
use browsy_core::fetch::InputPurpose;

let password = session.find_input_by_purpose(InputPurpose::Password);
let email = session.find_input_by_purpose(InputPurpose::Email);
let username = session.find_input_by_purpose(InputPurpose::Username);
let code = session.find_input_by_purpose(InputPurpose::VerificationCode);
let search = session.find_input_by_purpose(InputPurpose::Search);
let phone = session.find_input_by_purpose(InputPurpose::Phone);
}

Purpose	Matching logic
`Password`	`input[type="password"]`
`Email`	`input[type="email"]` or name/label contains `email`
`Username`	Text/email input with name/label containing `user` or `login`
`VerificationCode`	Text/number/tel input with name/label/placeholder containing `code`, `otp`, or `verify`
`Search`	`input[type="search"]`, role `searchbox`, or name containing `search`
`Phone`	`input[type="tel"]` or name/label containing `phone`

`find_nearest_button(input_id) -> Option<&SpatialElement>`

Find the nearest submit button to a given input element. Prefers buttons below the input, scored by Manhattan distance with Y weighted 2x.

#![allow(unused)]
fn main() {
if let Some(btn) = session.find_nearest_button(19) {
    println!("Submit button: {} (id: {})", btn.text.as_deref().unwrap_or(""), btn.id);
}
}

Compound actions

These methods combine multiple interactions into a single call, using the page intelligence action recipes.

`login(username, password) -> Result<SpatialDom, FetchError>`

Detects the login form from suggested_actions, fills in credentials, and submits. Returns the resulting page.

#![allow(unused)]
fn main() {
let dom = session.goto("https://github.com/login")?;
let result = session.login("user@example.com", "hunter2")?;
}

Returns an error if no Login action recipe was detected on the current page.

`enter_code(code) -> Result<SpatialDom, FetchError>`

Fills in a verification code and submits the form, using the EnterCode action recipe.

#![allow(unused)]
fn main() {
let result = session.enter_code("847291")?;
}

`find_verification_code() -> Option<String>`

Extracts a verification code from the current page text (4-8 digit sequences near code-related keywords).

#![allow(unused)]
fn main() {
// On a page that says "Your verification code is 847291"
if let Some(code) = session.find_verification_code() {
    session.enter_code(&code)?;
}
}

CAPTCHA detection

`is_captcha() -> bool`

Returns true if the current page is classified as a CAPTCHA challenge.

#![allow(unused)]
fn main() {
if session.is_captcha() {
    println!("CAPTCHA detected -- cannot proceed automatically");
}
}

`captcha_info() -> Option<&CaptchaInfo>`

Returns CAPTCHA details if detected: captcha_type (ReCaptcha, HCaptcha, Turnstile, CloudflareChallenge, ImageGrid, TextCaptcha, Unknown) and optional sitekey.

#![allow(unused)]
fn main() {
if let Some(info) = session.captcha_info() {
    match info.captcha_type {
        CaptchaType::ReCaptcha => {
            println!("reCAPTCHA sitekey: {:?}", info.sitekey);
        }
        CaptchaType::CloudflareChallenge => {
            println!("Cloudflare challenge -- wait and retry");
        }
        _ => {}
    }
}
}

Web search

`search(query) -> Result<Vec<SearchResult>, FetchError>`

Search the web using DuckDuckGo. Returns structured results with title, URL, and snippet.

#![allow(unused)]
fn main() {
let results = session.search("rust programming language")?;
for r in &results {
    println!("{}: {} -- {}", r.title, r.url, r.snippet);
}
}

`search_with(query, engine) -> Result<Vec<SearchResult>, FetchError>`

Search with a specific engine.

#![allow(unused)]
fn main() {
use browsy_core::fetch::SearchEngine;

let results = session.search_with("browsy", SearchEngine::Google)?;
}

Available engines: SearchEngine::DuckDuckGo (default, most reliable) and SearchEngine::Google (may return CAPTCHAs for automated requests).

`search_and_read(query, n) -> Result<Vec<SearchPage>, FetchError>`

Search and fetch the top N results, returning each page's Spatial DOM alongside the search result metadata.

#![allow(unused)]
fn main() {
let pages = session.search_and_read("rust web scraping", 3)?;
for page in &pages {
    println!("{}:", page.result.title);
    if let Some(ref dom) = page.dom {
        println!("  {} elements, page_type: {:?}", dom.els.len(), dom.page_type);
    }
}
}

Behaviors

`behaviors() -> Vec<JsBehavior>`

Detects JavaScript behaviors from HTML attributes (onclick, data-toggle, data-bs-toggle, etc.). Returns trigger element IDs and inferred actions.

#![allow(unused)]
fn main() {
let behaviors = session.behaviors();
for b in &behaviors {
    println!("Element {} triggers {:?}", b.trigger_id, b.action);
}
}

Error handling

All fallible methods return Result<_, FetchError>. Error variants:

Variant	Cause
`FetchError::InvalidUrl(msg)`	URL could not be parsed
`FetchError::BlockedUrl(url)`	URL matched a blocked pattern or is a private network address
`FetchError::Network(msg)`	HTTP request failed (timeout, DNS, connection refused)
`FetchError::HttpError(status)`	Non-2xx HTTP status code
`FetchError::ResponseTooLarge(size, max)`	Response exceeded `max_response_bytes`
`FetchError::ActionError(msg)`	Invalid interaction (element not found, wrong element type, no page loaded)

Output Formats

browsy supports three output formats for the Spatial DOM: JSON (full fidelity), compact (minimal tokens), and delta (changes only). The choice depends on your token budget and whether you need machine-readable structure or LLM-friendly brevity.

JSON format

The full SpatialDom serialized as JSON. Every field, every element, complete fidelity.

#![allow(unused)]
fn main() {
let json = serde_json::to_string_pretty(&dom)?;
}

{
  "url": "https://example.com",
  "title": "Example",
  "vp": [1920.0, 1080.0],
  "scroll": [0.0, 0.0],
  "page_type": "Login",
  "suggested_actions": [
    {
      "action": "Login",
      "username_id": 19,
      "password_id": 21,
      "submit_id": 34
    }
  ],
  "els": [
    {
      "id": 1,
      "tag": "nav",
      "role": "navigation",
      "b": [0, 0, 1920, 60]
    },
    {
      "id": 19,
      "tag": "input",
      "role": "textbox",
      "ph": "Username or email address",
      "type": "text",
      "name": "login",
      "label": "Username or email address",
      "b": [480, 320, 960, 40]
    },
    {
      "id": 21,
      "tag": "input",
      "role": "textbox",
      "ph": "Password",
      "type": "password",
      "name": "password",
      "label": "Password",
      "b": [480, 380, 960, 40]
    },
    {
      "id": 34,
      "tag": "button",
      "role": "button",
      "text": "Sign in",
      "b": [480, 440, 960, 44]
    }
  ]
}

Optional fields (text, href, ph, val, name, label, input_type, hidden, checked, disabled, expanded, selected, required, alert_type) are omitted when absent, keeping the JSON compact. The page_type field is omitted when it is Other. The captcha field is omitted when no CAPTCHA is detected.

Use JSON when you need programmatic access to the full DOM structure, or when feeding the output to code rather than an LLM.

Compact format

A one-line-per-element text format designed for minimal token usage. This is the default output format in the MCP server and CLI.

#![allow(unused)]
fn main() {
use browsy_core::output::to_compact_string;

let compact = to_compact_string(&dom);
}

Each element is rendered as a bracketed line:

[id:tag "text" ->href]

Full example output:

[1:nav]
[5:h1 "Welcome"]
[19:input [login] "Username or email address" wide]
[21:input:password [password] "Password" wide]
[!25:a "Forgot password?" ->/reset]
[34:button "Sign in" wide]
[40:a "Create an account" ->/signup @bot]

Compact format rules

Basic structure: [id:tag ...] where id is the numeric element ID and tag is the HTML tag.

Input types: Non-text input types are appended after the tag: [21:input:password ...], [30:input:checkbox ...], [35:input:email ...]. Plain text inputs omit the type suffix.

Text content: Quoted strings show the element's text or placeholder: "Sign in", "Enter your email".

Links: Destinations shown with ->: [12:a "About" ->/about].

Form field names: Shown in square brackets: [login], [password], [email].

Checked state: [v] indicates a checked checkbox or radio button.

Required state: [*] indicates a required field.

Current value: [=value] shows the current value of a form field.

Hidden elements: Prefixed with ! to distinguish from visible elements: [!25:a "Forgot password?"].

Size hints: Form elements (input, button, textarea, select) include a width classification relative to viewport:

Hint	Meaning
`narrow`	Width < 15% of viewport
`wide`	Width > 50% of viewport
`full`	Width > 90% of viewport

No hint is shown for elements between 15-50% of viewport width.

Position disambiguation: When multiple elements share the same (tag, text) tuple, a position tag is appended to disambiguate: @top-L, @top, @top-R, @mid-L, @mid, @mid-R, @bot-L, @bot, @bot-R, or @below (below the fold). Position tags are only added when needed -- unique elements have no position suffix.

The viewport is divided into a 3x3 grid for classification:

+--------+--------+--------+
| top-L  |  top   | top-R  |
+--------+--------+--------+
| mid-L  |  mid   | mid-R  |
+--------+--------+--------+
| bot-L  |  bot   | bot-R  |
+--------+--------+--------+

Compact format header

When served through the MCP server or CLI, compact output includes a metadata header:

title: GitHub Login
url: https://github.com/login
els: 47
---
[1:nav]
[5:h1 "Sign in to GitHub"]
...

Delta format

After the first page load, subsequent navigations can use delta output -- only the elements that changed. This dramatically reduces token usage for multi-step workflows.

#![allow(unused)]
fn main() {
use browsy_core::output::{diff, delta_to_compact_string};

let delta = diff(&old_dom, &new_dom);
let compact_delta = delta_to_compact_string(&delta);
}

The DeltaDom struct contains:

#![allow(unused)]
fn main() {
pub struct DeltaDom {
    pub changed: Vec<SpatialElement>,  // Added or modified elements
    pub removed: Vec<u32>,             // IDs of removed elements
    pub vp: [f32; 2],                  // Viewport for size hints
}
}

Compact delta format uses + for added/changed elements and - for removed IDs:

-[3,7,12,15]
[+19:input "Search" wide]
[+20:button "Go"]
[+21:h2 "Results"]
[+22:a "First result" ->https://example.com]

Matching between old and new elements is done by content similarity (tag + text + placeholder + href + input type + bounds), not by ID. IDs are assigned sequentially and may differ between page loads.

Using delta in the Session API

#![allow(unused)]
fn main() {
let mut session = Session::new()?;
session.goto("https://example.com")?;
session.goto("https://example.com/about")?;

if let Some(delta) = session.delta() {
    let output = delta_to_compact_string(&delta);
    println!("{}", output);
}
}

Token comparison

Compact format uses approximately 58 characters per element on average, compared to 96-157 characters for JSON and accessibility-tree-based competitors. On a typical page with 80 elements:

Format	Approximate tokens
Compact	~1,200
JSON	~2,500
Raw accessibility tree	~4,000+

Delta format reduces this further on subsequent pages -- a navigation that changes 15 elements and removes 10 produces roughly 200 tokens instead of re-sending the full 1,200.

Choosing a format

Scenario	Format
Programmatic consumption (code, not LLM)	JSON
LLM agent with normal context	Compact
LLM agent with tight token budget	Compact + `filter_above_fold()`
Multi-step browsing workflow	Compact for first page, delta for subsequent
Debugging / inspection	JSON

MCP Server (Claude Code)

browsy runs as a Model Context Protocol (MCP) server, exposing its browser engine as tools that Claude Code (or any MCP client) can call directly.

Starting the server

browsy mcp

This launches browsy as a stdio-based MCP server. It creates a single persistent Session with cookie jar, navigation history, and form state.

Claude Code configuration

Add browsy to your claude_desktop_config.json:

{
  "mcpServers": {
    "browsy": {
      "command": "browsy",
      "args": ["mcp"]
    }
  }
}

The server advertises itself as browsy-mcp and exposes 14 tools.

Available tools

browse

Navigate to a URL and return the page content.

Parameter	Type	Required	Description
`url`	string	yes	URL to navigate to
`format`	string	no	`"compact"` (default) or `"json"`
`scope`	string	no	`"all"` (default), `"visible"`, `"above_fold"`, or `"visible_above_fold"`

Returns the full Spatial DOM. In compact format, the output begins with a header block:

title: Example Domain
url: https://example.com
els: 12
---
[1:h1 "Example Domain"]
[2:p "This domain is for use in illustrative examples..."]
[3:a "More information..." ->https://www.iana.org/domains/example]

If a CAPTCHA is detected, a warning is prepended to the output:

CAPTCHA detected (ReCaptcha) -- this page requires human verification to proceed.

click

Click an element by its ID. Links navigate to new pages, buttons submit forms.

Parameter	Type	Required	Description
`id`	u32	yes	Element ID to click

Returns the resulting page DOM. Link clicks trigger navigation (fetching the href). Button clicks submit the enclosing form with all typed values and checked states. If a CAPTCHA is detected on the resulting page, a warning is included.

type_text

Type text into an input field or textarea by element ID.

Parameter	Type	Required	Description
`id`	u32	yes	Element ID of the text input
`text`	string	yes	Text to type into the input

This stores the value in session state. The value is included in form submissions and reflected in subsequent get_page calls. Only works on <input> and <textarea> elements.

check

Check a checkbox or radio button by element ID.

Parameter	Type	Required	Description
`id`	u32	yes	Element ID of the checkbox or radio button

uncheck

Uncheck a checkbox or radio button by element ID.

Parameter	Type	Required	Description
`id`	u32	yes	Element ID of the checkbox or radio button

select

Select an option in a dropdown/select element.

Parameter	Type	Required	Description
`id`	u32	yes	Element ID of the select element
`value`	string	yes	Value to select

get_page

Get the current page DOM with form state overlaid. Use after type_text, check, select, or uncheck to see the updated form values without re-fetching.

Parameter	Type	Required	Description
`format`	string	no	`"compact"` (default) or `"json"`
`scope`	string	no	`"all"` (default), `"visible"`, `"above_fold"`, or `"visible_above_fold"`

search

Search the web and return structured results with title, URL, and snippet.

Parameter	Type	Required	Description
`query`	string	yes	Search query
`engine`	string	no	`"duckduckgo"` (default) or `"google"`

Returns a JSON array of search results, each with title, url, and snippet fields.

back

Go back to the previous page in browsing history. No parameters. Returns the previous page's DOM.

Fill in a detected login form and submit it. Requires a page with a Login suggested action.

Parameter	Type	Required	Description
`username`	string	yes	Username or email
`password`	string	yes	Password

This is a compound action: it types the username into the detected username field, types the password into the password field, and clicks the submit button. Returns the resulting page DOM.

enter_code

Enter a verification or 2FA code into the detected code input field. Requires a page with an EnterCode suggested action.

Parameter	Type	Required	Description
`code`	string	yes	Verification or 2FA code

Types the code into the detected input and clicks submit. Returns the resulting page DOM.

find

Find elements on the current page by text content or ARIA role.

Parameter	Type	Required	Description
`text`	string	no	Find elements containing this text
`role`	string	no	Find elements with this ARIA role

At least one of text or role must be provided. Returns a JSON array of matching elements.

tables

Extract structured table data from the current page. No parameters. Returns a JSON array of tables, each with headers (string array) and rows (array of string arrays).

page_info

Get page metadata without the full element list. No parameters. Returns:

{
  "title": "Sign In - Example",
  "url": "https://example.com/login",
  "page_type": "Login",
  "suggested_actions": [
    {
      "action": "Login",
      "username_id": 5,
      "password_id": 8,
      "submit_id": 12
    }
  ],
  "alerts": [],
  "pagination": null
}

When a CAPTCHA is detected, the response includes a captcha field with captcha_type and optional sitekey.

Example conversation flow

A typical agent interaction with a login-protected site:

browse https://app.example.com -- page_type is Login, suggested_actions includes Login with field IDs.
login with username and password -- the agent calls login directly, which fills and submits the form.
The result page might be TwoFactorAuth with an EnterCode action.
enter_code with the 2FA code -- fills the code input and submits.
The result page is now Dashboard -- the agent can proceed with its task.

For pages without compound actions, the lower-level tools work:

browse the URL.
type_text to fill form fields by ID.
check or select for checkboxes and dropdowns.
get_page to verify the form state looks correct.
click the submit button to submit.

CAPTCHA warnings

When browse or click returns a page detected as Captcha, a warning line is prepended to the output:

CAPTCHA detected (HCaptcha) -- this page requires human verification to proceed.

The page_info tool also surfaces CAPTCHA details in a structured captcha field. browsy cannot solve CAPTCHAs -- it detects and classifies them so the agent can decide how to proceed (request human help, use a third-party solver, or try a different approach).

Output format

In compact mode (the default), elements are rendered as:

[id:tag "text"]

With additional annotations:

!id:tag -- hidden element (display:none, visibility:hidden, aria-hidden, or hidden attribute)
[name] -- HTML name attribute
[v] -- checked checkbox/radio
[*] -- required field
[=value] -- current value
->url -- href target
narrow / wide / full -- size hint for form elements
@top-L / @mid / @bot-R -- position hint (only shown to disambiguate duplicate elements)

REST API

browsy includes a built-in HTTP server that exposes the full Session API as REST endpoints. This is the primary integration point for non-Rust, non-Python, and non-MCP clients.

Starting the server

browsy serve --port 3847

The server listens on http://localhost:3847 by default. See CLI Usage for all flags.

Session management

The server manages multiple concurrent browsing sessions. Each session has its own cookie jar, navigation history, and form state.

Sessions are identified by the X-Browsy-Session header:

Scenario	Behavior
No `X-Browsy-Session` header	Server creates a new session and returns the token in the response header
Valid token in header	Existing session is reused
Invalid or expired token	Server creates a new session and returns the new token
Session idle > 30 minutes	Session expires and is cleaned up
Server at capacity (default: 100 sessions)	Returns `503 Service Unavailable`

Every response includes the X-Browsy-Session header. Clients should capture it from the first response and include it in all subsequent requests.

# First request -- capture the session token
TOKEN=$(curl -s -D- -o /dev/null http://localhost:3847/api/browse \
  -H "Content-Type: application/json" \
  -d '{"url": "https://example.com"}' | grep -i x-browsy-session | tr -d '\r' | cut -d' ' -f2)

# Subsequent requests -- reuse the session
curl http://localhost:3847/api/page-info -H "X-Browsy-Session: $TOKEN"

CORS

The server sends CORS headers on all responses:

Access-Control-Allow-Origin: *
Access-Control-Allow-Headers: Content-Type, X-Browsy-Session
Access-Control-Expose-Headers: X-Browsy-Session

This allows browser-based clients to call the API directly.

Endpoint reference

Method	Path	Description
`POST`	`/api/browse`	Navigate to a URL
`POST`	`/api/click`	Click an element by ID
`POST`	`/api/type`	Type text into an input
`POST`	`/api/check`	Check a checkbox or radio
`POST`	`/api/uncheck`	Uncheck a checkbox or radio
`POST`	`/api/select`	Select a dropdown option
`POST`	`/api/search`	Web search
`POST`	`/api/login`	Fill and submit a login form
`POST`	`/api/enter-code`	Enter a verification code
`POST`	`/api/find`	Find elements by text or role
`POST`	`/api/back`	Go back in history
`GET`	`/api/page`	Get current page DOM
`GET`	`/api/page-info`	Get page metadata
`GET`	`/api/tables`	Extract table data
`GET`	`/health`	Health check

All POST endpoints accept Content-Type: application/json.

Endpoints

POST /api/browse

Navigate to a URL and return the Spatial DOM.

Request body:

Field	Type	Required	Description
`url`	string	yes	URL to navigate to
`format`	string	no	`"compact"` (default) or `"json"`
`scope`	string	no	`"all"` (default), `"visible"`, `"above_fold"`, or `"visible_above_fold"`

curl http://localhost:3847/api/browse \
  -H "Content-Type: application/json" \
  -d '{"url": "https://example.com"}'

Response: The Spatial DOM in the requested format. Compact format returns plain text; JSON format returns the full structured DOM.

# JSON format with only visible elements
curl http://localhost:3847/api/browse \
  -H "Content-Type: application/json" \
  -d '{"url": "https://example.com", "format": "json", "scope": "visible"}'

POST /api/click

Click an element by its ID. Links navigate to new pages; buttons submit forms.

Request body:

Field	Type	Required	Description
`id`	integer	yes	Element ID to click

curl http://localhost:3847/api/click \
  -H "Content-Type: application/json" \
  -H "X-Browsy-Session: $TOKEN" \
  -d '{"id": 3}'

Response: The resulting page DOM (after navigation or form submission).

POST /api/type

Type text into an input field or textarea.

Request body:

Field	Type	Required	Description
`id`	integer	yes	Element ID of the text input
`text`	string	yes	Text to type

curl http://localhost:3847/api/type \
  -H "Content-Type: application/json" \
  -H "X-Browsy-Session: $TOKEN" \
  -d '{"id": 5, "text": "user@example.com"}'

Response: Confirmation. Use GET /api/page to see the updated form state.

POST /api/check

Check a checkbox or radio button.

Request body:

Field	Type	Required	Description
`id`	integer	yes	Element ID

curl http://localhost:3847/api/check \
  -H "Content-Type: application/json" \
  -H "X-Browsy-Session: $TOKEN" \
  -d '{"id": 10}'

POST /api/uncheck

Uncheck a checkbox or radio button.

Request body:

Field	Type	Required	Description
`id`	integer	yes	Element ID

curl http://localhost:3847/api/uncheck \
  -H "Content-Type: application/json" \
  -H "X-Browsy-Session: $TOKEN" \
  -d '{"id": 10}'

POST /api/select

Select an option in a dropdown.

Request body:

Field	Type	Required	Description
`id`	integer	yes	Element ID of the select element
`value`	string	yes	Value to select

curl http://localhost:3847/api/select \
  -H "Content-Type: application/json" \
  -H "X-Browsy-Session: $TOKEN" \
  -d '{"id": 12, "value": "en-US"}'

POST /api/search

Search the web and return structured results.

Request body:

Field	Type	Required	Description
`query`	string	yes	Search query
`engine`	string	no	`"duckduckgo"` (default) or `"google"`

curl http://localhost:3847/api/search \
  -H "Content-Type: application/json" \
  -d '{"query": "rust web framework"}'

Response:

[
  {
    "title": "Actix Web - Rust Web Framework",
    "url": "https://actix.rs",
    "snippet": "A powerful, pragmatic, and fast web framework for Rust."
  }
]

POST /api/login

Fill and submit a detected login form. Requires a page with a Login suggested action loaded in the session.

Request body:

Field	Type	Required	Description
`username`	string	yes	Username or email
`password`	string	yes	Password

# First navigate to the login page
curl http://localhost:3847/api/browse \
  -H "Content-Type: application/json" \
  -H "X-Browsy-Session: $TOKEN" \
  -d '{"url": "https://app.example.com/login"}'

# Then submit credentials
curl http://localhost:3847/api/login \
  -H "Content-Type: application/json" \
  -H "X-Browsy-Session: $TOKEN" \
  -d '{"username": "user@example.com", "password": "secretpassword"}'

Response: The resulting page DOM after login submission.

POST /api/enter-code

Enter a verification or 2FA code. Requires a page with an EnterCode suggested action.

Request body:

Field	Type	Required	Description
`code`	string	yes	Verification or 2FA code

curl http://localhost:3847/api/enter-code \
  -H "Content-Type: application/json" \
  -H "X-Browsy-Session: $TOKEN" \
  -d '{"code": "847291"}'

Response: The resulting page DOM after code submission.

POST /api/find

Find elements on the current page by text content or ARIA role.

Request body:

Field	Type	Required	Description
`text`	string	no	Find elements containing this text
`role`	string	no	Find elements with this ARIA role

At least one of text or role must be provided.

# Find by text
curl http://localhost:3847/api/find \
  -H "Content-Type: application/json" \
  -H "X-Browsy-Session: $TOKEN" \
  -d '{"text": "Sign In"}'

# Find by role
curl http://localhost:3847/api/find \
  -H "Content-Type: application/json" \
  -H "X-Browsy-Session: $TOKEN" \
  -d '{"role": "button"}'

Response: JSON array of matching elements.

POST /api/back

Go back to the previous page in browsing history. No request body required.

curl -X POST http://localhost:3847/api/back \
  -H "X-Browsy-Session: $TOKEN"

Response: The previous page's DOM.

GET /api/page

Get the current page DOM with form state overlaid. Use after type, check, select, or uncheck to see updated form values without re-fetching.

Query parameters:

Parameter	Type	Required	Description
`scope`	string	no	`"all"` (default), `"visible"`, `"above_fold"`, or `"visible_above_fold"`
`format`	string	no	`"compact"` (default) or `"json"`

curl "http://localhost:3847/api/page?format=json&scope=visible" \
  -H "X-Browsy-Session: $TOKEN"

GET /api/page-info

Get page metadata without the full element list. No parameters.

curl http://localhost:3847/api/page-info \
  -H "X-Browsy-Session: $TOKEN"

Response:

{
  "title": "Sign In - Example",
  "url": "https://example.com/login",
  "page_type": "Login",
  "suggested_actions": [
    {
      "action": "Login",
      "username_id": 5,
      "password_id": 8,
      "submit_id": 12
    }
  ],
  "alerts": [],
  "pagination": null
}

GET /api/tables

Extract structured table data from the current page. No parameters.

curl http://localhost:3847/api/tables \
  -H "X-Browsy-Session: $TOKEN"

Response:

[
  {
    "headers": ["Name", "Price", "Stock"],
    "rows": [
      ["Widget A", "$9.99", "In stock"],
      ["Widget B", "$14.99", "Out of stock"]
    ]
  }
]

GET /health

Health check endpoint. No session required.

curl http://localhost:3847/health

Response:

{
  "status": "ok"
}

Scopes

The scope parameter controls which elements are included in the output:

Scope	Description
`all`	All elements including hidden ones (default)
`visible`	Only non-hidden elements
`above_fold`	Only elements with top edge within the viewport height
`visible_above_fold`	Non-hidden elements above the fold

Output formats

The format parameter controls the response format:

Format	Content-Type	Description
`compact`	`text/plain`	Minimal token-efficient text format (default)
`json`	`application/json`	Full structured Spatial DOM

See Output Formats for details on both formats.

Error responses

Errors return JSON with an error field:

{
  "error": "Element 999 not found"
}

Status	Cause
`400`	Invalid request body or parameters
`404`	Element not found, no page loaded, or no matching action
`503`	Server at session capacity

# Start the server
browsy serve --port 3847 &

# Browse to login page (captures session token)
TOKEN=$(curl -s -D- http://localhost:3847/api/browse \
  -H "Content-Type: application/json" \
  -d '{"url": "https://app.example.com/login"}' \
  | grep -i x-browsy-session | tr -d '\r' | cut -d' ' -f2)

# Check page type
curl -s http://localhost:3847/api/page-info \
  -H "X-Browsy-Session: $TOKEN" | jq .page_type
# "Login"

# Submit credentials
curl -s http://localhost:3847/api/login \
  -H "Content-Type: application/json" \
  -H "X-Browsy-Session: $TOKEN" \
  -d '{"username": "user@example.com", "password": "secret"}'

# Check if 2FA is needed
curl -s http://localhost:3847/api/page-info \
  -H "X-Browsy-Session: $TOKEN" | jq .page_type
# "TwoFactorAuth"

# Enter 2FA code
curl -s http://localhost:3847/api/enter-code \
  -H "Content-Type: application/json" \
  -H "X-Browsy-Session: $TOKEN" \
  -d '{"code": "847291"}'

# Now on the dashboard -- extract tables
curl -s http://localhost:3847/api/tables \
  -H "X-Browsy-Session: $TOKEN" | jq .

A2A Protocol

browsy implements Google's Agent-to-Agent (A2A) protocol, enabling agent discovery and task delegation over HTTP. Any A2A-compatible agent can discover browsy's capabilities and delegate web browsing tasks to it.

Overview

A2A is a standard for agents to find and communicate with each other. browsy's A2A support consists of two parts:

Agent card -- a JSON manifest at a well-known URL describing browsy's capabilities.
Task execution -- an endpoint that accepts goals, executes them as browsing tasks, and streams status events back via SSE.

Both are served automatically by browsy serve.

browsy serve --port 3847

Agent card

The agent card is served at GET /.well-known/agent.json and describes browsy's identity and capabilities.

curl http://localhost:3847/.well-known/agent.json

Response:

{
  "name": "browsy",
  "description": "Zero-render browser engine for AI agents. Navigates, extracts, and interacts with web pages without rendering pixels.",
  "url": "http://localhost:3847",
  "version": "1.0",
  "capabilities": {
    "streaming": true,
    "pushNotifications": false
  },
  "skills": [
    {
      "id": "web-browse",
      "name": "Web Browsing",
      "description": "Navigate to URLs, interact with pages, extract content, fill forms, and search the web.",
      "tags": ["browse", "scrape", "extract", "search", "login", "forms"]
    }
  ]
}

Agents discover browsy by fetching this card and inspecting the skills array. The streaming: true capability indicates that task responses are delivered as Server-Sent Events (SSE).

Task execution

POST /a2a/tasks

Submit a task for browsy to execute. The response is an SSE event stream with status updates.

Request body:

Field	Type	Required	Description
`goal`	string	yes	Natural language description of the task
`params`	object	no	Structured parameters (see below)

Params fields:

Field	Type	Description
`url`	string	Target URL to browse
`credentials`	object	`{ "username": "...", "password": "..." }` for login tasks
`search_query`	string	Query string for search tasks
`extract`	string	What to extract from the page (e.g., `"tables"`, `"links"`, `"text"`)

browsy infers the task intent from the goal text and params fields. Explicit params take priority over goal parsing.

Intent detection

browsy maps each task to one of these intents:

Intent	Trigger	Behavior
`Search`	`search_query` param, or goal contains "search"	Performs a web search, returns results
`Login`	`credentials` param, or goal contains "login"/"sign in"	Navigates to URL, fills login form, submits
`Extract`	`extract` param (not "tables"), or goal contains "extract"/"scrape"	Navigates to URL, returns page content
`ExtractTables`	`extract: "tables"`, or goal contains "table"	Navigates to URL, extracts structured table data
`FillForm`	Goal contains "fill"/"form"/"submit"	Navigates to URL, interacts with form elements
`Browse`	Default fallback	Navigates to URL, returns the Spatial DOM

SSE event stream

The response uses Content-Type: text/event-stream. Each event is a JSON object with the following structure:

data: {"id":"task_abc123","status":"working","steps":[{"description":"Navigating to https://example.com"}]}

data: {"id":"task_abc123","status":"completed","steps":[{"description":"Navigating to https://example.com"},{"description":"Page loaded: Example Domain (3 elements)"}],"result":{"page_type":"Other","title":"Example Domain","elements":3}}

Event fields:

Field	Type	Description
`id`	string	Unique task identifier
`status`	string	`"working"`, `"completed"`, or `"failed"`
`steps`	array	List of `{ "description": "..." }` objects showing progress
`result`	object	Present when `status` is `"completed"`. Contains extracted data
`error`	string	Present when `status` is `"failed"`. Describes what went wrong

The stream always ends with a terminal event ("completed" or "failed").

Examples

Browse a page

curl -N http://localhost:3847/a2a/tasks \
  -H "Content-Type: application/json" \
  -d '{
    "goal": "Browse the Hacker News front page",
    "params": { "url": "https://news.ycombinator.com" }
  }'

Event stream:

data: {"id":"task_1","status":"working","steps":[{"description":"Navigating to https://news.ycombinator.com"}]}

data: {"id":"task_1","status":"completed","steps":[{"description":"Navigating to https://news.ycombinator.com"},{"description":"Page loaded: Hacker News (120 elements)"}],"result":{"page_type":"List","title":"Hacker News","elements":120}}

Search the web

curl -N http://localhost:3847/a2a/tasks \
  -H "Content-Type: application/json" \
  -d '{
    "goal": "Search for Rust web frameworks",
    "params": { "search_query": "rust web framework 2026" }
  }'

curl -N http://localhost:3847/a2a/tasks \
  -H "Content-Type: application/json" \
  -d '{
    "goal": "Login to the application",
    "params": {
      "url": "https://app.example.com/login",
      "credentials": { "username": "user@example.com", "password": "secret" }
    }
  }'

Event stream:

data: {"id":"task_3","status":"working","steps":[{"description":"Navigating to https://app.example.com/login"}]}

data: {"id":"task_3","status":"working","steps":[{"description":"Navigating to https://app.example.com/login"},{"description":"Login page detected, submitting credentials"}]}

data: {"id":"task_3","status":"completed","steps":[{"description":"Navigating to https://app.example.com/login"},{"description":"Login page detected, submitting credentials"},{"description":"Login successful, redirected to Dashboard"}],"result":{"page_type":"Dashboard","title":"Dashboard - App"}}

Extract table data

curl -N http://localhost:3847/a2a/tasks \
  -H "Content-Type: application/json" \
  -d '{
    "goal": "Extract the pricing table",
    "params": {
      "url": "https://example.com/pricing",
      "extract": "tables"
    }
  }'

Extract page content

curl -N http://localhost:3847/a2a/tasks \
  -H "Content-Type: application/json" \
  -d '{
    "goal": "Extract the main article text",
    "params": {
      "url": "https://example.com/blog/post",
      "extract": "text"
    }
  }'

Fill a form

curl -N http://localhost:3847/a2a/tasks \
  -H "Content-Type: application/json" \
  -d '{
    "goal": "Fill out the contact form with name John and email john@example.com",
    "params": { "url": "https://example.com/contact" }
  }'

Task status polling

A stub endpoint exists for polling task status by ID:

GET /a2a/tasks/{task_id}

curl http://localhost:3847/a2a/tasks/task_abc123

This returns the last known state of the task. Since tasks execute synchronously over SSE, polling is primarily useful for checking whether a task completed after a disconnection.

Error handling

When a task fails, the final SSE event includes an error field:

data: {"id":"task_5","status":"failed","steps":[{"description":"Navigating to https://invalid.example"}],"error":"Network error: DNS resolution failed"}

Common failure causes:

Error	Cause
Network error	DNS failure, connection refused, timeout
CAPTCHA detected	Target page requires human verification
No login form found	Login intent but page has no detected login action
Element not found	Form interaction referenced a nonexistent element

Framework Integrations

browsy provides native integrations for popular AI/agent frameworks in both Python and JavaScript/TypeScript. Each integration wraps browsy as framework-compatible tools, so agents can browse the web using their native tool-calling patterns.

JavaScript / TypeScript

The browsy-ai npm package provides integrations for LangChain.js, OpenAI, and Vercel AI SDK. Install the core package and whichever framework you use:

npm install browsy-ai                    # Core SDK
npm install browsy-ai @langchain/core    # + LangChain.js
npm install browsy-ai openai             # + OpenAI
npm install browsy-ai ai                 # + Vercel AI SDK

LangChain.js

import { getTools } from "browsy-ai/langchain";

const tools = getTools();  // -> 14 LangChain tool instances

OpenAI function calling

import { getToolDefinitions, handleToolCall } from "browsy-ai/openai";

const tools = getToolDefinitions();
const result = await handleToolCall("browsy_browse", { url: "https://example.com" });

Vercel AI SDK

import { browsyTools } from "browsy-ai/vercel-ai";
import { generateText } from "ai";
import { openai } from "@ai-sdk/openai";

const result = await generateText({
  model: openai("gpt-4o"),
  tools: browsyTools(),
  prompt: "Go to example.com and summarize it",
  maxSteps: 10,
});

See the full JavaScript / TypeScript guide for complete examples and API reference.

Python

Install browsy with the extras for your framework:

pip install browsy-ai[langchain]   # LangChain tools
pip install browsy-ai[crewai]      # CrewAI tool
pip install browsy-ai[openai]      # OpenAI function calling
pip install browsy-ai[autogen]     # AutoGen integration
pip install browsy-ai[smolagents]  # HuggingFace smolagents
pip install browsy-ai[all]         # All integrations

All Python integrations share a lazily-initialized Browser instance. You can pass your own Browser for custom viewport configuration.

LangChain

The LangChain integration provides individual tools that plug directly into LangChain agents and chains.

from browsy.langchain import get_tools

Available tools

Tool class	Description
`BrowsyBrowseTool`	Navigate to a URL, returns Spatial DOM
`BrowsyClickTool`	Click an element by ID
`BrowsyTypeTextTool`	Type text into an input field
`BrowsySearchTool`	Web search via DuckDuckGo or Google
`BrowsyLoginTool`	Fill and submit a login form
`BrowsyPageInfoTool`	Get page metadata and suggested actions

Quick start

from browsy.langchain import get_tools
from langchain_openai import ChatOpenAI
from langgraph.prebuilt import create_react_agent

llm = ChatOpenAI(model="gpt-4o")
tools = get_tools()

agent = create_react_agent(llm, tools)

result = agent.invoke({
    "messages": [{"role": "user", "content": "Go to news.ycombinator.com and list the top 5 stories"}]
})

Custom browser

Pass a Browser instance to control viewport size or other settings:

from browsy import Browser
from browsy.langchain import get_tools

browser = Browser(viewport_width=375, viewport_height=812)
tools = get_tools(browser=browser)

Using individual tools

from browsy.langchain import BrowsyBrowseTool, BrowsyClickTool

browse = BrowsyBrowseTool()
page = browse.invoke({"url": "https://example.com"})

click = BrowsyClickTool()
result = click.invoke({"id": 3})

CrewAI

The CrewAI integration wraps all browsy actions into a single tool that CrewAI agents can call.

from browsy.crewai import BrowsyTool

Quick start

from browsy.crewai import BrowsyTool
from crewai import Agent, Task, Crew

browsy_tool = BrowsyTool()

researcher = Agent(
    role="Web Researcher",
    goal="Find and summarize information from web pages",
    backstory="You are an expert at navigating websites and extracting key information.",
    tools=[browsy_tool],
    verbose=True,
)

task = Task(
    description="Go to https://news.ycombinator.com and summarize the top 3 stories.",
    expected_output="A summary of the top 3 Hacker News stories with titles and URLs.",
    agent=researcher,
)

crew = Crew(agents=[researcher], tasks=[task])
result = crew.kickoff()
print(result)

Tool actions

The BrowsyTool accepts a JSON string with an action field and action-specific parameters:

# Browse
browsy_tool.run('{"action": "browse", "url": "https://example.com"}')

# Click
browsy_tool.run('{"action": "click", "id": 3}')

# Type
browsy_tool.run('{"action": "type", "id": 5, "text": "hello"}')

# Search
browsy_tool.run('{"action": "search", "query": "rust web framework"}')

# Login
browsy_tool.run('{"action": "login", "username": "user@example.com", "password": "secret"}')

# Page info
browsy_tool.run('{"action": "page_info"}')

OpenAI function calling

The OpenAI integration provides tool definitions compatible with the OpenAI Chat Completions API and a dispatcher to handle tool calls.

from browsy.openai import get_tool_definitions, handle_tool_call

Tool definitions

get_tool_definitions() returns a list of OpenAI-compatible tool schemas:

from browsy.openai import get_tool_definitions

tools = get_tool_definitions()
# Returns list of {"type": "function", "function": {"name": ..., "parameters": ...}}

Handling tool calls

handle_tool_call(name, args) dispatches a tool call to browsy and returns the result as a string:

from browsy.openai import handle_tool_call

result = handle_tool_call("browsy_browse", {"url": "https://example.com"})

Complete example

import json
from openai import OpenAI
from browsy.openai import get_tool_definitions, handle_tool_call

client = OpenAI()
tools = get_tool_definitions()

messages = [
    {"role": "user", "content": "Go to example.com and tell me what's on the page."}
]

# Initial request
response = client.chat.completions.create(
    model="gpt-4o",
    messages=messages,
    tools=tools,
)

# Tool call loop
while response.choices[0].message.tool_calls:
    msg = response.choices[0].message
    messages.append(msg)

    for tool_call in msg.tool_calls:
        args = json.loads(tool_call.function.arguments)
        result = handle_tool_call(tool_call.function.name, args)

        messages.append({
            "role": "tool",
            "tool_call_id": tool_call.id,
            "content": result,
        })

    response = client.chat.completions.create(
        model="gpt-4o",
        messages=messages,
        tools=tools,
    )

print(response.choices[0].message.content)

Available functions

Function name	Parameters	Description
`browsy_browse`	`url`, `format?`, `scope?`	Navigate to a URL
`browsy_click`	`id`	Click an element
`browsy_type_text`	`id`, `text`	Type into an input
`browsy_search`	`query`, `engine?`	Web search
`browsy_login`	`username`, `password`	Login to a site
`browsy_page_info`	(none)	Get page metadata

AutoGen

The AutoGen integration provides a BrowsyBrowser class compatible with Microsoft AutoGen's ConversableAgent.

from browsy.autogen import BrowsyBrowser

Quick start

from browsy.autogen import BrowsyBrowser
from autogen import ConversableAgent, UserProxyAgent

browser = BrowsyBrowser()

assistant = ConversableAgent(
    name="web_assistant",
    system_message="You help users browse the web and extract information.",
    llm_config={"config_list": [{"model": "gpt-4o"}]},
)

# Register browsy tools with the agent
browser.register(assistant)

user = UserProxyAgent(
    name="user",
    human_input_mode="NEVER",
    code_execution_config=False,
)
browser.register(user)

user.initiate_chat(
    assistant,
    message="Go to https://example.com and describe what you see.",
)

Custom browser

from browsy import Browser
from browsy.autogen import BrowsyBrowser

custom = Browser(viewport_width=1366, viewport_height=768)
browser = BrowsyBrowser(browser=custom)

Smolagents

The smolagents integration provides a tool compatible with HuggingFace's smolagents framework.

from browsy.smolagents import BrowsyTool

Quick start

from browsy.smolagents import BrowsyTool
from smolagents import CodeAgent, HfApiModel

tool = BrowsyTool()

agent = CodeAgent(
    tools=[tool],
    model=HfApiModel("Qwen/Qwen2.5-Coder-32B-Instruct"),
)

result = agent.run("Go to https://example.com and extract the main heading text.")
print(result)

Custom browser

from browsy import Browser
from browsy.smolagents import BrowsyTool

browser = Browser(viewport_width=1920, viewport_height=1080)
tool = BrowsyTool(browser=browser)

OpenClaw / SimpleClaw

The openclaw-browsy plugin integrates browsy as a first-class tool in OpenClaw and compatible frameworks like SimpleClaw. Unlike the Python integrations above, this is a TypeScript/Node.js plugin that manages its own browsy server process.

npm install openclaw-browsy

import { register } from "openclaw-browsy";
export default { register };

The plugin auto-starts a browsy serve process and injects 14 browsing tools into every agent. It can also intercept built-in Playwright browser tools for a transparent speed upgrade.

See the full OpenClaw / SimpleClaw integration guide for configuration, standalone usage, and custom orchestrator support.

Shared Browser instance

All integrations lazily initialize a Browser instance with default settings (1920x1080 viewport) if none is provided. The Browser instance is shared across all tool calls within the same integration, maintaining session state (cookies, history, form values) across interactions.

To share a single Browser across multiple integrations:

from browsy import Browser
from browsy.langchain import get_tools as get_langchain_tools
from browsy.openai import get_tool_definitions

browser = Browser(viewport_width=1920, viewport_height=1080)

# Both use the same session
langchain_tools = get_langchain_tools(browser=browser)
openai_tools = get_tool_definitions(browser=browser)

JavaScript / TypeScript

The browsy-ai npm package provides a TypeScript SDK for the browsy REST API, plus ready-made integrations for LangChain.js, OpenAI, and Vercel AI SDK.

Installation

npm install browsy-ai

The package uses ESM and requires Node.js 22+. Framework dependencies are optional peer dependencies — install only what you need.

Core SDK

The core SDK manages the browsy server process, HTTP communication, and per-agent session isolation.

import { BrowsyClient, BrowsyContext, ServerManager } from "browsy-ai";

BrowsyContext

The simplest way to use browsy. BrowsyContext is a facade that coordinates the client, server manager, and session manager.

import { BrowsyContext } from "browsy-ai";

const ctx = new BrowsyContext({ port: 3847 });

// Execute tool calls — server auto-starts, sessions auto-managed
const page = await ctx.executeToolCall("browse", { url: "https://example.com" });
console.log(page);

const info = await ctx.executeToolCall("pageInfo", {});
console.log(info);

BrowsyClient

Lower-level HTTP client for direct API calls. Use this when you manage the server and sessions yourself.

import { BrowsyClient } from "browsy-ai";

const client = new BrowsyClient(3847);

// Navigate
const res = await client.browse({ url: "https://example.com" });
console.log(res.body);

// Interact using the session from the response
await client.typeText({ id: 5, text: "hello" }, res.session);
await client.click({ id: 12 }, res.session);

// Extract data
const tables = await client.tables(res.session);
const info = await client.pageInfo(res.session);

Configuration

import { BrowsyContext } from "browsy-ai";

const ctx = new BrowsyContext({
  port: 3847,           // REST server port (default: 3847)
  autoStart: true,      // Auto-start browsy serve (default: true)
  allowPrivateNetwork: false,  // Allow private network URLs (default: false)
  serverTimeout: 10_000,      // Startup timeout in ms (default: 10000)
});

When autoStart is true, the SDK finds the browsy binary in your PATH (or via the BROWSY_BIN environment variable) and spawns browsy serve --port <port>.

Session isolation

Each agent gets its own isolated browsing session with independent cookies, history, and form state:

const ctx = new BrowsyContext();

// Different agents get different sessions
const page1 = await ctx.executeToolCall("browse", { url: "https://a.com" }, "agent-1");
const page2 = await ctx.executeToolCall("browse", { url: "https://b.com" }, "agent-2");

LangChain.js

npm install browsy-ai @langchain/core

import { getTools } from "browsy-ai/langchain";

Quick start

import { getTools } from "browsy-ai/langchain";
import { ChatOpenAI } from "@langchain/openai";
import { createReactAgent } from "@langchain/langgraph/prebuilt";

const tools = getTools({ port: 3847 });
const llm = new ChatOpenAI({ model: "gpt-4o" });
const agent = createReactAgent({ llm, tools });

const result = await agent.invoke({
  messages: [{ role: "user", content: "Go to news.ycombinator.com and list the top 5 stories" }],
});

Custom context

Pass a BrowsyContext for full control:

import { BrowsyContext } from "browsy-ai";
import { getTools } from "browsy-ai/langchain";

const ctx = new BrowsyContext({ port: 9000, autoStart: false });
const tools = getTools(ctx);

Available tools

getTools() returns 14 LangChain tool instances:

Tool name	Parameters	Description
`browsy_browse`	`url`, `format?`, `scope?`	Navigate to a URL
`browsy_click`	`id`	Click an element by ID
`browsy_type_text`	`id`, `text`	Type into an input field
`browsy_check`	`id`	Check a checkbox/radio
`browsy_uncheck`	`id`	Uncheck a checkbox/radio
`browsy_select`	`id`, `value`	Select a dropdown option
`browsy_search`	`query`, `engine?`	Web search
`browsy_login`	`username`, `password`	Log in using detected form
`browsy_enter_code`	`code`	Enter 2FA/verification code
`browsy_find`	`text?`, `role?`	Find elements by text or role
`browsy_get_page`	`format?`, `scope?`	Get current page with form state
`browsy_page_info`	—	Page metadata and suggested actions
`browsy_tables`	—	Extract structured table data
`browsy_back`	—	Go back in history

OpenAI

npm install browsy-ai openai

import { getToolDefinitions, handleToolCall } from "browsy-ai/openai";

Quick start

import OpenAI from "openai";
import { getToolDefinitions, handleToolCall, createToolCallHandler } from "browsy-ai/openai";

const client = new OpenAI();
const tools = getToolDefinitions();

const messages = [
  { role: "user" as const, content: "Go to example.com and tell me what's there." },
];

let response = await client.chat.completions.create({
  model: "gpt-4o",
  messages,
  tools,
});

// Tool call loop
while (response.choices[0].message.tool_calls?.length) {
  const msg = response.choices[0].message;
  messages.push(msg);

  for (const toolCall of msg.tool_calls!) {
    const args = JSON.parse(toolCall.function.arguments);
    const result = await handleToolCall(toolCall.function.name, args);

    messages.push({
      role: "tool" as const,
      tool_call_id: toolCall.id,
      content: result,
    });
  }

  response = await client.chat.completions.create({
    model: "gpt-4o",
    messages,
    tools,
  });
}

console.log(response.choices[0].message.content);

Bound handler

Use createToolCallHandler() to get a pre-bound handler:

import { getToolDefinitions, createToolCallHandler } from "browsy-ai/openai";

const tools = getToolDefinitions();
const handle = createToolCallHandler({ port: 3847 });

// In your tool call loop:
const result = await handle(toolCall.function.name, args);

Vercel AI SDK

npm install browsy-ai ai

import { browsyTools } from "browsy-ai/vercel-ai";

Quick start

import { generateText } from "ai";
import { openai } from "@ai-sdk/openai";
import { browsyTools } from "browsy-ai/vercel-ai";

const result = await generateText({
  model: openai("gpt-4o"),
  tools: browsyTools(),
  prompt: "Go to news.ycombinator.com and list the top 5 stories",
  maxSteps: 10,
});

console.log(result.text);

Custom context

import { BrowsyContext } from "browsy-ai";
import { browsyTools } from "browsy-ai/vercel-ai";

const ctx = new BrowsyContext({ port: 9000 });
const tools = browsyTools(ctx);

Zod schemas

All tool parameter schemas are exported as Zod objects for use in custom integrations:

import {
  BrowseParams,
  ClickParams,
  TypeTextParams,
  SearchParams,
  TOOL_DESCRIPTIONS,
  TOOL_SCHEMAS,
} from "browsy-ai";

// Use in your own tool definitions
const parsed = BrowseParams.parse({ url: "https://example.com" });

// Iterate over all tools
for (const { name, method, schema } of TOOL_SCHEMAS) {
  console.log(name, TOOL_DESCRIPTIONS[name]);
}

Prerequisites

The SDK talks to a browsy REST server. You need the browsy CLI installed:

cargo install browsy

With autoStart: true (the default), the SDK starts the server automatically. With autoStart: false, start it manually:

browsy serve --port 3847

OpenClaw Integration

browsy integrates with OpenClaw as a first-class plugin, giving every agent fast, zero-render browsing capabilities without Playwright or Chromium.

Why use browsy in OpenClaw?

OpenClaw's built-in browser uses Playwright + CDP: ~300MB RAM, 2-5s per page. browsy handles 70%+ of agent browsing tasks at 10x speed and 60x less memory. The plugin auto-starts a browsy server and injects 14 browsing tools into every agent.

	Built-in Browser	browsy Plugin
Engine	Chromium via Playwright	Zero-render Spatial DOM
Memory	~300MB/page	~5MB/page
Latency	2-5s/page	<100ms/page
JS support	Full	Hidden content exposure
Setup	Bundled	`npm install openclaw-browsy` + browsy CLI

Installation

# Install the OpenClaw plugin
npm install openclaw-browsy

# Install the browsy CLI (needed for the server)
cargo install browsy

Configuration

Add to your OpenClaw config:

{
  "plugins": {
    "openclaw-browsy": {
      "port": 3847,
      "autoStart": true,
      "allowPrivateNetwork": false,
      "preferBrowsy": true,
      "serverTimeout": 10000
    }
  }
}

Option	Default	Description
`port`	`3847`	Port for the browsy REST server
`autoStart`	`true`	Start `browsy serve` automatically on plugin init
`allowPrivateNetwork`	`false`	Allow fetching private/internal network URLs
`preferBrowsy`	`true`	Intercept built-in browser tool calls and redirect through browsy
`serverTimeout`	`10000`	Timeout (ms) waiting for server startup

Plugin registration

// openclaw.config.ts
import { register } from "openclaw-browsy";
export default { register };

The plugin registers four components following OpenClaw's standard pattern:

preToolExecution hook — intercepts built-in browser tools (browser, web_browser, playwright_browser, browse_web) and redirects them through browsy when preferBrowsy is enabled
agent:bootstrap hook — injects 14 browsy tools into every agent's toolset at startup
browsy-server service — manages the browsy serve process lifecycle (auto-start, health polling, shutdown)
Gateway methods + CLI commands — browsy.status, browsy.restart, /browsy-status, /browsy-sessions

Available tools

Every agent gets these 14 tools automatically:

Tool	Parameters	Description
`browsy_browse`	`url`, `format?`, `scope?`	Navigate to a URL
`browsy_click`	`id`	Click an element by ID
`browsy_type_text`	`id`, `text`	Type text into an input field
`browsy_check`	`id`	Check a checkbox or radio button
`browsy_uncheck`	`id`	Uncheck a checkbox or radio button
`browsy_select`	`id`, `value`	Select a dropdown option
`browsy_search`	`query`, `engine?`	Search the web (DuckDuckGo or Google)
`browsy_login`	`username`, `password`	Log in using detected form fields
`browsy_enter_code`	`code`	Enter a verification or 2FA code
`browsy_find`	`text?`, `role?`	Find elements by text or ARIA role
`browsy_get_page`	`format?`, `scope?`	Get current page DOM with form state
`browsy_page_info`	—	Get page metadata and suggested actions
`browsy_tables`	—	Extract structured table data
`browsy_back`	—	Go back in browsing history

How it works

The plugin is a pure proxy — it talks to browsy's REST API via fetch() and manages sessions:

Agent → browsy_browse("https://example.com")
  → Plugin ensures browsy server is running
  → Plugin gets/creates session for this agent
  → POST /api/browse with X-Browsy-Session header
  → browsy fetches, parses, and returns Spatial DOM
  → Plugin updates session token
  → Agent receives page content

Each agent gets its own isolated session with independent cookies, history, and form state.

SimpleClaw and other OpenClaw-compatible frameworks

The openclaw-browsy plugin works with any framework that implements the OpenClaw plugin API. This includes SimpleClaw and other lightweight agent orchestrators built on the OpenClaw standard.

SimpleClaw quick start

import { SimpleClaw } from "simpleclaw";
import { register } from "openclaw-browsy";

const claw = new SimpleClaw({
  plugins: [{ register }],
  config: {
    "openclaw-browsy": {
      port: 3847,
      preferBrowsy: true,
    },
  },
});

// Agents automatically get browsy tools
const agent = claw.createAgent({
  name: "researcher",
  instructions: "You browse the web and extract information.",
});

const result = await agent.run("Search for 'Rust web frameworks' and summarize the top 3 results");

Standalone usage (no framework)

You can also use the browsy client directly without OpenClaw:

import { BrowsyContext } from "openclaw-browsy";

const ctx = new BrowsyContext({ port: 3847, autoStart: false });

// Assumes browsy serve is already running
const page = await ctx.executeToolCall("browse", { url: "https://example.com" });
console.log(page);

const info = await ctx.executeToolCall("pageInfo", {});
console.log(info);

Custom agent orchestrators

Any orchestrator that implements the four-method OpenClawPluginApi interface can use the plugin:

interface OpenClawPluginApi {
  registerHook(name: string, handler: (...args: unknown[]) => unknown): void;
  registerService(name: string, service: { start?: () => void | Promise<void>; stop?: () => void | Promise<void> }): void;
  registerGatewayMethod(name: string, handler: (...args: unknown[]) => unknown): void;
  registerCommand(name: string, handler: (...args: unknown[]) => unknown): void;
}

Call register(api) with your implementation and browsy tools become available to your agents.

preferBrowsy mode

When preferBrowsy is enabled (the default), the plugin intercepts calls to built-in browser tools and aborts them with a message directing the agent to use browsy_browse instead. This provides a transparent speed upgrade for agents that were previously using Playwright.

The intercepted tool names are:

browser
web_browser
playwright_browser
browse_web

To disable interception and run browsy alongside the built-in browser:

{
  "openclaw-browsy": {
    "preferBrowsy": false
  }
}

When to fall back to a full browser

browsy handles server-rendered pages, forms, search, and data extraction. Fall back to Playwright for:

JS-heavy SPAs — React, Vue, Angular apps that render entirely client-side
Screenshots — when you need visual/pixel-level inspection
Complex JS interactions — drag-and-drop, infinite scroll, WebSocket-driven UIs
PDF generation — print-to-PDF workflows

With preferBrowsy: false, both browsy and the built-in browser are available. Agents can choose the right tool for each task.

Bundled skills

The plugin includes three runtime skills for common browsing patterns:

browse-and-extract

Navigate to a URL and extract data, automatically handling cookie consent and login walls.

web-research

Search the web, visit multiple pages, and compile a research summary with source attribution.

form-filler

Detect form fields using browsy's page intelligence, fill them with provided data, and submit.

Python Bindings

browsy provides Python bindings via PyO3. The API closely mirrors the Rust Session API.

Installation

pip install browsy-ai

The package ships a compiled native extension (_core.pyd / _core.so). No Rust toolchain required for installation from wheels.

Module contents

from browsy import Browser, Page, Element

Class	Description
`Browser`	A browsing session with cookie persistence and form state
`Page`	A parsed page (the Spatial DOM)
`Element`	A single element in the Spatial DOM

Basic usage: parsing HTML

The Browser class can parse local HTML without network access:

from browsy import Browser

browser = Browser(viewport_width=1920, viewport_height=1080)
page = browser.load_html('<h1>Hello</h1><a href="/about">About</a>', 'https://example.com')

print(page.title)       # ""
print(len(page))        # 2
for el in page.elements:
    print(el.id, el.tag, el.text)
# 1 h1 Hello
# 2 a About

Browsing: navigating URLs

from browsy import Browser

browser = Browser()
page = browser.goto("https://example.com")

print(page.title)       # "Example Domain"
print(page.url)         # "https://example.com"
print(page.page_type()) # "Other"

Page properties and methods

page.title              # str: page title
page.url                # str: current URL
page.elements           # list[Element]: all elements
page.visible()          # list[Element]: non-hidden elements only
page.above_fold()       # list[Element]: elements with top edge within viewport
page.get(id)            # Element or None: lookup by ID
page.page_type()        # str: "Login", "Search", "Article", "List", etc.
page.suggested_actions() # list[dict]: detected action recipes
page.alerts()           # list[Element]: elements with alert_type set
page.tables()           # list[dict]: extracted table data (headers + rows)
page.pagination()       # dict or None: next/prev/pages links
page.to_json()          # str: full JSON serialization
page.to_compact()       # str: compact text format
len(page)               # int: element count

Element properties

el.id                   # int: unique element ID
el.tag                  # str: HTML tag name
el.role                 # str or None: ARIA role (implicit or explicit)
el.text                 # str or None: visible text content
el.href                 # str or None: link target (resolved to absolute URL)
el.placeholder          # str or None: placeholder text
el.value                # str or None: current value
el.input_type           # str or None: input type attribute
el.name                 # str or None: HTML name attribute
el.label                # str or None: associated label text
el.alert_type           # str or None: "alert", "error", "success", "warning"
el.disabled             # bool or None
el.checked              # bool or None
el.expanded             # bool or None
el.selected             # bool or None
el.required             # bool or None
el.hidden               # bool or None: True if element is hidden
el.bounds               # tuple[int, int, int, int]: (x, y, width, height)

Form interaction

browser = Browser()
page = browser.goto("https://example.com/login")

# Type into fields by element ID
browser.type_text(5, "user@example.com")
browser.type_text(8, "secretpassword")

# Check a "remember me" checkbox
browser.check(10)

# Select a dropdown option
browser.select(12, "en-US")

# Read the updated DOM with form state overlaid
page = browser.dom()

# Submit by clicking the submit button
page = browser.click(15)

Compound actions

For detected form patterns, compound actions handle the full workflow:

# Login (requires Login suggested action on current page)
page = browser.login("user@example.com", "password123")

# Enter verification code (requires EnterCode suggested action)
page = browser.enter_code("123456")

Search

# Search the web (DuckDuckGo by default)
results = browser.search("python web scraping")
for r in results:
    print(r["title"], r["url"], r["snippet"])

Finding elements

# Find by text content (exact substring match)
elements = browser.find_by_text("Sign In")

# Find by text content (case-insensitive substring)
elements = browser.find_by_text_fuzzy("sign in")

# Find by ARIA role
buttons = browser.find_by_role("button")
headings = browser.find_by_role("heading")
links = browser.find_by_role("link")

# Find input by semantic purpose
password_input = browser.find_input_by_purpose("password")
email_input = browser.find_input_by_purpose("email")
search_input = browser.find_input_by_purpose("search")
# Supported purposes: "password", "email", "username", "code", "search", "phone"

# Find verification codes on the page
code = browser.find_verification_code()  # str or None

# Navigate to a URL
page = browser.goto("https://example.com")

# Click a link (navigates to its href)
page = browser.click(3)

# Go back
page = browser.back()

Suggested actions

page = browser.goto("https://example.com/login")

for action in page.suggested_actions():
    print(action)
    # {"action": "Login", "username_id": 5, "password_id": 8, "submit_id": 12}

Each action is a dictionary with an "action" key identifying the type and additional fields with element IDs. See the Action Recipes Reference for all variants.

Viewport configuration

# Mobile viewport
browser = Browser(viewport_width=375, viewport_height=812)

# Desktop viewport (default)
browser = Browser(viewport_width=1920, viewport_height=1080)

The viewport dimensions affect CSS media query evaluation and layout computation, which in turn affects element positions and visibility.

CLI Usage

The browsy CLI provides three commands: fetch for URLs, parse for local HTML files, and serve for the REST API server.

Installation

cargo install browsy

Commands

fetch

Fetch a URL, compute the Spatial DOM, and print the result.

browsy fetch <URL> [OPTIONS]

Flag	Description
`--json`	Output as JSON instead of compact format
`--viewport <WxH>`	Viewport size (default: `1920x1080`)
`--no-css`	Skip fetching external CSS stylesheets
`--visible-only`	Only include visible (non-hidden) elements
`--above-fold`	Only include elements above the viewport fold

Examples:

# Compact output (default)
browsy fetch https://example.com

# JSON output
browsy fetch https://example.com --json

# Mobile viewport
browsy fetch https://example.com --viewport 375x812

# Skip external CSS for faster parsing
browsy fetch https://example.com --no-css

# Only visible above-fold elements
browsy fetch https://example.com --visible-only --above-fold

parse

Parse a local HTML file and print the Spatial DOM. No network requests are made (external stylesheets are not fetched).

browsy parse <FILE> [OPTIONS]

Flag	Description
`--json`	Output as JSON instead of compact format
`--viewport <WxH>`	Viewport size (default: `1920x1080`)

Use - to read from stdin:

echo '<h1>Hello</h1>' | browsy parse -
curl -s https://example.com | browsy parse -

Examples:

# Parse a local file
browsy parse index.html

# Parse with JSON output
browsy parse index.html --json

# Parse from stdin
cat page.html | browsy parse -

serve

Start the REST API + A2A server.

browsy serve [OPTIONS]

Flag	Description
`--port <PORT>`	Port to listen on (default: `3847`)
`--allow-private-network`	Allow fetching private/LAN addresses

Examples:

# Start on default port
browsy serve

# Custom port
browsy serve --port 8080

# Allow local development server access
browsy serve --allow-private-network

The server exposes a REST API and A2A protocol endpoints. See REST API and A2A Protocol.

Output formats

Compact format (default)

The compact format is designed for minimal token usage in LLM contexts:

title: Example Domain
url: https://example.com
vp: 1920x1080
els: 3
---
[1:h1 "Example Domain"]
[2:p "This domain is for use in illustrative examples in documents."]
[3:a "More information..." ->https://www.iana.org/domains/example]

The header shows the page title, URL, viewport dimensions, and element count. Each element line follows the pattern [id:tag "text"] with optional annotations:

!id:tag -- hidden element
id:input:password -- input type (when not "text")
[name] -- HTML name attribute
[v] -- checked
[*] -- required
[=value] -- current value
->url -- href
narrow / wide / full -- width relative to viewport
@region -- position (only when needed to disambiguate duplicates)

JSON format

The JSON format includes the full SpatialDom structure with all element properties. See the Architecture page for the complete schema.

MCP server mode

browsy also runs as an MCP server for use with Claude Code and other MCP clients. See MCP Server for details.

browsy mcp

Web Search

browsy includes built-in web search via DuckDuckGo and Google. No API keys or external services required -- it fetches search result pages directly and parses the HTML.

Search engines

Engine	Endpoint	Reliability
DuckDuckGo	`https://html.duckduckgo.com/html/`	High. Uses the HTML-only endpoint, no JavaScript needed.
Google	`https://www.google.com/search`	Variable. Google may return CAPTCHAs or block automated requests.

DuckDuckGo is the default and recommended engine.

Rust API

Basic search

#![allow(unused)]
fn main() {
use browsy_core::fetch::{Session, SearchEngine};

let mut session = Session::new()?;
let results = session.search("rust web scraping")?;

for r in &results {
    println!("{}: {} -- {}", r.title, r.url, r.snippet);
}
}

Choosing a search engine

#![allow(unused)]
fn main() {
let results = session.search_with("rust web scraping", SearchEngine::Google)?;
}

Search and read

Search and automatically fetch the top N result pages:

#![allow(unused)]
fn main() {
let pages = session.search_and_read("rust web scraping", 3)?;

for page in &pages {
    println!("--- {} ---", page.result.title);
    if let Some(ref dom) = page.dom {
        println!("  Page type: {:?}", dom.page_type);
        println!("  Elements: {}", dom.els.len());
    } else {
        println!("  (fetch failed)");
    }
}
}

Each SearchPage contains the original SearchResult (title, URL, snippet) and an Option<SpatialDom> for the fetched page. Pages that fail to fetch have dom: None.

#![allow(unused)]
fn main() {
let pages = session.search_and_read_with(
    "rust web scraping",
    5,
    SearchEngine::DuckDuckGo,
)?;
}

Python API

from browsy import Browser

browser = Browser()

# Basic search (DuckDuckGo)
results = browser.search("python asyncio tutorial")
for r in results:
    print(r["title"], r["url"])

Search results are returned as a list of dictionaries, each with title, url, and snippet keys.

MCP API

The search tool accepts a query and optional engine:

{
  "query": "browsy zero-render browser",
  "engine": "duckduckgo"
}

Returns a JSON array of results:

[
  {
    "title": "browsy - Zero-render browser engine",
    "url": "https://example.com/browsy",
    "snippet": "A browser engine for AI agents..."
  }
]

SearchResult struct

#![allow(unused)]
fn main() {
pub struct SearchResult {
    pub title: String,
    pub url: String,
    pub snippet: String,
}
}

How it works

DuckDuckGo

browsy fetches https://html.duckduckgo.com/html/?q=<query>, which returns a pure HTML page with no JavaScript. Results are extracted by finding <div class="result"> containers and parsing the title link (result__a), URL (result__url), and snippet (result__snippet). Redirect URLs are decoded from the uddg query parameter.

Google

browsy fetches https://www.google.com/search?q=<query>&num=10. Results are extracted using a structural pattern: anchor tags containing an <h3> descendant. The title comes from the h3 text, the URL from the anchor href (with /url?q= redirect decoding), and snippets from nearby div elements. The parser targets the #rso results container to skip ads and navigation.

Google results may be less reliable because Google actively detects and blocks automated requests. DuckDuckGo's HTML endpoint is specifically designed for non-JavaScript clients and is the recommended default.

Page Types Reference

browsy classifies every page into a PageType to help agents decide what to do next. The classification is based on structural heuristics applied to the Spatial DOM -- no machine learning, no external services.

Page types are evaluated in priority order. The first match wins.

PageType enum

#![allow(unused)]
fn main() {
pub enum PageType {
    Error,
    Captcha,
    Login,
    TwoFactorAuth,
    OAuthConsent,
    Inbox,
    EmailBody,
    Dashboard,
    Article,
    SearchResults,
    List,
    Search,
    Form,
    Other,          // default
}
}

Detection criteria

Page Type	Detection Criteria
Error	Title contains HTTP error codes (`404`, `500`, `403`, `not found`, `error`) OR page has elements with `alert_type == "error"`.
Captcha	Title contains CAPTCHA keywords (`captcha`, `verify you're human`, `robot`, `security check`, `just a moment`, `attention required`) OR heading contains CAPTCHA phrases OR a CAPTCHA service (reCAPTCHA, hCaptcha, Turnstile, Cloudflare challenge) is detected in the HTML structure.
Login	Page has a visible `<input type="password">`.
TwoFactorAuth	Title or heading contains verification keywords (`verification`, `enter code`, `security code`, `2fa`, `two-factor`, `otp`, `one-time`, `passcode`) AND page has a visible text/number/tel input. No password field present (that would be Login).
OAuthConsent	Title or heading contains OAuth keywords (`authorize`, `allow access`, `grant permission`, `oauth`, `consent`).
Inbox	Title contains inbox keywords (`inbox`, `mail`, `messages`) AND page has 10+ visible links.
EmailBody	Page text contains 3+ of the email markers: `from:`, `to:`, `subject:`, `date:`.
Dashboard	Title or heading contains dashboard keywords (`dashboard`, `welcome back`, `overview`) AND page has both a `<nav>` and `<main>` landmark.
Article	Page has 3+ headings AND enough long paragraphs (>100 chars). When the page has 20+ links, the threshold is 10 long paragraphs (vs 2 for low-link pages). Pages with 15+ headings must have a paragraph-to-heading ratio of at least 0.8 to distinguish articles (Wikipedia) from heading-heavy list pages (BBC News).
SearchResults	Page has a search input (visible or hidden) AND 8+ links AND search context: title/heading contains search-result keywords (`search results`, `results for`, `search`) OR URL contains search query parameters (`?q=`, `?query=`, `?s=`, `?search=`, `/search`).
List	Page has 10+ visible links. Evaluated after Article and SearchResults.
Search	Page has a visible search input. Evaluated after List (many list pages have search bars in navigation). Also fires as a fallback when a page has fewer than 5 visible elements but has a hidden search input (common in JS-rendered search engines without JS execution).
Form	Page has 2+ visible data-entry inputs (excludes checkbox, radio, hidden, submit, button, and image inputs).
Other	Default when no heuristic matches.

Evaluation order

The order matters. For example:

A login page with a search bar in the nav is classified as Login (password field check comes first), not Search.
A search results page with many links is SearchResults, not List, because SearchResults is checked before List.
An article with a search bar is Article, not Search, because Article is checked first.
An error page with a login form is Error, because error checks come before Login.

Accessing page type

Rust

#![allow(unused)]
fn main() {
use browsy_core::output::PageType;

let dom = browsy_core::parse(html, 1920.0, 1080.0);
match dom.page_type {
    PageType::Login => println!("This is a login page"),
    PageType::Article => println!("This is an article"),
    _ => println!("Page type: {:?}", dom.page_type),
}
}

Python

page = browser.goto("https://example.com")
print(page.page_type())  # "Login", "Article", "Other", etc.

MCP

The page_info tool returns page_type as a string. The browse tool includes it in the JSON output format.

JSON serialization

PageType is serialized as a string. The field is omitted from JSON when the value is Other (via skip_serializing_if).

{
  "page_type": "Login",
  "title": "Sign In",
  "url": "https://example.com/login"
}

Action Recipes Reference

browsy detects structured action patterns on each page and emits them as SuggestedAction variants. Each action provides element IDs that an agent can use directly with click, type_text, check, and select operations.

Actions are detected after page type classification. Multiple actions can coexist on a single page (a login page might also have a Search action for the nav bar and a CookieConsent action for a banner).

SuggestedAction enum

#![allow(unused)]
fn main() {
#[derive(Debug, Clone, Serialize, Deserialize)]
#[serde(tag = "action")]
pub enum SuggestedAction {
    Login { ... },
    Register { ... },
    Contact { ... },
    FillForm { ... },
    Search { ... },
    EnterCode { ... },
    Download { ... },
    CaptchaChallenge { ... },
    CookieConsent { ... },
    Consent { ... },
    SelectFromList { ... },
    Paginate { ... },
}
}

All actions are serialized with a "action" tag field for easy pattern matching.

Detected when the page has a visible password input, a nearby text/email input, and a submit button.

{
  "action": "Login",
  "username_id": 5,
  "password_id": 8,
  "submit_id": 12,
  "remember_me_id": 10
}

Field	Type	Description
`username_id`	u32	Text or email input nearest to the password field (within 500px Y)
`password_id`	u32	The `<input type="password">` element
`submit_id`	u32	Nearest submit button below the password field
`remember_me_id`	Option<u32>	Checkbox with "remember" in its label or name

When it fires: Page has a visible password input and a nearby username/email input. Does NOT fire if the page also has registration context (confirm password + registration keywords) -- Register takes priority in that case. When a page has both login and registration sections (like Hacker News), Login takes priority over Register.

Usage: The MCP login tool and Python browser.login() use this action internally. They type into username_id and password_id, then click submit_id.

Register

Detected on registration pages: password field plus either a confirm password field or registration keywords in the title/heading.

{
  "action": "Register",
  "email_id": 3,
  "username_id": 4,
  "password_id": 7,
  "confirm_password_id": 9,
  "name_id": 2,
  "submit_id": 11
}

Field	Type	Description
`email_id`	Option<u32>	Email input
`username_id`	Option<u32>	Username text input
`password_id`	u32	Primary password input
`confirm_password_id`	Option<u32>	Second password input (confirm)
`name_id`	Option<u32>	Full name text input
`submit_id`	u32	Submit button

When it fires: Page has a visible password field AND either (a) two or more password fields (confirm password pattern) or (b) title/heading contains registration keywords (register, sign up, signup, create account, join, new account). Does not fire when login keywords are present alongside confirm password (dual login/register pages prefer Login).

Contact

Detected on contact forms: a textarea (message body) plus contact-related context in the title or headings.

{
  "action": "Contact",
  "name_id": 2,
  "email_id": 4,
  "message_id": 6,
  "submit_id": 8
}

Field	Type	Description
`name_id`	Option<u32>	Name input
`email_id`	Option<u32>	Email input
`message_id`	u32	Textarea element
`submit_id`	u32	Submit button

When it fires: Page has a visible textarea AND title/heading contains contact keywords (contact us, contact form, get in touch, reach out, send us a message, inquiry).

FillForm

Generic form action for pages classified as Form that don't match a more specific pattern (Login, Register, Contact, Search).

{
  "action": "FillForm",
  "fields": [
    { "id": 3, "label": "First Name", "name": "first_name", "type": "text" },
    { "id": 5, "label": "Email Address", "name": "email", "type": "email" },
    { "id": 7, "label": "Phone", "name": "phone", "type": "tel" }
  ],
  "submit_id": 10
}

Field	Type	Description
`fields`	Vec<FormField>	Visible data-entry fields with labels
`submit_id`	u32	Submit button

Each FormField contains:

Field	Type	Description
`id`	u32	Element ID
`label`	Option<String>	Associated label text (from `<label>` or placeholder)
`name`	Option<String>	HTML name attribute
`input_type`	Option<String>	Input type attribute

When it fires: Page type is Form (2+ data-entry inputs) AND no more specific form action (Login, Register, Contact, Search) was already detected.

Search

Detected when a search input is present on the page.

{
  "action": "Search",
  "input_id": 15,
  "submit_id": 16
}

Field	Type	Description
`input_id`	u32	Search input element
`submit_id`	u32	Submit button

When it fires: Page has an input matching search criteria: type="search", role="searchbox", name="q", name contains "search", or placeholder contains "search". Prefers visible inputs but falls back to hidden ones (for JS-rendered search engines).

EnterCode

Detected on verification/2FA pages with code-related context.

{
  "action": "EnterCode",
  "input_id": 4,
  "submit_id": 6,
  "code_length": 6
}

Field	Type	Description
`input_id`	u32	Code input element (first input if multiple narrow digit inputs)
`submit_id`	u32	Submit button
`code_length`	Option<usize>	Expected code length (set when 4-8 narrow inputs are detected)

When it fires: Title or heading contains verification keywords AND the page has a visible text/number/tel input. Does not fire if a password field is present (that is Login). Detects separate-digit inputs (width < 60px, 4-8 inputs) and reports the code length.

Usage: The MCP enter_code tool and Python browser.enter_code() use this action internally.

Download

Detected when the page has links or buttons with download-related text or file extension hrefs.

{
  "action": "Download",
  "items": [
    { "id": 20, "text": "Download v2.1.0", "href": "https://example.com/release.zip" },
    { "id": 22, "text": "Download PDF", "href": "https://example.com/guide.pdf" }
  ]
}

Field	Type	Description
`items`	Vec<DownloadItem>	Downloadable links/buttons

Each DownloadItem contains:

Field	Type	Description
`id`	u32	Element ID
`text`	Option<String>	Link/button text
`href`	Option<String>	Download URL

When it fires: Page has visible links or buttons where the text starts with "download" (and is short) or the href ends with a known file extension (.zip, .tar.gz, .dmg, .exe, .msi, .deb, .rpm, .pkg, .appimage, .pdf, .csv, .xlsx).

CaptchaChallenge

Detected when a CAPTCHA service is found in the HTML structure or the page is classified as Captcha.

{
  "action": "CaptchaChallenge",
  "captcha_type": "ReCaptcha",
  "sitekey": "6Le-wvkSAAAAABx7...",
  "submit_id": 15
}

Field	Type	Description
`captcha_type`	CaptchaType	Type of CAPTCHA detected
`sitekey`	Option<String>	Site key from `data-sitekey` attribute
`submit_id`	Option<u32>	Submit/verify button

When it fires: Page has a captcha field set (detected CAPTCHA service in HTML) OR page type is Captcha. See CAPTCHA Detection for details.

CookieConsent

Detected when the page has a cookie notice with accept/reject buttons.

{
  "action": "CookieConsent",
  "accept_id": 50,
  "reject_id": 52
}

Field	Type	Description
`accept_id`	u32	Accept/agree button
`reject_id`	Option<u32>	Reject button (not always present)

When it fires: Page has a substantial text block (>30 chars) mentioning cookies/GDPR AND a button with accept-related text (accept all, accept cookies, allow cookies, allow all, agree, got it, i understand, i agree).

Detected on OAuth/authorization consent pages with approve/deny buttons.

{
  "action": "Consent",
  "approve_ids": [30],
  "deny_ids": [32]
}

Field	Type	Description
`approve_ids`	Vec<u32>	Approve/allow/authorize buttons
`deny_ids`	Vec<u32>	Deny/cancel/decline buttons

When it fires: Title or heading contains OAuth keywords (authorize, allow access, grant permission, oauth, consent) AND the page has buttons with approve or deny text.

SelectFromList

Detected on pages with many links arranged in a list-like pattern.

{
  "action": "SelectFromList",
  "items": [10, 14, 18, 22, 26]
}

Field	Type	Description
`items`	Vec<u32>	One link ID per row (the first link in each row group)

When it fires: Page has 5+ visible links that form 5+ distinct rows (links within 30px Y are grouped into the same row). The action provides the first link ID from each row as representative items.

Paginate

Detected when the page has next/previous navigation links or numbered page links.

{
  "action": "Paginate",
  "next_id": 100,
  "prev_id": 98
}

Field	Type	Description
`next_id`	Option<u32>	Next page link
`prev_id`	Option<u32>	Previous page link

When it fires: Page has links with pagination text (next, prev, previous, >, >>, <, <<, and Unicode equivalents).

Detection order

Actions are detected in this order:

Register (or Login if no registration context)
EnterCode
Consent
Contact
Search
SelectFromList
CookieConsent
Paginate
FillForm (only if no more specific form action exists)
Download
CaptchaChallenge

Multiple actions can coexist. A login page with a cookie banner and nav search bar will have Login, CookieConsent, and Search actions simultaneously.

CAPTCHA Detection

browsy detects CAPTCHAs from HTML structure alone -- no rendering, no image analysis, no JavaScript execution. Detection works by scanning the raw DOM tree for known CAPTCHA service indicators before the Spatial DOM is generated.

CaptchaType enum

#![allow(unused)]
fn main() {
pub enum CaptchaType {
    ReCaptcha,           // Google reCAPTCHA v2 or v3
    HCaptcha,            // hCaptcha
    Turnstile,           // Cloudflare Turnstile
    CloudflareChallenge, // Cloudflare JS challenge ("Just a moment...")
    ImageGrid,           // Custom image-grid CAPTCHA ("select all images containing...")
    TextCaptcha,         // Text-based CAPTCHA (type characters from an image)
    Unknown,             // CAPTCHA detected but type not identified
}
}

Detection signals

browsy scans the layout tree for these patterns:

Script sources

Pattern	Detected as
`src` contains `recaptcha` or `google.com/recaptcha`	ReCaptcha
`src` contains `hcaptcha.com`	HCaptcha
`src` contains `challenges.cloudflare.com/turnstile`	Turnstile

Iframe sources

Pattern	Detected as
`src` contains `recaptcha` or `google.com/recaptcha`	ReCaptcha
`src` contains `hcaptcha.com` or `newassets.hcaptcha.com`	HCaptcha

Div classes

Pattern	Detected as
Class contains `g-recaptcha`	ReCaptcha
Class contains `h-captcha`	HCaptcha
Class contains `cf-turnstile`	Turnstile

Div IDs

Pattern	Detected as
ID contains `challenge-running` or `cf-challenge`	CloudflareChallenge

Site key

Any element with a data-sitekey attribute has its value captured. This attribute is used by reCAPTCHA, hCaptcha, and Turnstile to embed the site key.

Title and heading keywords

Page type detection checks title and headings for CAPTCHA-related phrases. These trigger PageType::Captcha even without a known CAPTCHA service:

Title keywords: captcha, verify you're human, verify you are human, robot, security check, challenge, just a moment, attention required, are you human

Heading keywords: captcha, verify you're human, security check, are you human, complete the challenge, human verification

CaptchaInfo struct

#![allow(unused)]
fn main() {
pub struct CaptchaInfo {
    pub captcha_type: CaptchaType,
    pub sitekey: Option<String>,
}
}

The sitekey is populated when a data-sitekey attribute is found. It is the value needed by third-party CAPTCHA solving services.

CaptchaChallenge action

When a CAPTCHA is detected, the CaptchaChallenge suggested action is emitted:

#![allow(unused)]
fn main() {
SuggestedAction::CaptchaChallenge {
    captcha_type: CaptchaType,
    sitekey: Option<String>,
    submit_id: Option<u32>,
}
}

The submit_id is the nearest verify/submit/continue button, if one exists. When no known CAPTCHA service is detected but the page is classified as Captcha, browsy infers the type:

4+ image buttons on the page: ImageGrid
Otherwise: Unknown

Session methods

Rust

#![allow(unused)]
fn main() {
let mut session = Session::new()?;
let dom = session.goto("https://example.com")?;

// Check if the current page is a CAPTCHA
if session.is_captcha() {
    println!("CAPTCHA detected!");
}

// Get CAPTCHA details
if let Some(info) = session.captcha_info() {
    println!("Type: {:?}", info.captcha_type);
    if let Some(ref key) = info.sitekey {
        println!("Site key: {}", key);
    }
}
}

Python

browser = Browser()
page = browser.goto("https://example.com")

if page.page_type() == "Captcha":
    for action in page.suggested_actions():
        if action["action"] == "CaptchaChallenge":
            print(f"Type: {action['captcha_type']}")
            print(f"Site key: {action.get('sitekey')}")

MCP behavior

When the browse or click tools return a page detected as Captcha, the output is prefixed with a warning:

CAPTCHA detected (ReCaptcha) -- this page requires human verification to proceed.

The page_info tool includes the full CAPTCHA information:

{
  "page_type": "Captcha",
  "captcha": {
    "captcha_type": "ReCaptcha",
    "sitekey": "6Le-wvkSAAAA..."
  },
  "suggested_actions": [
    {
      "action": "CaptchaChallenge",
      "captcha_type": "ReCaptcha",
      "sitekey": "6Le-wvkSAAAA...",
      "submit_id": 15
    }
  ]
}

What browsy cannot do

browsy detects and classifies CAPTCHAs. It does not solve them. When a CAPTCHA is encountered, the agent has several options:

Human-in-the-loop: Surface the CAPTCHA to a human operator.
Third-party solver: Pass the captcha_type and sitekey to a CAPTCHA solving service (2captcha, Anti-Captcha, etc.), receive the solution token, and inject it.
Alternative approach: Try a different URL, use an API instead of the web interface, or skip the blocked resource.
Wait and retry: Some Cloudflare challenges resolve after a delay.

The sitekey in the CaptchaInfo is the value that third-party solving services typically require.

Detection pipeline

CAPTCHA detection happens at two stages:

Tree scan (detect_captcha_from_tree): Before the Spatial DOM is generated, the layout tree is scanned for CAPTCHA service indicators (script/iframe sources, div classes/IDs, data-sitekey). This produces the CaptchaInfo stored on SpatialDom.captcha.
Page type classification (detect_page_type): After the Spatial DOM is built, the page type heuristic checks for CAPTCHA signals: title keywords, heading keywords, and the presence of captcha on the SpatialDom. If any signal matches, the page is classified as PageType::Captcha.
Action detection (detect_captcha_challenge_action): If captcha is set or the page type is Captcha, the CaptchaChallenge action is emitted with the type, sitekey, and submit button.

CSS Engine

browsy includes a CSS engine built from scratch in Rust. It handles selector matching, property parsing, variable resolution, calc() expressions, @media queries, and specificity ordering. The engine computes the subset of CSS properties needed for layout -- approximately 40 properties that affect bounding box computation.

Architecture

HTML ──> DomNode tree
          │
          ├── <style> blocks ──> parse_stylesheet() ──> Vec<CssRule>
          ├── External <link> CSS ──> fetched + parse_stylesheet()
          ├── Inline style="" ──> parse_inline_style_with_vars()
          │
          └── compute_styles() ──> StyledNode tree (LayoutStyle per node)
                │
                └── Taffy layout ──> bounding boxes

Style computation walks the DOM tree, matching each element against all CSS rules by specificity. Inline styles override stylesheet rules. CSS custom properties (--var) inherit through the tree.

Selector matching

The selector engine supports these selector types:

Selector	Example	Description
Tag	`div`, `button`	Matches element tag name
Class	`.nav-item`	Matches class attribute
ID	`#header`	Matches id attribute
Universal	`*`	Matches any element
Descendant	`div p`	Matches `p` inside any `div` ancestor
Child	`div > p`	Matches `p` that is a direct child of `div`
Pseudo-class	`:hover`, `:first-child`	Parsed but ignored for layout (no interaction state)
Attribute (exists)	`[disabled]`	Element has the attribute
Attribute (exact)	`[type="submit"]`	Attribute equals value
Attribute (word)	`[class~="active"]`	Whitespace-separated word match
Attribute (prefix)	`[href^="/"]`	Attribute starts with value
Attribute (suffix)	`[src$=".png"]`	Attribute ends with value
Attribute (contains)	`[class*="btn"]`	Attribute contains substring
Attribute (hyphen-prefix)	`[lang\|="en"]`	Exact match or prefix with hyphen
Comma-separated	`h1, h2, h3`	Union of selectors

Specificity

Selectors are ordered by CSS specificity rules:

ID selectors: weight 100
Class selectors, attribute selectors, pseudo-classes: weight 10
Tag selectors, universal: weight 1

Higher specificity rules override lower specificity rules. Equal specificity resolves by source order (later wins). Inline styles always win over stylesheet rules.

Property parsing

Supported properties

The engine parses approximately 40 layout-affecting CSS properties:

Category	Properties
Box model	`display`, `box-sizing`, `width`, `height`, `min-width`, `min-height`, `max-width`, `max-height`
Spacing	`margin` (+ sides), `padding` (+ sides), `border-width` (+ sides)
Position	`position`, `top`, `right`, `bottom`, `left`
Flexbox	`flex-direction`, `flex-wrap`, `flex-grow`, `flex-shrink`, `flex-basis`, `align-items`, `align-self`, `justify-content`, `gap`
Grid	`grid-template-columns`, `grid-template-rows`, `grid-column`, `grid-row`
Typography	`font-size`, `line-height`
Visibility	`visibility`, `overflow`

Shorthand properties are expanded: margin: 10px 20px expands to margin-top, margin-right, margin-bottom, margin-left. Similarly for padding, border-width, flex, and gap.

Dimension types

#![allow(unused)]
fn main() {
pub enum Dimension {
    Px(f32),           // Absolute pixels
    Percent(f32),      // Percentage of parent
    Calc(f32, f32),    // calc() result: (px_component, percent_component)
    Auto,              // Auto sizing
}
}

The engine resolves em values against the element's computed font-size and rem values against the root font size (16px default).

var() resolution

CSS custom properties are collected during style computation and inherited through the DOM tree:

:root {
  --primary-color: #333;
  --spacing: 16px;
}

.container {
  padding: var(--spacing);
  color: var(--primary-color);
}

.card {
  margin: var(--spacing-large, 24px);  /* fallback value */
}

The var() resolver supports:

Simple references: var(--name)
Fallback values: var(--name, fallback)
Nested var() references in fallbacks

calc() expressions

The calc() parser handles full arithmetic expressions with mixed units:

.element {
  width: calc(100% - 32px);
  margin: calc(16px + 1em);
  padding: calc(2 * var(--spacing));
}

Supported operators: +, -, *, /. The parser respects operator precedence and handles parenthesized sub-expressions. Mixed px and % units are preserved as a Calc(px, percent) dimension and resolved during layout.

@media queries

The engine evaluates @media queries against the viewport dimensions provided at parse time:

@media (max-width: 768px) {
  .sidebar { display: none; }
}

@media screen and (min-width: 1024px) {
  .container { max-width: 1200px; }
}

Supported media features

Feature	Example	Description
`min-width`	`(min-width: 768px)`	Viewport width >= value
`max-width`	`(max-width: 1024px)`	Viewport width <= value
`min-height`	`(min-height: 600px)`	Viewport height >= value
`max-height`	`(max-height: 900px)`	Viewport height <= value
`width`	`(width: 1920px)`	Exact viewport width
`height`	`(height: 1080px)`	Exact viewport height
`orientation`	`(orientation: portrait)`	Portrait or landscape
`screen`	`screen`	Always matches
`print`	`print`	Never matches
`all`	`all`	Always matches

Multiple conditions joined with and are evaluated conjunctively. The screen and / all and prefix is stripped before evaluating conditions.

External stylesheets

When using the fetch feature (enabled by default), browsy automatically fetches external CSS linked via <link rel="stylesheet"> tags. Fetched CSS is parsed and merged with inline <style> blocks during style computation.

Resource limits prevent abuse:

Maximum total CSS bytes (across all external stylesheets)
Maximum bytes per individual stylesheet
Blocked URL patterns (analytics, tracking, ad-related CSS)
Private network and non-HTTP URL blocking

Layout engine

After style computation, browsy feeds the styled tree into Taffy (from the Dioxus project) for layout computation. Taffy handles:

Flexbox: All flex container and flex item properties
CSS Grid: Template columns/rows, explicit placement
Block layout: Standard block flow with margins, padding, borders

Taffy returns bounding boxes (x, y, width, height) for every element, which browsy uses to build the Spatial DOM.

What is NOT supported

The CSS engine focuses on properties that affect element position and size. The following are intentionally not implemented:

Visual properties: color, background, border-color, border-radius, box-shadow, opacity, z-index
Transforms: transform, translate, rotate, scale
Animations: animation, transition, @keyframes
Pseudo-elements: ::before, ::after, ::placeholder (no content generation)
Advanced selectors: :nth-child(), :not(), ~ (general sibling), + (adjacent sibling)
Advanced grid: grid-auto-flow, grid-auto-rows, named grid areas, minmax() in some contexts
Columns: column-count, column-width
Table layout: table-layout, border-collapse

These omissions are by design. browsy computes where elements are and how large they are, not what they look like. The Spatial DOM output contains position and size data; color and visual styling are irrelevant for agent interaction.

Architecture

browsy is a zero-render browser engine. It converts raw HTML into a flat list of interactive and text elements with bounding boxes, page type classification, and suggested actions -- without rendering pixels or executing JavaScript.

Pipeline

HTML
 │
 ├──────────────────────────────────────────────────────────────────┐
 v                                                                  │
DOM Parser (html5ever)                                              │
 │                                                                  │
 v                                                                  │
DomNode tree ──> External CSS fetch (reqwest) ──> merged CSS text   │
 │                                                                  │
 v                                                                  │
CSS Engine (browsy)                                                 │
 ├── Selector matching (tag, class, ID, attribute, combinators)     │
 ├── Property parsing (var(), calc(), shorthands)                   │
 ├── @media query evaluation                                        │
 └── Specificity + cascade ordering                                 │
 │                                                                  │
 v                                                                  │
StyledNode tree (LayoutStyle per element)                           │
 │                                                                  │
 v                                                                  │
Layout Engine (Taffy)                                               │
 ├── Flexbox                                                        │
 ├── CSS Grid                                                       │
 └── Block flow                                                     │
 │                                                                  │
 v                                                                  │
LayoutNode tree (with bounding boxes)                               │
 │                                                                  │
 v                                                                  │
Spatial DOM Generator (browsy)                                      │
 ├── Element emission (interactive + text + landmark + img)         │
 ├── CAPTCHA detection (from tree scan)                             │
 ├── Deduplication (wrapper skip)                                   │
 ├── Hidden content preservation                                    │
 ├── Text fallback chain (aria-label > title > img alt > svg title) │
 ├── Label association (<label for="id">)                           │
 ├── URL resolution (relative -> absolute)                          │
 ├── Page type classification                                       │
 └── Suggested action detection                                     │
 │                                                                  │
 v                                                                  │
SpatialDom                                                          │
 ├── els: Vec<SpatialElement>  (flat list with IDs + bounds)        │
 ├── page_type: PageType                                            │
 ├── suggested_actions: Vec<SuggestedAction>                        │
 ├── captcha: Option<CaptchaInfo>                                   │
 └── title, url, viewport, scroll                                   │

Entry point

The primary entry point is browsy_core::parse:

#![allow(unused)]
fn main() {
pub fn parse(html: &str, viewport_width: f32, viewport_height: f32) -> SpatialDom {
    let dom_tree = dom::parse_html(html);
    let styled = css::compute_styles_with_viewport(&dom_tree, viewport_width, viewport_height);
    let laid_out = layout::compute_layout(&styled, viewport_width, viewport_height);
    output::generate_spatial_dom(&laid_out, viewport_width, viewport_height)
}
}

For network-aware usage, Session::goto() fetches the HTML, resolves external CSS, and runs the full pipeline.

Project structure

crates/
  core/                 browsy-core library (the engine)
    src/
      lib.rs              Entry point: parse(html, w, h) -> SpatialDom
      dom/mod.rs           HTML -> DomNode tree (thin wrapper around html5ever)
      css/
        mod.rs              Style computation, CSS variable inheritance
        selector.rs         CSS selector matching engine
        properties.rs       CSS property parsing, var() resolution, calc()
      layout/mod.rs        Style tree -> Taffy -> bounding boxes
      output/mod.rs        SpatialDom generation, page type, actions, CAPTCHA
      js/mod.rs            Behavior detection from HTML attributes
      fetch/
        mod.rs              HTTP fetching, form extraction, resource blocking
        session.rs          Session API, search, navigation, form interaction
    tests/
      css_layout.rs        CSS + layout integration tests
      output.rs            Spatial DOM output tests
      benchmark.rs         Detection accuracy benchmark runner
      corpus/              HTML snapshots with ground truth labels

  cli/                  browsy CLI binary
    src/main.rs           fetch and parse commands

  mcp/                  browsy MCP server
    src/
      lib.rs              MCP tool definitions (14 tools)
      main.rs             stdio server entry point

  python/               Python bindings (PyO3)
    src/lib.rs            Browser, Page, Element classes
    browsy/__init__.py    Python module

What is ours vs external

browsy depends on two external crates for foundational work:

Crate	Role	What it does
html5ever (Mozilla/Servo)	HTML parsing	Converts raw HTML into a DOM tree. Handles malformed HTML, character encoding, and the full HTML5 parsing algorithm.
Taffy (Dioxus)	Layout computation	Computes bounding boxes from a style tree. Handles Flexbox, CSS Grid, and block layout.

Everything else is built from scratch in browsy:

Component	Description
CSS selector matching	Tag, class, ID, attribute selectors (7 operator types), descendant/child combinators, specificity ordering
CSS property parsing	~40 layout properties, shorthand expansion, `var()` resolution with fallbacks, `calc()` with full expression parser
CSS variables	Custom property collection, inheritance through DOM tree
@media queries	min-width, max-width, min-height, max-height, orientation, screen/print
Spatial DOM output	Element emission, deduplication, landmark markers, text fallback chains, hidden content exposure, alert detection, table extraction
Page intelligence	Page type classification (14 types), suggested action detection (12 action types), CAPTCHA detection (7 CAPTCHA types), pagination detection, verification code extraction
Session API	Cookie persistence, navigation history, form state overlay, form submission, compound actions (login, enter_code)
Web search	DuckDuckGo and Google result parsing
Behavior detection	onclick/ARIA/Bootstrap pattern inference from HTML attributes

Key design decisions

Hidden content exposure

Elements with display:none, visibility:hidden, aria-hidden="true", or the hidden attribute are NOT discarded. They appear in the Spatial DOM with hidden: true. This is intentional -- agents need to see dropdown menus, accordion panels, modal dialogs, tab content, and other JS-toggled content that is present in the HTML but not visible without JavaScript execution.

Landmark markers

HTML5 landmarks (<nav>, <header>, <footer>, <main>, <aside>, <section>, <form>) and elements with explicit landmark ARIA roles are emitted as structural markers with their role only -- no recursive text collection. Their children carry the actual content. This prevents a <nav> from emitting a giant concatenated string of all its link texts.

Text fallback chain

Interactive elements (links, buttons) that contain no text but only images or icons get their text from a fallback chain:

aria-label attribute
title attribute
Child <img alt> text
Child <svg><title> text

This ensures that icon-only buttons like a hamburger menu or close button have accessible text in the Spatial DOM.

SVG handling

SVG child elements are not emitted (they are visual, not semantic). However, <svg><title> text is extracted and stored as the SVG element's aria-label, making it available through the text fallback chain.

Deduplication

Wrapper elements that only wrap a single interactive child (like <li><a>..., <td><span>..., <p><a>...) are skipped. Only the meaningful child element is emitted. This prevents duplicate text in the output. When a wrapper has its own text that would not be captured by the child, it is emitted with only its own text.

Zero-size skip

Visible elements with zero width and height are skipped as layout artifacts. Hidden elements are always preserved regardless of size.

Element ID assignment

Element IDs are assigned sequentially (1, 2, 3, ...) during a single parse. IDs are NOT stable across page loads -- they are positional, not content-based. The delta diff system uses content keys (tag + text + href + bounds) rather than IDs to match elements across page transitions.

Testing

Integration tests

Tests live in crates/core/tests/ as integration tests:

cargo test -p browsy-core                        # all tests
cargo test -p browsy-core --test css_layout      # CSS + layout
cargo test -p browsy-core --test output          # Spatial DOM output

Detection benchmark

The crates/core/tests/corpus/ directory contains HTML snapshots of real websites with ground truth labels in manifest.json. The benchmark runner parses every snapshot and verifies:

Correct page type classification
Correct suggested action detection
Valid element IDs in all actions (referencing real elements)
Verification code extraction accuracy

cargo test -p browsy-core --test benchmark -- --nocapture

Adding a new test case:

Harvest an HTML snapshot with HARVEST_URL and HARVEST_NAME environment variables.
Add the expected labels to corpus/manifest.json.
Run the benchmark to confirm the failure.
Fix the heuristics in output/mod.rs.
Re-run the benchmark to confirm the fix with no regressions.

[1:h1 "Page Title"]
[!2:div "Hidden content"]
[3:input:email [email] [*] "Enter email" wide]
[4:button "Submit" full]
[5:a "Link" ->https://example.com @top-R]

Each element is one line: [id:tag "text"] with annotations for type, name, state, size, href, and position.

Delta format

For page transitions, the delta format shows only what changed:

-[3,5,7]
[+8:h1 "New Heading"]
[+9:a "New Link" ->https://example.com]

Removed element IDs are prefixed with -, added/changed elements with +.

browsy Documentation