Page Types Reference

browsy classifies every page into a PageType to help agents decide what to do next. The classification is based on structural heuristics applied to the Spatial DOM -- no machine learning, no external services.

Page types are evaluated in priority order. The first match wins.

PageType enum

#![allow(unused)]
fn main() {
pub enum PageType {
    Error,
    Captcha,
    Login,
    TwoFactorAuth,
    OAuthConsent,
    Inbox,
    EmailBody,
    Dashboard,
    Article,
    SearchResults,
    List,
    Search,
    Form,
    Other,          // default
}
}

Detection criteria

Page TypeDetection Criteria
ErrorTitle contains HTTP error codes (404, 500, 403, not found, error) OR page has elements with alert_type == "error".
CaptchaTitle contains CAPTCHA keywords (captcha, verify you're human, robot, security check, just a moment, attention required) OR heading contains CAPTCHA phrases OR a CAPTCHA service (reCAPTCHA, hCaptcha, Turnstile, Cloudflare challenge) is detected in the HTML structure.
LoginPage has a visible <input type="password">.
TwoFactorAuthTitle or heading contains verification keywords (verification, enter code, security code, 2fa, two-factor, otp, one-time, passcode) AND page has a visible text/number/tel input. No password field present (that would be Login).
OAuthConsentTitle or heading contains OAuth keywords (authorize, allow access, grant permission, oauth, consent).
InboxTitle contains inbox keywords (inbox, mail, messages) AND page has 10+ visible links.
EmailBodyPage text contains 3+ of the email markers: from:, to:, subject:, date:.
DashboardTitle or heading contains dashboard keywords (dashboard, welcome back, overview) AND page has both a <nav> and <main> landmark.
ArticlePage has 3+ headings AND enough long paragraphs (>100 chars). When the page has 20+ links, the threshold is 10 long paragraphs (vs 2 for low-link pages). Pages with 15+ headings must have a paragraph-to-heading ratio of at least 0.8 to distinguish articles (Wikipedia) from heading-heavy list pages (BBC News).
SearchResultsPage has a search input (visible or hidden) AND 8+ links AND search context: title/heading contains search-result keywords (search results, results for, search) OR URL contains search query parameters (?q=, ?query=, ?s=, ?search=, /search).
ListPage has 10+ visible links. Evaluated after Article and SearchResults.
SearchPage has a visible search input. Evaluated after List (many list pages have search bars in navigation). Also fires as a fallback when a page has fewer than 5 visible elements but has a hidden search input (common in JS-rendered search engines without JS execution).
FormPage has 2+ visible data-entry inputs (excludes checkbox, radio, hidden, submit, button, and image inputs).
OtherDefault when no heuristic matches.

Evaluation order

The order matters. For example:

  • A login page with a search bar in the nav is classified as Login (password field check comes first), not Search.
  • A search results page with many links is SearchResults, not List, because SearchResults is checked before List.
  • An article with a search bar is Article, not Search, because Article is checked first.
  • An error page with a login form is Error, because error checks come before Login.

Accessing page type

Rust

#![allow(unused)]
fn main() {
use browsy_core::output::PageType;

let dom = browsy_core::parse(html, 1920.0, 1080.0);
match dom.page_type {
    PageType::Login => println!("This is a login page"),
    PageType::Article => println!("This is an article"),
    _ => println!("Page type: {:?}", dom.page_type),
}
}

Python

page = browser.goto("https://example.com")
print(page.page_type())  # "Login", "Article", "Other", etc.

MCP

The page_info tool returns page_type as a string. The browse tool includes it in the JSON output format.

JSON serialization

PageType is serialized as a string. The field is omitted from JSON when the value is Other (via skip_serializing_if).

{
  "page_type": "Login",
  "title": "Sign In",
  "url": "https://example.com/login"
}