All use cases
Use case · Data

Data extraction

Point an agent at a site and get clean structured data back. No scrapers, no brittle selectors — the run is the spec.

Data extraction
0:24 / 1:12
Preview

How it’s different from a scraper

Traditional scrapers break the day the site changes a class name. An agent-driven extraction describes the *intent* (“get the product price, the SKU, and the in-stock status from each tile”) and lets the model adapt to whatever the page actually looks like today.

What it uses

  • `browser-automation` to drive the page.
  • `agent_write_doc` to dump structured output (JSON, markdown table, CSV) into your workspace.
  • `browser_snapshot` at each step so you can audit what the agent actually saw before it wrote anything.

A good shape

Tell the agent: _“Visit <url>, pull every row from the listings table into workspace/extracts/{date}-listings.json, and stop if the row count is below 10 — flag it for review instead.”_

The output lands in memory. Once the JSON is in the workspace, the indexer chunks and embeds it. Future agents can ask “what did we see last Tuesday?” without re-running the extraction.

When to fall back

If you need millions of pages, you want a real scraper. Agent-driven extraction is best at moderate scale, irregular pages, and one-shot tasks where the cost of writing brittle selectors exceeds the cost of an LLM call.