Use case · Data

Data extraction

Point an agent at a site and get clean structured data back. No scrapers, no brittle selectors — the run is the spec.

Data extraction

0:24 / 1:12

Preview

How it’s different from a scraper

Traditional scrapers break the day the site changes a class name. An agent-driven extraction describes the *intent* (“get the product price, the SKU, and the in-stock status from each tile”) and lets the model adapt to whatever the page actually looks like today.

What it uses

`browser-automation` to drive the page.
`agent_write_doc` to dump structured output (JSON, markdown table, CSV) into your workspace.
`browser_snapshot` at each step so you can audit what the agent actually saw before it wrote anything.

A good shape

Tell the agent: _“Visit <url>, pull every row from the listings table into workspace/extracts/{date}-listings.json, and stop if the row count is below 10 — flag it for review instead.”_

The output lands in memory. Once the JSON is in the workspace, the indexer chunks and embeds it. Future agents can ask “what did we see last Tuesday?” without re-running the extraction.

When to fall back

If you need millions of pages, you want a real scraper. Agent-driven extraction is best at moderate scale, irregular pages, and one-shot tasks where the cost of writing brittle selectors exceeds the cost of an LLM call.

Data extraction

How it’s different from a scraper

What it uses

A good shape

When to fall back

Customer support