Data extraction
Point an agent at a site and get clean structured data back. No scrapers, no brittle selectors — the run is the spec.
How it’s different from a scraper
Traditional scrapers break the day the site changes a class name. An agent-driven extraction describes the *intent* (“get the product price, the SKU, and the in-stock status from each tile”) and lets the model adapt to whatever the page actually looks like today.
What it uses
- `browser-automation` to drive the page.
- `agent_write_doc` to dump structured output (JSON, markdown table, CSV) into your workspace.
- `browser_snapshot` at each step so you can audit what the agent actually saw before it wrote anything.
A good shape
Tell the agent: _“Visit <url>, pull every row from the listings table into workspace/extracts/{date}-listings.json, and stop if the row count is below 10 — flag it for review instead.”_
When to fall back
If you need millions of pages, you want a real scraper. Agent-driven extraction is best at moderate scale, irregular pages, and one-shot tasks where the cost of writing brittle selectors exceeds the cost of an LLM call.