Going local-first with Ollama
Run a real agent fully offline on a laptop. What works, what doesn't, and the trade-offs you actually feel.
I ran Froots fully offline for two weeks on a MacBook Pro M3 with Ollama as the only backend. No cloud calls, no API keys, no internet for the model layer. Here’s what worked, what didn’t, and the trade-offs you actually feel day to day.
The setup
Ollama running locally, serving Llama 3.1 70B for heavy tasks and Llama 3.1 8B for routing and triage. Froots configured to point at http://localhost:11434 instead of any cloud provider. Memory graph, skills, and routines all unchanged — just a different brain on the other end of the adapter.
Total cost: $0 in API spend, ~40W of extra power draw, and about 45GB of disk for the model weights. The 70B took roughly 90 seconds to load on first call and stayed warm after that.
What worked surprisingly well
Chat felt almost identical to GPT-4o for everyday tasks. Drafting emails, summarizing notes, writing tweets, light coding. The 8B model is fast enough to feel instant on intent classification and routing. Heartbeat’s relevance-check runs (small prompts, structured output) ran beautifully on the 8B.
Memory commits — the small write-back step at the end of each turn — were also great on the 8B. They’re prompt-heavy but output-light, which is exactly Llama’s wheelhouse.
What didn’t
Long agent loops fell apart. The 70B is good for a few turns; it’s noticeably worse than Claude Opus at staying focused across 20+ tool calls. By turn 12 or so, the model started repeating itself or losing the plot. This is the gap that the frontier models really earn their price on.
Code generation was decent but not Cursor-level. The model gets the structure right but introduces small bugs that a frontier model wouldn’t.
The trade-off, in one sentence
Local-first is excellent for the 80% of tasks that are short, focused, and routine — and noticeably worse for the 20% that are long, exploratory, and complex. So the right setup is mixed: Ollama as the default brain, with a frontier model on standby for the hard runs. Froots lets you do exactly that with a per-conversation model toggle.
Two weeks later I’m back on a hybrid setup. But I keep Ollama running. The number of one-shot tasks I now route to the local 8B for free, instead of paying a frontier model to handle, is much higher than I expected.