Use case · Voice

Voice & audio agents

Transcribe meetings, summarize calls, generate voice notes — turn audio into text the rest of the system can act on.

0:24 / 1:12

Preview

The pipeline

Audio in Froots flows through two skills: openai-whisper for transcription (local model) or openai-whisper-api (cloud), then summarize to compress the transcript into something useful. Output is plain markdown, indexed like anything else in your workspace.

What you can wire up

Meeting capture — point the agent at an audio file or a Zoom recording, get a transcript + summary + action items in workspace/meetings/{date}.md.
Voice notes — record a thought on your phone, drop it in a watched folder, the agent transcribes and files it.
Podcast research — transcribe an episode, surface the bits relevant to a topic you’re researching.

Local vs cloud

Path	Pros	Cons
`openai-whisper` (local)	Free, private, no internet	Slower, model download
`openai-whisper-api`	Fast, accurate	Costs cents per file, leaves your machine

Both ship. Pick per-task. Sensitive meeting? Local. Quick lunchtime podcast? API.

Output is markdown — so the rest works

Once the transcript is in the workspace, every other Froots primitive applies. The KB indexer picks it up. Routines can summarize it weekly. Memory recall surfaces it next time you ask about the topic.

Voice & audio agents

The pipeline

What you can wire up

Local vs cloud

Output is markdown — so the rest works

Coding agents