All use cases
Use case · Voice
Voice & audio agents
Transcribe meetings, summarize calls, generate voice notes — turn audio into text the rest of the system can act on.
Voice & audio agents
0:24 / 1:12
Preview
The pipeline
Audio in Froots flows through two skills: openai-whisper for transcription (local model) or openai-whisper-api (cloud), then summarize to compress the transcript into something useful. Output is plain markdown, indexed like anything else in your workspace.
What you can wire up
- Meeting capture — point the agent at an audio file or a Zoom recording, get a transcript + summary + action items in
workspace/meetings/{date}.md. - Voice notes — record a thought on your phone, drop it in a watched folder, the agent transcribes and files it.
- Podcast research — transcribe an episode, surface the bits relevant to a topic you’re researching.
Local vs cloud
| Path | Pros | Cons |
|---|---|---|
openai-whisper (local) | Free, private, no internet | Slower, model download |
openai-whisper-api | Fast, accurate | Costs cents per file, leaves your machine |
Both ship. Pick per-task. Sensitive meeting? Local. Quick lunchtime podcast? API.
Output is markdown — so the rest works
Once the transcript is in the workspace, every other Froots primitive applies. The KB indexer picks it up. Routines can summarize it weekly. Memory recall surfaces it next time you ask about the topic.