Speech-to-Text App
Building a real-time speech-to-text application using Whisper AI, streamed live from first commit to production.
Tech Stack
This project is a real-time speech-to-text app built on OpenAI’s Whisper model. The goal is simple: take audio input from a phone or laptop mic and get accurate transcription back in near real-time. Every line of code is written live on stream so you can see the real process — the debugging, the dead ends, and the breakthroughs.
The backend runs on FastAPI with Whisper handling the heavy lifting for transcription. Right now the focus is on getting latency down to something usable for live conversations, not just batch processing recorded audio. That means chunked audio streaming, smart buffering, and figuring out where Whisper’s accuracy starts to drop off.
If you want to see how an AI app actually gets built from scratch — not a polished tutorial, but the real messy process — this is it. Catch the streams live or check back here for progress updates as the project moves forward.