Ashish Nayak

AI Audio Transcriber

Voice transcription app built on Cloudflare's serverless platform, processing audio with Whisper and Llama AI.

Project Overview

A real-time voice transcription application that leverages Cloudflare's edge AI to convert speech into text with automatic summarization. Users can record audio directly in their browser and receive instant transcriptions, along with AI-generated summaries of the content.

Record & Upload

Users can record audio directly in the browser or upload existing files, with all recordings managed through Cloudflare’s storage for easy playback and reuse.

Transcribe & Summarize

The app transcribes audio using OpenAI’s Whisper model and generates concise summaries with Llama 3.1, all processed through Cloudflare’s AI workflow.

View the live demo here.

Technical Implementation

Built on Cloudflare’s serverless Workers platform, this web application performs real-time audio transcription and summarization entirely at the edge. The frontend uses the Web Audio API’s MediaRecorder to capture microphone input or process uploaded audio files.

Once recorded, audio data is sent to a Cloudflare Worker, which orchestrates two AI models in sequence:

OpenAI Whisper for high-accuracy speech-to-text transcription
Meta Llama 3.1 for natural-language summarization of the transcript

Demo Features

Browser-based recording with a 10-second auto-stop timer
Upload and select stored recordings via Cloudflare Durable Objects
Edge-processed transcription and summarization with JSON responses
Save and reload past recordings from persistent storage

Leveraging Cloudflare’s serverless and edge computing products, the system delivers low latency, scalability, and cost-efficient performance, making it well-suited for real-time transcription and summarization at scale.

Stack: Cloudflare Workers, Cloudflare AI (Whisper + Llama 3.1), Cloudflare Durable Objects, TypeScript, Web Audio API, HTML/CSS/JS
View the Live Demo or check out the project repo here!

Completed October 2025.