Speech to Text Transcriber
Transcribe uploaded audio with Workers AI Whisper at the edge
Upload an audio file and receive a transcript using Workers AI Whisper (@cf/openai/whisper-large-v3-turbo). Validates file size and content type before calling the model.
Features
- POST /transcribe - Multipart upload with field
audio - Whisper model -
@cf/openai/whisper-large-v3-turbo - Validation - Max 2 MB,
audio/*types only - Timing - Returns
durationMsfor the transcription call
API Reference
POST /transcribe
Transcribe an uploaded audio file.
audio file (required, multipart)
Audio file (max 2 MB). Accepted types include audio/mpeg, audio/wav, audio/webm, audio/ogg, and other audio/* MIME types.
Example Request
curl -X POST "https://your-worker.workers.dev/transcribe" \
-F "audio=@sample.mp3"Success Response
{
"text": "Hello, this is a test recording.",
"language": "en",
"durationMs": 842
}Error Codes
400- Missing or invalid audio (INVALID_AUDIO)502- Transcription failed or empty result (TRANSCRIBE_ERROR)
Use Cases
- Add speech-to-text to edge apps without external API keys
- Prototype voice note or meeting transcription workflows
- Learn Workers AI audio model integration patterns
Limitations
- Max upload size 2 MB per request (Whisper model limits apply)
- No chunking for long recordings; split large files client-side
- Requires Workers AI enabled on your Cloudflare account
Deployment
Deploy
Enable Workers AI. The AI binding is declared in wrangler.json.
Test your deployment
curl -X POST "https://your-worker.workers.dev/transcribe" -F "audio=@sample.mp3"Local Development
cd apps/experiments/speech-to-text-transcriber
npm install
npm run devConfiguration
| Binding | Purpose |
|---|---|
AI | Workers AI Whisper model |
Cloudflare Features Used
- Workers - Edge compute runtime
- Workers AI - Whisper speech-to-text