Speech to Text Transcriber

Upload an audio file and receive a transcript using Workers AI Whisper (@cf/openai/whisper-large-v3-turbo). Validates file size and content type before calling the model.

Features

POST /transcribe - Multipart upload with field audio
Whisper model - @cf/openai/whisper-large-v3-turbo
Validation - Max 2 MB, audio/* types only
Timing - Returns durationMs for the transcription call

API Reference

POST /transcribe

Transcribe an uploaded audio file.

audio file (required, multipart)

Audio file (max 2 MB). Accepted types include audio/mpeg, audio/wav, audio/webm, audio/ogg, and other audio/* MIME types.

Example Request

curl -X POST "https://your-worker.workers.dev/transcribe" \
  -F "audio=@sample.mp3"

Success Response

{
  "text": "Hello, this is a test recording.",
  "language": "en",
  "durationMs": 842
}

Error Codes

400 - Missing or invalid audio (INVALID_AUDIO)
502 - Transcription failed or empty result (TRANSCRIBE_ERROR)

Use Cases

Add speech-to-text to edge apps without external API keys
Prototype voice note or meeting transcription workflows
Learn Workers AI audio model integration patterns

Limitations

Max upload size 2 MB per request (Whisper model limits apply)
No chunking for long recordings; split large files client-side
Requires Workers AI enabled on your Cloudflare account

Deployment

Click the deploy button

Deploy

Enable Workers AI. The AI binding is declared in wrangler.json.

Test your deployment

curl -X POST "https://your-worker.workers.dev/transcribe" -F "audio=@sample.mp3"

Local Development

cd apps/experiments/speech-to-text-transcriber
npm install
npm run dev

Configuration

Binding	Purpose
`AI`	Workers AI Whisper model

Cloudflare Features Used

Workers - Edge compute runtime
Workers AI - Whisper speech-to-text

Speech to Text Transcriber

On this page