Cloud AI Proxy
Call Workers AI with any model and prompt from a single public endpoint
This is an experimental Worker. Use it as a starting point for your own projects.
The Cloud AI Proxy experiment exposes Workers AI through a single public endpoint. You pass a model ID and prompt (or chat messages) and receive AI-generated text. Use it to try different models or integrate Workers AI from external tools without managing Wrangler or bindings yourself.
The endpoint is public by design. For production use, add rate limiting or API key authentication.
Features
- Call any Workers AI text-generation model via one endpoint
- Support for single prompt or chat-style messages
- Optional
max_tokenscontrol - POST (JSON) and GET (query params) interfaces
- No external API keys; uses your Cloudflare account’s Workers AI
API Reference
POST /chat
Send a JSON body with model and prompt or messages.
| Field | Required | Description |
|---|---|---|
model | Yes | Workers AI model ID (e.g. @cf/meta/llama-3.1-8b-instruct-fast) |
prompt | No* | Single prompt string |
messages | No* | Array of { role: string, content: string } for chat-style input |
max_tokens | No | Max tokens to generate (1–4096) |
*At least one of prompt or messages is required.
response string
The AI-generated text.
Example Request
curl -X POST https://your-worker.workers.dev/chat \
-H "Content-Type: application/json" \
-d '{"model":"@cf/meta/llama-3.1-8b-instruct-fast","prompt":"Say hello in one sentence."}'Example Response
{
"response": "Hello! How can I help you today?"
}GET /chat
Query params: model (required), prompt (required), optional max_tokens.
Prop
Type
Example Request
curl "https://your-worker.workers.dev/chat?model=@cf/meta/llama-3.1-8b-instruct-fast&prompt=Say%20hello"Error Responses
- 400 Bad Request -
INVALID_BODY,MISSING_MODEL,MISSING_PROMPT, orINVALID_MAX_TOKENS - 502 Bad Gateway -
AI_ERROR(model run failed)
Use Cases
- Try Workers AI models from curl, Postman, or external tools without managing bindings
- Prototype chat APIs before adding authentication and rate limiting
- Compare model outputs by swapping the
modelparameter - Integrate edge AI into apps via a simple HTTP proxy
Limitations
- Public endpoint with no authentication or rate limiting by default
- Workers AI is subject to usage limits by plan
max_tokensis capped at 4096- Text generation only; no streaming responses or tool calling
Deployment
Deploy
Enable the Workers AI binding (AI) in your Worker settings. The deploy button configures this automatically via wrangler.json. Requires a Cloudflare account with Workers AI enabled.
Test your deployment
curl -X POST "https://your-worker.workers.dev/chat" \
-H "Content-Type: application/json" \
-d '{"model":"@cf/meta/llama-3.1-8b-instruct-fast","prompt":"Say hello in one sentence."}'Local Development
cd apps/experiments/cloud-ai-proxy
npm install
npm run devCall the endpoint at http://localhost:8787/chat with POST (JSON body) or GET (query params). Requires a Cloudflare account with Workers AI enabled.
Cloudflare Features Used
- Workers - Edge runtime
- Workers AI -
AIbinding for text generation