Cloud AI Proxy

This is an experimental Worker. Use it as a starting point for your own projects.

The Cloud AI Proxy experiment exposes Workers AI through a single public endpoint. You pass a model ID and prompt (or chat messages) and receive AI-generated text. Use it to try different models or integrate Workers AI from external tools without managing Wrangler or bindings yourself.

The endpoint is public by design. For production use, add rate limiting or API key authentication.

Features

Call any Workers AI text-generation model via one endpoint
Support for single prompt or chat-style messages
Optional max_tokens control
POST (JSON) and GET (query params) interfaces
No external API keys; uses your Cloudflare account’s Workers AI

API Reference

POST /chat

Send a JSON body with model and prompt or messages.

Field	Required	Description
`model`	Yes	Workers AI model ID (e.g. `@cf/meta/llama-3.1-8b-instruct-fast`)
`prompt`	No*	Single prompt string
`messages`	No*	Array of `{ role: string, content: string }` for chat-style input
`max_tokens`	No	Max tokens to generate (1–4096)

*At least one of prompt or messages is required.

response string

The AI-generated text.

Example Request

curl -X POST https://your-worker.workers.dev/chat \
  -H "Content-Type: application/json" \
  -d '{"model":"@cf/meta/llama-3.1-8b-instruct-fast","prompt":"Say hello in one sentence."}'

Example Response

{
  "response": "Hello! How can I help you today?"
}

GET /chat

Query params: model (required), prompt (required), optional max_tokens.

Prop

Type

Example Request

curl "https://your-worker.workers.dev/chat?model=@cf/meta/llama-3.1-8b-instruct-fast&prompt=Say%20hello"

Error Responses

400 Bad Request - INVALID_BODY, MISSING_MODEL, MISSING_PROMPT, or INVALID_MAX_TOKENS
502 Bad Gateway - AI_ERROR (model run failed)

Use Cases

Try Workers AI models from curl, Postman, or external tools without managing bindings
Prototype chat APIs before adding authentication and rate limiting
Compare model outputs by swapping the model parameter
Integrate edge AI into apps via a simple HTTP proxy

Limitations

Public endpoint with no authentication or rate limiting by default
Workers AI is subject to usage limits by plan
max_tokens is capped at 4096
Text generation only; no streaming responses or tool calling

Deployment

Click the deploy button

Deploy

Enable the Workers AI binding (AI) in your Worker settings. The deploy button configures this automatically via wrangler.json. Requires a Cloudflare account with Workers AI enabled.

Test your deployment

curl -X POST "https://your-worker.workers.dev/chat" \
  -H "Content-Type: application/json" \
  -d '{"model":"@cf/meta/llama-3.1-8b-instruct-fast","prompt":"Say hello in one sentence."}'

Local Development

cd apps/experiments/cloud-ai-proxy
npm install
npm run dev

Call the endpoint at http://localhost:8787/chat with POST (JSON body) or GET (query params). Requires a Cloudflare account with Workers AI enabled.

Cloudflare Features Used

Workers - Edge runtime
Workers AI - AI binding for text generation

Cloud AI Proxy

Features

API Reference

POST /chat

Example Request

Example Response

GET /chat

Example Request

Error Responses

Use Cases

Limitations

Deployment

Click the deploy button

Deploy

Test your deployment

Local Development

Cloudflare Features Used

Next Steps

Workers AI models

GitHub repository

On this page