This site is not affiliated with or endorsed by Cloudflare, Inc. It simply showcases experiments built using Cloudflare services.
Cloudflare Experiments

Cloud AI Proxy

Call Workers AI with any model and prompt from a single public endpoint

This is an experimental Worker. Use it as a starting point for your own projects.

The Cloud AI Proxy experiment exposes Workers AI through a single public endpoint. You pass a model ID and prompt (or chat messages) and receive AI-generated text. Use it to try different models or integrate Workers AI from external tools without managing Wrangler or bindings yourself.

The endpoint is public by design. For production use, add rate limiting or API key authentication.

Features

  • Call any Workers AI text-generation model via one endpoint
  • Support for single prompt or chat-style messages
  • Optional max_tokens control
  • POST (JSON) and GET (query params) interfaces
  • No external API keys; uses your Cloudflare account’s Workers AI

API Reference

POST /chat

Send a JSON body with model and prompt or messages.

FieldRequiredDescription
modelYesWorkers AI model ID (e.g. @cf/meta/llama-3.1-8b-instruct-fast)
promptNo*Single prompt string
messagesNo*Array of { role: string, content: string } for chat-style input
max_tokensNoMax tokens to generate (1–4096)

*At least one of prompt or messages is required.

response string

The AI-generated text.

Example Request

curl -X POST https://your-worker.workers.dev/chat \
  -H "Content-Type: application/json" \
  -d '{"model":"@cf/meta/llama-3.1-8b-instruct-fast","prompt":"Say hello in one sentence."}'

Example Response

{
  "response": "Hello! How can I help you today?"
}

GET /chat

Query params: model (required), prompt (required), optional max_tokens.

Prop

Type

Example Request

curl "https://your-worker.workers.dev/chat?model=@cf/meta/llama-3.1-8b-instruct-fast&prompt=Say%20hello"

Error Responses

  • 400 Bad Request - INVALID_BODY, MISSING_MODEL, MISSING_PROMPT, or INVALID_MAX_TOKENS
  • 502 Bad Gateway - AI_ERROR (model run failed)

Use Cases

  • Try Workers AI models from curl, Postman, or external tools without managing bindings
  • Prototype chat APIs before adding authentication and rate limiting
  • Compare model outputs by swapping the model parameter
  • Integrate edge AI into apps via a simple HTTP proxy

Limitations

  • Public endpoint with no authentication or rate limiting by default
  • Workers AI is subject to usage limits by plan
  • max_tokens is capped at 4096
  • Text generation only; no streaming responses or tool calling

Deployment

Deploy

Enable the Workers AI binding (AI) in your Worker settings. The deploy button configures this automatically via wrangler.json. Requires a Cloudflare account with Workers AI enabled.

Test your deployment

curl -X POST "https://your-worker.workers.dev/chat" \
  -H "Content-Type: application/json" \
  -d '{"model":"@cf/meta/llama-3.1-8b-instruct-fast","prompt":"Say hello in one sentence."}'

Local Development

cd apps/experiments/cloud-ai-proxy
npm install
npm run dev

Call the endpoint at http://localhost:8787/chat with POST (JSON body) or GET (query params). Requires a Cloudflare account with Workers AI enabled.

Cloudflare Features Used

Next Steps

On this page