# Website to llms.txt (/docs/experiments/website-to-llms-txt)



Convert any webpage into the llms.txt format, a structured markdown format optimized for Large Language Model consumption. Extracts title, description, key links, and contact information in a standardized format that LLMs can easily parse.

## API Reference [#api-reference]

### GET /llms.txt [#get-llmstxt]

Convert a webpage to llms.txt format by providing its URL.

<TypeTable
  type="{
  url: {
    description: &#x22;The URL of the webpage to convert. Must be a valid HTTP or HTTPS URL.&#x22;,
    type: &#x22;string&#x22;,
    required: true,
  },
}"
/>

#### Example Request [#example-request]

```bash
curl "https://your-worker.workers.dev/llms.txt?url=https://www.cloudflare.com"
```

#### Response Format [#response-format]

The endpoint returns plain text in llms.txt format with `Content-Type: text/plain; charset=utf-8`.

#### Example Response [#example-response]

```markdown
# Cloudflare - The Web Performance & Security Company

> Here at Cloudflare, we make the Internet work the way it should. Offering CDN, DNS, DDoS protection and security, find out how we can help your site.

## Key Information

- [Products](https://www.cloudflare.com/products/)
- [Solutions](https://www.cloudflare.com/solutions/)
- [Pricing](https://www.cloudflare.com/plans/)
- [Developers](https://developers.cloudflare.com/)
- [Learning Center](https://www.cloudflare.com/learning/)
- [Community](https://community.cloudflare.com/)
- [Support](https://www.cloudflare.com/support/)

## Contact

- [Contact Sales](mailto:sales@cloudflare.com)
- [Support Team](mailto:support@cloudflare.com)
```

## Format Specification [#format-specification]

The llms.txt format follows this structure:

1. **Title (H1)**: The page title or site name
2. **Description (Blockquote)**: Meta description or og:description
3. **Key Information (H2)**: Up to 100 important links from the page with anchor text
4. **Contact (H2)**: Contact information (mailto links or fallback to website URL)

## Error Responses [#error-responses]

### Invalid URL [#invalid-url]

```json
{
  "success": false,
  "error": "Missing or invalid query parameter: url",
  "code": "INVALID_URL"
}
```

### Fetch Error [#fetch-error]

```json
{
  "success": false,
  "error": "HTTP 404",
  "code": "FETCH_ERROR"
}
```

## Technical Details [#technical-details]

* Built with [Hono](https://hono.dev/) framework
* Runs on Cloudflare Workers
* Implements llms.txt specification v1.1.1
* Extracts up to 100 key links from the page
* Prioritizes metadata over HTML content for descriptions
* Returns plain text with UTF-8 encoding

## Extraction Logic [#extraction-logic]

### Title [#title]

1. Extracts from `<title>` tag
2. Falls back to hostname if no title found

### Description [#description]

1. Checks `<meta name="description">` tag
2. Falls back to `<meta property="og:description">` tag
3. Falls back to generic description with the URL

### Key Links [#key-links]

* Extracts links from `<a>` tags throughout the page
* Includes anchor text with each link
* Limits to 100 links maximum
* Excludes anchor links (#), javascript:, and mailto: links
* Deduplicates by URL
* Resolves relative URLs to absolute URLs
* Truncates long anchor text to 200 characters

### Contact Information [#contact-information]

* Extracts mailto: links if available
* Falls back to website URL if no contact links found

## Use Cases [#use-cases]

* **LLM Context**: Provide structured website information to language models
* **AI Assistants**: Enable AI to understand website structure and navigation
* **Documentation Parsing**: Convert documentation sites into LLM-friendly format
* **Content Summarization**: Extract key information for AI-powered summaries
* **Chatbot Training**: Generate training data from website content
* **RAG Systems**: Prepare website data for retrieval-augmented generation

## Limitations [#limitations]

* Static HTML extraction; client-rendered content is not executed
* Produces a simplified llms.txt for one URL, not a full site crawl
* Contact section only finds `mailto:` links present in the HTML

## Deployment [#deployment]

<Steps>
  <Step>
    ### Click the deploy button [#click-the-deploy-button]

    [![Deploy to Cloudflare Workers](https://deploy.workers.cloudflare.com/button)](https://deploy.workers.cloudflare.com/?url=https://github.com/shrinathsnayak/cloudflare-experiments/tree/main/apps/experiments/website-to-llms-txt)
  </Step>

  <Step>
    ### Deploy [#deploy]

    No additional configuration required.
  </Step>

  <Step>
    ### Test your deployment [#test-your-deployment]

    ```bash
    curl "https://your-worker.workers.dev/llms.txt?url=https://example.com"
    ```
  </Step>
</Steps>

## Local Development [#local-development]

```bash
cd apps/experiments/website-to-llms-txt
npm install
npm run dev
```

Test locally:

```bash
curl "http://localhost:8787/llms.txt?url=https://example.com"
```

## Cloudflare Features Used [#cloudflare-features-used]

* **[Workers](https://developers.cloudflare.com/workers/)** - Edge compute runtime
* **[Fetch API](https://developers.cloudflare.com/workers/runtime-apis/fetch/)** - Remote HTML retrieval for llms.txt conversion
