Website to llms.txt

Convert any webpage into the llms.txt format, a structured markdown format optimized for Large Language Model consumption. Extracts title, description, key links, and contact information in a standardized format that LLMs can easily parse.

API Reference

GET /llms.txt

Convert a webpage to llms.txt format by providing its URL.

Prop

Type

Example Request

curl "https://your-worker.workers.dev/llms.txt?url=https://www.cloudflare.com"

Response Format

The endpoint returns plain text in llms.txt format with Content-Type: text/plain; charset=utf-8.

Example Response

# Cloudflare - The Web Performance & Security Company

> Here at Cloudflare, we make the Internet work the way it should. Offering CDN, DNS, DDoS protection and security, find out how we can help your site.

## Key Information

- [Products](https://www.cloudflare.com/products/)
- [Solutions](https://www.cloudflare.com/solutions/)
- [Pricing](https://www.cloudflare.com/plans/)
- [Developers](https://developers.cloudflare.com/)
- [Learning Center](https://www.cloudflare.com/learning/)
- [Community](https://community.cloudflare.com/)
- [Support](https://www.cloudflare.com/support/)

## Contact

- [Contact Sales](mailto:sales@cloudflare.com)
- [Support Team](mailto:support@cloudflare.com)

Format Specification

The llms.txt format follows this structure:

Title (H1): The page title or site name
Description (Blockquote): Meta description or og:description
Key Information (H2): Up to 100 important links from the page with anchor text
Contact (H2): Contact information (mailto links or fallback to website URL)

Error Responses

Invalid URL

{
  "success": false,
  "error": "Missing or invalid query parameter: url",
  "code": "INVALID_URL"
}

Fetch Error

{
  "success": false,
  "error": "HTTP 404",
  "code": "FETCH_ERROR"
}

Technical Details

Built with Hono framework
Runs on Cloudflare Workers
Implements llms.txt specification v1.1.1
Extracts up to 100 key links from the page
Prioritizes metadata over HTML content for descriptions
Returns plain text with UTF-8 encoding

Extraction Logic

Title

Extracts from <title> tag
Falls back to hostname if no title found

Description

Checks <meta name="description"> tag
Falls back to <meta property="og:description"> tag
Falls back to generic description with the URL

Key Links

Extracts links from <a> tags throughout the page
Includes anchor text with each link
Limits to 100 links maximum
Excludes anchor links (#), javascript:, and mailto: links
Deduplicates by URL
Resolves relative URLs to absolute URLs
Truncates long anchor text to 200 characters

Contact Information

Extracts mailto: links if available
Falls back to website URL if no contact links found

Use Cases

LLM Context: Provide structured website information to language models
AI Assistants: Enable AI to understand website structure and navigation
Documentation Parsing: Convert documentation sites into LLM-friendly format
Content Summarization: Extract key information for AI-powered summaries
Chatbot Training: Generate training data from website content
RAG Systems: Prepare website data for retrieval-augmented generation

Limitations

Static HTML extraction; client-rendered content is not executed
Produces a simplified llms.txt for one URL, not a full site crawl
Contact section only finds mailto: links present in the HTML

Deployment

Click the deploy button

Deploy

No additional configuration required.

Test your deployment

curl "https://your-worker.workers.dev/llms.txt?url=https://example.com"

Local Development

cd apps/experiments/website-to-llms-txt
npm install
npm run dev

Test locally:

curl "http://localhost:8787/llms.txt?url=https://example.com"

Cloudflare Features Used

Workers - Edge compute runtime
Fetch API - Remote HTML retrieval for llms.txt conversion

Website to llms.txt

On this page