Website to llms.txt
Convert any webpage into llms.txt format for LLM consumption
Convert any webpage into the llms.txt format, a structured markdown format optimized for Large Language Model consumption. Extracts title, description, key links, and contact information in a standardized format that LLMs can easily parse.
API Reference
GET /llms.txt
Convert a webpage to llms.txt format by providing its URL.
Prop
Type
Example Request
curl "https://your-worker.workers.dev/llms.txt?url=https://www.cloudflare.com"Response Format
The endpoint returns plain text in llms.txt format with Content-Type: text/plain; charset=utf-8.
Example Response
# Cloudflare - The Web Performance & Security Company
> Here at Cloudflare, we make the Internet work the way it should. Offering CDN, DNS, DDoS protection and security, find out how we can help your site.
## Key Information
- [Products](https://www.cloudflare.com/products/)
- [Solutions](https://www.cloudflare.com/solutions/)
- [Pricing](https://www.cloudflare.com/plans/)
- [Developers](https://developers.cloudflare.com/)
- [Learning Center](https://www.cloudflare.com/learning/)
- [Community](https://community.cloudflare.com/)
- [Support](https://www.cloudflare.com/support/)
## Contact
- [Contact Sales](mailto:sales@cloudflare.com)
- [Support Team](mailto:support@cloudflare.com)Format Specification
The llms.txt format follows this structure:
- Title (H1): The page title or site name
- Description (Blockquote): Meta description or og:description
- Key Information (H2): Up to 100 important links from the page with anchor text
- Contact (H2): Contact information (mailto links or fallback to website URL)
Error Responses
Invalid URL
{
"success": false,
"error": "Missing or invalid query parameter: url",
"code": "INVALID_URL"
}Fetch Error
{
"success": false,
"error": "HTTP 404",
"code": "FETCH_ERROR"
}Technical Details
- Built with Hono framework
- Runs on Cloudflare Workers
- Implements llms.txt specification v1.1.1
- Extracts up to 100 key links from the page
- Prioritizes metadata over HTML content for descriptions
- Returns plain text with UTF-8 encoding
Extraction Logic
Title
- Extracts from
<title>tag - Falls back to hostname if no title found
Description
- Checks
<meta name="description">tag - Falls back to
<meta property="og:description">tag - Falls back to generic description with the URL
Key Links
- Extracts links from
<a>tags throughout the page - Includes anchor text with each link
- Limits to 100 links maximum
- Excludes anchor links (#), javascript:, and mailto: links
- Deduplicates by URL
- Resolves relative URLs to absolute URLs
- Truncates long anchor text to 200 characters
Contact Information
- Extracts mailto: links if available
- Falls back to website URL if no contact links found
Use Cases
- LLM Context: Provide structured website information to language models
- AI Assistants: Enable AI to understand website structure and navigation
- Documentation Parsing: Convert documentation sites into LLM-friendly format
- Content Summarization: Extract key information for AI-powered summaries
- Chatbot Training: Generate training data from website content
- RAG Systems: Prepare website data for retrieval-augmented generation
Limitations
- Static HTML extraction; client-rendered content is not executed
- Produces a simplified llms.txt for one URL, not a full site crawl
- Contact section only finds
mailto:links present in the HTML
Deployment
Deploy
No additional configuration required.
Test your deployment
curl "https://your-worker.workers.dev/llms.txt?url=https://example.com"Local Development
cd apps/experiments/website-to-llms-txt
npm install
npm run devTest locally:
curl "http://localhost:8787/llms.txt?url=https://example.com"