# Website to API (/docs/experiments/website-to-api)



Convert any webpage into clean, structured JSON data. Extract the title, all headings with their hierarchy, internal and external links, and images. Perfect for web scraping, content analysis, or building custom search indices.

## API Reference [#api-reference]

### GET /api [#get-api]

Transform a webpage into structured JSON by providing its URL.

<TypeTable
  type="{
  url: {
    description: &#x22;The URL of the webpage to convert. Must be a valid HTTP or HTTPS URL.&#x22;,
    type: &#x22;string&#x22;,
    required: true,
  },
}"
/>

#### Example Request [#example-request]

```bash
curl "https://your-worker.workers.dev/api?url=https://www.cloudflare.com"
```

#### Response Structure [#response-structure]

**`success`** `boolean`

Indicates if the request was successful

**`data`** `object`

The structured data extracted from the webpage

**`data.title`** `string | null`

The page title extracted from the `<title>` tag

**`data.headings`** `array`

Array of all headings (h1-h6) found on the page

**`data.headings[].level`** `number`

The heading level (1-6 corresponding to h1-h6)

**`data.headings[].text`** `string`

The text content of the heading (HTML tags stripped)

**`data.links`** `string[]`

Array of unique absolute URLs extracted from `<a>` tags. Excludes anchor links (#) and javascript: links. All relative URLs are resolved to absolute URLs.

**`data.images`** `string[]`

Array of unique absolute image URLs extracted from `<img>` tags. All relative URLs are resolved to absolute URLs.

#### Example Response [#example-response]

```json
{
  "success": true,
  "data": {
    "title": "Cloudflare - The Web Performance & Security Company",
    "headings": [
      {
        "level": 1,
        "text": "Welcome to Cloudflare"
      },
      {
        "level": 2,
        "text": "Performance"
      },
      {
        "level": 2,
        "text": "Security"
      },
      {
        "level": 3,
        "text": "DDoS Protection"
      }
    ],
    "links": [
      "https://www.cloudflare.com/products/",
      "https://www.cloudflare.com/plans/",
      "https://www.cloudflare.com/learning/",
      "https://developers.cloudflare.com/"
    ],
    "images": [
      "https://www.cloudflare.com/img/logo-web-badges/cf-logo-on-white-bg.svg",
      "https://www.cloudflare.com/img/products/workers.png"
    ]
  }
}
```

## Error Responses [#error-responses]

### Invalid URL [#invalid-url]

```json
{
  "success": false,
  "error": "Missing or invalid query parameter: url",
  "code": "INVALID_URL"
}
```

### Fetch Error [#fetch-error]

```json
{
  "success": false,
  "error": "HTTP 404",
  "code": "FETCH_ERROR"
}
```

## Technical Details [#technical-details]

* Built with [Hono](https://hono.dev/) framework
* Runs on Cloudflare Workers
* Regex-based HTML parsing for fast extraction
* Automatically resolves relative URLs to absolute URLs
* Deduplicates links and images
* Returns clean, structured JSON ready for further processing

## Processing Notes [#processing-notes]

* All HTML tags within headings are stripped, returning clean text
* Anchor links (starting with #) are excluded from the links array
* JavaScript URLs (javascript:) are excluded from the links array
* Duplicate links and images are automatically removed
* Relative URLs are resolved based on the requested page URL

## Use Cases [#use-cases]

* **Web Scraping**: Extract structured data from websites without parsing HTML
* **Content Analysis**: Analyze page structure and heading hierarchy
* **Link Extraction**: Build sitemaps or discover related content
* **Search Indexing**: Extract text and structure for custom search engines
* **Content Migration**: Extract content when migrating between platforms
* **SEO Audits**: Analyze heading structure and internal linking

## Limitations [#limitations]

* Static HTML parsing only; JavaScript-rendered content is not available
* Fetch timeout and HTML size limits apply
* Structure extraction is heuristic; not a full DOM or accessibility tree

## Deployment [#deployment]

<Steps>
  <Step>
    ### Click the deploy button [#click-the-deploy-button]

    [![Deploy to Cloudflare Workers](https://deploy.workers.cloudflare.com/button)](https://deploy.workers.cloudflare.com/?url=https://github.com/shrinathsnayak/cloudflare-experiments/tree/main/apps/experiments/website-to-api)
  </Step>

  <Step>
    ### Deploy [#deploy]

    No additional configuration required.
  </Step>

  <Step>
    ### Test your deployment [#test-your-deployment]

    ```bash
    curl "https://your-worker.workers.dev/api?url=https://example.com"
    ```
  </Step>
</Steps>

## Local Development [#local-development]

```bash
cd apps/experiments/website-to-api
npm install
npm run dev
```

Test locally:

```bash
curl "http://localhost:8787/api?url=https://example.com"
```

## Cloudflare Features Used [#cloudflare-features-used]

* **[Workers](https://developers.cloudflare.com/workers/)** - Edge compute runtime
* **[Fetch API](https://developers.cloudflare.com/workers/runtime-apis/fetch/)** - Remote HTML retrieval and structured parsing
