url-to-markdown

Convert any publicly accessible URL into clean GitHub-Flavored Markdown with YAML frontmatter metadata. The page is rendered in a real browser, passed through a readability extractor to isolate the main article, then serialised to Markdown with normalised links, images, and fenced code blocks. Boilerplate (navigation, footer, aside, scripts, forms, buttons) is removed, and relative URLs are resolved to absolute URLs.

Useful for LLM ingestion pipelines, content archival, RSS-style feeds, and any workflow that needs plain-text article content without HTML noise.

Endpoint

POST /v1/convert/url-to-markdown

Content-Type: application/json

Output format: Markdown (.md, UTF-8) with a YAML frontmatter block at the top of the file containing page metadata. The output format is not configurable — Markdown with YAML frontmatter is always produced.

Authentication

This endpoint supports both private and public key authentication.

Private Key

Include your secret key in the X-API-Key header. Use this for server-to-server calls where the key is never exposed to the client.

X-API-Key: sk_live_your_private_key

Public Key with JWT

For client-side usage, first generate a JWT token using your public key, then pass it as a Bearer token.

Step 1 -- Get a token:

POST /v1/auth/token
X-API-Key: pk_live_your_public_key

Step 2 -- Use the token:

Authorization: Bearer eyJhbGciOiJIUzI1NiIs...

Note: Public key requests are restricted to a single URL, synchronous mode, and direct download. Async mode, batch processing, webhooks, and notification emails are not available with public keys.

Request Parameters

Top-Level Parameters

Parameter	Type	Required	Default	Description	Plan Gating
`url`	`string` or `string[]`	Yes	--	A single URL string or an array of URLs to convert. Multiple URLs require async mode.	--
`async_mode`	`boolean`	No	`false`	Run the conversion asynchronously. Returns a `batch_id` immediately for polling. Required for batch (multiple URLs).	Requires async access
`direct_download`	`boolean`	No	`false`	Return raw Markdown bytes in the response body instead of a JSON response with a presigned URL. Forced `true` for public keys. Incompatible with `async_mode` and multiple URLs.	--
`output_format`	`boolean`	No	`false`	When `true` with multiple URLs, bundles all output Markdown files into a single ZIP archive. Requires multiple URLs.	Requires ZIP output access
`output_filename`	`string`	No	Auto-generated	Custom filename for the output file. The `.md` extension is added automatically. Default format: `{domain}_{timestamp}.md`.	--
`job_id`	`string`	No	--	Client-provided job ID for timeout recovery. Public keys only. When a sync conversion exceeds reverse-proxy timeout limits, the client can poll `GET /v1/convert/status/{job_id}` to retrieve the result. Ignored for private keys.	--
`notification_email`	`string`	No	Project owner email	Email address to notify when an async job completes. Private keys only.	--
`callback_url`	`string`	No	--	Webhook URL to receive a POST request when the conversion completes. Private keys only.	Requires webhook access

Browser & Rendering Parameters

Parameter	Type	Required	Default	Description	Plan Gating
`viewport_width`	`integer`	No	`1920`	Browser viewport width in pixels. Affects responsive content and which layout variant is captured before extraction.	--
`viewport_height`	`integer`	No	`1080`	Browser viewport height in pixels. Used as a reference for rendering and viewport-unit calculation.	--
`load_media`	`boolean`	No	`true`	Wait for all images and videos to fully load before extraction. When `false`, extraction is faster but lazy-loaded images may have placeholder `src` values in the Markdown output.	--
`enable_scroll`	`boolean`	No	`true`	Scroll the page top-to-bottom to trigger lazy-loading content (IntersectionObserver-based loaders).	--
`handle_sticky_header`	`boolean`	No	`true`	Detect sticky/fixed headers and scroll to top before extraction so content ordering is preserved correctly.	--
`handle_cookies`	`boolean`	No	`true`	Auto-dismiss cookie consent banners (OneTrust, Cookiebot, Didomi, Usercentrics, and generic banners) before extraction.	--
`wait_for_images`	`boolean`	No	`true`	Wait for all `<img>` elements to finish loading (5-second timeout per image) so `alt` text and final `src` values are captured correctly.	--

Authentication & Custom Requests

Parameter	Type	Required	Default	Description	Plan Gating
`auth`	`object`	No	`null`	HTTP Basic Auth credentials for the target URL. Format: `{"username": "...", "password": "..."}`. Cannot be used together with an `Authorization` custom header.	Requires basic auth access
`cookies`	`array`	No	`null`	Array of cookie objects to inject before navigation. Maximum 50 cookies. Each cookie must have `name`, `value`, and either `domain` or `url`.	Requires basic auth access
`headers`	`object`	No	`null`	Dictionary of custom HTTP headers sent with every request to the target URL. Maximum 20 headers. Blocked headers: `host`, `content-length`, `transfer-encoding`, `connection`, `upgrade`, `te`, `trailer`.	Requires basic auth access

Not supported: The single_page and pdf_options parameters from the url-to-pdf endpoint are accepted for request-shape parity but have no effect on Markdown output. Markdown has no concept of pages, margins, or orientation.

Each item in the cookies array must follow this structure:

Field	Type	Required	Default	Description
`name`	`string`	Yes	--	Cookie name.
`value`	`string`	Yes	--	Cookie value.
`domain`	`string`	Conditional	--	Cookie domain. Either `domain` or `url` must be provided.
`url`	`string`	Conditional	--	URL to associate the cookie with. Either `domain` or `url` must be provided.
`path`	`string`	No	`"/"`	Cookie path. Defaults to `"/"` when `domain` is set.

Response

Synchronous with Direct Download (`direct_download=true`)

Private key -- returns raw Markdown bytes:

HTTP 200 OK
Content-Type: text/markdown; charset=utf-8
Content-Disposition: inline; filename="example_20260421_123456789.md"
X-Object-Key: env/files/{project_id}/url-to-markdown/example_20260421_123456789.md
X-File-Size: 8421
X-Conversion-Time: 6.3
X-Filename: example_20260421_123456789.md

(UTF-8 Markdown with YAML frontmatter)

Public key -- returns JSON with a presigned URL:

{
    "presigned_url": "https://spaces.example.com/...",
    "object_key": "env/files/{project_id}/url-to-markdown/example_20260421_123456789.md",
    "filename": "example_20260421_123456789.md",
    "file_size": 8421,
    "conversion_time_seconds": 6.3,
    "job_id": "client-provided-id"
}

Synchronous without Direct Download (`direct_download=false`)

Available with private keys only.

{
    "presigned_url": "https://spaces.example.com/...",
    "object_key": "env/files/{project_id}/url-to-markdown/example_20260421_123456789.md",
    "filename": "example_20260421_123456789.md",
    "file_size": 8421,
    "conversion_time_seconds": 6.3
}

Asynchronous Mode

Returns immediately with a batch_id for polling.

HTTP 202 Accepted

{
    "status": "processing",
    "batch_id": "550e8400-e29b-41d4-a716-446655440000",
    "url_count": 5,
    "output_format": "individual"
}

When output_format=true (ZIP bundling):

{
    "status": "processing",
    "batch_id": "550e8400-e29b-41d4-a716-446655440000",
    "url_count": 5,
    "output_format": "zip"
}

Job Status Polling (Public Keys Only)

For public key timeout recovery:

GET /v1/convert/status/{job_id}
Authorization: Bearer <jwt_token>

Status	Response
Processing	`{"status": "processing"}`
Success	`{"status": "success", "presigned_url": "...", "object_key": "..."}`
Failed	`{"status": "failed", "error": "..."}`

Batch Status Polling (Private Keys Only)

For async batch jobs, poll with the batch_id from the 202 response:

GET /v1/convert/batch/{batch_id}
X-API-Key: sk_live_your_private_key

Returns aggregate status, per-URL statuses, and presigned download URLs. See Batch Status Polling for the full response schema.

Webhook Callback Payload

When a callback_url is provided, Enconvert sends a POST request to that URL on completion.

Single URL job:

{
    "job_id": "activity_id",
    "status": "success",
    "batch_id": "...",
    "gcs_uri": "object_key",
    "filename": "example_20260421_123456789.md",
    "file_size": 8421
}

Batch job:

{
    "job_id": "activity_id",
    "status": "success",
    "batch_id": "...",
    "total_tasks": 10,
    "successful_tasks": 8,
    "failed_tasks": 2,
    "tasks": [
        {"url": "https://example.com/page1", "status": "success", "filename": "page1.md"},
        {"url": "https://example.com/page2", "status": "failed", "filename": null}
    ]
}

Output Format

Every Markdown file starts with a YAML frontmatter block containing page metadata, followed by the extracted article body.

---
url: https://example.com/articles/my-post
title: My Post Title
description: A short summary from the meta description or og:description tag.
links:
  - url: https://example.com/related
    text: Related article
  - url: https://example.com/about
    text: About the author
images:
  - url: https://example.com/hero.jpg
    alt: Hero image alt text
  - url: https://example.com/diagram.png
    alt: Architecture diagram
---

# My Post Title

Opening paragraph of the article body, converted to GitHub-Flavored Markdown...

## A Section Heading

- List item one
- List item two

\`\`\`python
def example():
    return "code blocks are fenced with language hints"
\`\`\`

[A link in the body](https://example.com/linked-page)

![An image in the body](https://example.com/inline-image.png)

Frontmatter Fields

Field	Type	Description
`url`	`string`	The final URL after redirects (not always the URL you sent).
`title`	`string`	The page title from `<title>`, falling back to the Readability-detected short title.
`description`	`string`	The `<meta name="description">` value, falling back to `<meta property="og:description">`.
`links`	`array`	Every `<a href>` found on the page, with absolute URLs and visible anchor text.
`images`	`array`	Every `<img src>` found on the page, with absolute URLs and `alt` text.

Markdown Conventions

Heading style: ATX (#, ##, ###)
List bullets: -
Emphasis: *bold*, *italic* with escaped * and _ in literal text
Soft line breaks: trailing two spaces (preserved in output)
Code blocks: fenced (```) with language hints detected from class="language-xxx", class="lang-xxx", class="highlight-source-xxx", data-lang, and data-language
Links: [text](url) when anchor text is present, autolink form <url> when the anchor is empty, anchor-only (#foo) and javascript: links are unwrapped to plain text
Images: ![alt](src), with title preserved when present, falling back to data-src when src is missing (lazy-loaded images)
Horizontal rules: ---

Features

Clean Content Extraction

Enconvert uses the Readability algorithm (the same library that powers Firefox Reader View) to isolate the main article content from the rest of the page, then applies a second pass of post-processing to produce clean Markdown.

Removed before conversion:

Navigation (<nav>), footers (<footer>), asides (<aside>)
Scripts (<script>, <noscript>), styles (<style>), iframes, forms, buttons
Inline SVG, canvas, and template elements
style, class, id, and all on* event handler attributes

Preserved:

Headings, paragraphs, lists, tables, blockquotes, code blocks
Links with their href and anchor text (absolute URLs)
Images with alt, title, and absolute src
Figures and figcaptions (inline images are kept inside these)

Clear Capture Mode

Before extraction, the page is rendered in a real browser and cleaned up the same way as url-to-pdf:

Cookie consent banners -- Auto-dismissed across the main page and iframes (OneTrust, Cookiebot, Didomi, Usercentrics, and generic banners).
Modal and popup dismissal -- Overlays closed via Escape key, ARIA close buttons, class-based close buttons, and role-based dialog buttons.
Scroll animation reveal -- Forces visibility on elements hidden by WOW.js, AOS, ScrollReveal, GSAP ScrollTrigger, and generic animation classes.
Sticky header handling -- Sticky/fixed headers detected and the page scrolled back to the top so content ordering is preserved.

Absolute URL Resolution

Every relative href and src in the extracted article is resolved against the final page URL (after redirects), so the Markdown output always contains absolute, clickable links — useful for LLM ingestion pipelines that would otherwise see broken relative paths.

Anchor-only links (#section), javascript: links, mailto:, and tel: links are not rewritten. Anchor-only and javascript: links are unwrapped to plain text because they have no meaning outside the original page.

Code Block Language Detection

Code blocks are fenced with a detected language hint where possible:

<pre><code class="language-python">...</code></pre>    →   ```python
<pre data-lang="js">...</pre>                          →   ```js
<pre><code class="highlight-source-shell">...</code></pre> →   ```shell

Classes matching language-*, lang-*, highlight-source-*, and brush:* are recognised, plus data-lang and data-language attributes on both the <pre> and its nested <code>. If no hint is found, the block is fenced without a language label.

HTTP Basic Auth

Pass auth with username and password to convert pages behind HTTP Basic Authentication.

{
    "url": "https://staging.example.com/docs/article",
    "auth": {
        "username": "admin",
        "password": "secret"
    }
}

Inject up to 50 cookies before the page loads. Useful for converting member-only or locale-specific article pages.

{
    "url": "https://example.com/members/post",
    "cookies": [
        {"name": "session_id", "value": "abc123", "domain": "example.com"},
        {"name": "locale", "value": "en-US", "domain": "example.com"}
    ]
}

Custom Headers

Send up to 20 custom HTTP headers with every request to the target page.

{
    "url": "https://example.com/api-docs",
    "headers": {
        "X-Custom-Token": "my-token-value",
        "Accept-Language": "en-US"
    }
}

Lazy Image Loading

When load_media and enable_scroll are enabled (both default to true), the converter scrolls the page slowly to trigger lazy loaders, then waits for all images to finish loading before capturing the final HTML. This ensures data-src values have been promoted to real src values and the frontmatter images list is complete.

Set load_media=false for faster extraction when you only need the text body — placeholder src values may remain in the output.

Additional Rendering Features

Viewport unit normalization -- CSS viewport units (vh, svh, lvh, dvh) are converted to fixed pixel values before extraction.
Stealth mode -- Browser fingerprint masking to avoid bot detection on protected pages.
Popup interception -- Automatically closes any new browser tabs or popups triggered by the page.
CSP bypass -- Handles Content Security Policy and Trusted Types restrictions that would otherwise block page manipulation.

Subscription Plan Gating

Feature	Free	Starter	Pro	Enterprise
Basic conversion (single URL, sync)	Yes	Yes	Yes	Yes
Viewport and rendering options	Yes	Yes	Yes	Yes
Async mode	No	Yes	Yes	Yes
Batch processing (multiple URLs)	No	Yes	Yes	Yes
ZIP output bundling	No	No	Yes	Yes
Webhook callbacks	No	No	Yes	Yes
HTTP Basic Auth	No	Yes	Yes	Yes
Cookie injection	No	Yes	Yes	Yes
Custom headers	No	Yes	Yes	Yes
Monthly conversions	100	Plan-based	Plan-based	Unlimited
Batch size limit	0	Plan-based	Plan-based	Unlimited
File retention	1 hour	Plan-based	Plan-based	Plan-based

Async Mode

Asynchronous mode is useful for long-running conversions or when converting multiple URLs.

How It Works

Send a request with async_mode=true (or pass multiple URLs, which enables async automatically).
The API returns HTTP 202 immediately with a batch_id and url_count.
Each URL is converted in the background, uploaded to storage, and tracked individually.
Monitor completion via batch status polling, email notification, or webhook callback.

Email Notification

By default, a completion email is sent to the project owner's email address. Override with notification_email:

{
    "url": ["https://example.com/page1", "https://example.com/page2"],
    "async_mode": true,
    "notification_email": "team@example.com"
}

Webhook Callback

Provide a callback_url to receive an automatic POST notification on completion:

{
    "url": ["https://example.com/page1", "https://example.com/page2"],
    "async_mode": true,
    "callback_url": "https://your-server.com/webhook/enconvert"
}

The webhook is sent with a 30-second timeout and considers HTTP 200, 201, 202, and 204 as successful delivery.

Batch and Bulk Processing

Convert multiple URLs in a single request. Requires async mode and a private key.

Individual Output (default)

Each URL produces a separate Markdown file:

{
    "url": [
        "https://example.com/post-1",
        "https://example.com/post-2",
        "https://example.com/post-3"
    ],
    "async_mode": true
}

ZIP Bundle Output

Bundle all Markdown files into a single ZIP archive:

{
    "url": [
        "https://example.com/post-1",
        "https://example.com/post-2",
        "https://example.com/post-3"
    ],
    "async_mode": true,
    "output_format": true,
    "output_filename": "blog-archive"
}

The ZIP file is named {output_filename}_{timestamp}.zip or batch_{timestamp}.zip if no custom name is provided.

Code Examples

Python (Private Key)

import requests

response = requests.post(
    "https://api.enconvert.com/v1/convert/url-to-markdown",
    headers={"X-API-Key": "sk_live_your_private_key"},
    json={
        "url": "https://example.com/articles/my-post",
        "direct_download": True
    }
)

response.raise_for_status()
markdown_text = response.content.decode("utf-8")
print(markdown_text)

PHP (Private Key)

$ch = curl_init("https://api.enconvert.com/v1/convert/url-to-markdown");
curl_setopt_array($ch, [
    CURLOPT_POST => true,
    CURLOPT_RETURNTRANSFER => true,
    CURLOPT_HTTPHEADER => [
        "Content-Type: application/json",
        "X-API-Key: sk_live_your_private_key"
    ],
    CURLOPT_POSTFIELDS => json_encode([
        "url" => "https://example.com/articles/my-post"
    ])
]);

$response = curl_exec($ch);
curl_close($ch);

$data = json_decode($response, true);
echo $data["presigned_url"];

Node.js (Private Key)

const response = await fetch("https://api.enconvert.com/v1/convert/url-to-markdown", {
    method: "POST",
    headers: {
        "Content-Type": "application/json",
        "X-API-Key": "sk_live_your_private_key"
    },
    body: JSON.stringify({
        url: "https://example.com/articles/my-post"
    })
});

const data = await response.json();
console.log(data.presigned_url);

Go (Private Key)

package main

import (
    "bytes"
    "encoding/json"
    "fmt"
    "io"
    "net/http"
)

func main() {
    body, _ := json.Marshal(map[string]interface{}{
        "url": "https://example.com/articles/my-post",
    })

    req, _ := http.NewRequest("POST", "https://api.enconvert.com/v1/convert/url-to-markdown", bytes.NewBuffer(body))
    req.Header.Set("Content-Type", "application/json")
    req.Header.Set("X-API-Key", "sk_live_your_private_key")

    resp, _ := http.DefaultClient.Do(req)
    defer resp.Body.Close()

    respBody, _ := io.ReadAll(resp.Body)
    fmt.Println(string(respBody))
}

JavaScript -- Browser (Public Key)

// Step 1: Get a JWT token
const tokenRes = await fetch("https://api.enconvert.com/v1/auth/token", {
    method: "POST",
    headers: { "X-API-Key": "pk_live_your_public_key" }
});
const { token } = await tokenRes.json();

// Step 2: Convert URL to Markdown (public keys receive raw bytes)
const convertRes = await fetch("https://api.enconvert.com/v1/convert/url-to-markdown", {
    method: "POST",
    headers: {
        "Content-Type": "application/json",
        "Authorization": `Bearer ${token}`
    },
    body: JSON.stringify({
        url: "https://example.com/articles/my-post"
    })
});

const markdown = await convertRes.text();
console.log(markdown);

React (Public Key)

import { useState } from "react";

function UrlToMarkdown() {
    const [loading, setLoading] = useState(false);
    const [markdown, setMarkdown] = useState("");

    async function convertUrl() {
        setLoading(true);
        try {
            // Get JWT token
            const tokenRes = await fetch("https://api.enconvert.com/v1/auth/token", {
                method: "POST",
                headers: { "X-API-Key": "pk_live_your_public_key" }
            });
            const { token } = await tokenRes.json();

            // Convert
            const convertRes = await fetch("https://api.enconvert.com/v1/convert/url-to-markdown", {
                method: "POST",
                headers: {
                    "Content-Type": "application/json",
                    "Authorization": `Bearer ${token}`
                },
                body: JSON.stringify({ url: "https://example.com/articles/my-post" })
            });

            setMarkdown(await convertRes.text());
        } finally {
            setLoading(false);
        }
    }

    return (
        <div>
            <button onClick={convertUrl} disabled={loading}>
                {loading ? "Converting..." : "Convert to Markdown"}
            </button>
            {markdown && <pre>{markdown}</pre>}
        </div>
    );
}

export default UrlToMarkdown;

Error Responses

Status	Condition
`400 Bad Request`	Missing or empty `url` parameter
`400 Bad Request`	`output_format=true` with a single URL (requires multiple URLs)
`400 Bad Request`	`direct_download=true` with multiple URLs
`400 Bad Request`	`direct_download=true` with `async_mode=true`
`400 Bad Request`	Invalid `auth` object (missing `username` or `password`)
`400 Bad Request`	Invalid `cookies` (not an array, exceeds 50 entries, missing required fields)
`400 Bad Request`	Invalid `headers` (not an object, exceeds 20 entries, blocked header names, non-string values)
`400 Bad Request`	Conflicting `auth` and `Authorization` custom header
`400 Bad Request`	Public key attempting multiple URLs
`401 Unauthorized`	Missing or invalid API key / JWT token
`402 Payment Required`	Monthly conversion limit reached
`402 Payment Required`	Batch would exceed remaining monthly quota
`402 Payment Required`	Storage limit reached
`403 Forbidden`	Endpoint not in the API key's allowed endpoints
`403 Forbidden`	Feature not available on current plan (async, webhook, ZIP, basic auth)
`403 Forbidden`	Batch size exceeds plan's batch limit
`404 Not Found`	Job ID not found (when polling status)
`500 Internal Server Error`	Conversion failed (browser crash, navigation error, extraction failure)

Limits

Limit	Value
Page navigation timeout	60 seconds
Per-image load timeout	5 seconds
Cookie banner dismiss timeout	3 seconds
Maximum cookies per request	50
Maximum custom headers per request	20
Monthly conversions	Plan-dependent (Free: 100)
Batch size	Plan-dependent (Free: disabled)
File retention	Plan-dependent (Free: 1 hour)
Webhook delivery timeout	30 seconds

API Documentation Beta