url-to-markdown

Convert any publicly accessible URL into clean GitHub-Flavored Markdown with YAML frontmatter metadata. The page is rendered in a real browser, passed through a readability extractor to isolate the main article, then serialised to Markdown with normalised links, images, and fenced code blocks. Boilerplate (navigation, footer, aside, scripts, forms, buttons) is removed, and relative URLs are resolved to absolute URLs.

Useful for LLM ingestion pipelines, content archival, RSS-style feeds, and any workflow that needs plain-text article content without HTML noise.


Endpoint

POST /v1/convert/url-to-markdown

Content-Type: application/json

Output format: Markdown (.md, UTF-8) with a YAML frontmatter block at the top of the file containing page metadata. The output format is not configurable — Markdown with YAML frontmatter is always produced.


Authentication

This endpoint supports both private and public key authentication.

Private Key

Include your secret key in the X-API-Key header. Use this for server-to-server calls where the key is never exposed to the client.

X-API-Key: sk_live_your_private_key

Public Key with JWT

For client-side usage, first generate a JWT token using your public key, then pass it as a Bearer token.

Step 1 -- Get a token:

POST /v1/auth/token
X-API-Key: pk_live_your_public_key

Step 2 -- Use the token:

Authorization: Bearer eyJhbGciOiJIUzI1NiIs...
Note: Public key requests are restricted to a single URL, synchronous mode, and direct download. Async mode, batch processing, webhooks, and notification emails are not available with public keys.

Request Parameters

Top-Level Parameters

Parameter Type Required Default Description Plan Gating
url string or string[] Yes -- A single URL string or an array of URLs to convert. Multiple URLs require async mode. --
async_mode boolean No false Run the conversion asynchronously. Returns a batch_id immediately for polling. Required for batch (multiple URLs). Requires async access
direct_download boolean No false Return raw Markdown bytes in the response body instead of a JSON response with a presigned URL. Forced true for public keys. Incompatible with async_mode and multiple URLs. --
output_format boolean No false When true with multiple URLs, bundles all output Markdown files into a single ZIP archive. Requires multiple URLs. Requires ZIP output access
output_filename string No Auto-generated Custom filename for the output file. The .md extension is added automatically. Default format: {domain}_{timestamp}.md. --
job_id string No -- Client-provided job ID for timeout recovery. Public keys only. When a sync conversion exceeds reverse-proxy timeout limits, the client can poll GET /v1/convert/status/{job_id} to retrieve the result. Ignored for private keys. --
notification_email string No Project owner email Email address to notify when an async job completes. Private keys only. --
callback_url string No -- Webhook URL to receive a POST request when the conversion completes. Private keys only. Requires webhook access

Browser & Rendering Parameters

Parameter Type Required Default Description Plan Gating
viewport_width integer No 1920 Browser viewport width in pixels. Affects responsive content and which layout variant is captured before extraction. --
viewport_height integer No 1080 Browser viewport height in pixels. Used as a reference for rendering and viewport-unit calculation. --
load_media boolean No true Wait for all images and videos to fully load before extraction. When false, extraction is faster but lazy-loaded images may have placeholder src values in the Markdown output. --
enable_scroll boolean No true Scroll the page top-to-bottom to trigger lazy-loading content (IntersectionObserver-based loaders). --
handle_sticky_header boolean No true Detect sticky/fixed headers and scroll to top before extraction so content ordering is preserved correctly. --
handle_cookies boolean No true Auto-dismiss cookie consent banners (OneTrust, Cookiebot, Didomi, Usercentrics, and generic banners) before extraction. --
wait_for_images boolean No true Wait for all <img> elements to finish loading (5-second timeout per image) so alt text and final src values are captured correctly. --

Authentication & Custom Requests

Parameter Type Required Default Description Plan Gating
auth object No null HTTP Basic Auth credentials for the target URL. Format: {"username": "...", "password": "..."}. Cannot be used together with an Authorization custom header. Requires basic auth access
cookies array No null Array of cookie objects to inject before navigation. Maximum 50 cookies. Each cookie must have name, value, and either domain or url. Requires basic auth access
headers object No null Dictionary of custom HTTP headers sent with every request to the target URL. Maximum 20 headers. Blocked headers: host, content-length, transfer-encoding, connection, upgrade, te, trailer. Requires basic auth access
Not supported: The single_page and pdf_options parameters from the url-to-pdf endpoint are accepted for request-shape parity but have no effect on Markdown output. Markdown has no concept of pages, margins, or orientation.

Each item in the cookies array must follow this structure:

Field Type Required Default Description
name string Yes -- Cookie name.
value string Yes -- Cookie value.
domain string Conditional -- Cookie domain. Either domain or url must be provided.
url string Conditional -- URL to associate the cookie with. Either domain or url must be provided.
path string No "/" Cookie path. Defaults to "/" when domain is set.

Response

Synchronous with Direct Download (direct_download=true)

Private key -- returns raw Markdown bytes:

HTTP 200 OK
Content-Type: text/markdown; charset=utf-8
Content-Disposition: inline; filename="example_20260421_123456789.md"
X-Object-Key: env/files/{project_id}/url-to-markdown/example_20260421_123456789.md
X-File-Size: 8421
X-Conversion-Time: 6.3
X-Filename: example_20260421_123456789.md

(UTF-8 Markdown with YAML frontmatter)

Public key -- returns JSON with a presigned URL:

{
    "presigned_url": "https://spaces.example.com/...",
    "object_key": "env/files/{project_id}/url-to-markdown/example_20260421_123456789.md",
    "filename": "example_20260421_123456789.md",
    "file_size": 8421,
    "conversion_time_seconds": 6.3,
    "job_id": "client-provided-id"
}

Synchronous without Direct Download (direct_download=false)

Available with private keys only.

{
    "presigned_url": "https://spaces.example.com/...",
    "object_key": "env/files/{project_id}/url-to-markdown/example_20260421_123456789.md",
    "filename": "example_20260421_123456789.md",
    "file_size": 8421,
    "conversion_time_seconds": 6.3
}

Asynchronous Mode

Returns immediately with a batch_id for polling.

HTTP 202 Accepted
{
    "status": "processing",
    "batch_id": "550e8400-e29b-41d4-a716-446655440000",
    "url_count": 5,
    "output_format": "individual"
}

When output_format=true (ZIP bundling):

{
    "status": "processing",
    "batch_id": "550e8400-e29b-41d4-a716-446655440000",
    "url_count": 5,
    "output_format": "zip"
}

Job Status Polling (Public Keys Only)

For public key timeout recovery:

GET /v1/convert/status/{job_id}
Authorization: Bearer <jwt_token>
Status Response
Processing {"status": "processing"}
Success {"status": "success", "presigned_url": "...", "object_key": "..."}
Failed {"status": "failed", "error": "..."}

Batch Status Polling (Private Keys Only)

For async batch jobs, poll with the batch_id from the 202 response:

GET /v1/convert/batch/{batch_id}
X-API-Key: sk_live_your_private_key

Returns aggregate status, per-URL statuses, and presigned download URLs. See Batch Status Polling for the full response schema.

Webhook Callback Payload

When a callback_url is provided, Enconvert sends a POST request to that URL on completion.

Single URL job:

{
    "job_id": "activity_id",
    "status": "success",
    "batch_id": "...",
    "gcs_uri": "object_key",
    "filename": "example_20260421_123456789.md",
    "file_size": 8421
}

Batch job:

{
    "job_id": "activity_id",
    "status": "success",
    "batch_id": "...",
    "total_tasks": 10,
    "successful_tasks": 8,
    "failed_tasks": 2,
    "tasks": [
        {"url": "https://example.com/page1", "status": "success", "filename": "page1.md"},
        {"url": "https://example.com/page2", "status": "failed", "filename": null}
    ]
}

Output Format

Every Markdown file starts with a YAML frontmatter block containing page metadata, followed by the extracted article body.

---
url: https://example.com/articles/my-post
title: My Post Title
description: A short summary from the meta description or og:description tag.
links:
  - url: https://example.com/related
    text: Related article
  - url: https://example.com/about
    text: About the author
images:
  - url: https://example.com/hero.jpg
    alt: Hero image alt text
  - url: https://example.com/diagram.png
    alt: Architecture diagram
---

# My Post Title

Opening paragraph of the article body, converted to GitHub-Flavored Markdown...

## A Section Heading

- List item one
- List item two

\`\`\`python
def example():
    return "code blocks are fenced with language hints"
\`\`\`

[A link in the body](https://example.com/linked-page)

![An image in the body](https://example.com/inline-image.png)

Frontmatter Fields

Field Type Description
url string The final URL after redirects (not always the URL you sent).
title string The page title from <title>, falling back to the Readability-detected short title.
description string The <meta name="description"> value, falling back to <meta property="og:description">.
links array Every <a href> found on the page, with absolute URLs and visible anchor text.
images array Every <img src> found on the page, with absolute URLs and alt text.

Markdown Conventions

  • Heading style: ATX (#, ##, ###)
  • List bullets: -
  • Emphasis: *bold*, *italic* with escaped * and _ in literal text
  • Soft line breaks: trailing two spaces (preserved in output)
  • Code blocks: fenced (```) with language hints detected from class="language-xxx", class="lang-xxx", class="highlight-source-xxx", data-lang, and data-language
  • Links: [text](url) when anchor text is present, autolink form <url> when the anchor is empty, anchor-only (#foo) and javascript: links are unwrapped to plain text
  • Images: ![alt](src), with title preserved when present, falling back to data-src when src is missing (lazy-loaded images)
  • Horizontal rules: ---

Features

Clean Content Extraction

Enconvert uses the Readability algorithm (the same library that powers Firefox Reader View) to isolate the main article content from the rest of the page, then applies a second pass of post-processing to produce clean Markdown.

Removed before conversion:

  • Navigation (<nav>), footers (<footer>), asides (<aside>)
  • Scripts (<script>, <noscript>), styles (<style>), iframes, forms, buttons
  • Inline SVG, canvas, and template elements
  • style, class, id, and all on* event handler attributes

Preserved:

  • Headings, paragraphs, lists, tables, blockquotes, code blocks
  • Links with their href and anchor text (absolute URLs)
  • Images with alt, title, and absolute src
  • Figures and figcaptions (inline images are kept inside these)

Clear Capture Mode

Before extraction, the page is rendered in a real browser and cleaned up the same way as url-to-pdf:

  • Cookie consent banners -- Auto-dismissed across the main page and iframes (OneTrust, Cookiebot, Didomi, Usercentrics, and generic banners).
  • Modal and popup dismissal -- Overlays closed via Escape key, ARIA close buttons, class-based close buttons, and role-based dialog buttons.
  • Scroll animation reveal -- Forces visibility on elements hidden by WOW.js, AOS, ScrollReveal, GSAP ScrollTrigger, and generic animation classes.
  • Sticky header handling -- Sticky/fixed headers detected and the page scrolled back to the top so content ordering is preserved.

Absolute URL Resolution

Every relative href and src in the extracted article is resolved against the final page URL (after redirects), so the Markdown output always contains absolute, clickable links — useful for LLM ingestion pipelines that would otherwise see broken relative paths.

Anchor-only links (#section), javascript: links, mailto:, and tel: links are not rewritten. Anchor-only and javascript: links are unwrapped to plain text because they have no meaning outside the original page.

Code Block Language Detection

Code blocks are fenced with a detected language hint where possible:

<pre><code class="language-python">...</code></pre>    →   ```python
<pre data-lang="js">...</pre>                          →   ```js
<pre><code class="highlight-source-shell">...</code></pre> →   ```shell

Classes matching language-*, lang-*, highlight-source-*, and brush:* are recognised, plus data-lang and data-language attributes on both the <pre> and its nested <code>. If no hint is found, the block is fenced without a language label.

HTTP Basic Auth

Pass auth with username and password to convert pages behind HTTP Basic Authentication.

{
    "url": "https://staging.example.com/docs/article",
    "auth": {
        "username": "admin",
        "password": "secret"
    }
}

Inject up to 50 cookies before the page loads. Useful for converting member-only or locale-specific article pages.

{
    "url": "https://example.com/members/post",
    "cookies": [
        {"name": "session_id", "value": "abc123", "domain": "example.com"},
        {"name": "locale", "value": "en-US", "domain": "example.com"}
    ]
}

Custom Headers

Send up to 20 custom HTTP headers with every request to the target page.

{
    "url": "https://example.com/api-docs",
    "headers": {
        "X-Custom-Token": "my-token-value",
        "Accept-Language": "en-US"
    }
}

Lazy Image Loading

When load_media and enable_scroll are enabled (both default to true), the converter scrolls the page slowly to trigger lazy loaders, then waits for all images to finish loading before capturing the final HTML. This ensures data-src values have been promoted to real src values and the frontmatter images list is complete.

Set load_media=false for faster extraction when you only need the text body — placeholder src values may remain in the output.

Additional Rendering Features

  • Viewport unit normalization -- CSS viewport units (vh, svh, lvh, dvh) are converted to fixed pixel values before extraction.
  • Stealth mode -- Browser fingerprint masking to avoid bot detection on protected pages.
  • Popup interception -- Automatically closes any new browser tabs or popups triggered by the page.
  • CSP bypass -- Handles Content Security Policy and Trusted Types restrictions that would otherwise block page manipulation.

Subscription Plan Gating

Feature Free Starter Pro Enterprise
Basic conversion (single URL, sync) Yes Yes Yes Yes
Viewport and rendering options Yes Yes Yes Yes
Async mode No Yes Yes Yes
Batch processing (multiple URLs) No Yes Yes Yes
ZIP output bundling No No Yes Yes
Webhook callbacks No No Yes Yes
HTTP Basic Auth No Yes Yes Yes
Cookie injection No Yes Yes Yes
Custom headers No Yes Yes Yes
Monthly conversions 100 Plan-based Plan-based Unlimited
Batch size limit 0 Plan-based Plan-based Unlimited
File retention 1 hour Plan-based Plan-based Plan-based

Async Mode

Asynchronous mode is useful for long-running conversions or when converting multiple URLs.

How It Works

  1. Send a request with async_mode=true (or pass multiple URLs, which enables async automatically).
  2. The API returns HTTP 202 immediately with a batch_id and url_count.
  3. Each URL is converted in the background, uploaded to storage, and tracked individually.
  4. Monitor completion via batch status polling, email notification, or webhook callback.

Email Notification

By default, a completion email is sent to the project owner's email address. Override with notification_email:

{
    "url": ["https://example.com/page1", "https://example.com/page2"],
    "async_mode": true,
    "notification_email": "team@example.com"
}

Webhook Callback

Provide a callback_url to receive an automatic POST notification on completion:

{
    "url": ["https://example.com/page1", "https://example.com/page2"],
    "async_mode": true,
    "callback_url": "https://your-server.com/webhook/enconvert"
}

The webhook is sent with a 30-second timeout and considers HTTP 200, 201, 202, and 204 as successful delivery.


Batch and Bulk Processing

Convert multiple URLs in a single request. Requires async mode and a private key.

Individual Output (default)

Each URL produces a separate Markdown file:

{
    "url": [
        "https://example.com/post-1",
        "https://example.com/post-2",
        "https://example.com/post-3"
    ],
    "async_mode": true
}

ZIP Bundle Output

Bundle all Markdown files into a single ZIP archive:

{
    "url": [
        "https://example.com/post-1",
        "https://example.com/post-2",
        "https://example.com/post-3"
    ],
    "async_mode": true,
    "output_format": true,
    "output_filename": "blog-archive"
}

The ZIP file is named {output_filename}_{timestamp}.zip or batch_{timestamp}.zip if no custom name is provided.


Code Examples

Python (Private Key)

import requests

response = requests.post(
    "https://api.enconvert.com/v1/convert/url-to-markdown",
    headers={"X-API-Key": "sk_live_your_private_key"},
    json={
        "url": "https://example.com/articles/my-post",
        "direct_download": True
    }
)

response.raise_for_status()
markdown_text = response.content.decode("utf-8")
print(markdown_text)

PHP (Private Key)

$ch = curl_init("https://api.enconvert.com/v1/convert/url-to-markdown");
curl_setopt_array($ch, [
    CURLOPT_POST => true,
    CURLOPT_RETURNTRANSFER => true,
    CURLOPT_HTTPHEADER => [
        "Content-Type: application/json",
        "X-API-Key: sk_live_your_private_key"
    ],
    CURLOPT_POSTFIELDS => json_encode([
        "url" => "https://example.com/articles/my-post"
    ])
]);

$response = curl_exec($ch);
curl_close($ch);

$data = json_decode($response, true);
echo $data["presigned_url"];

Node.js (Private Key)

const response = await fetch("https://api.enconvert.com/v1/convert/url-to-markdown", {
    method: "POST",
    headers: {
        "Content-Type": "application/json",
        "X-API-Key": "sk_live_your_private_key"
    },
    body: JSON.stringify({
        url: "https://example.com/articles/my-post"
    })
});

const data = await response.json();
console.log(data.presigned_url);

Go (Private Key)

package main

import (
    "bytes"
    "encoding/json"
    "fmt"
    "io"
    "net/http"
)

func main() {
    body, _ := json.Marshal(map[string]interface{}{
        "url": "https://example.com/articles/my-post",
    })

    req, _ := http.NewRequest("POST", "https://api.enconvert.com/v1/convert/url-to-markdown", bytes.NewBuffer(body))
    req.Header.Set("Content-Type", "application/json")
    req.Header.Set("X-API-Key", "sk_live_your_private_key")

    resp, _ := http.DefaultClient.Do(req)
    defer resp.Body.Close()

    respBody, _ := io.ReadAll(resp.Body)
    fmt.Println(string(respBody))
}

JavaScript -- Browser (Public Key)

// Step 1: Get a JWT token
const tokenRes = await fetch("https://api.enconvert.com/v1/auth/token", {
    method: "POST",
    headers: { "X-API-Key": "pk_live_your_public_key" }
});
const { token } = await tokenRes.json();

// Step 2: Convert URL to Markdown (public keys receive raw bytes)
const convertRes = await fetch("https://api.enconvert.com/v1/convert/url-to-markdown", {
    method: "POST",
    headers: {
        "Content-Type": "application/json",
        "Authorization": `Bearer ${token}`
    },
    body: JSON.stringify({
        url: "https://example.com/articles/my-post"
    })
});

const markdown = await convertRes.text();
console.log(markdown);

React (Public Key)

import { useState } from "react";

function UrlToMarkdown() {
    const [loading, setLoading] = useState(false);
    const [markdown, setMarkdown] = useState("");

    async function convertUrl() {
        setLoading(true);
        try {
            // Get JWT token
            const tokenRes = await fetch("https://api.enconvert.com/v1/auth/token", {
                method: "POST",
                headers: { "X-API-Key": "pk_live_your_public_key" }
            });
            const { token } = await tokenRes.json();

            // Convert
            const convertRes = await fetch("https://api.enconvert.com/v1/convert/url-to-markdown", {
                method: "POST",
                headers: {
                    "Content-Type": "application/json",
                    "Authorization": `Bearer ${token}`
                },
                body: JSON.stringify({ url: "https://example.com/articles/my-post" })
            });

            setMarkdown(await convertRes.text());
        } finally {
            setLoading(false);
        }
    }

    return (
        <div>
            <button onClick={convertUrl} disabled={loading}>
                {loading ? "Converting..." : "Convert to Markdown"}
            </button>
            {markdown && <pre>{markdown}</pre>}
        </div>
    );
}

export default UrlToMarkdown;

Error Responses

Status Condition
400 Bad Request Missing or empty url parameter
400 Bad Request output_format=true with a single URL (requires multiple URLs)
400 Bad Request direct_download=true with multiple URLs
400 Bad Request direct_download=true with async_mode=true
400 Bad Request Invalid auth object (missing username or password)
400 Bad Request Invalid cookies (not an array, exceeds 50 entries, missing required fields)
400 Bad Request Invalid headers (not an object, exceeds 20 entries, blocked header names, non-string values)
400 Bad Request Conflicting auth and Authorization custom header
400 Bad Request Public key attempting multiple URLs
401 Unauthorized Missing or invalid API key / JWT token
402 Payment Required Monthly conversion limit reached
402 Payment Required Batch would exceed remaining monthly quota
402 Payment Required Storage limit reached
403 Forbidden Endpoint not in the API key's allowed endpoints
403 Forbidden Feature not available on current plan (async, webhook, ZIP, basic auth)
403 Forbidden Batch size exceeds plan's batch limit
404 Not Found Job ID not found (when polling status)
500 Internal Server Error Conversion failed (browser crash, navigation error, extraction failure)

Limits

Limit Value
Page navigation timeout 60 seconds
Per-image load timeout 5 seconds
Cookie banner dismiss timeout 3 seconds
Maximum cookies per request 50
Maximum custom headers per request 20
Monthly conversions Plan-dependent (Free: 100)
Batch size Plan-dependent (Free: disabled)
File retention Plan-dependent (Free: 1 hour)
Webhook delivery timeout 30 seconds