website-to-pdf

Discover all pages of a website via sitemap or full crawl, convert each page to a high-fidelity PDF, and bundle the results into a single ZIP archive. Always runs asynchronously. Requires a paid plan.

Endpoint

POST /v1/convert/website-to-pdf

Content-Type: application/json

Output format: ZIP archive containing one PDF per discovered page.

Mode: Always asynchronous. Returns HTTP 202 immediately.

Authentication

This endpoint requires a private API key. Public keys are not supported for website capture.

X-API-Key: sk_live_your_private_key

Request Parameters

Website Discovery Parameters

Parameter	Type	Required	Default	Description	Plan Gating
`url`	`string`	Yes	--	The website base URL (e.g. `https://example.com`). Used as the root for page discovery.	--
`crawl_mode`	`string`	No	`"auto"`	URL discovery method. One of `"auto"`, `"sitemap"`, or `"full"`. See Crawl Modes below.	Sitemap requires Starter+, Full requires Pro+
`include_patterns`	`string[]`	No	`null`	Regex patterns to whitelist discovered URLs. Only used in `full` crawl mode.	--
`exclude_patterns`	`string[]`	No	System defaults	Regex patterns to blacklist URLs. Only used in `full` crawl mode. When omitted, uses built-in defaults that exclude static assets, login/admin/cart pages, and deep pagination.	--

Notification Parameters

Parameter	Type	Required	Default	Description	Plan Gating
`output_filename`	`string`	No	Auto-generated	Custom base name for the output ZIP file. Timestamp is appended automatically.	--
`notification_email`	`string`	No	Project owner email	Email address to notify when the job completes.	--
`callback_url`	`string`	No	--	Webhook URL to receive a POST request on completion.	Requires webhook access

Browser & Rendering Parameters

These settings apply to the conversion of each individual page within the website.

Parameter	Type	Required	Default	Description	Plan Gating
`viewport_width`	`integer`	No	`1920`	Browser viewport width in pixels.	--
`viewport_height`	`integer`	No	`1080`	Browser viewport height in pixels.	--
`single_page`	`boolean`	No	`true`	`true` renders each page as one continuous PDF page. `false` produces paginated output using `pdf_options` page size.	--
`load_media`	`boolean`	No	`true`	Wait for all images and videos to fully load before conversion.	--
`enable_scroll`	`boolean`	No	`true`	Scroll each page to trigger lazy-loading content.	--
`handle_sticky_header`	`boolean`	No	`true`	Detect sticky/fixed headers and handle them before capture.	--
`handle_cookies`	`boolean`	No	`true`	Auto-dismiss cookie consent banners.	--
`wait_for_images`	`boolean`	No	`true`	Wait for all `<img>` elements to finish loading.	--

Authentication & Custom Requests

Parameter	Type	Required	Default	Description	Plan Gating
`auth`	`object`	No	`null`	HTTP Basic Auth credentials applied to every page. Format: `{"username": "...", "password": "..."}`.	Requires basic auth access
`cookies`	`array`	No	`null`	Array of cookie objects injected before each page load. Maximum 50 cookies.	Requires basic auth access
`headers`	`object`	No	`null`	Custom HTTP headers sent with every request. Maximum 20 headers.	Requires basic auth access

PDF Options

Pass these inside a pdf_options object. They apply to every page in the website.

Parameter	Type	Default	Description
`page_size`	`string`	`"A4"`	Named page size. Ignored when both `page_width` and `page_height` are set.
`page_width`	`float`	`null`	Custom page width in millimeters. Both `page_width` and `page_height` must be set together.
`page_height`	`float`	`null`	Custom page height in millimeters.
`orientation`	`string`	`"portrait"`	`"portrait"` or `"landscape"`.
`margins`	`object`	`{"top": 10, "bottom": 10, "left": 10, "right": 10}`	Page margins in millimeters.
`scale`	`float`	`1.0`	Content scale factor. Range: `0.1` to `2.0`. Paginated mode only.
`grayscale`	`boolean`	`false`	Convert each PDF page to grayscale.
`header`	`object`	`null`	Page header for paginated mode. Format: `{"content": "<html>", "height": 15}`. Supports template variables: `{{page}}`, `{{total_pages}}`, `{{date}}`, `{{title}}`, `{{url}}`.
`footer`	`object`	`null`	Page footer for paginated mode. Same format as header.

Supported page sizes: A0, A1, A2, A3, A4, A5, A6, B0, B1, B2, B3, B4, B5, Letter, Legal, Tabloid, Ledger

Crawl Modes

`"auto"` (default)

Uses the highest crawl mode your plan allows. If your plan supports full crawling, it runs a full crawl. If your plan supports sitemap-only, it runs sitemap discovery.

`"sitemap"`

Discovers pages by parsing the website's sitemap.xml:

Fetches {base_url}/sitemap.xml (30-second timeout)
If the root element is <sitemapindex>, recursively fetches each child sitemap
Extracts all <url><loc> entries from <urlset> elements
Returns the full list of discovered URLs

Returns an error if the sitemap is missing, returns a non-200 status, contains invalid XML, or has no URLs.

`"full"`

Performs a comprehensive two-phase crawl:

Phase 1 -- Seed discovery:

Parses robots.txt for sitemap directives and crawl rules
Checks standard sitemap paths (/sitemap.xml, /wp-sitemap.xml, /sitemap_index.xml, etc.)
Discovers RSS/Atom feeds from <link> tags and common feed paths
Extracts seed URLs from all discovered sources

Phase 2 -- Breadth-first link crawl:

Starts from the base URL plus all seed URLs
Visits each page and enqueues same-domain links
Applies include_patterns and exclude_patterns to filter links
Respects robots.txt rules
Detects and avoids infinite URL traps (calendar pages, faceted filters, etc.)
Deduplicates URLs by normalizing scheme, host, query parameters, and stripping tracking parameters (utm_*, fbclid, gclid, etc.)

Default exclude patterns (when exclude_patterns is not provided):

Static assets: *.pdf, *.zip, *.jpg, *.png, *.gif, *.svg, *.css, *.js, *.xml, *.json, *.mp4, *.webm, *.woff, *.woff2
Protected paths: /login, /admin, /cart, /checkout
Deep pagination: URLs with page= parameters exceeding 3 digits

Response

202 Accepted (immediate)

{
    "status": "processing",
    "batch_id": "550e8400-e29b-41d4-a716-446655440000",
    "url_count": 42,
    "total_discovered": 42,
    "discovery_method": "sitemap",
    "output_format": "zip"
}

Field	Description
`batch_id`	UUID for tracking the job via batch status polling or webhook.
`url_count`	Number of pages that will be converted.
`total_discovered`	Total pages discovered by the crawl.
`discovery_method`	`"sitemap"` or `"full_crawl"` depending on the effective crawl mode.

Batch Status Polling

Poll with the batch_id from the 202 response:

GET /v1/convert/batch/{batch_id}
X-API-Key: sk_live_your_private_key

Returns aggregate status, per-URL statuses, and a presigned download URL for the ZIP when complete. See Batch Status Polling for the full response schema.

Webhook Callback Payload

When callback_url is provided, Enconvert sends a POST request on completion:

{
    "job_id": "batch-uuid",
    "status": "success",
    "batch_id": "550e8400-e29b-41d4-a716-446655440000",
    "gcs_uri": "env/files/{project_id}/url-to-pdf/website_20260405_123456789.zip",
    "filename": "website_20260405_123456789.zip",
    "file_size": 12345678,
    "total_tasks": 42,
    "successful_tasks": 40,
    "failed_tasks": 2,
    "tasks": [
        {"url": "https://example.com/", "status": "success", "filename": "example_20260405_001.pdf"},
        {"url": "https://example.com/about", "status": "success", "filename": "example_20260405_002.pdf"},
        {"url": "https://example.com/broken", "status": "failed", "error": "Timeout"}
    ]
}

Email Notification

A completion email is sent to notification_email (or the project owner's email by default) when the job finishes, regardless of success or failure.

Subscription Plan Gating

Feature	Free	Starter	Pro	Enterprise
Website capture	No	Yes	Yes	Yes
Sitemap crawl mode	No	Yes	Yes	Yes
Full crawl mode	No	No	Yes	Yes
Webhook callbacks	No	No	Yes	Yes
HTTP Basic Auth	No	Yes	Yes	Yes
Cookie injection	No	Yes	Yes	Yes
Custom headers	No	Yes	Yes	Yes
Batch size limit	0	Plan-based	Plan-based	Unlimited
Monthly conversions	100	Plan-based	Plan-based	Unlimited

Free plan: Website capture is not available on the free plan. Attempting to use this endpoint returns 403 Forbidden.

Code Examples

Python (Private Key)

import requests
import time

# Start the website capture
response = requests.post(
    "https://api.enconvert.com/v1/convert/website-to-pdf",
    headers={"X-API-Key": "sk_live_your_private_key"},
    json={
        "url": "https://example.com",
        "crawl_mode": "sitemap",
        "output_filename": "example-website",
        "pdf_options": {
            "page_size": "A4",
            "orientation": "portrait"
        }
    }
)

data = response.json()
print(f"Batch ID: {data['batch_id']}")
print(f"Pages found: {data['url_count']}")

# Poll for completion
batch_id = data["batch_id"]
while True:
    status = requests.get(
        f"https://api.enconvert.com/v1/convert/batch/{batch_id}",
        headers={"X-API-Key": "sk_live_your_private_key"}
    ).json()

    print(f"Status: {status['status']} ({status['completed']}/{status['total']})")

    if status["status"] in ("completed", "partial", "failed"):
        if status.get("zip_download_url"):
            print(f"Download: {status['zip_download_url']}")
        break

    time.sleep(5)

PHP (Private Key)

$ch = curl_init("https://api.enconvert.com/v1/convert/website-to-pdf");
curl_setopt_array($ch, [
    CURLOPT_POST => true,
    CURLOPT_RETURNTRANSFER => true,
    CURLOPT_HTTPHEADER => [
        "Content-Type: application/json",
        "X-API-Key: sk_live_your_private_key"
    ],
    CURLOPT_POSTFIELDS => json_encode([
        "url" => "https://example.com",
        "crawl_mode" => "sitemap",
        "output_filename" => "example-website",
        "pdf_options" => [
            "page_size" => "A4",
            "orientation" => "portrait"
        ]
    ])
]);

$response = json_decode(curl_exec($ch), true);
curl_close($ch);

echo "Batch ID: " . $response["batch_id"] . "\n";
echo "Pages found: " . $response["url_count"] . "\n";

Node.js (Private Key)

const response = await fetch("https://api.enconvert.com/v1/convert/website-to-pdf", {
    method: "POST",
    headers: {
        "Content-Type": "application/json",
        "X-API-Key": "sk_live_your_private_key"
    },
    body: JSON.stringify({
        url: "https://example.com",
        crawl_mode: "sitemap",
        output_filename: "example-website",
        pdf_options: {
            page_size: "A4",
            orientation: "portrait"
        }
    })
});

const data = await response.json();
console.log(`Batch ID: ${data.batch_id}`);
console.log(`Pages found: ${data.url_count}`);

// Poll for completion
const poll = async () => {
    const status = await fetch(
        `https://api.enconvert.com/v1/convert/batch/${data.batch_id}`,
        { headers: { "X-API-Key": "sk_live_your_private_key" } }
    ).then(r => r.json());

    console.log(`Status: ${status.status} (${status.completed}/${status.total})`);

    if (["completed", "partial", "failed"].includes(status.status)) {
        if (status.zip_download_url) console.log(`Download: ${status.zip_download_url}`);
        return;
    }
    setTimeout(poll, 5000);
};
poll();

Go (Private Key)

package main

import (
    "bytes"
    "encoding/json"
    "fmt"
    "io"
    "net/http"
)

func main() {
    body, _ := json.Marshal(map[string]interface{}{
        "url":             "https://example.com",
        "crawl_mode":      "sitemap",
        "output_filename": "example-website",
        "pdf_options": map[string]interface{}{
            "page_size":   "A4",
            "orientation": "portrait",
        },
    })

    req, _ := http.NewRequest("POST", "https://api.enconvert.com/v1/convert/website-to-pdf", bytes.NewBuffer(body))
    req.Header.Set("Content-Type", "application/json")
    req.Header.Set("X-API-Key", "sk_live_your_private_key")

    resp, _ := http.DefaultClient.Do(req)
    defer resp.Body.Close()

    respBody, _ := io.ReadAll(resp.Body)
    fmt.Println(string(respBody))
}

With Webhook Callback

{
    "url": "https://example.com",
    "crawl_mode": "full",
    "callback_url": "https://your-server.com/webhook/enconvert",
    "output_filename": "example-full-site",
    "include_patterns": [".*\\/blog\\/.*", ".*\\/docs\\/.*"],
    "pdf_options": {
        "page_size": "Letter",
        "margins": {"top": 20, "bottom": 20, "left": 15, "right": 15}
    }
}

With Authentication (Password-Protected Site)

{
    "url": "https://staging.example.com",
    "crawl_mode": "sitemap",
    "auth": {
        "username": "admin",
        "password": "staging-password"
    },
    "cookies": [
        {"name": "session_token", "value": "abc123", "domain": "staging.example.com"}
    ]
}

Error Responses

Status	Condition
`400 Bad Request`	Missing or empty `url` parameter
`400 Bad Request`	No URLs found in sitemap
`400 Bad Request`	Timeout fetching sitemap (30-second limit)
`400 Bad Request`	Non-200 response from sitemap URL
`400 Bad Request`	Invalid XML in sitemap
`400 Bad Request`	Unrecognized sitemap format
`400 Bad Request`	No pages discovered (full crawl found zero URLs)
`400 Bad Request`	Invalid `auth`, `cookies`, or `headers` structure
`402 Payment Required`	Monthly conversion limit exceeded by discovered page count
`402 Payment Required`	Storage limit reached
`403 Forbidden`	Website crawling not available on current plan (Free plan)
`403 Forbidden`	Full crawl mode requires Pro plan or higher
`403 Forbidden`	Discovered page count exceeds batch size limit
`403 Forbidden`	Feature not available on plan (webhook, basic auth)
`500 Internal Server Error`	Crawl or conversion failure

Limits

Limit	Value
Sitemap fetch timeout	30 seconds
Global crawl timeout (full mode)	10 minutes
Max crawl depth (full mode)	10 levels
Per-page crawl timeout (full mode)	30 seconds
Crawler memory limit	512 MB
Infinite trap threshold	20 URLs per URL pattern
Robots.txt fetch timeout	10 seconds
Max pages per crawl	Plan's batch size limit
Maximum cookies per request	50
Maximum custom headers per request	20
Webhook delivery timeout	30 seconds
Monthly conversions	Plan-dependent
File retention	Plan-dependent

API Documentation Beta