website-to-pdf

Discover all pages of a website via sitemap or full crawl, convert each page to a high-fidelity PDF, and bundle the results into a single ZIP archive. Always runs asynchronously. Requires a paid plan.


Endpoint

POST /v1/convert/website-to-pdf

Content-Type: application/json

Output format: ZIP archive containing one PDF per discovered page.

Mode: Always asynchronous. Returns HTTP 202 immediately.


Authentication

This endpoint requires a private API key. Public keys are not supported for website capture.

X-API-Key: sk_live_your_private_key

Request Parameters

Website Discovery Parameters

Parameter Type Required Default Description Plan Gating
url string Yes -- The website base URL (e.g. https://example.com). Used as the root for page discovery. --
crawl_mode string No "auto" URL discovery method. One of "auto", "sitemap", or "full". See Crawl Modes below. Sitemap requires Starter+, Full requires Pro+
include_patterns string[] No null Regex patterns to whitelist discovered URLs. Only used in full crawl mode. --
exclude_patterns string[] No System defaults Regex patterns to blacklist URLs. Only used in full crawl mode. When omitted, uses built-in defaults that exclude static assets, login/admin/cart pages, and deep pagination. --

Notification Parameters

Parameter Type Required Default Description Plan Gating
output_filename string No Auto-generated Custom base name for the output ZIP file. Timestamp is appended automatically. --
notification_email string No Project owner email Email address to notify when the job completes. --
callback_url string No -- Webhook URL to receive a POST request on completion. Requires webhook access

Browser & Rendering Parameters

These settings apply to the conversion of each individual page within the website.

Parameter Type Required Default Description Plan Gating
viewport_width integer No 1920 Browser viewport width in pixels. --
viewport_height integer No 1080 Browser viewport height in pixels. --
single_page boolean No true true renders each page as one continuous PDF page. false produces paginated output using pdf_options page size. --
load_media boolean No true Wait for all images and videos to fully load before conversion. --
enable_scroll boolean No true Scroll each page to trigger lazy-loading content. --
handle_sticky_header boolean No true Detect sticky/fixed headers and handle them before capture. --
handle_cookies boolean No true Auto-dismiss cookie consent banners. --
wait_for_images boolean No true Wait for all <img> elements to finish loading. --

Authentication & Custom Requests

Parameter Type Required Default Description Plan Gating
auth object No null HTTP Basic Auth credentials applied to every page. Format: {"username": "...", "password": "..."}. Requires basic auth access
cookies array No null Array of cookie objects injected before each page load. Maximum 50 cookies. Requires basic auth access
headers object No null Custom HTTP headers sent with every request. Maximum 20 headers. Requires basic auth access

PDF Options

Pass these inside a pdf_options object. They apply to every page in the website.

Parameter Type Default Description
page_size string "A4" Named page size. Ignored when both page_width and page_height are set.
page_width float null Custom page width in millimeters. Both page_width and page_height must be set together.
page_height float null Custom page height in millimeters.
orientation string "portrait" "portrait" or "landscape".
margins object {"top": 10, "bottom": 10, "left": 10, "right": 10} Page margins in millimeters.
scale float 1.0 Content scale factor. Range: 0.1 to 2.0. Paginated mode only.
grayscale boolean false Convert each PDF page to grayscale.
header object null Page header for paginated mode. Format: {"content": "<html>", "height": 15}. Supports template variables: {{page}}, {{total_pages}}, {{date}}, {{title}}, {{url}}.
footer object null Page footer for paginated mode. Same format as header.

Supported page sizes: A0, A1, A2, A3, A4, A5, A6, B0, B1, B2, B3, B4, B5, Letter, Legal, Tabloid, Ledger


Crawl Modes

"auto" (default)

Uses the highest crawl mode your plan allows. If your plan supports full crawling, it runs a full crawl. If your plan supports sitemap-only, it runs sitemap discovery.

"sitemap"

Discovers pages by parsing the website's sitemap.xml:

  1. Fetches {base_url}/sitemap.xml (30-second timeout)
  2. If the root element is <sitemapindex>, recursively fetches each child sitemap
  3. Extracts all <url><loc> entries from <urlset> elements
  4. Returns the full list of discovered URLs

Returns an error if the sitemap is missing, returns a non-200 status, contains invalid XML, or has no URLs.

"full"

Performs a comprehensive two-phase crawl:

Phase 1 -- Seed discovery:

  1. Parses robots.txt for sitemap directives and crawl rules
  2. Checks standard sitemap paths (/sitemap.xml, /wp-sitemap.xml, /sitemap_index.xml, etc.)
  3. Discovers RSS/Atom feeds from <link> tags and common feed paths
  4. Extracts seed URLs from all discovered sources

Phase 2 -- Breadth-first link crawl:

  1. Starts from the base URL plus all seed URLs
  2. Visits each page and enqueues same-domain links
  3. Applies include_patterns and exclude_patterns to filter links
  4. Respects robots.txt rules
  5. Detects and avoids infinite URL traps (calendar pages, faceted filters, etc.)
  6. Deduplicates URLs by normalizing scheme, host, query parameters, and stripping tracking parameters (utm_*, fbclid, gclid, etc.)

Default exclude patterns (when exclude_patterns is not provided):

  • Static assets: *.pdf, *.zip, *.jpg, *.png, *.gif, *.svg, *.css, *.js, *.xml, *.json, *.mp4, *.webm, *.woff, *.woff2
  • Protected paths: /login, /admin, /cart, /checkout
  • Deep pagination: URLs with page= parameters exceeding 3 digits

Response

202 Accepted (immediate)

{
    "status": "processing",
    "batch_id": "550e8400-e29b-41d4-a716-446655440000",
    "url_count": 42,
    "total_discovered": 42,
    "discovery_method": "sitemap",
    "output_format": "zip"
}
Field Description
batch_id UUID for tracking the job via batch status polling or webhook.
url_count Number of pages that will be converted.
total_discovered Total pages discovered by the crawl.
discovery_method "sitemap" or "full_crawl" depending on the effective crawl mode.

Batch Status Polling

Poll with the batch_id from the 202 response:

GET /v1/convert/batch/{batch_id}
X-API-Key: sk_live_your_private_key

Returns aggregate status, per-URL statuses, and a presigned download URL for the ZIP when complete. See Batch Status Polling for the full response schema.

Webhook Callback Payload

When callback_url is provided, Enconvert sends a POST request on completion:

{
    "job_id": "batch-uuid",
    "status": "success",
    "batch_id": "550e8400-e29b-41d4-a716-446655440000",
    "gcs_uri": "env/files/{project_id}/url-to-pdf/website_20260405_123456789.zip",
    "filename": "website_20260405_123456789.zip",
    "file_size": 12345678,
    "total_tasks": 42,
    "successful_tasks": 40,
    "failed_tasks": 2,
    "tasks": [
        {"url": "https://example.com/", "status": "success", "filename": "example_20260405_001.pdf"},
        {"url": "https://example.com/about", "status": "success", "filename": "example_20260405_002.pdf"},
        {"url": "https://example.com/broken", "status": "failed", "error": "Timeout"}
    ]
}

Email Notification

A completion email is sent to notification_email (or the project owner's email by default) when the job finishes, regardless of success or failure.


Subscription Plan Gating

Feature Free Starter Pro Enterprise
Website capture No Yes Yes Yes
Sitemap crawl mode No Yes Yes Yes
Full crawl mode No No Yes Yes
Webhook callbacks No No Yes Yes
HTTP Basic Auth No Yes Yes Yes
Cookie injection No Yes Yes Yes
Custom headers No Yes Yes Yes
Batch size limit 0 Plan-based Plan-based Unlimited
Monthly conversions 100 Plan-based Plan-based Unlimited
Free plan: Website capture is not available on the free plan. Attempting to use this endpoint returns 403 Forbidden.

Code Examples

Python (Private Key)

import requests
import time

# Start the website capture
response = requests.post(
    "https://api.enconvert.com/v1/convert/website-to-pdf",
    headers={"X-API-Key": "sk_live_your_private_key"},
    json={
        "url": "https://example.com",
        "crawl_mode": "sitemap",
        "output_filename": "example-website",
        "pdf_options": {
            "page_size": "A4",
            "orientation": "portrait"
        }
    }
)

data = response.json()
print(f"Batch ID: {data['batch_id']}")
print(f"Pages found: {data['url_count']}")

# Poll for completion
batch_id = data["batch_id"]
while True:
    status = requests.get(
        f"https://api.enconvert.com/v1/convert/batch/{batch_id}",
        headers={"X-API-Key": "sk_live_your_private_key"}
    ).json()

    print(f"Status: {status['status']} ({status['completed']}/{status['total']})")

    if status["status"] in ("completed", "partial", "failed"):
        if status.get("zip_download_url"):
            print(f"Download: {status['zip_download_url']}")
        break

    time.sleep(5)

PHP (Private Key)

$ch = curl_init("https://api.enconvert.com/v1/convert/website-to-pdf");
curl_setopt_array($ch, [
    CURLOPT_POST => true,
    CURLOPT_RETURNTRANSFER => true,
    CURLOPT_HTTPHEADER => [
        "Content-Type: application/json",
        "X-API-Key: sk_live_your_private_key"
    ],
    CURLOPT_POSTFIELDS => json_encode([
        "url" => "https://example.com",
        "crawl_mode" => "sitemap",
        "output_filename" => "example-website",
        "pdf_options" => [
            "page_size" => "A4",
            "orientation" => "portrait"
        ]
    ])
]);

$response = json_decode(curl_exec($ch), true);
curl_close($ch);

echo "Batch ID: " . $response["batch_id"] . "\n";
echo "Pages found: " . $response["url_count"] . "\n";

Node.js (Private Key)

const response = await fetch("https://api.enconvert.com/v1/convert/website-to-pdf", {
    method: "POST",
    headers: {
        "Content-Type": "application/json",
        "X-API-Key": "sk_live_your_private_key"
    },
    body: JSON.stringify({
        url: "https://example.com",
        crawl_mode: "sitemap",
        output_filename: "example-website",
        pdf_options: {
            page_size: "A4",
            orientation: "portrait"
        }
    })
});

const data = await response.json();
console.log(`Batch ID: ${data.batch_id}`);
console.log(`Pages found: ${data.url_count}`);

// Poll for completion
const poll = async () => {
    const status = await fetch(
        `https://api.enconvert.com/v1/convert/batch/${data.batch_id}`,
        { headers: { "X-API-Key": "sk_live_your_private_key" } }
    ).then(r => r.json());

    console.log(`Status: ${status.status} (${status.completed}/${status.total})`);

    if (["completed", "partial", "failed"].includes(status.status)) {
        if (status.zip_download_url) console.log(`Download: ${status.zip_download_url}`);
        return;
    }
    setTimeout(poll, 5000);
};
poll();

Go (Private Key)

package main

import (
    "bytes"
    "encoding/json"
    "fmt"
    "io"
    "net/http"
)

func main() {
    body, _ := json.Marshal(map[string]interface{}{
        "url":             "https://example.com",
        "crawl_mode":      "sitemap",
        "output_filename": "example-website",
        "pdf_options": map[string]interface{}{
            "page_size":   "A4",
            "orientation": "portrait",
        },
    })

    req, _ := http.NewRequest("POST", "https://api.enconvert.com/v1/convert/website-to-pdf", bytes.NewBuffer(body))
    req.Header.Set("Content-Type", "application/json")
    req.Header.Set("X-API-Key", "sk_live_your_private_key")

    resp, _ := http.DefaultClient.Do(req)
    defer resp.Body.Close()

    respBody, _ := io.ReadAll(resp.Body)
    fmt.Println(string(respBody))
}

With Webhook Callback

{
    "url": "https://example.com",
    "crawl_mode": "full",
    "callback_url": "https://your-server.com/webhook/enconvert",
    "output_filename": "example-full-site",
    "include_patterns": [".*\\/blog\\/.*", ".*\\/docs\\/.*"],
    "pdf_options": {
        "page_size": "Letter",
        "margins": {"top": 20, "bottom": 20, "left": 15, "right": 15}
    }
}

With Authentication (Password-Protected Site)

{
    "url": "https://staging.example.com",
    "crawl_mode": "sitemap",
    "auth": {
        "username": "admin",
        "password": "staging-password"
    },
    "cookies": [
        {"name": "session_token", "value": "abc123", "domain": "staging.example.com"}
    ]
}

Error Responses

Status Condition
400 Bad Request Missing or empty url parameter
400 Bad Request No URLs found in sitemap
400 Bad Request Timeout fetching sitemap (30-second limit)
400 Bad Request Non-200 response from sitemap URL
400 Bad Request Invalid XML in sitemap
400 Bad Request Unrecognized sitemap format
400 Bad Request No pages discovered (full crawl found zero URLs)
400 Bad Request Invalid auth, cookies, or headers structure
402 Payment Required Monthly conversion limit exceeded by discovered page count
402 Payment Required Storage limit reached
403 Forbidden Website crawling not available on current plan (Free plan)
403 Forbidden Full crawl mode requires Pro plan or higher
403 Forbidden Discovered page count exceeds batch size limit
403 Forbidden Feature not available on plan (webhook, basic auth)
500 Internal Server Error Crawl or conversion failure

Limits

Limit Value
Sitemap fetch timeout 30 seconds
Global crawl timeout (full mode) 10 minutes
Max crawl depth (full mode) 10 levels
Per-page crawl timeout (full mode) 30 seconds
Crawler memory limit 512 MB
Infinite trap threshold 20 URLs per URL pattern
Robots.txt fetch timeout 10 seconds
Max pages per crawl Plan's batch size limit
Maximum cookies per request 50
Maximum custom headers per request 20
Webhook delivery timeout 30 seconds
Monthly conversions Plan-dependent
File retention Plan-dependent