Website Capture

Website capture endpoints convert all pages of a website into PDFs or screenshots in a single API call. The API automatically discovers pages using one of two methods — sitemap parsing or full algorithmic crawling — then processes each URL as a batch job and delivers the results as a ZIP archive.

Paid Plans Only: Website capture requires a paid plan with crawl_mode set to sitemap or full. Free plans will receive a 403 Forbidden response.
Private Keys Only: Website capture is only available when authenticating with a private API key. Public keys and dashboard tokens do not support this feature.

Available Endpoints

Endpoint Output Description
POST /v1/convert/website-to-pdf ZIP of PDFs Convert every discovered page to a PDF
POST /v1/convert/website-to-screenshot ZIP of PNGs Screenshot every discovered page

Discovery Modes

The API supports two methods for discovering pages on a website. The mode used depends on your plan and the optional crawl_mode parameter.

Sitemap Mode (crawl_mode: "sitemap")

Available on Starter, Pro, and Business plans.

  1. Fetches {url}/sitemap.xml and extracts all page URLs.
  2. If the sitemap is a sitemap index (pointing to child sitemaps), recursively fetches and parses each child sitemap.
  3. If no valid sitemap is found, the request fails with a 400 error.

This is the same behavior as before — simple and fast, but requires the target site to have a valid sitemap.xml.

Full Crawl Mode (crawl_mode: "full")

Available on Pro and Business plans only.

Full crawl uses a two-phase algorithmic discovery pipeline that can find pages even when a website has no sitemap or an incomplete one:

Phase 1 — Seed Discovery (fast, no browser):

  1. Fetches robots.txt to extract Sitemap: directives and disallow rules.
  2. Parses any sitemaps found (standard sitemap.xml + robots.txt sitemaps).
  3. Probes common sitemap paths (/wp-sitemap.xml, /sitemap_index.xml, etc.) if no sitemap was found.
  4. Discovers RSS/Atom feeds from the homepage or common paths (/feed, /rss) and extracts page URLs from feed entries.

Phase 2 — Algorithmic Link Crawl:

  1. Starting from the seed URLs, the crawler follows links on each page to discover additional pages using breadth-first search (BFS).
  2. The crawler automatically adapts per-page — using a fast HTTP parser for static pages and a full browser for JavaScript-rendered pages.
  3. Only same-domain HTML pages are followed. Binary files (images, PDFs, ZIPs), login pages, admin pages, and shopping cart URLs are automatically excluded.
  4. The crawl respects robots.txt disallow rules, detects infinite URL traps (calendars, paginated archives), and backs off on HTTP 429 rate-limit responses.

The full crawl is capped at your plan's batch limit (Pro: 100 pages, Business: 400 pages) and has a maximum duration of 10 minutes.

Auto Mode (crawl_mode: "auto") — Default

When you don't specify a crawl_mode, the API automatically uses the best mode available for your plan:

  • Starter: Sitemap mode
  • Pro / Business: Full crawl mode

How It Works

  1. You provide the base URL of a website (e.g. https://example.com).
  2. The API discovers pages using the active discovery mode.
  3. Each discovered URL is converted using the same browser engine as url-to-pdf / url-to-screenshot, with full Clear Capture Mode support.
  4. All converted files are bundled into a ZIP archive and uploaded to storage.
  5. You receive an email notification (or webhook callback) when the job completes, with the results available on the Activity page or via presigned URL.

Request Format

curl -X POST https://api.enconvert.com/v1/convert/website-to-pdf \
  -H "Authorization: Bearer YOUR_PRIVATE_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "url": "https://example.com"
  }'

Parameters

Parameter Type Default Description
url string (required) The base URL of the website to capture.
crawl_mode string "auto" Page discovery method: "auto" (uses best available for your plan), "sitemap" (sitemap.xml only), or "full" (algorithmic crawl). See Discovery Modes.
include_patterns array of strings null Regex patterns to whitelist URLs during full crawl. Only URLs matching at least one pattern will be followed.
exclude_patterns array of strings null Regex patterns to blacklist URLs during full crawl. Matching URLs will not be crawled or included.
notification_email string null Email address to notify when the job completes. If omitted, the project owner's email is used as a fallback.
callback_url string null Webhook URL to receive a POST request when the job completes. See Job Notifications.
output_filename string null Custom filename prefix for the output ZIP archive.
auth object null HTTP Basic Auth credentials for password-protected pages. See Authenticated Pages.
cookies array null Session cookies to inject before loading each page. See Authenticated Pages.
headers object null Custom HTTP headers to send with every request. See Authenticated Pages.
load_media boolean true Load images and media assets on each page before conversion.
enable_scroll boolean true Scroll through each page to trigger lazy-loaded content.
handle_sticky_header boolean true Neutralize sticky/fixed headers.
handle_cookies boolean true Dismiss cookie consent banners.
wait_for_images boolean true Wait for all images to finish loading.
single_page boolean false Render each page as a single continuous page (PDF only).
viewport_width integer 1920 Browser viewport width in pixels.
viewport_height integer 1080 Browser viewport height in pixels.
pdf_options object null PDF output configuration for website-to-pdf. Controls page size, orientation, margins, scale, grayscale, and headers/footers. Applied to each page in the batch. See URL Converters — PDF Options.

Response

Website capture is always asynchronous. The API immediately returns an HTTP 202 Accepted response.

{
  "status": "processing",
  "batch_id": "a1b2c3d4-e5f6-7890-abcd-ef1234567890",
  "url_count": 42,
  "total_discovered": 42,
  "discovery_method": "full_crawl",
  "output_format": "zip"
}
Field Description
status Always "processing" on success.
batch_id Unique identifier for tracking this batch job.
url_count Number of pages that will be converted.
total_discovered Total number of pages discovered during the discovery phase.
discovery_method The discovery method used: "sitemap" or "full_crawl".
output_format Always "zip" — all pages are bundled into a single archive.

When the job completes, you will receive a notification via email (and/or webhook if configured). The ZIP archive can be downloaded from the Activity page in the dashboard.


Sitemap Requirements (Sitemap Mode)

When using sitemap mode, the target website must have a valid sitemap.xml at its root URL. The API supports:

  • Standard sitemaps (<urlset>) — a flat list of <url><loc> entries.
  • Sitemap indexes (<sitemapindex>) — a list of child sitemaps that are recursively fetched and parsed.

If the sitemap cannot be fetched, is not valid XML, or contains no URLs, the API returns a 400 Bad Request error.

Tip: If the target website doesn't have a sitemap.xml, use crawl_mode: "full" (Pro/Business plans) to discover pages algorithmically.

Examples

Website to PDF (Auto Mode)

Uses the best discovery method available for your plan:

curl -X POST https://api.enconvert.com/v1/convert/website-to-pdf \
  -H "Authorization: Bearer YOUR_PRIVATE_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "url": "https://docs.example.com",
    "notification_email": "team@example.com",
    "output_filename": "docs-site-backup"
  }'

Full Crawl with URL Filtering

Crawl a website but only include blog pages:

curl -X POST https://api.enconvert.com/v1/convert/website-to-pdf \
  -H "Authorization: Bearer YOUR_PRIVATE_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "url": "https://www.example.com",
    "crawl_mode": "full",
    "include_patterns": [".*/blog/.*"],
    "exclude_patterns": [".*/tag/.*", ".*/author/.*"]
  }'

Sitemap Only

Force sitemap-only mode (faster, but requires sitemap.xml):

curl -X POST https://api.enconvert.com/v1/convert/website-to-screenshot \
  -H "Authorization: Bearer YOUR_PRIVATE_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "url": "https://www.example.com",
    "crawl_mode": "sitemap",
    "viewport_width": 1440,
    "viewport_height": 900
  }'

Website to PDF with PDF Options

curl -X POST https://api.enconvert.com/v1/convert/website-to-pdf \
  -H "Authorization: Bearer YOUR_PRIVATE_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "url": "https://docs.example.com",
    "notification_email": "team@example.com",
    "pdf_options": {
      "page_size": "A4",
      "margins": { "top": 20, "bottom": 20, "left": 15, "right": 15 },
      "footer": {
        "content": "<div style=\"font-size: 9px; width: 100%; text-align: center;\">Page {{page}} of {{total_pages}}</div>",
        "height": 12
      }
    }
  }'

Error Responses

No Sitemap Found (400) — Sitemap Mode

{
  "detail": "Could not fetch sitemap: https://example.com/sitemap.xml returned 404"
}

No Pages Discovered (400) — Full Crawl Mode

{
  "detail": "No pages discovered on https://example.com"
}

Empty Sitemap (400)

{
  "detail": "No URLs found in sitemap: https://example.com/sitemap.xml"
}

Free Plan (403)

{
  "detail": "Website crawling is not available on your current plan. Please upgrade to access this feature."
}

Full Crawl Not Available (403)

Returned when a Starter plan user requests crawl_mode: "full":

{
  "detail": "Full website crawling requires a Pro plan or higher. Your plan supports sitemap-based crawling only."
}

Batch Limit Exceeded (403)

Returned when the number of discovered URLs exceeds your plan's batch limit.

{
  "detail": "Batch size 150 exceeds your plan limit of 50 URLs per batch."
}

Plan Comparison

Feature Starter Pro Business
Sitemap mode Yes Yes Yes
Full crawl mode No Yes Yes
Max pages per batch 50 100 400
Discovery timeout 10 min 10 min