Website Scraper

Web Scraping API

Structured JSON from any URL in one request. The same AI extraction as the app — no selectors, no maintenance — behind a plain REST API. Included on Pro plans and above; keys live in Settings → API keys.

Two details worth knowing before you benchmark: repeat scrapes of recently fetched pages come back near-instantly, and the fetching layer underneath is independently audited (SOC 2 Type II).

Authentication

Pass your key as a bearer token. Keys are shown once at creation and stored hashed.

curl https://websitescraper.io/api/v1/scrape \
  -H "Authorization: Bearer ws_live_..." \
  -H "Content-Type: application/json" \
  -d '{"url": "https://books.toscrape.com/"}'

Scrape a page

POST /api/v1/scrape — body {url, prompt?, schema?}. Add a prompt to steer extraction, or a column schema to lock the output shape. One credit per page; failed jobs are refunded automatically.

curl https://websitescraper.io/api/v1/scrape \
  -H "Authorization: Bearer ws_live_..." \
  -H "Content-Type: application/json" \
  -d '{
    "url": "https://books.toscrape.com/",
    "prompt": "book name and price",
    "schema": {"columns": [{"name": "book_name"}, {"name": "price"}]}
  }'

# 200 OK
{
  "run_id": "d3f6…",
  "columns": ["book_name", "price"],
  "rows": [{"book_name": "A Light in the Attic", "price": "£51.77"}, …],
  "row_count": 20,
  "credits_charged": 1
}

Check a job

GET /api/v1/jobs/:id — status and results for any run, including multi-page crawls started in the app.

curl https://websitescraper.io/api/v1/jobs/RUN_ID \
  -H "Authorization: Bearer ws_live_..."

Run a saved scraper

Configure a scraper once in the dashboard, then trigger it from your pipeline.

# trigger
curl -X POST https://websitescraper.io/api/v1/scrapers/SCRAPER_ID/run \
  -H "Authorization: Bearer ws_live_..."

# history
curl https://websitescraper.io/api/v1/scrapers/SCRAPER_ID/runs \
  -H "Authorization: Bearer ws_live_..."

Errors

Errors are typed and human-readable: {"error": {"code", "message"}}.

CodeHTTPMeaning
INVALID_URL400Malformed, private, or unreachable URL
INVALID_KEY401Missing or revoked API key
INSUFFICIENT_CREDITS402Balance too low — buy a pack or upgrade
FETCH_BLOCKED403Site is on our do-not-scrape list
EXTRACTION_EMPTY422Page had no extractable data (refunded)
RATE_LIMITED429Over 20 scrapes/min — check X-RateLimit-* headers
ENGINE_ERROR502Upstream failure (refunded)

Rate limit: 20 scrapes per minute per account, reported via X-RateLimit-Limit / -Remaining / -Reset and Retry-After on 429s. Failed jobs always auto-refund — the pricing page spells out the trust rules.