Cloudflare Browser Rendering Introduces Website Crawling API
Key Points
- Single API call crawls entire websites with automatic page discovery
- Returns content in HTML, Markdown, and structured JSON formats
- Respects robots.txt and AI Crawl Control for compliant crawling
Summary
Cloudflare has launched a new /crawl endpoint for Browser Rendering in open beta, enabling developers to crawl entire websites with a single API call. The service automatically discovers pages, renders them in a headless browser, and returns content in multiple formats.
Key Points
- Asynchronous crawling: Submit a URL, receive a job ID, and poll for results as pages are processed
- Multiple output formats: Returns content as HTML, Markdown, and structured JSON (powered by Workers AI)
- Flexible crawl controls: Configure depth limits, page limits, and URL pattern inclusion/exclusion
- Automatic page discovery: Finds URLs through sitemaps, page links, or both
- Incremental crawling: Use
modifiedSinceandmaxAgeparameters to skip unchanged or recently fetched pages - Static mode: Set
render: falsefor faster crawling of static sites without browser rendering - Compliance-focused: Respects robots.txt and AI Crawl Control by default as a signed-agent
- Availability: Works on both Workers Free and Paid plans
Limitations
- Cannot bypass Cloudflare bot detection or captchas
- Self-identifies as a bot during crawling