Inside Googlebot: crawling, 2MB limits, and rendering implications
Key Points
- Googlebot is a centralized crawler platform
- HTML fetch capped at 2MB (PDFs 64MB)
- WRS only renders bytes actually fetched
Summary
This post explains how Googlebot is actually a set of clients using a centralized crawling platform, highlights per-URL byte limits (notably a 2MB HTML fetch limit), and describes how the Web Rendering Service (WRS) processes only the bytes that the crawler retrieved. Bytes beyond the configured cutoff are ignored and never indexed or rendered.
Key Points
- Googlebot is an umbrella user of a centralized crawler platform; many Google products route fetches through it.
- Default per-URL fetch limits: HTML and most resources — 2MB (includes HTTP headers); PDFs — 64MB; unspecified crawlers — 15MB.
- If a resource exceeds the per-URL limit, the fetch stops at the cutoff and the remainder is ignored (not rendered or indexed).
- WRS renders client-side JS/CSS only from the bytes the fetcher retrieved; it operates statelessly (clears local/session storage between requests).
- External resources (scripts, styles) are fetched separately and have their own per-URL counters; images/videos are typically not rendered by WRS.
Practical guidance for engineers
- Keep initial HTML lean: move large CSS/JS and base64 blobs to external files to avoid hitting the 2MB cutoff.
- Place critical metadata and structured data early in the document (title, meta, canonicals, essential JSON-LD).
- Monitor server response times and logs; slow servers cause fetchers to back off and reduce crawl frequency.
- Test pages with realistic fetch-size constraints and ensure client-side rendering doesn’t rely on bytes that may be truncated.
Takeaway
Treat crawling as a byte-limited exchange: prioritize and surface your most important content early and externally reference heavy assets so Google’s fetcher and renderer can reliably see and index your site.