Close Menu

    Subscribe to Updates

    Get the latest creative news from FooBar about art, design and business.

    What's Hot

    Japanese devs face font licensing dilemma as leading provider increases annual plan price from $380 to $20,000+

    Indie dev Chequered Ink puts together $10 10,000 game assets pack so developers “don’t feel the need to turn to AI”

    Valorant Mobile is China’s biggest mobile launch of 2025 | News-in-Brief

    Facebook X (Twitter) Instagram
    • Artificial Intelligence
    • Business Technology
    • Cryptocurrency
    • Gadgets
    • Gaming
    • Health
    • Software and Apps
    • Technology
    Facebook X (Twitter) Instagram Pinterest Vimeo
    Tech AI Verse
    • Home
    • Artificial Intelligence

      Apple’s AI chief abruptly steps down

      December 3, 2025

      The issue that’s scrambling both parties: From the Politics Desk

      December 3, 2025

      More of Silicon Valley is building on free Chinese AI

      December 1, 2025

      From Steve Bannon to Elizabeth Warren, backlash erupts over push to block states from regulating AI

      November 23, 2025

      Insurance companies are trying to avoid big payouts by making AI safer

      November 19, 2025
    • Business

      Public GitLab repositories exposed more than 17,000 secrets

      November 29, 2025

      ASUS warns of new critical auth bypass flaw in AiCloud routers

      November 28, 2025

      Windows 11 gets new Cloud Rebuild, Point-in-Time Restore tools

      November 18, 2025

      Government faces questions about why US AWS outage disrupted UK tax office and banking firms

      October 23, 2025

      Amazon’s AWS outage knocked services like Alexa, Snapchat, Fortnite, Venmo and more offline

      October 21, 2025
    • Crypto

      Five Cryptocurrencies That Often Rally Around Christmas

      December 3, 2025

      Why Trump-Backed Mining Company Struggles Despite Bitcoin’s Recovery

      December 3, 2025

      XRP ETFs Extend 11-Day Inflow Streak as $1 Billion Mark Nears

      December 3, 2025

      Why AI-Driven Crypto Exploits Are More Dangerous Than Ever Before

      December 3, 2025

      Bitcoin Is Recovering, But Can It Drop Below $80,000 Again?

      December 3, 2025
    • Technology

      Criteo CEO Michael Komasinski on agentic commerce, experiments with LLMs, and M&A rumors

      December 3, 2025

      Future of TV Briefing: The streaming ad upfront trends, programmatic priorities revealed in Q3 2025 earnings reports

      December 3, 2025

      Omnicom’s reshuffled leadership emerges as the ad industry’s new power players

      December 3, 2025

      OpenX redraws the SSP-agency relationship

      December 3, 2025

      TikTok Shop sheds bargain-bin reputation as average prices climb across categories

      December 3, 2025
    • Others
      • Gadgets
      • Gaming
      • Health
      • Software and Apps
    Check BMI
    Tech AI Verse
    You are at:Home»Technology»Exploring Large HTML Documents on the Web
    Technology

    Exploring Large HTML Documents on the Web

    TechAiVerseBy TechAiVerseDecember 3, 2025No Comments8 Mins Read0 Views
    Facebook Twitter Pinterest Telegram LinkedIn Tumblr Email Reddit
    Exploring Large HTML Documents on the Web
    Share
    Facebook Twitter LinkedIn Pinterest WhatsApp Email

    Exploring Large HTML Documents on the Web

    Most HTML documents are relatively small, providing a starting point for other resources on the page to load.

    But why do some websites load several megabytes of HTML code? Usually it’s not that there’s a lot of content on the page, but rather that other types of resources are embedded within the document.

    In this article, we’ll look at examples of large HTML documents around the web and peek into the code to see what’s making them so big.

    HTML on the web is full of surprises. In the process of writing this article I rebuilt most of the DebugBear HTML Size Analyzer. If your HTML contains scripts that contain JSON that contains HTML that contains CSS that contains images – that’s supported now!

    Embedded images

    Base64 encoding is a way to turn images into text, so that they can be embedded in a text file like HTML or CSS. Embedding images directly in the HTML has a big advantage: the browser no longer needs to make a separate request to display the image.

    However, for large files it’s likely to cause problems. For example, the image can no longer be cached independently, and the image will be prioritized in the same way as the document content, while usually it’s ok for images to load later.

    Here’s an example of PNG files that are embedded in HTML using data URLs.

    There are different variations of this pattern:

    • Sometimes it’s a single multi-megabyte image that was included accidentally, other times there are hundreds of small icons that added up over time
    • I saw a site using responsive images together with data URLs. One goal of responsive images is only loading images at the minimum necessary resolution, but embedding all versions in the HTML has the opposite effect.
    • Indirectly embedded images:
      • Inline SVGs that are themselves a thin wrapper around PNG or JPEG
      • Background images from inlined CSS stylesheets
      • Images within JSON data (more on that later 😬)

    Here’s an example of a style tag that contains 201 rules with embedded background images.

    Inline CSS

    Large inline CSS is usually due to images. However, long selectors from deeply nested CSS also contribute to CSS and HTML size.

    In the example below, the HTML contains 20 inline style tags with similar content (variations like “header”, “header-mobile” and “header-desktop”). Most selectors are over 200 characters long, and as a result 47% of the overall stylesheet content consists of selectors instead of style declarations.

    However, the HTML compresses well due to repetition within the selectors, and the size goes from 20.5 megabytes to only 2.3 megabytes after GZIP compression.

    Embedded fonts

    Like images, fonts are also sometimes encoded as Base64. For one or two small fonts this can actually work well, as text can render with the proper font right away.

    However, when many fonts are embedded, it means visitors have to wait for these fonts to finish downloading before page content can render.

    Client-side application state

    Many modern websites are built as JavaScript applications. It would be slow to only show content after all JavaScript and required data has loaded, so during the initial page load the HTML is also rendered on the server.

    Once the client-side application code has loaded, the static HTML is “hydrated”: the page content is made interactive with JavaScript, and client-side code takes control of future content updates.

    Normally client-side code makes fetch requests to API endpoints on the backend to load in required data. But, since the initial client-side render requires the same data as the server-side rendering process, servers embed the hydration state in the final HTML. Then, the client-side hydration can take place right after loading all JavaScript, without making any additional API requests.

    As you can guess, this hydration state can be big! You can identify it based on script tags that reference framework-specific keywords like this:

    • Next.js: self.__next_f.push or __NEXT_DATA__
    • Nuxt: __NUXT_DATA__
    • Redux: __PRELOADED_STATE__
    • Apollo: __APOLLO_STATE__
    • Angular: ng-state or similar
    • __INITIAL_STATE__ or __INITIAL_DATA__ in many custom setups

    In a local development environment with little data the size of the hydration state might not be noticeable. But as more data is added to the production database, the hydration state also grows. For example, a list of hotels references 3,561 different images (which, thankfully, are not embedded as Base64 😅).

    If you pass Base64 images into your front-end components, they will also end up in the hydration state.

    This website has 42 images embedded within the JSON data inside of the HTML document. The biggest image has a size of 2.5 megabytes.

    There’s a surprising amount of nesting going. In the previous example we have images in JSON in a script in the HTML.

    But we can go deeper than that! Let’s dive into our next example:

    After digging into the hydration state, we find 52 products with a judgmeWidget property. The value of this property is itself an HTML fragment!

    Let’s put one of those values into the HTML Size Analyzer. Once again, most of the HTML is actually embedded JSON code, this time in the form of a data-json attribute on a div!

    And what’s the name of the biggest property in that JSON? body_html 😂😂😂

    Other causes of large HTML

    A few more examples I’ve seen during my research:

    • A 4-megabyte inline script
    • Unexpected metadata from Figma
    • A megamenu with over 7,000 items and 1,300 inline SVGs
    • Responsive images with 180 supported sizes

    There are still some large websites that still don’t apply GZIP or Brotli compression to their HTML. So while there’s not a lot of code, you still get a large transfer size.

    Seeing a 53 kilobyte NREUM script is also always frustrating: many websites embed New Relic’s end user monitoring script directly into the document . If you measure user experience you really want to avoid that performance impact!

    How does HTML size impact page speed?

    HTML code needs to be downloaded and parsed as part of the page load process. The more time this takes, the longer visitors have to wait for content to show up.

    Browsers also assign a high priority to HTML content, assuming all of it is essential page content. That can mean that non-critical hydration state is downloaded before render-blocking stylesheets and JavaScript files are loaded.

    You can see an example of that in this request waterfall from the DebugBear website speed test. While the browser knows about the other files early on, all bandwidth is instead consumed by the document.

    Embedding images or fonts in the HTML also means that these files can’t be cached and re-used across pages. Instead they need to be redownloaded for every page load on the website.

    Is time spent parsing HTML also a concern? On my MacBook it takes about 6 milliseconds to parse one megabyte of HTML code. In contrast, the low-end phone I use for testing takes about 80 milliseconds per megabyte. So for very large documents, CPU processing starts becoming a factor worth thinking about.

    Websites with large HTML can still be fast

    As you can tell, I might have a bit of an obsession with HTML size. But is it really a problem for many real visitors?

    I don’t want to make large HTML files out to be a bigger issue than they really are. Most visitors coming to your website today probably have reasonably fast connections and devices. Other web performance problems tend to be more pressing. (Like actually running the JavaScript application code that’s using the hydration state.)

    Pages also don’t need to download the full HTML document before they can start rendering. Here you can see that the document and important stylesheets are loaded in parallel. As a result, the main content renders before the document is fully loaded.

    The real visitor data from Google’s Chrome User Experience Report (CrUX) shows that this website typically renders under 2 seconds. And that’s on a mobile device!

    Still, the large document is definitely slowing the page down. One indicator of that is that the Largest Contentful Paint (LCP) image does not show up right away after loading. Instead, CrUX reports 584 milliseconds of render delay.

    This tells us that the render-blocking stylesheet, which competes with other resources on the main website server, is loading more slowly than images from a different server.

    It’s worth taking a quick look at your website HTML and to check what it actually contains. Often there are quick high-impact fixes you can make.

    When images are inlined in HTML or CSS code it’s often intended to be a performance optimization. But a good setup can make it too easy to add more images later on without ever looking at the file being embedded. Consider adding guardrails to your CI build to catch unintended jumps in file size.

    Share. Facebook Twitter Pinterest LinkedIn Reddit WhatsApp Telegram Email
    Previous ArticleAI generated font using Nano Banana
    Next Article DOOM could have had PC Speaker Music
    TechAiVerse
    • Website

    Jonathan is a tech enthusiast and the mind behind Tech AI Verse. With a passion for artificial intelligence, consumer tech, and emerging innovations, he deliver clear, insightful content to keep readers informed. From cutting-edge gadgets to AI advancements and cryptocurrency trends, Jonathan breaks down complex topics to make technology accessible to all.

    Related Posts

    Criteo CEO Michael Komasinski on agentic commerce, experiments with LLMs, and M&A rumors

    December 3, 2025

    Future of TV Briefing: The streaming ad upfront trends, programmatic priorities revealed in Q3 2025 earnings reports

    December 3, 2025

    Omnicom’s reshuffled leadership emerges as the ad industry’s new power players

    December 3, 2025
    Leave A Reply Cancel Reply

    Top Posts

    Ping, You’ve Got Whale: AI detection system alerts ships of whales in their path

    April 22, 2025467 Views

    Lumo vs. Duck AI: Which AI is Better for Your Privacy?

    July 31, 2025159 Views

    6.7 Cummins Lifter Failure: What Years Are Affected (And Possible Fixes)

    April 14, 202584 Views

    Is Libby Compatible With Kobo E-Readers?

    March 31, 202563 Views
    Don't Miss
    Gaming December 3, 2025

    Japanese devs face font licensing dilemma as leading provider increases annual plan price from $380 to $20,000+

    Japanese devs face font licensing dilemma as leading provider increases annual plan price from $380…

    Indie dev Chequered Ink puts together $10 10,000 game assets pack so developers “don’t feel the need to turn to AI”

    Valorant Mobile is China’s biggest mobile launch of 2025 | News-in-Brief

    Epic Games Store decides “at the last minute” not to distribute Horses

    Stay In Touch
    • Facebook
    • Twitter
    • Pinterest
    • Instagram
    • YouTube
    • Vimeo

    Subscribe to Updates

    Get the latest creative news from SmartMag about art & design.

    About Us
    About Us

    Welcome to Tech AI Verse, your go-to destination for everything technology! We bring you the latest news, trends, and insights from the ever-evolving world of tech. Our coverage spans across global technology industry updates, artificial intelligence advancements, machine learning ethics, and automation innovations. Stay connected with us as we explore the limitless possibilities of technology!

    Facebook X (Twitter) Pinterest YouTube WhatsApp
    Our Picks

    Japanese devs face font licensing dilemma as leading provider increases annual plan price from $380 to $20,000+

    December 3, 20250 Views

    Indie dev Chequered Ink puts together $10 10,000 game assets pack so developers “don’t feel the need to turn to AI”

    December 3, 20250 Views

    Valorant Mobile is China’s biggest mobile launch of 2025 | News-in-Brief

    December 3, 20250 Views
    Most Popular

    Apple thinks people won’t use MagSafe on iPhone 16e

    March 12, 20250 Views

    Volkswagen’s cheapest EV ever is the first to use Rivian software

    March 12, 20250 Views

    Startup studio Hexa acquires majority stake in Veevart, a vertical SaaS platform for museums

    March 12, 20250 Views
    © 2025 TechAiVerse. Designed by Divya Tech.
    • Home
    • About Us
    • Contact Us
    • Privacy Policy
    • Terms & Conditions

    Type above and press Enter to search. Press Esc to cancel.