Close Menu

    Subscribe to Updates

    Get the latest creative news from FooBar about art, design and business.

    What's Hot

    Google is sunsetting the weather app on Android

    Nvidia could launch its first laptops with its own processors later this year

    AMD reportedly pauses Ryzen Z1 drivers for gaming handhelds

    Facebook X (Twitter) Instagram
    • Artificial Intelligence
    • Business Technology
    • Cryptocurrency
    • Gadgets
    • Gaming
    • Health
    • Software and Apps
    • Technology
    Facebook X (Twitter) Instagram Pinterest Vimeo
    Tech AI Verse
    • Home
    • Artificial Intelligence

      Tensions between the Pentagon and AI giant Anthropic reach a boiling point

      February 21, 2026

      Read the extended transcript: President Donald Trump interviewed by ‘NBC Nightly News’ anchor Tom Llamas

      February 6, 2026

      Stocks and bitcoin sink as investors dump software company shares

      February 4, 2026

      AI, crypto and Trump super PACs stash millions to spend on the midterms

      February 2, 2026

      To avoid accusations of AI cheating, college students are turning to AI

      January 29, 2026
    • Business

      Gartner: Why neoclouds are the future of GPU-as-a-Service

      February 21, 2026

      The HDD brand that brought you the 1.8-inch, 2.5-inch, and 3.5-inch hard drives is now back with a $19 pocket-sized personal cloud for your smartphones

      February 12, 2026

      New VoidLink malware framework targets Linux cloud servers

      January 14, 2026

      Nvidia Rubin’s rack-scale encryption signals a turning point for enterprise AI security

      January 13, 2026

      How KPMG is redefining the future of SAP consulting on a global scale

      January 10, 2026
    • Crypto

      XRP Struggles as On-Chain Stress Mounts: Is a Bottom Forming?

      February 23, 2026

      Vitalik Buterin Sold Over 8,800 ETH in February: Did It Impact the Price?

      February 23, 2026

      Vitalik Buterin Explains How Crypto Can Protect Users When Perfect Security Remains Impossible

      February 23, 2026

      Ethereum, Solana Defy L1 Myth — Bitwise CIO Sees Prediction Markets Changing Everything

      February 23, 2026

      5 Critical Factors That Could End Gold’s 7-Month Green Streak

      February 23, 2026
    • Technology

      Google is sunsetting the weather app on Android

      February 23, 2026

      Nvidia could launch its first laptops with its own processors later this year

      February 23, 2026

      AMD reportedly pauses Ryzen Z1 drivers for gaming handhelds

      February 23, 2026

      Here’s your chance to grab a cheaper Cybertruck but you have to hurry

      February 23, 2026

      Rocket reentries are leaving measurable lithium pollution in the upper atmosphere

      February 23, 2026
    • Others
      • Gadgets
      • Gaming
      • Health
      • Software and Apps
    Check BMI
    Tech AI Verse
    You are at:Home»Technology»I ported JustHTML from Python to JavaScript with Codex CLI and GPT-5.2 in hours
    Technology

    I ported JustHTML from Python to JavaScript with Codex CLI and GPT-5.2 in hours

    TechAiVerseBy TechAiVerseDecember 17, 2025No Comments9 Mins Read2 Views
    Facebook Twitter Pinterest Telegram LinkedIn Tumblr Email Reddit
    I ported JustHTML from Python to JavaScript with Codex CLI and GPT-5.2 in hours
    Share
    Facebook Twitter LinkedIn Pinterest WhatsApp Email

    I ported JustHTML from Python to JavaScript with Codex CLI and GPT-5.2 in hours

    15th December 2025

    I wrote about JustHTML yesterday—Emil Stenström’s project to build a new standards compliant HTML5 parser in pure Python code using coding agents running against the comprehensive html5lib-tests testing library. Last night, purely out of curiosity, I decided to try porting JustHTML from Python to JavaScript with the least amount of effort possible, using Codex CLI and GPT-5.2. It worked beyond my expectations.

    TL;DR

    I built simonw/justjshtml, a dependency-free HTML5 parsing library in JavaScript which passes 9,200 tests from the html5lib-tests suite and imitates the API design of Emil’s JustHTML library.

    It took two initial prompts and a few tiny follow-ups. GPT-5.2 running in Codex CLI ran uninterrupted for several hours, burned through 1,464,295 input tokens, 97,122,176 cached input tokens and 625,563 output tokens and ended up producing 9,000 lines of fully tested JavaScript across 43 commits.

    Time elapsed from project idea to finished library: about 4 hours, during which I also bought and decorated a Christmas tree with family and watched the latest Knives Out movie.

    Some background

    One of the most important contributions of the HTML5 specification ten years ago was the way it precisely specified how invalid HTML should be parsed. The world is full of invalid documents and having a specification that covers those means browsers can treat them in the same way—there’s no more “undefined behavior” to worry about when building parsing software.

    Unsurprisingly, those invalid parsing rules are pretty complex! The free online book Idiosyncrasies of the HTML parser by Simon Pieters is an excellent deep dive into this topic, in particular Chapter 3. The HTML parser.

    The Python html5lib project started the html5lib-tests repository with a set of implementation-independent tests. These have since become the gold standard for interoperability testing of HTML5 parsers, and are used by projects such as Servo which used them to help build html5ever, a “high-performance browser-grade HTML5 parser” written in Rust.

    Emil Stenström’s JustHTML project is a pure-Python implementation of an HTML5 parser that passes the full html5lib-tests suite. Emil spent a couple of months working on this as a side project, deliberately picking a problem with a comprehensive existing test suite to see how far he could get with coding agents.

    At one point he had the agents rewrite it based on a close inspection of the Rust html5ever library. I don’t know how much of this was direct translation versus inspiration (here’s Emil’s commentary on that)—his project has 1,215 commits total so it appears to have included a huge amount of iteration, not just a straight port.

    My project is a straight port. I instructed Codex CLI to build a JavaScript version of Emil’s Python code.

    The process in detail

    I started with a bit of mise en place. I checked out two repos and created an empty third directory for the new project:

    cd ~/dev
    git clone https://github.com/EmilStenstrom/justhtml
    git clone https://github.com/html5lib/html5lib-tests
    mkdir justjshtml
    cd justjshtml

    Then I started Codex CLI for GPT-5.2 like this:

    That --yolo flag is a shortcut for --dangerously-bypass-approvals-and-sandbox, which is every bit as dangerous as it sounds.

    My first prompt told Codex to inspect the existing code and use it to build a specification for the new JavaScript library:

    We are going to create a JavaScript port of ~/dev/justhtml - an HTML parsing library that passes the full ~/dev/html5lib-tests test suite. It is going to have a similar API to the Python library but in JavaScript. It will have no dependencies other than raw JavaScript, hence it will work great in the browser and node.js and other environments. Start by reading ~/dev/justhtml and designing the user-facing API for the new library - create a spec.md containing your plan.

    I reviewed the spec, which included a set of proposed milestones, and told it to add another:

    Add an early step to the roadmap that involves an initial version that parses a simple example document that is valid and returns the right results. Then add and commit the spec.md file.

    Here’s the resulting spec.md file. My request for that initial version became “Milestone 0.5” which looked like this:

    Milestone 0.5 — End-to-end smoke parse (single valid document)

    • Implement the smallest end-to-end slice so the public API is real early:
      • new JustHTML("

        Hello

        ") returns a tree with the expected tag structure and text nodes.

      • doc.toText() returns "Hello" and doc.errors is empty for this valid input.
    • Add scripts/smoke.js (no deps) that runs the example and asserts the expected structure/output.
    • Gate: node scripts/smoke.js passes.

    Then I told it:

    Implement Milestone 0.5

    And off it went. The resulting code appeared to work so I said:

    Configure GitHub Actions test.yml to run that on every commit, then commit and push

    I ran git init and created a private GitHub repository for this project at this point, and set up the local directory to push to that remote.

    Here’s that initial push. Then I told it:

    OK do the rest, commit and push often

    And that was almost it!

    I set my laptop to not fall asleep and left it to its devices while we went off to buy a Christmas tree. The “commit and push often” meant I could monitor its progress on my phone by refreshing the commit log on GitHub.

    I was running this against my $20/month ChatGPT Plus account, which has a five hour token allowance window for Codex CLI. That ran out at 6:35pm and Codex paused, so I waited until the reset point at 7:14pm and typed:

    Keep going

    At 9:30pm it declared itself done with the following summary message:

    • What’s Done
    
      - html5lib-tests now run in CI and pass: tokenizer 6810/6810, tree 1770/1782
        (12 skipped: #script-on), serializer 230/230, encoding 82/83 (1 skipped:
        scripted) (.github/workflows/test.yml:1).
      - Public API expanded: stream() (src/stream.js:1), CSS selectors query()/
        matches() (src/selector.js:1), and toMarkdown() (src/markdown.js:1), all
        exported via src/index.js:1.
      - Added no-deps unit scripts for selectors/stream/markdown (scripts/run-
        selector-tests.js:1, scripts/run-stream-tests.js:1, scripts/run-markdown-
        tests.js:1) and wired them into CI.
      - Fixed TreeBuilder so 
    no longer closes

    (src/treebuilder.js:647).

    As a finishing touch, I had it add a playground interface so I could try out the new library in my browser. I prompted:

    Add a playground.html in the top level folder which loads the necessary ES modules from ./src/... and implements the exact same functionality as seen on https://tools.simonwillison.net/justhtml but using the JavaScript library instead of Pyodide

    It fetched my existing JustHTML playground page (described here) using curl and built a new playground.html file that loaded the new JavaScript code instead. This worked perfectly.

    I enabled GitHub Pages for my still-private repo which meant I could access the new playground at this URL:

    https://simonw.github.io/justjshtml/playground.html

    Example Page

      “. A “Playground Mode” section shows buttons for “CSS Selector Query”, “Pretty Print HTML”, “Tree Structure”, “Stream Events”, “Extract Text”, and “To Markdown” (highlighted in purple). Below is a text field labeled “CSS Selector (optional – leave empty for whole document):” with placeholder “e.g., article, main, .content (or leave empty)” and a green “Convert to Markdown” button. The Output section has a teal header with “Whole document” badge and displays converted markdown: “Example Page” followed by “- [Home](/)” “- [About](/about)” “- [Contact](/contact)”.”>

      All it needed now was some documentation:

      Add a comprehensive README with full usage instructions including attribution plus how this was built plus how to use in in HTML plus how to use it in Node.js

      You can read the result here.

      We are now at eight prompts total, running for just over four hours and I’ve decorated for Christmas and watched Wake Up Dead Man on Netflix.

      According to Codex CLI:

      Token usage: total=2,089,858 input=1,464,295 (+ 97,122,176 cached) output=625,563 (reasoning 437,010)

      My llm-prices.com calculator estimates that at $29.41 if I was paying for those tokens at API prices, but they were included in my $20/month ChatGPT Plus subscription so the actual extra cost to me was zero.

      What can we learn from this?

      I’m sharing this project because I think it demonstrates a bunch of interesting things about the state of LLMs in December 2025.

      • Frontier LLMs really can perform complex, multi-hour tasks with hundreds of tool calls and minimal supervision. I used GPT-5.2 for this but I have no reason to believe that Claude Opus 4.5 or Gemini 3 Pro would not be able to achieve the same thing—the only reason I haven’t tried is that I don’t want to burn another 4 hours of time and several million tokens on more runs.
      • If you can reduce a problem to a robust test suite you can set a coding agent loop loose on it with a high degree of confidence that it will eventually succeed. I called this designing the agentic loop a few months ago. I think it’s the key skill to unlocking the potential of LLMs for complex tasks.
      • Porting entire open source libraries from one language to another via a coding agent works extremely well.
      • Code is so cheap it’s practically free. Code that works continues to carry a cost, but that cost has plummeted now that coding agents can check their work as they go.
      • We haven’t even begun to unpack the etiquette and ethics around this style of development. Is it responsible and appropriate to churn out a direct port of a library like this in a few hours while watching a movie? What would it take for code built like this to be trusted in production?

      I’ll end with some open questions:

      • Does this library represent a legal violation of copyright of either the Rust library or the Python one?
      • Even if this is legal, is it ethical to build a library in this way?
      • Does this format of development hurt the open source ecosystem?
      • Can I even assert copyright over this, given how much of the work was produced by the LLM?
      • Is it responsible to publish software libraries built in this way?
      • How much better would this library be if an expert team hand crafted it over the course of several months?
    Share. Facebook Twitter Pinterest LinkedIn Reddit WhatsApp Telegram Email
    Previous ArticleNo AI* Here – A Response to Mozilla’s Next Chapter
    Next Article Dafny: Verification-Aware Programming Language
    TechAiVerse
    • Website

    Jonathan is a tech enthusiast and the mind behind Tech AI Verse. With a passion for artificial intelligence, consumer tech, and emerging innovations, he deliver clear, insightful content to keep readers informed. From cutting-edge gadgets to AI advancements and cryptocurrency trends, Jonathan breaks down complex topics to make technology accessible to all.

    Related Posts

    Google is sunsetting the weather app on Android

    February 23, 2026

    Nvidia could launch its first laptops with its own processors later this year

    February 23, 2026

    AMD reportedly pauses Ryzen Z1 drivers for gaming handhelds

    February 23, 2026
    Leave A Reply Cancel Reply

    Top Posts

    Ping, You’ve Got Whale: AI detection system alerts ships of whales in their path

    April 22, 2025690 Views

    Lumo vs. Duck AI: Which AI is Better for Your Privacy?

    July 31, 2025278 Views

    6.7 Cummins Lifter Failure: What Years Are Affected (And Possible Fixes)

    April 14, 2025159 Views

    6 Best MagSafe Phone Grips (2025), Tested and Reviewed

    April 6, 2025120 Views
    Don't Miss
    Technology February 23, 2026

    Google is sunsetting the weather app on Android

    Google is sunsetting the weather app on Android Google is replacing the long-standing shortcut with…

    Nvidia could launch its first laptops with its own processors later this year

    AMD reportedly pauses Ryzen Z1 drivers for gaming handhelds

    Here’s your chance to grab a cheaper Cybertruck but you have to hurry

    Stay In Touch
    • Facebook
    • Twitter
    • Pinterest
    • Instagram
    • YouTube
    • Vimeo

    Subscribe to Updates

    Get the latest creative news from SmartMag about art & design.

    About Us
    About Us

    Welcome to Tech AI Verse, your go-to destination for everything technology! We bring you the latest news, trends, and insights from the ever-evolving world of tech. Our coverage spans across global technology industry updates, artificial intelligence advancements, machine learning ethics, and automation innovations. Stay connected with us as we explore the limitless possibilities of technology!

    Facebook X (Twitter) Pinterest YouTube WhatsApp
    Our Picks

    Google is sunsetting the weather app on Android

    February 23, 20262 Views

    Nvidia could launch its first laptops with its own processors later this year

    February 23, 20261 Views

    AMD reportedly pauses Ryzen Z1 drivers for gaming handhelds

    February 23, 20262 Views
    Most Popular

    7 Best Kids Bikes (2025): Mountain, Balance, Pedal, Coaster

    March 13, 20250 Views

    VTOMAN FlashSpeed 1500: Plenty Of Power For All Your Gear

    March 13, 20250 Views

    This new Roomba finally solves the big problem I have with robot vacuums

    March 13, 20250 Views
    © 2026 TechAiVerse. Designed by Divya Tech.
    • Home
    • About Us
    • Contact Us
    • Privacy Policy
    • Terms & Conditions

    Type above and press Enter to search. Press Esc to cancel.