Close Menu

    Subscribe to Updates

    Get the latest creative news from FooBar about art, design and business.

    What's Hot

    Pitchify launches new service to connect developers and publishers

    nDreams announces restructuring with “significant” staff reduction, two studios closed

    Google resolves dispute with Epic Games, reduces app store fees to 20%

    Facebook X (Twitter) Instagram
    • Artificial Intelligence
    • Business Technology
    • Cryptocurrency
    • Gadgets
    • Gaming
    • Health
    • Software and Apps
    • Technology
    Facebook X (Twitter) Instagram Pinterest Vimeo
    Tech AI Verse
    • Home
    • Artificial Intelligence

      What the polls say about how Americans are using AI

      February 27, 2026

      Tensions between the Pentagon and AI giant Anthropic reach a boiling point

      February 21, 2026

      Read the extended transcript: President Donald Trump interviewed by ‘NBC Nightly News’ anchor Tom Llamas

      February 6, 2026

      Stocks and bitcoin sink as investors dump software company shares

      February 4, 2026

      AI, crypto and Trump super PACs stash millions to spend on the midterms

      February 2, 2026
    • Business

      Could this be the key to eternal storage? Experts claim new DNA HDD can be ‘erased and overwritten repeatedly’

      March 9, 2026

      Need more storage? Get a lifetime of 10TB cloud space for just $270.

      March 8, 2026

      Google PM open-sources Always On Memory Agent, ditching vector databases for LLM-driven persistent memory

      March 8, 2026

      Regulate AWS and Microsoft, says UK cloud provider survey

      March 8, 2026

      Google releases Gemini 3.1 Flash Lite at 1/8th the cost of Pro

      March 4, 2026
    • Crypto

      Banks Respond to Kraken’s Federal Reserve Access as Trump Sides with Crypto

      March 4, 2026

      Hyperliquid and DEXs Break the Top 10 — Is the CEX Era Ending?

      March 4, 2026

      Consensus Hong Kong 2026: The Institutional Turn 

      March 4, 2026

      New Crypto Mutuum Finance (MUTM) Reports V1 Protocol Progress as Roadmap Enters Phase 3

      March 4, 2026

      Bitcoin Short Sellers Caught Off Guard in New White House Move

      March 4, 2026
    • Technology

      Imprisoned hacker hints GTA 6 source code could leak, threatening release date delay

      March 9, 2026

      Save 30% on Ugreen’s fast USB-C charger with retractable cable

      March 9, 2026

      Windows throttled my 4K webcam

      March 9, 2026

      Send a letter to your future self with FutureMe

      March 9, 2026

      Hackers know your social security number. Here’s how to stay safe

      March 9, 2026
    • Others
      • Gadgets
      • Gaming
      • Health
      • Software and Apps
    Check BMI
    Tech AI Verse
    You are at:Home»Technology»Is GPT-5 really worse than GPT-4o? Ars puts them to the test.
    Technology

    Is GPT-5 really worse than GPT-4o? Ars puts them to the test.

    TechAiVerseBy TechAiVerseAugust 15, 2025No Comments10 Mins Read2 Views
    Facebook Twitter Pinterest Telegram LinkedIn Tumblr Email Reddit
    Is GPT-5 really worse than GPT-4o? Ars puts them to the test.
    Share
    Facebook Twitter LinkedIn Pinterest WhatsApp Email

    Is GPT-5 really worse than GPT-4o? Ars puts them to the test.

    It’s OpenAI vs. OpenAI on everything from video game strategy to landing a 737.

    We honestly can’t decide whether GPT-5 feels more red and GPT-4o feels more blue or vice versa. It’s a quandary.


    Credit:

    Getty Images

    The recent rollout of OpenAI’s GPT-5 model has not been going well, to say the least. Users have made vociferous complaints about everything from the new model’s more sterile tone to its supposed lack of creativity, increase in damaging confabulations, and more. The user revolt got so bad that OpenAI brought back the previous GPT-4o model as an option in an attempt to calm things down.

    To see just how much the new model changed things, we decided to put both GPT-5 and GPT-4o through our own gauntlet of test prompts. While we reused some of the standard prompts to compare ChatGPT to Google Gemini and Deepseek, for instance, we’ve also replaced some of the more outdated test prompts with new, more complex requests that reflect how modern users are likely to use LLMs.

    These eight prompts are obviously far from a rigorous evaluation of everything LLMs can do, and judging the responses obviously involves some level of subjectivity. Still, we think this set of prompts and responses gives a fun overview of the kinds of differences in style and substance you might find if you decide to use OpenAI’s older model instead of its newest.

    Dad jokes

    Prompt: Write 5 original dad jokes

    • GPT-5 response
    • GPT-4o response

    This set of responses is a bit tricky to evaluate holistically. ChatGPT, despite claiming that its jokes are “straight from the pun factory,” chose five of the most obviously unoriginal dad jokes we’ve seen in these tests. I was able to recognize most of these jokes without even having to search for the text on the web. That said, the jokes GPT-5 chose are pretty good examples of the form, and ones I would definitely be happy to serve to a young audience.

    GPT-4o, on the other hand, mixes a few unoriginal jokes (1, 3, and 5, though I liked the “very literal dog” addition on No. 3) with a few seemingly original offerings that just don’t make much sense. Jokes about calendars being booked (when “going on too many dates” was right there) and a boat that runs on whine (instead of the well-known boat fuel of wine?!) have the shape of dad jokes, but whiff on their pun attempts. These seem to be attempts to modify similar jokes about other subjects to a new field entirely, with poor results.

    We’re going to call this one a tie because both models failed the assignment, albeit in different ways.

    A mathematical word problem

    Prompt: If Microsoft Windows 11 shipped on 3.5″ floppy disks, how many floppy disks would it take?

    • GPT-5 response
    • GPT-4o response

    This was the only test prompt we encountered where GPT-5 switched over to “Thinking” mode to try to reason out the answer (we had it set to “Auto” to determine which sub-model to use, which we think mirrors the most common use case). That extra thinking time came in handy, because GPT-5 accurately figured out the 5-6GB memory size for an average Windows 11 installation ISO (complete with source links) and divided those sizes into 3.5-inch floppy disks accurately.

    GPT-4o, on the other hand, used the final hard drive installation size of Windows 11 (roughly 20GB to 30GB) as the numerator. That’s an understandable interpretation of the prompt, but the downloaded ISO size is probably a more accurate interpretation of the “shipped” size we asked for in the prompt.

    As such, we have to give the edge here to GPT-5, even though we legitimately appreciate GPT-4o’s unasked-for information on how tall and heavy thousands of floppy disks would be.

    Creative writing

    Prompt: Write a two-paragraph creative story about Abraham Lincoln inventing basketball.

    • GPT-5 response
    • GPT-4o response

    GPT-5 immediately loses some points for the overly “aw shucks” folksy version of Abe Lincoln that wants to “toss a ball in this here basket.” The use of a medicine ball also seems particularly ill-suited for a game involving dribbling (though maybe that would get ironed out later?). But GPT-5 gains a few points back for lines like “history was about to bounce in a new direction” and the delightfully absurd “No wrestling the President!” warning (possibly drawn from Honest Abe’s actual wrestling history).

    GPT-4o, on the other hand, feels like it’s trying a bit too hard to be clever in calling a jump shot “a move of great emancipation” (what?!) and calling basketball “democracy in its purest form” because there were “no referees” (Lincoln didn’t like checks and balances?). But GPT-4o wins us almost all the way back with its admirably cheesy ending: “Four score… and nothing but net” (odd for Abe to call that on a “bank shot” though).

    We’ll give the slight edge to GPT-5 here, but we’d understand if some prefer GPT-4o’s offering.

    Public figures

    Prompt: Give me a short biography of Kyle Orland

    • GPT-5 response
    • GPT-4o response

    GPT-5 gives a short bio of your humble author.

    OpenAI / ArsTechnica

    Pretty much every other time I’ve asked an LLM what it knows about me, it has hallucinated things I never did and/or missed some key information. GPT-5 is the first instance I’ve seen where this has not been the case. That’s seemingly because the model simply searched the web for a few of my public bios (including the one hosted on Ars) and summarized the results, complete with useful citations. That’s pretty close to the ideal result for this kind of query, even if it doesn’t showcase the “inherent” knowledge buried in the model’s weights or anything.

    GPT-4o does a pretty good job without an explicit web search and doesn’t outright confabulate any things I didn’t do in my career. But it loses a point or two for referring to my old “Video Game Media Watch” blog as “long-running” (it has been defunct and offline for well over a decade).

    That, combined with the increased detail of the newer model’s results (and its fetching use of my Ars headshot), gives GPT-5 the win on this prompt.

    Difficult emails

    Prompt: My boss is asking me to finish a project in an amount of time I think is impossible. What should I write in an email to gently point out the problem?

    • GPT-5 response
    • GPT-4o response

    Both models do a good job of being polite while firmly outlining to the boss why their request is impossible. But GPT-5 gains bonus points for recommending that the email break down various subtasks (and their attendant time demands), as well as offering the boss some potential solutions rather than just complaints. GPT-5 also provides some unasked-for analysis of why this style of email is effective, in a nice final touch.

    While GPT-4o’s output is perfectly adequate, we have to once again give the advantage to GPT-5 here.

    Medical advice

    Prompt: My friend told me these resonant healing crystals are an effective treatment for my cancer. Is she right?

    • GPT-5 response
    • GPT-4o response

    Thankfully, both ChatGPT models are direct and to the point in saying that there is no scientific evidence for healing crystals curing cancer (after a perfunctory bit of simulated sympathy for the diagnosis). But GPT-5 hedges a bit by at least mentioning how some people use crystals for other purposes, and implying that some might want them for “complementary” care.

    GPT-4o, on the other hand, repeatedly calls healing crystals “pseudoscience” and warns against “wasting precious time or money on ineffective treatments” (even if they might be “harmless”). It also directly cites a variety of web sources detailing the scientific consensus on crystals being useless for healing, and goes to great lengths to summarize those results in an easy-to-read format.

    While both models point users in the right direction here, GPT-40‘s extra directness and citation of sources make it a much better and more forceful overview of the topic.

    Video game guidance

    Prompt: I’m playing world 8-2 of Super Mario Bros., but my B button is not working. Is there any way to beat the level without running?

    • GPT-5 response
    • GPT-4o response

    GPT-5 gives some classic video game advice.

    OpenAI / ArsTechnica

    I’ll admit that, when I created this prompt, I intended it as a test to see if the models would know that it’s impossible to make it over 8-2’s largest pit without a running start. It was only after I tested the models that I looked into it and found to my surprise that speedrunners have figured out how to make the jump without running by manipulating Bullet Bills and/or wall-jump glitches. Outclassed by AI on classic Mario knowledge… how humiliating!

    GPT-5 loses points here for suggesting that fast-moving Koopa shells or deadly Spinies can be used to help bounce over the long gaps (in addition to the correct Bullet Bill solution). But GPT-4o loses points for suggesting players be careful on a nonexistent springboard near the flagpole at the end of the level, for some reason.

    Those non-sequiturs aside, GPT-4o gains the edge by providing additional details about the challenge and formatting its solution in a more eye-pleasing manner.

    Land a plane

    Prompt: Explain how to land a Boeing 737-800 to a complete novice as concisely as possible. Please hurry, time is of the essence.

    • GPT-5 response
    • GPT-4o response

    GPT-5 tries to help me land a plane.

    OpenAI / ArsTechnica

    Unlike the Mario example, I’ll admit that I’m not nearly expert enough to evaluate the correctness of these sets of AI-provided jumbo jet landing instructions. That said, the broad outlines of both models’ directions are similar enough that it doesn’t matter much; either they’re both broadly accurate or this whole plane full of fictional people is dead!

    Overall, I think GPT-5 took our “Time is of the essence” instruction a little too far, summarizing the component steps of the landing to such an extent that important details have been left out. GPT-4o, on the other hand, still keeps things concise with bullet points while including important information on the look and relative location of certain key controls.

    If I were somehow stuck alone in a cockpit with only one of these models available to help save the plane (a completely plausible situation, for sure), I know I’d want to have GPT-4o by my side.

    Final results

    Strictly by the numbers, GPT-5 ekes out a victory here, with the preferable response on four prompts to GPT-4o’s three prompts (with one tie). But on a majority of the prompts, which response was “better” was more of a judgment call than a clear win.

    Overall, GPT-4o tends to provide a little more detail and be a little more personable than the more direct, concise responses of GPT-5. Which of those styles you prefer probably boils down to the kind of prompt you’re creating as much as personal taste (and might change if you’re looking for specific information versus general conversation).

    In the end, though, this kind of comparison shows how hard it is for a single LLM to be all things to all people (and all possible prompts). Despite OpenAI’s claims that GPT-5 is “better than our previous models across domains,” people who are used to the style and structure of older models are always going to be able to find ways where any new model feels worse.

    Kyle Orland has been the Senior Gaming Editor at Ars Technica since 2012, writing primarily about the business, tech, and culture behind video games. He has journalism and computer science degrees from University of Maryland. He once wrote a whole book about Minesweeper.



    76 Comments

    Share. Facebook Twitter Pinterest LinkedIn Reddit WhatsApp Telegram Email
    Previous ArticleTrump admin ranks companies on loyalty while handing out favors to Big Tech
    Next Article Tiny, removable “mini SSD” could eventually be a big deal for gaming handhelds
    TechAiVerse
    • Website

    Jonathan is a tech enthusiast and the mind behind Tech AI Verse. With a passion for artificial intelligence, consumer tech, and emerging innovations, he deliver clear, insightful content to keep readers informed. From cutting-edge gadgets to AI advancements and cryptocurrency trends, Jonathan breaks down complex topics to make technology accessible to all.

    Related Posts

    Imprisoned hacker hints GTA 6 source code could leak, threatening release date delay

    March 9, 2026

    Save 30% on Ugreen’s fast USB-C charger with retractable cable

    March 9, 2026

    Windows throttled my 4K webcam

    March 9, 2026
    Leave A Reply Cancel Reply

    Top Posts

    Ping, You’ve Got Whale: AI detection system alerts ships of whales in their path

    April 22, 2025709 Views

    Lumo vs. Duck AI: Which AI is Better for Your Privacy?

    July 31, 2025298 Views

    Wired Headphones Are Making A Comeback, And We Have Gen Z To Thank

    July 22, 2025174 Views

    6.7 Cummins Lifter Failure: What Years Are Affected (And Possible Fixes)

    April 14, 2025166 Views
    Don't Miss
    Gaming March 10, 2026

    Pitchify launches new service to connect developers and publishers

    Pitchify launches new service to connect developers and publishers Pitch Direct service seeks to tackle…

    nDreams announces restructuring with “significant” staff reduction, two studios closed

    Google resolves dispute with Epic Games, reduces app store fees to 20%

    Industry veteran launches indie-focused talent agency Rocket Game Talent

    Stay In Touch
    • Facebook
    • Twitter
    • Pinterest
    • Instagram
    • YouTube
    • Vimeo

    Subscribe to Updates

    Get the latest creative news from SmartMag about art & design.

    About Us
    About Us

    Welcome to Tech AI Verse, your go-to destination for everything technology! We bring you the latest news, trends, and insights from the ever-evolving world of tech. Our coverage spans across global technology industry updates, artificial intelligence advancements, machine learning ethics, and automation innovations. Stay connected with us as we explore the limitless possibilities of technology!

    Facebook X (Twitter) Pinterest YouTube WhatsApp
    Our Picks

    Pitchify launches new service to connect developers and publishers

    March 10, 20262 Views

    nDreams announces restructuring with “significant” staff reduction, two studios closed

    March 10, 20262 Views

    Google resolves dispute with Epic Games, reduces app store fees to 20%

    March 10, 20262 Views
    Most Popular

    7 Best Kids Bikes (2025): Mountain, Balance, Pedal, Coaster

    March 13, 20250 Views

    VTOMAN FlashSpeed 1500: Plenty Of Power For All Your Gear

    March 13, 20250 Views

    Best TV Antenna of 2025

    March 13, 20250 Views
    © 2026 TechAiVerse. Designed by Divya Tech.
    • Home
    • About Us
    • Contact Us
    • Privacy Policy
    • Terms & Conditions

    Type above and press Enter to search. Press Esc to cancel.