Close Menu

    Subscribe to Updates

    Get the latest creative news from FooBar about art, design and business.

    What's Hot

    Save 30% on Ugreen’s fast USB-C charger with retractable cable

    Windows throttled my 4K webcam

    Send a letter to your future self with FutureMe

    Facebook X (Twitter) Instagram
    • Artificial Intelligence
    • Business Technology
    • Cryptocurrency
    • Gadgets
    • Gaming
    • Health
    • Software and Apps
    • Technology
    Facebook X (Twitter) Instagram Pinterest Vimeo
    Tech AI Verse
    • Home
    • Artificial Intelligence

      What the polls say about how Americans are using AI

      February 27, 2026

      Tensions between the Pentagon and AI giant Anthropic reach a boiling point

      February 21, 2026

      Read the extended transcript: President Donald Trump interviewed by ‘NBC Nightly News’ anchor Tom Llamas

      February 6, 2026

      Stocks and bitcoin sink as investors dump software company shares

      February 4, 2026

      AI, crypto and Trump super PACs stash millions to spend on the midterms

      February 2, 2026
    • Business

      Could this be the key to eternal storage? Experts claim new DNA HDD can be ‘erased and overwritten repeatedly’

      March 9, 2026

      Need more storage? Get a lifetime of 10TB cloud space for just $270.

      March 8, 2026

      Google PM open-sources Always On Memory Agent, ditching vector databases for LLM-driven persistent memory

      March 8, 2026

      Regulate AWS and Microsoft, says UK cloud provider survey

      March 8, 2026

      Google releases Gemini 3.1 Flash Lite at 1/8th the cost of Pro

      March 4, 2026
    • Crypto

      Banks Respond to Kraken’s Federal Reserve Access as Trump Sides with Crypto

      March 4, 2026

      Hyperliquid and DEXs Break the Top 10 — Is the CEX Era Ending?

      March 4, 2026

      Consensus Hong Kong 2026: The Institutional Turn 

      March 4, 2026

      New Crypto Mutuum Finance (MUTM) Reports V1 Protocol Progress as Roadmap Enters Phase 3

      March 4, 2026

      Bitcoin Short Sellers Caught Off Guard in New White House Move

      March 4, 2026
    • Technology

      Save 30% on Ugreen’s fast USB-C charger with retractable cable

      March 9, 2026

      Windows throttled my 4K webcam

      March 9, 2026

      Send a letter to your future self with FutureMe

      March 9, 2026

      Hackers know your social security number. Here’s how to stay safe

      March 9, 2026

      Acer Swift X 14 AI review: Fast, creator-focused OLED laptop

      March 9, 2026
    • Others
      • Gadgets
      • Gaming
      • Health
      • Software and Apps
    Check BMI
    Tech AI Verse
    You are at:Home»Technology»Developers Say GPT-5 Is a Mixed Bag
    Technology

    Developers Say GPT-5 Is a Mixed Bag

    TechAiVerseBy TechAiVerseAugust 16, 2025No Comments7 Mins Read2 Views
    Facebook Twitter Pinterest Telegram LinkedIn Tumblr Email Reddit
    Developers Say GPT-5 Is a Mixed Bag
    Share
    Facebook Twitter LinkedIn Pinterest WhatsApp Email

    Developers Say GPT-5 Is a Mixed Bag

    Last week, when OpenAI launched GPT-5, it told software engineers the model was designed to be a “true coding collaborator” that excels at generating high-quality code and performing agentic, or automated, software tasks. While the company didn’t say so explicitly, OpenAI appeared to be taking direct aim at Anthropic’s Claude Code, which has quickly become many developers’ favored tool for AI-assisted coding.

    But developers tell WIRED that GPT-5 has been a mixed bag so far. It shines at technical reasoning and planning coding tasks, but some say that Anthropic’s newest Opus and Sonnet reasoning models still produce better code. Depending on which version of GPT-5 developers are using—low, medium, or high verbosity—the model can be more elaborative, which sometimes leads it to generate unnecessary or redundant lines of code.

    Some software engineers have also criticized how OpenAI evaluated GPT-5’s performance at coding, arguing that the benchmarks it used are misleading. One research firm called a graphic that OpenAI published boasting about GPT-5’s capabilities a “chart crime.”

    GPT-5 does stand out in at least one way: Several people noted that, in comparison to competing models, it is a much more cost-effective option. “GPT-5 is mostly outperformed by other AI models in our tests, but it’s really cheap,” says Sayash Kapoor, a computer science doctoral student and researcher at Princeton University who cowrote the book AI Snake Oil.

    Kapoor says he and his team have been running benchmark tests to evaluate GPT-5’s capabilities since the model was released to the public last week. He notes that the standard test his team uses—measuring how well a language model can write code that will reproduce the results of 45 scientific papers—costs $30 to run with GPT-5 set to medium, or mid-range verbosity. The same test using Anthropic’s Opus 4.1 costs $400. In total, Kapoor says his team has spent around $20,000 testing GPT-5 so far.

    Although GPT-5 is cheap, Kapoor’s tests indicate the model is also less accurate than some of its competitors. Claude’s premium model achieved a 51 percent accuracy rating, measured by how many of the scientific papers it accurately reproduced. The medium version of GPT-5 received a 27 percent accuracy rating. (Kapoor has not yet run the same test using GPT-5 high, so it’s an indirect comparison, given that Opus 4.1 is Anthropic’s most powerful model.)

    OpenAI spokesperson Lindsay McCallum referred WIRED to its blog, where it said that it trained GPT-5 on “real-world coding tasks in collaboration with early testers across startups and enterprises.” The company also highlighted some of its internal accuracy measurements for GPT-5, which showed that the GPT-5 “thinking” model, which does more deliberate reasoning, scored highest on accuracy among all of OpenAI’s models. GPT-5 “main,” however, still fell short of previously-released models on OpenAI’s own accuracy scale.

    Anthropic spokesperson Amie Rotherham said in a statement that “performance claims and pricing models often look different once developers start using them in production environments. Since reasoning models can quickly use a lot of tokens while thinking, the industry is moving to a world where price per outcome matters more than price per token.”

    Some developers say they’ve had largely positive experiences with GPT-5 so far. Jenny Wang, an engineer, investor, and creator of the personal styling agent Alta, told WIRED the model appears to be better at completing complex coding tasks in one shot than other models. She compared it to OpenAI’s o3 and 4o, which she uses frequently for code generation and straightforward fixes “like formatting, or if I want to create an API endpoint similar to what I already have,” Wang says.

    In her tests of GPT-5, Wang says she asked the model to generate code for a press page for her company’s website, including specific design elements that would match the rest of the site’s aesthetic. GPT-5 completed the task in one take, whereas in the past, Wang would have had to revise her prompts during the process. There was one significant error, though: “It hallucinated the URLs,” Wang says.

    Another developer, who spoke on the condition of anonymity because their employer didn’t authorize them to speak to the press, says GPT-5 excels at solving deep technical problems.

    The developer’s current hobby project is writing a programmatic network analysis tool, one that would require code isolation for security purposes. “I basically presented my project and some paths I was considering, and GPT-5 took it all in and gave back a few recommendations along with a realistic timeline,” the developer explains. “I’m impressed.”

    A handful of OpenAI’s enterprise partners and customers, including Cursor, Windsurf, and Notion, have publicly vouched for GPT-5’s coding and reasoning skills. (OpenAI included many of these remarks in its own blog post announcing the new model.) Notion also shared on X that it’s “fast, thorough, and handles complex work 15 percent better than other models we’ve tested.”

    But within days of GPT-5’s release, some developers were weighing in online with complaints. Many said that GPT-5’s coding abilities seemed behind the curve for what was supposed to be a state-of-the-art, ultra-capable model from the world’s buzziest AI company.

    “OpenAI’s GPT-5 is very good, but it seems like something that would have been released a year ago,” says Kieran Klassen, a developer who has been building an AI assistant for email inboxes. “Its coding capabilities remind me of Sonnet 3.5,” he adds, referring to an Anthropic model that launched in June 2024.

    Amir Salihefendić, founder of the startup company Doist, said in a social media post that he’s been using GPT-5 in Cursor and has found it “pretty underwhelming” and that “it’s especially bad at coding.” He said the release of GPT-4 felt like a “Llama 4 moment,” referring to Meta’s AI model, which had also disappointed some people in the AI community.

    On X, developer Mckay Wrigley wrote that GPT-5 is a “phenomenal everyday chat model,” but when it comes to coding, “I will still be using Claude Code + Opus.”

    Other developers describe GPT-5 as “exhaustive”—at times helpful, but often irritating in its long-windedness. Wang, who was pleased overall with the frontend coding project she assigned to GPT-5, says that she did notice that the model was “more redundant. It clearly could have come up with a cleaner or shorter solution.” (Kapoor points out that the verbosity of GPT-5 can be adjusted, so that users can ask it to be less chatty or even do less reasoning in exchange for better performance or cheaper pricing.)

    Itamar Friedman, the cofounder and CEO of the AI-coding platform Qodo, believes that some of the critiques of GPT-5 stem from evolving expectations around AI model releases. “I think a lot of people thought that GPT-5 would be another moment when everything about AI improved, because of this march towards AGI. When actually, the model improved on a few key sub-tasks,” he says.

    Friedman refers to before 2022 as “BCE”—Before ChatGPT Era—when AI models improved holistically. In the post-ChatGPT era, new AI models are often better at certain things. “Claude Sonnet 3.5, for example, was the one model to rule them all on coding. And Google Gemini got really good at code review, to check if code is high quality,” Friedman says.

    OpenAI has also gotten some heat for the methodology it used to run its benchmark tests and make performance claims about GPT-5—although benchmark tests vary considerably across the industry. SemiAnalysis, a research firm focused on the semiconductor and AI sector, noted that OpenAI only ran 477 out of the 500 tests that are typically included in SWE-bench, a relatively new AI industry framework for testing large language models. (This was for overall performance of the model, not just coding.)

    OpenAI says that it always tests its AI models on a fixed subset of 477 tasks rather than the full 500 in the SWE-bench test, because those 477 tests are the ones the company has validated on its internal infrastructure. McCallum also pointed to GPT-5’s system card, which noted that changes in the model’s verbosity setting can “lead to variation in eval performance.”

    Kapoor says that frontier AI companies are ultimately facing difficult trade-offs. “When model developers train new models, they’re introducing new constraints, too, and have to consider many factors: how users expect the AI to behave and how it performs at certain tasks like agentic coding, all while managing the cost,” he says. “In some sense, I believe OpenAI knew it wouldn’t break all of those benchmarks, so it made something that would generally please a wide range of people.”

    Share. Facebook Twitter Pinterest LinkedIn Reddit WhatsApp Telegram Email
    Previous ArticleBest Apple Watch (2025): Buyer’s Guide to the Series 10 and SE
    Next Article Decoding Palantir, the Most Mysterious Company in Silicon Valley
    TechAiVerse
    • Website

    Jonathan is a tech enthusiast and the mind behind Tech AI Verse. With a passion for artificial intelligence, consumer tech, and emerging innovations, he deliver clear, insightful content to keep readers informed. From cutting-edge gadgets to AI advancements and cryptocurrency trends, Jonathan breaks down complex topics to make technology accessible to all.

    Related Posts

    Save 30% on Ugreen’s fast USB-C charger with retractable cable

    March 9, 2026

    Windows throttled my 4K webcam

    March 9, 2026

    Send a letter to your future self with FutureMe

    March 9, 2026
    Leave A Reply Cancel Reply

    Top Posts

    Ping, You’ve Got Whale: AI detection system alerts ships of whales in their path

    April 22, 2025707 Views

    Lumo vs. Duck AI: Which AI is Better for Your Privacy?

    July 31, 2025298 Views

    6.7 Cummins Lifter Failure: What Years Are Affected (And Possible Fixes)

    April 14, 2025166 Views

    Wired Headphones Are Making A Comeback, And We Have Gen Z To Thank

    July 22, 2025139 Views
    Don't Miss
    Technology March 9, 2026

    Save 30% on Ugreen’s fast USB-C charger with retractable cable

    Save 30% on Ugreen’s fast USB-C charger with retractable cable Image: Ugreen Everyone could use…

    Windows throttled my 4K webcam

    Send a letter to your future self with FutureMe

    Hackers know your social security number. Here’s how to stay safe

    Stay In Touch
    • Facebook
    • Twitter
    • Pinterest
    • Instagram
    • YouTube
    • Vimeo

    Subscribe to Updates

    Get the latest creative news from SmartMag about art & design.

    About Us
    About Us

    Welcome to Tech AI Verse, your go-to destination for everything technology! We bring you the latest news, trends, and insights from the ever-evolving world of tech. Our coverage spans across global technology industry updates, artificial intelligence advancements, machine learning ethics, and automation innovations. Stay connected with us as we explore the limitless possibilities of technology!

    Facebook X (Twitter) Pinterest YouTube WhatsApp
    Our Picks

    Save 30% on Ugreen’s fast USB-C charger with retractable cable

    March 9, 20263 Views

    Windows throttled my 4K webcam

    March 9, 20264 Views

    Send a letter to your future self with FutureMe

    March 9, 20264 Views
    Most Popular

    7 Best Kids Bikes (2025): Mountain, Balance, Pedal, Coaster

    March 13, 20250 Views

    VTOMAN FlashSpeed 1500: Plenty Of Power For All Your Gear

    March 13, 20250 Views

    Best TV Antenna of 2025

    March 13, 20250 Views
    © 2026 TechAiVerse. Designed by Divya Tech.
    • Home
    • About Us
    • Contact Us
    • Privacy Policy
    • Terms & Conditions

    Type above and press Enter to search. Press Esc to cancel.