Close Menu

    Subscribe to Updates

    Get the latest creative news from FooBar about art, design and business.

    What's Hot

    Get a Samsung OLED gaming monitor for just $350

    Qualcomm Snapdragon X2 Elite tops the Apple M5 in new test video

    Tapo’s 1440p Wi-Fi security cam is 42% off! Grab it now for $70

    Facebook X (Twitter) Instagram
    • Artificial Intelligence
    • Business Technology
    • Cryptocurrency
    • Gadgets
    • Gaming
    • Health
    • Software and Apps
    • Technology
    Facebook X (Twitter) Instagram Pinterest Vimeo
    Tech AI Verse
    • Home
    • Artificial Intelligence

      Read the extended transcript: President Donald Trump interviewed by ‘NBC Nightly News’ anchor Tom Llamas

      February 6, 2026

      Stocks and bitcoin sink as investors dump software company shares

      February 4, 2026

      AI, crypto and Trump super PACs stash millions to spend on the midterms

      February 2, 2026

      To avoid accusations of AI cheating, college students are turning to AI

      January 29, 2026

      ChatGPT can embrace authoritarian ideas after just one prompt, researchers say

      January 24, 2026
    • Business

      New VoidLink malware framework targets Linux cloud servers

      January 14, 2026

      Nvidia Rubin’s rack-scale encryption signals a turning point for enterprise AI security

      January 13, 2026

      How KPMG is redefining the future of SAP consulting on a global scale

      January 10, 2026

      Top 10 cloud computing stories of 2025

      December 22, 2025

      Saudia Arabia’s STC commits to five-year network upgrade programme with Ericsson

      December 18, 2025
    • Crypto

      Bernstein Discusses Bitcoin’s Weakest Bear Market Yet – “Nothing Broke”

      February 9, 2026

      Ethereum Price Hits Breakdown Target — But Is a Bigger Drop to $1,000 Coming?

      February 9, 2026

      Damex Secures MiCA CASP Licence, Establishing Its Position as a Tier-1 Digital Asset Institution in Europe

      February 9, 2026

      Bitget and BlockSec Introduce the UEX Security Standard, Setting a New Benchmark for Universal Exchanges

      February 9, 2026

      3 Meme Coins To Watch In The Second Week Of February 2026

      February 9, 2026
    • Technology

      Get a Samsung OLED gaming monitor for just $350

      February 10, 2026

      Qualcomm Snapdragon X2 Elite tops the Apple M5 in new test video

      February 10, 2026

      Tapo’s 1440p Wi-Fi security cam is 42% off! Grab it now for $70

      February 10, 2026

      This 8BitDo Retro wireless ‘mecha’ keyboard is just $63 today

      February 10, 2026

      Star power, AI jabs and Free Bird: Digiday’s guide to what was in and out at the Super Bowl

      February 10, 2026
    • Others
      • Gadgets
      • Gaming
      • Health
      • Software and Apps
    Check BMI
    Tech AI Verse
    You are at:Home»Technology»OpenAI’s new reasoning AI models hallucinate more
    Technology

    OpenAI’s new reasoning AI models hallucinate more

    TechAiVerseBy TechAiVerseApril 20, 2025No Comments4 Mins Read2 Views
    Facebook Twitter Pinterest Telegram LinkedIn Tumblr Email Reddit
    OpenAI’s new reasoning AI models hallucinate more
    Share
    Facebook Twitter LinkedIn Pinterest WhatsApp Email

    OpenAI’s new reasoning AI models hallucinate more

    OpenAI’s recently launched o3 and o4-mini AI models are state-of-the-art in many respects. However, the new models still hallucinate, or make things up — in fact, they hallucinate more than several of OpenAI’s older models.

    Hallucinations have proven to be one of the biggest and most difficult problems to solve in AI, impacting even today’s best-performing systems. Historically, each new model has improved slightly in the hallucination department, hallucinating less than its predecessor. But that doesn’t seem to be the case for o3 and o4-mini.

    According to OpenAI’s internal tests, o3 and o4-mini, which are so-called reasoning models, hallucinate more often than the company’s previous reasoning models — o1, o1-mini, and o3-mini — as well as OpenAI’s traditional, “non-reasoning” models, such as GPT-4o.

    Perhaps more concerning, the ChatGPT maker doesn’t really know why it’s happening.

    In its technical report for o3 and o4-mini, OpenAI writes that “more research is needed” to understand why hallucinations are getting worse as it scales up reasoning models. O3 and o4-mini perform better in some areas, including tasks related to coding and math. But because they “make more claims overall,” they’re often led to make “more accurate claims as well as more inaccurate/hallucinated claims,” per the report.

    OpenAI found that o3 hallucinated in response to 33% of questions on PersonQA, the company’s in-house benchmark for measuring the accuracy of a model’s knowledge about people. That’s roughly double the hallucination rate of OpenAI’s previous reasoning models, o1 and o3-mini, which scored 16% and 14.8%, respectively. O4-mini did even worse on PersonQA — hallucinating 48% of the time.

    Third-party testing by Transluce, a nonprofit AI research lab, also found evidence that o3 has a tendency to make up actions it took in the process of arriving at answers. In one example, Transluce observed o3 claiming that it ran code on a 2021 MacBook Pro “outside of ChatGPT,” then copied the numbers into its answer. While o3 has access to some tools, it can’t do that.

    “Our hypothesis is that the kind of reinforcement learning used for o-series models may amplify issues that are usually mitigated (but not fully erased) by standard post-training pipelines,” said Neil Chowdhury, a Transluce researcher and former OpenAI employee, in an email to TechCrunch.

    Sarah Schwettmann, co-founder of Transluce, added that o3’s hallucination rate may make it less useful than it otherwise would be.

    Kian Katanforoosh, a Stanford adjunct professor and CEO of the upskilling startup Workera, told TechCrunch that his team is already testing o3 in their coding workflows, and that they’ve found it to be a step above the competition. However, Katanforoosh says that o3 tends to hallucinate broken website links. The model will supply a link that, when clicked, doesn’t work.

    Hallucinations may help models arrive at interesting ideas and be creative in their “thinking,” but they also make some models a tough sell for businesses in markets where accuracy is paramount. For example, a law firm likely wouldn’t be pleased with a model that inserts lots of factual errors into client contracts.

    One promising approach to boosting the accuracy of models is giving them web search capabilities. OpenAI’s GPT-4o with web search achieves 90% accuracy on SimpleQA, another one of OpenAI’s accuracy benchmarks. Potentially, search could improve reasoning models’ hallucination rates, as well — at least in cases where users are willing to expose prompts to a third-party search provider.

    If scaling up reasoning models indeed continues to worsen hallucinations, it’ll make the hunt for a solution all the more urgent.

    “Addressing hallucinations across all our models is an ongoing area of research, and we’re continually working to improve their accuracy and reliability,” said OpenAI spokesperson Niko Felix in an email to TechCrunch.

    In the last year, the broader AI industry has pivoted to focus on reasoning models after techniques to improve traditional AI models started showing diminishing returns. Reasoning improves model performance on a variety of tasks without requiring massive amounts of computing and data during training. Yet it seems reasoning also may lead to more hallucinating — presenting a challenge.

    Maxwell Zeff is a senior reporter at TechCrunch specializing in AI and emerging technologies. Previously with Gizmodo, Bloomberg, and MSNBC, Zeff has covered the rise of AI and the Silicon Valley Bank crisis. He is based in San Francisco. When not reporting, he can be found hiking, biking, and exploring the Bay Area’s food scene.

    View Bio

    Share. Facebook Twitter Pinterest LinkedIn Reddit WhatsApp Telegram Email
    Previous ArticleWe just got a big hint that the Samsung Galaxy Z Fold 7 and Galaxy Z Flip 7 are on schedule
    Next Article Week in Review: Google loses a major antitrust case
    TechAiVerse
    • Website

    Jonathan is a tech enthusiast and the mind behind Tech AI Verse. With a passion for artificial intelligence, consumer tech, and emerging innovations, he deliver clear, insightful content to keep readers informed. From cutting-edge gadgets to AI advancements and cryptocurrency trends, Jonathan breaks down complex topics to make technology accessible to all.

    Related Posts

    Get a Samsung OLED gaming monitor for just $350

    February 10, 2026

    Qualcomm Snapdragon X2 Elite tops the Apple M5 in new test video

    February 10, 2026

    Tapo’s 1440p Wi-Fi security cam is 42% off! Grab it now for $70

    February 10, 2026
    Leave A Reply Cancel Reply

    Top Posts

    Ping, You’ve Got Whale: AI detection system alerts ships of whales in their path

    April 22, 2025660 Views

    Lumo vs. Duck AI: Which AI is Better for Your Privacy?

    July 31, 2025250 Views

    6.7 Cummins Lifter Failure: What Years Are Affected (And Possible Fixes)

    April 14, 2025148 Views

    6 Best MagSafe Phone Grips (2025), Tested and Reviewed

    April 6, 2025111 Views
    Don't Miss
    Technology February 10, 2026

    Get a Samsung OLED gaming monitor for just $350

    Get a Samsung OLED gaming monitor for just $350 Image: Samsung To paraphrase a certain…

    Qualcomm Snapdragon X2 Elite tops the Apple M5 in new test video

    Tapo’s 1440p Wi-Fi security cam is 42% off! Grab it now for $70

    This 8BitDo Retro wireless ‘mecha’ keyboard is just $63 today

    Stay In Touch
    • Facebook
    • Twitter
    • Pinterest
    • Instagram
    • YouTube
    • Vimeo

    Subscribe to Updates

    Get the latest creative news from SmartMag about art & design.

    About Us
    About Us

    Welcome to Tech AI Verse, your go-to destination for everything technology! We bring you the latest news, trends, and insights from the ever-evolving world of tech. Our coverage spans across global technology industry updates, artificial intelligence advancements, machine learning ethics, and automation innovations. Stay connected with us as we explore the limitless possibilities of technology!

    Facebook X (Twitter) Pinterest YouTube WhatsApp
    Our Picks

    Get a Samsung OLED gaming monitor for just $350

    February 10, 20263 Views

    Qualcomm Snapdragon X2 Elite tops the Apple M5 in new test video

    February 10, 20263 Views

    Tapo’s 1440p Wi-Fi security cam is 42% off! Grab it now for $70

    February 10, 20264 Views
    Most Popular

    7 Best Kids Bikes (2025): Mountain, Balance, Pedal, Coaster

    March 13, 20250 Views

    VTOMAN FlashSpeed 1500: Plenty Of Power For All Your Gear

    March 13, 20250 Views

    This new Roomba finally solves the big problem I have with robot vacuums

    March 13, 20250 Views
    © 2026 TechAiVerse. Designed by Divya Tech.
    • Home
    • About Us
    • Contact Us
    • Privacy Policy
    • Terms & Conditions

    Type above and press Enter to search. Press Esc to cancel.