Close Menu

    Subscribe to Updates

    Get the latest creative news from FooBar about art, design and business.

    What's Hot

    Acer Expands Predator Lineup with AI Laptop, Gaming Desktops, Ultra-Fast Monitor and Keyboard

    IFA 2025: Acer unveils world’s lightest 16-inch laptop

    Samsung Malaysia introduces Galaxy A07 and Galaxy A17 5G

    Facebook X (Twitter) Instagram
    • Artificial Intelligence
    • Business Technology
    • Cryptocurrency
    • Gadgets
    • Gaming
    • Health
    • Software and Apps
    • Technology
    Facebook X (Twitter) Instagram Pinterest Vimeo
    Tech AI Verse
    • Home
    • Artificial Intelligence

      Blue-collar jobs are gaining popularity as AI threatens office work

      August 17, 2025

      Man who asked ChatGPT about cutting out salt from his diet was hospitalized with hallucinations

      August 15, 2025

      What happens when chatbots shape your reality? Concerns are growing online

      August 14, 2025

      Scientists want to prevent AI from going rogue by teaching it to be bad first

      August 8, 2025

      AI models may be accidentally (and secretly) learning each other’s bad behaviors

      July 30, 2025
    • Business

      Cloudflare hit by data breach in Salesloft Drift supply chain attack

      September 2, 2025

      Cloudflare blocks largest recorded DDoS attack peaking at 11.5 Tbps

      September 2, 2025

      Why Certified VMware Pros Are Driving the Future of IT

      August 24, 2025

      Murky Panda hackers exploit cloud trust to hack downstream customers

      August 23, 2025

      The rise of sovereign clouds: no data portability, no party

      August 20, 2025
    • Crypto

      Trump Death Rumors Fueled $1.6 Million In Prediction Market Bets This Weekend

      September 3, 2025

      3 US Crypto Stocks to Watch This Week

      September 3, 2025

      The Shocking Cost Of Bitcoin Payments: One Transaction Can Power a UK Home For 3 Weeks

      September 3, 2025

      Analysts Increase IREN Price Target: Will The Stock Keep Rallying?

      September 3, 2025

      ​​Pi Network Gears Up for Version 23 Upgrade, But Market Demand Stays Flat

      September 3, 2025
    • Technology

      I love Windows PCs, but a $599 MacBook would be mighty tempting

      September 3, 2025

      Acer’s new OLED gaming monitor hits a blistering 720Hz, with a catch

      September 3, 2025

      Acer Chromebook Plus Spin 514 review: This 2-in-1 multitasks like a pro

      September 3, 2025

      Acer’s Nitro V 16S packs big RTX gaming power into a slim frame

      September 3, 2025

      Acer’s latest Chromebook Plus laptop joins Google’s AI wave

      September 3, 2025
    • Others
      • Gadgets
      • Gaming
      • Health
      • Software and Apps
    Check BMI
    Tech AI Verse
    You are at:Home»Technology»OpenAI’s new reasoning AI models hallucinate more
    Technology

    OpenAI’s new reasoning AI models hallucinate more

    TechAiVerseBy TechAiVerseApril 20, 2025No Comments4 Mins Read2 Views
    Facebook Twitter Pinterest Telegram LinkedIn Tumblr Email Reddit
    OpenAI’s new reasoning AI models hallucinate more
    Share
    Facebook Twitter LinkedIn Pinterest WhatsApp Email

    BMI Calculator – Check your Body Mass Index for free!

    OpenAI’s new reasoning AI models hallucinate more

    OpenAI’s recently launched o3 and o4-mini AI models are state-of-the-art in many respects. However, the new models still hallucinate, or make things up — in fact, they hallucinate more than several of OpenAI’s older models.

    Hallucinations have proven to be one of the biggest and most difficult problems to solve in AI, impacting even today’s best-performing systems. Historically, each new model has improved slightly in the hallucination department, hallucinating less than its predecessor. But that doesn’t seem to be the case for o3 and o4-mini.

    According to OpenAI’s internal tests, o3 and o4-mini, which are so-called reasoning models, hallucinate more often than the company’s previous reasoning models — o1, o1-mini, and o3-mini — as well as OpenAI’s traditional, “non-reasoning” models, such as GPT-4o.

    Perhaps more concerning, the ChatGPT maker doesn’t really know why it’s happening.

    In its technical report for o3 and o4-mini, OpenAI writes that “more research is needed” to understand why hallucinations are getting worse as it scales up reasoning models. O3 and o4-mini perform better in some areas, including tasks related to coding and math. But because they “make more claims overall,” they’re often led to make “more accurate claims as well as more inaccurate/hallucinated claims,” per the report.

    OpenAI found that o3 hallucinated in response to 33% of questions on PersonQA, the company’s in-house benchmark for measuring the accuracy of a model’s knowledge about people. That’s roughly double the hallucination rate of OpenAI’s previous reasoning models, o1 and o3-mini, which scored 16% and 14.8%, respectively. O4-mini did even worse on PersonQA — hallucinating 48% of the time.

    Third-party testing by Transluce, a nonprofit AI research lab, also found evidence that o3 has a tendency to make up actions it took in the process of arriving at answers. In one example, Transluce observed o3 claiming that it ran code on a 2021 MacBook Pro “outside of ChatGPT,” then copied the numbers into its answer. While o3 has access to some tools, it can’t do that.

    “Our hypothesis is that the kind of reinforcement learning used for o-series models may amplify issues that are usually mitigated (but not fully erased) by standard post-training pipelines,” said Neil Chowdhury, a Transluce researcher and former OpenAI employee, in an email to TechCrunch.

    Sarah Schwettmann, co-founder of Transluce, added that o3’s hallucination rate may make it less useful than it otherwise would be.

    Kian Katanforoosh, a Stanford adjunct professor and CEO of the upskilling startup Workera, told TechCrunch that his team is already testing o3 in their coding workflows, and that they’ve found it to be a step above the competition. However, Katanforoosh says that o3 tends to hallucinate broken website links. The model will supply a link that, when clicked, doesn’t work.

    Hallucinations may help models arrive at interesting ideas and be creative in their “thinking,” but they also make some models a tough sell for businesses in markets where accuracy is paramount. For example, a law firm likely wouldn’t be pleased with a model that inserts lots of factual errors into client contracts.

    One promising approach to boosting the accuracy of models is giving them web search capabilities. OpenAI’s GPT-4o with web search achieves 90% accuracy on SimpleQA, another one of OpenAI’s accuracy benchmarks. Potentially, search could improve reasoning models’ hallucination rates, as well — at least in cases where users are willing to expose prompts to a third-party search provider.

    If scaling up reasoning models indeed continues to worsen hallucinations, it’ll make the hunt for a solution all the more urgent.

    “Addressing hallucinations across all our models is an ongoing area of research, and we’re continually working to improve their accuracy and reliability,” said OpenAI spokesperson Niko Felix in an email to TechCrunch.

    In the last year, the broader AI industry has pivoted to focus on reasoning models after techniques to improve traditional AI models started showing diminishing returns. Reasoning improves model performance on a variety of tasks without requiring massive amounts of computing and data during training. Yet it seems reasoning also may lead to more hallucinating — presenting a challenge.

    Maxwell Zeff is a senior reporter at TechCrunch specializing in AI and emerging technologies. Previously with Gizmodo, Bloomberg, and MSNBC, Zeff has covered the rise of AI and the Silicon Valley Bank crisis. He is based in San Francisco. When not reporting, he can be found hiking, biking, and exploring the Bay Area’s food scene.

    View Bio

    BMI Calculator – Check your Body Mass Index for free!

    Share. Facebook Twitter Pinterest LinkedIn Reddit WhatsApp Telegram Email
    Previous ArticleWe just got a big hint that the Samsung Galaxy Z Fold 7 and Galaxy Z Flip 7 are on schedule
    Next Article Week in Review: Google loses a major antitrust case
    TechAiVerse
    • Website

    Jonathan is a tech enthusiast and the mind behind Tech AI Verse. With a passion for artificial intelligence, consumer tech, and emerging innovations, he deliver clear, insightful content to keep readers informed. From cutting-edge gadgets to AI advancements and cryptocurrency trends, Jonathan breaks down complex topics to make technology accessible to all.

    Related Posts

    I love Windows PCs, but a $599 MacBook would be mighty tempting

    September 3, 2025

    Acer’s new OLED gaming monitor hits a blistering 720Hz, with a catch

    September 3, 2025

    Acer Chromebook Plus Spin 514 review: This 2-in-1 multitasks like a pro

    September 3, 2025
    Leave A Reply Cancel Reply

    Top Posts

    Ping, You’ve Got Whale: AI detection system alerts ships of whales in their path

    April 22, 2025176 Views

    6.7 Cummins Lifter Failure: What Years Are Affected (And Possible Fixes)

    April 14, 202548 Views

    New Akira ransomware decryptor cracks encryptions keys using GPUs

    March 16, 202530 Views

    Is Libby Compatible With Kobo E-Readers?

    March 31, 202529 Views
    Don't Miss
    Gadgets September 3, 2025

    Acer Expands Predator Lineup with AI Laptop, Gaming Desktops, Ultra-Fast Monitor and Keyboard

    Acer Expands Predator Lineup with AI Laptop, Gaming Desktops, Ultra-Fast Monitor and Keyboard Acer has…

    IFA 2025: Acer unveils world’s lightest 16-inch laptop

    Samsung Malaysia introduces Galaxy A07 and Galaxy A17 5G

    I love Windows PCs, but a $599 MacBook would be mighty tempting

    Stay In Touch
    • Facebook
    • Twitter
    • Pinterest
    • Instagram
    • YouTube
    • Vimeo

    Subscribe to Updates

    Get the latest creative news from SmartMag about art & design.

    About Us
    About Us

    Welcome to Tech AI Verse, your go-to destination for everything technology! We bring you the latest news, trends, and insights from the ever-evolving world of tech. Our coverage spans across global technology industry updates, artificial intelligence advancements, machine learning ethics, and automation innovations. Stay connected with us as we explore the limitless possibilities of technology!

    Facebook X (Twitter) Pinterest YouTube WhatsApp
    Our Picks

    Acer Expands Predator Lineup with AI Laptop, Gaming Desktops, Ultra-Fast Monitor and Keyboard

    September 3, 20252 Views

    IFA 2025: Acer unveils world’s lightest 16-inch laptop

    September 3, 20252 Views

    Samsung Malaysia introduces Galaxy A07 and Galaxy A17 5G

    September 3, 20252 Views
    Most Popular

    Xiaomi 15 Ultra Officially Launched in China, Malaysia launch to follow after global event

    March 12, 20250 Views

    Apple thinks people won’t use MagSafe on iPhone 16e

    March 12, 20250 Views

    French Apex Legends voice cast refuses contracts over “unacceptable” AI clause

    March 12, 20250 Views
    © 2025 TechAiVerse. Designed by Divya Tech.
    • Home
    • About Us
    • Contact Us
    • Privacy Policy
    • Terms & Conditions

    Type above and press Enter to search. Press Esc to cancel.