Close Menu

    Subscribe to Updates

    Get the latest creative news from FooBar about art, design and business.

    What's Hot

    Infinix launches NOTE 60 Series in Malaysia

    Razer’s laptop sleeve comes with MagSafe wireless charging for US$130

    Apple’s Rumoured Budget MacBook Could Arrive With Notable Feature Trade-Offs

    Facebook X (Twitter) Instagram
    • Artificial Intelligence
    • Business Technology
    • Cryptocurrency
    • Gadgets
    • Gaming
    • Health
    • Software and Apps
    • Technology
    Facebook X (Twitter) Instagram Pinterest Vimeo
    Tech AI Verse
    • Home
    • Artificial Intelligence

      What the polls say about how Americans are using AI

      February 27, 2026

      Tensions between the Pentagon and AI giant Anthropic reach a boiling point

      February 21, 2026

      Read the extended transcript: President Donald Trump interviewed by ‘NBC Nightly News’ anchor Tom Llamas

      February 6, 2026

      Stocks and bitcoin sink as investors dump software company shares

      February 4, 2026

      AI, crypto and Trump super PACs stash millions to spend on the midterms

      February 2, 2026
    • Business

      FCC approves the merger of cable giants Cox and Charter

      February 28, 2026

      Finding value with AI and Industry 5.0 transformation

      February 28, 2026

      How Smarsh built an AI front door for regulated industries — and drove 59% self-service adoption

      February 24, 2026

      Where MENA CIOs draw the line on AI sovereignty

      February 24, 2026

      Ex-President’s shift away from Xbox consoles to cloud gaming reportedly caused friction

      February 24, 2026
    • Crypto

      Palladium Price Approaches a Critical Turning Point

      February 28, 2026

      Trump to Takeover Cuba, Iran War Tensions Rise, Bitcoin Crashes Again

      February 28, 2026

      A 40% XRP Crash Couldn’t Shake Its Strongest Holders — Is $1.70 Still Possible?

      February 28, 2026

      Why Is the US Stock Market Down Today?

      February 28, 2026

      SoFi Becomes First US Chartered Bank to Support Solana Deposits

      February 28, 2026
    • Technology

      The compact smartphone I actually want: Xiaomi 17 in a Galaxy S26 world

      February 28, 2026

      Exciting laptop concept turns palm rest into an E ink notepad

      February 28, 2026

      AYN Thor and Odin 3 new pricing revealed, to take effect in March

      February 28, 2026

      External desktop graphics card on a mini PC, laptop or tablet? The Minisforum DEG2 makes it possible

      February 28, 2026

      Linux-based Orange Pi Neo gaming handheld delayed due to rising RAM and storage costs

      February 28, 2026
    • Others
      • Gadgets
      • Gaming
      • Health
      • Software and Apps
    Check BMI
    Tech AI Verse
    You are at:Home»Technology»OpenAI’s new voice AI model gpt-4o-transcribe lets you add speech to your existing text apps in seconds
    Technology

    OpenAI’s new voice AI model gpt-4o-transcribe lets you add speech to your existing text apps in seconds

    TechAiVerseBy TechAiVerseMarch 20, 2025No Comments7 Mins Read3 Views
    Facebook Twitter Pinterest Telegram LinkedIn Tumblr Email Reddit
    OpenAI’s new voice AI model gpt-4o-transcribe lets you add speech to your existing text apps in seconds
    Share
    Facebook Twitter LinkedIn Pinterest WhatsApp Email

    OpenAI’s new voice AI model gpt-4o-transcribe lets you add speech to your existing text apps in seconds

    March 20, 2025 11:21 AM

    Credit: VentureBeat made with OpenAI ChatGPT

    Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More


    OpenAI’s voice AI models have gotten it into trouble before with actor Scarlett Johansson, but that isn’t stopping the company from continuing to advance its offerings in this category.

    Today, the ChatGPT maker has unveiled three, all new proprietary voice models called gpt-4o-transcribe, gpt-4o-mini-transcribe and gpt-4o-mini-tts, available initially in its application programming interface (API) for third-party software developers to build their own apps atop, as well as on a custom demo site, OpenAI.fm, that individual users can access for limited testing and fun.

    Moreover, the gpt-4o-mini-tts model voices can be customized from several pre-sets via text prompt to change their accents, pitch, tone, and other vocal qualities — including conveying whatever emotions the user asks them to, which should go a long way to addressing any concerns OpenAI is deliberately imitating any particular user’s voice (the company previously denied that was the case with Johansson, but pulled down the ostensibly imitative voice option, anyway). Now it’s up to the user to decide how they want their AI voice to sound when speaking back.

    In a demo with VentureBeat delivered over video call, OpenAI technical staff member Jeff Harris showed how using text alone on the demo site, a user could get the same voice to sound like a cackling mad scientist or a zen, calm yoga teacher.

    Discovering and refining new capabilities within GPT-4o base

    The models are variants of the existing GPT-4o model OpenAI launched back in May 2024 and which currently powers the ChatGPT text and voice experience for many users, but the company took that base model and post-trained it with additional data to make it excel at transcription and speech. The company didn’t specify when the models might come to ChatGPT.

    “ChatGPT has slightly different requirements in terms of cost and performance trade-offs, so while I expect they will move to these models in time, for now, this launch is focused on API users,” Harris said.

    It is meant to supersede OpenAI’s two-year-old Whisper open source text-to-speech model, offering lower word error rates across industry benchmarks and improved performance in noisy environments, with diverse accents, and at varying speech speeds — across 100+ languages.

    The company posted a chart on its website showing just how much lower the gpt-4o-transcribe models’ error rates are at identifying words across 33 languages, compared to Whisper — with an impressively low 2.46% in English.

    “These models include noise cancellation and a semantic voice activity detector, which helps determine when a speaker has finished a thought, improving transcription accuracy,” said Harris.

    Harris told VentureBeat that the new gpt-4o-transcribe model family is not designed to offer “diarization,” or the capability to label and differentiate between different speakers. Instead, it is designed primarily to receive one (or possibly multiple voices) as a single input channel and respond to all inputs with a single output voice in that interaction, however long it takes.

    The company is further hosting a competition for the general public to find the most creative examples of using its demo voice site OpenAI.fm and share them online by tagging the @openAI account on X. The winner is set to receive a custom Teenage Engineering radio with OpenAI logo, which OpenAI Head of Product, Platform Olivier Godement said is one of only three in the world.

    An audio applications gold mine

    The enhancements make them particularly well-suited for applications such as customer call centers, meeting note transcription, and AI-powered assistants.

    Impressively, the company’s newly launched Agents SDK from last week also allows those developers who have already built apps atop its text-based large language models like the regular GPT-4o to add fluid voice interactions with only about “nine lines of code,” according to a presenter during an OpenAI YouTube livestream announcing the new models (embedded above).

    For example, an e-commerce app built atop GPT-4o could now respond to turn-based user questions like “tell me about my last orders” in speech with just seconds of tweaking the code by adding these new models.

    “For the first time, we’re introducing streaming speech-to-text, allowing developers to continuously input audio and receive a real-time text stream, making conversations feel more natural,” Harris said.

    Still, for those devs looking for low-latency, real-time AI voice experiences, OpenAI recommends using its speech-to-speech models in the Realtime API.

    Pricing and availability

    The new models are available immediately via OpenAI’s API, with pricing as follows:

    • gpt-4o-transcribe: $6.00 per 1M audio input tokens (~$0.006 per minute)

    • gpt-4o-mini-transcribe: $3.00 per 1M audio input tokens (~$0.003 per minute)

    • gpt-4o-mini-tts: $0.60 per 1M text input tokens, $12.00 per 1M audio output tokens (~$0.015 per minute)

    However, they arrive into a time of fiercer-than-ever competition in the AI transcription and speech space, with dedicated speech AI firms such as ElevenLabs offering its new Scribe model that supports diarization and boasts a similarly (but not as low) reduced error rate of 3.3% in English, and pricing of $0.40 per hour of input audio (or $0.006 per minute, roughly equivalent).

    Another startup, Hume AI offers a new model Octave TTS with sentence-level and even word-level customization of pronunciation and emotional inflection — based entirely on the user’s instructions, not any pre-set voices. The pricing of Octave TTS isn’t directly comparable, but there is a free tier offering 10 minutes of audio and costs increase from there between

    Meanwhile, more advanced audio and speech models are also coming to the open source community, including one called Orpheus 3B which is available with a permissive Apache 2.0 license, meaning developers don’t have to pay any costs to run it — provided they have the right hardware or cloud servers.

    Industry adoption and early results

    Several companies have already integrated OpenAI’s new audio models into their platforms, reporting significant improvements in voice AI performance, according to testimonials shared by OpenAI with VentureBeat.

    EliseAI, a company focused on property management automation, found that OpenAI’s text-to-speech model enabled more natural and emotionally rich interactions with tenants.

    The enhanced voices made AI-powered leasing, maintenance, and tour scheduling more engaging, leading to higher tenant satisfaction and improved call resolution rates.

    Decagon, which builds AI-powered voice experiences, saw a 30% improvement in transcription accuracy using OpenAI’s speech recognition model.

    This increase in accuracy has allowed Decagon’s AI agents to perform more reliably in real-world scenarios, even in noisy environments. The integration process was quick, with Decagon incorporating the new model into its system within a day.

    Not all reactions to OpenAI’s latest release have been warm. Dawn AI app analytics software co-founder Ben Hylak (@benhylak), a former Apple human interfaces designer, posted on X that while the models seem promising, the announcement “feels like a retreat from real-time voice,” suggesting a shift away from OpenAI’s previous focus on low-latency conversational AI via ChatGPT.

    Additionally, the launch was preceded by an early leak on X (formerly Twitter). TestingCatalog News (@testingcatalog) posted details on the new models several minutes before the official announcement, listing the names of gpt-4o-mini-tts, gpt-4o-transcribe, and gpt-4o-mini-transcribe. The leak was credited to @StivenTheDev, and the post quickly gained traction.

    But looking ahead, OpenAI plans to continue refining its audio models and is exploring custom voice capabilities while ensuring safety and responsible AI use. Beyond audio, OpenAI is also investing in multimodal AI, including video, to enable more dynamic and interactive agent-based experiences.

    Daily insights on business use cases with VB Daily

    If you want to impress your boss, VB Daily has you covered. We give you the inside scoop on what companies are doing with generative AI, from regulatory shifts to practical deployments, so you can share insights for maximum ROI.

    Read our Privacy Policy

    Thanks for subscribing. Check out more VB newsletters here.

    An error occured.

    Share. Facebook Twitter Pinterest LinkedIn Reddit WhatsApp Telegram Email
    Previous ArticleLightSpeed Studios will focus on original IP in future games
    Next Article NAS storage: TrueNAS aims to make it big in Europe
    TechAiVerse
    • Website

    Jonathan is a tech enthusiast and the mind behind Tech AI Verse. With a passion for artificial intelligence, consumer tech, and emerging innovations, he deliver clear, insightful content to keep readers informed. From cutting-edge gadgets to AI advancements and cryptocurrency trends, Jonathan breaks down complex topics to make technology accessible to all.

    Related Posts

    The compact smartphone I actually want: Xiaomi 17 in a Galaxy S26 world

    February 28, 2026

    Exciting laptop concept turns palm rest into an E ink notepad

    February 28, 2026

    AYN Thor and Odin 3 new pricing revealed, to take effect in March

    February 28, 2026
    Leave A Reply Cancel Reply

    Top Posts

    Ping, You’ve Got Whale: AI detection system alerts ships of whales in their path

    April 22, 2025699 Views

    Lumo vs. Duck AI: Which AI is Better for Your Privacy?

    July 31, 2025280 Views

    6.7 Cummins Lifter Failure: What Years Are Affected (And Possible Fixes)

    April 14, 2025162 Views

    6 Best MagSafe Phone Grips (2025), Tested and Reviewed

    April 6, 2025124 Views
    Don't Miss
    Gadgets March 1, 2026

    Infinix launches NOTE 60 Series in Malaysia

    Infinix launches NOTE 60 Series in Malaysia Infinix has unveiled the new Infinix NOTE 60…

    Razer’s laptop sleeve comes with MagSafe wireless charging for US$130

    Apple’s Rumoured Budget MacBook Could Arrive With Notable Feature Trade-Offs

    The compact smartphone I actually want: Xiaomi 17 in a Galaxy S26 world

    Stay In Touch
    • Facebook
    • Twitter
    • Pinterest
    • Instagram
    • YouTube
    • Vimeo

    Subscribe to Updates

    Get the latest creative news from SmartMag about art & design.

    About Us
    About Us

    Welcome to Tech AI Verse, your go-to destination for everything technology! We bring you the latest news, trends, and insights from the ever-evolving world of tech. Our coverage spans across global technology industry updates, artificial intelligence advancements, machine learning ethics, and automation innovations. Stay connected with us as we explore the limitless possibilities of technology!

    Facebook X (Twitter) Pinterest YouTube WhatsApp
    Our Picks

    Infinix launches NOTE 60 Series in Malaysia

    March 1, 20262 Views

    Razer’s laptop sleeve comes with MagSafe wireless charging for US$130

    March 1, 20262 Views

    Apple’s Rumoured Budget MacBook Could Arrive With Notable Feature Trade-Offs

    March 1, 20262 Views
    Most Popular

    7 Best Kids Bikes (2025): Mountain, Balance, Pedal, Coaster

    March 13, 20250 Views

    VTOMAN FlashSpeed 1500: Plenty Of Power For All Your Gear

    March 13, 20250 Views

    Best TV Antenna of 2025

    March 13, 20250 Views
    © 2026 TechAiVerse. Designed by Divya Tech.
    • Home
    • About Us
    • Contact Us
    • Privacy Policy
    • Terms & Conditions

    Type above and press Enter to search. Press Esc to cancel.