Close Menu

    Subscribe to Updates

    Get the latest creative news from FooBar about art, design and business.

    What's Hot

    Security cameras are finally part of the Matter standard

    Black Friday power bank deals: What to expect and early sales

    Black Friday laptop deals: What to expect and early sales

    Facebook X (Twitter) Instagram
    • Artificial Intelligence
    • Business Technology
    • Cryptocurrency
    • Gadgets
    • Gaming
    • Health
    • Software and Apps
    • Technology
    Facebook X (Twitter) Instagram Pinterest Vimeo
    Tech AI Verse
    • Home
    • Artificial Intelligence

      Insurance companies are trying to avoid big payouts by making AI safer

      November 19, 2025

      State and local opposition to new data centers is gaining steam, study shows

      November 15, 2025

      Amazon to lay off 14,000 corporate employees

      October 29, 2025

      Elon Musk launches Grokipedia as an alternative to ‘woke’ Wikipedia

      October 29, 2025

      Fears of an AI bubble are growing, but some on Wall Street aren’t worried just yet

      October 18, 2025
    • Business

      Windows 11 gets new Cloud Rebuild, Point-in-Time Restore tools

      November 18, 2025

      Government faces questions about why US AWS outage disrupted UK tax office and banking firms

      October 23, 2025

      Amazon’s AWS outage knocked services like Alexa, Snapchat, Fortnite, Venmo and more offline

      October 21, 2025

      SAP ECC customers bet on composable ERP to avoid upgrading

      October 18, 2025

      Revenue generated by neoclouds expected to exceed $23bn in 2025, predicts Synergy

      October 15, 2025
    • Crypto

      Nvidia Posts $57B Record Revenue with Bitcoin Rebounding Above $91K

      November 20, 2025

      3 Reasons Why A Cardano Price Rebound Looks Likely

      November 20, 2025

      BitMine (BMNR) Stock Bounces As Q4 Results Near — Is the Price Preparing Another Early Move?

      November 20, 2025

      Fed Minutes Reveal December Rate Cut on a Knife’s Edge, Bitcoin Slips Below $89,000

      November 20, 2025

      TRUMP Price Holds Above $7, Even As Epstein Files Release Approved

      November 20, 2025
    • Technology

      Security cameras are finally part of the Matter standard

      November 20, 2025

      Black Friday power bank deals: What to expect and early sales

      November 20, 2025

      Black Friday laptop deals: What to expect and early sales

      November 20, 2025

      Experts: Black Friday 2025 could be your last chance for cheap PC deals

      November 20, 2025

      I love my mini PC, but I still want to buy a tower PC. Here’s why

      November 20, 2025
    • Others
      • Gadgets
      • Gaming
      • Health
      • Software and Apps
    Check BMI
    Tech AI Verse
    You are at:Home»Technology»OpenAI’s new voice AI model gpt-4o-transcribe lets you add speech to your existing text apps in seconds
    Technology

    OpenAI’s new voice AI model gpt-4o-transcribe lets you add speech to your existing text apps in seconds

    TechAiVerseBy TechAiVerseMarch 20, 2025No Comments7 Mins Read2 Views
    Facebook Twitter Pinterest Telegram LinkedIn Tumblr Email Reddit
    OpenAI’s new voice AI model gpt-4o-transcribe lets you add speech to your existing text apps in seconds
    Share
    Facebook Twitter LinkedIn Pinterest WhatsApp Email

    OpenAI’s new voice AI model gpt-4o-transcribe lets you add speech to your existing text apps in seconds

    March 20, 2025 11:21 AM

    Credit: VentureBeat made with OpenAI ChatGPT

    Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More


    OpenAI’s voice AI models have gotten it into trouble before with actor Scarlett Johansson, but that isn’t stopping the company from continuing to advance its offerings in this category.

    Today, the ChatGPT maker has unveiled three, all new proprietary voice models called gpt-4o-transcribe, gpt-4o-mini-transcribe and gpt-4o-mini-tts, available initially in its application programming interface (API) for third-party software developers to build their own apps atop, as well as on a custom demo site, OpenAI.fm, that individual users can access for limited testing and fun.

    Moreover, the gpt-4o-mini-tts model voices can be customized from several pre-sets via text prompt to change their accents, pitch, tone, and other vocal qualities — including conveying whatever emotions the user asks them to, which should go a long way to addressing any concerns OpenAI is deliberately imitating any particular user’s voice (the company previously denied that was the case with Johansson, but pulled down the ostensibly imitative voice option, anyway). Now it’s up to the user to decide how they want their AI voice to sound when speaking back.

    In a demo with VentureBeat delivered over video call, OpenAI technical staff member Jeff Harris showed how using text alone on the demo site, a user could get the same voice to sound like a cackling mad scientist or a zen, calm yoga teacher.

    Discovering and refining new capabilities within GPT-4o base

    The models are variants of the existing GPT-4o model OpenAI launched back in May 2024 and which currently powers the ChatGPT text and voice experience for many users, but the company took that base model and post-trained it with additional data to make it excel at transcription and speech. The company didn’t specify when the models might come to ChatGPT.

    “ChatGPT has slightly different requirements in terms of cost and performance trade-offs, so while I expect they will move to these models in time, for now, this launch is focused on API users,” Harris said.

    It is meant to supersede OpenAI’s two-year-old Whisper open source text-to-speech model, offering lower word error rates across industry benchmarks and improved performance in noisy environments, with diverse accents, and at varying speech speeds — across 100+ languages.

    The company posted a chart on its website showing just how much lower the gpt-4o-transcribe models’ error rates are at identifying words across 33 languages, compared to Whisper — with an impressively low 2.46% in English.

    “These models include noise cancellation and a semantic voice activity detector, which helps determine when a speaker has finished a thought, improving transcription accuracy,” said Harris.

    Harris told VentureBeat that the new gpt-4o-transcribe model family is not designed to offer “diarization,” or the capability to label and differentiate between different speakers. Instead, it is designed primarily to receive one (or possibly multiple voices) as a single input channel and respond to all inputs with a single output voice in that interaction, however long it takes.

    The company is further hosting a competition for the general public to find the most creative examples of using its demo voice site OpenAI.fm and share them online by tagging the @openAI account on X. The winner is set to receive a custom Teenage Engineering radio with OpenAI logo, which OpenAI Head of Product, Platform Olivier Godement said is one of only three in the world.

    An audio applications gold mine

    The enhancements make them particularly well-suited for applications such as customer call centers, meeting note transcription, and AI-powered assistants.

    Impressively, the company’s newly launched Agents SDK from last week also allows those developers who have already built apps atop its text-based large language models like the regular GPT-4o to add fluid voice interactions with only about “nine lines of code,” according to a presenter during an OpenAI YouTube livestream announcing the new models (embedded above).

    For example, an e-commerce app built atop GPT-4o could now respond to turn-based user questions like “tell me about my last orders” in speech with just seconds of tweaking the code by adding these new models.

    “For the first time, we’re introducing streaming speech-to-text, allowing developers to continuously input audio and receive a real-time text stream, making conversations feel more natural,” Harris said.

    Still, for those devs looking for low-latency, real-time AI voice experiences, OpenAI recommends using its speech-to-speech models in the Realtime API.

    Pricing and availability

    The new models are available immediately via OpenAI’s API, with pricing as follows:

    • gpt-4o-transcribe: $6.00 per 1M audio input tokens (~$0.006 per minute)

    • gpt-4o-mini-transcribe: $3.00 per 1M audio input tokens (~$0.003 per minute)

    • gpt-4o-mini-tts: $0.60 per 1M text input tokens, $12.00 per 1M audio output tokens (~$0.015 per minute)

    However, they arrive into a time of fiercer-than-ever competition in the AI transcription and speech space, with dedicated speech AI firms such as ElevenLabs offering its new Scribe model that supports diarization and boasts a similarly (but not as low) reduced error rate of 3.3% in English, and pricing of $0.40 per hour of input audio (or $0.006 per minute, roughly equivalent).

    Another startup, Hume AI offers a new model Octave TTS with sentence-level and even word-level customization of pronunciation and emotional inflection — based entirely on the user’s instructions, not any pre-set voices. The pricing of Octave TTS isn’t directly comparable, but there is a free tier offering 10 minutes of audio and costs increase from there between

    Meanwhile, more advanced audio and speech models are also coming to the open source community, including one called Orpheus 3B which is available with a permissive Apache 2.0 license, meaning developers don’t have to pay any costs to run it — provided they have the right hardware or cloud servers.

    Industry adoption and early results

    Several companies have already integrated OpenAI’s new audio models into their platforms, reporting significant improvements in voice AI performance, according to testimonials shared by OpenAI with VentureBeat.

    EliseAI, a company focused on property management automation, found that OpenAI’s text-to-speech model enabled more natural and emotionally rich interactions with tenants.

    The enhanced voices made AI-powered leasing, maintenance, and tour scheduling more engaging, leading to higher tenant satisfaction and improved call resolution rates.

    Decagon, which builds AI-powered voice experiences, saw a 30% improvement in transcription accuracy using OpenAI’s speech recognition model.

    This increase in accuracy has allowed Decagon’s AI agents to perform more reliably in real-world scenarios, even in noisy environments. The integration process was quick, with Decagon incorporating the new model into its system within a day.

    Not all reactions to OpenAI’s latest release have been warm. Dawn AI app analytics software co-founder Ben Hylak (@benhylak), a former Apple human interfaces designer, posted on X that while the models seem promising, the announcement “feels like a retreat from real-time voice,” suggesting a shift away from OpenAI’s previous focus on low-latency conversational AI via ChatGPT.

    Additionally, the launch was preceded by an early leak on X (formerly Twitter). TestingCatalog News (@testingcatalog) posted details on the new models several minutes before the official announcement, listing the names of gpt-4o-mini-tts, gpt-4o-transcribe, and gpt-4o-mini-transcribe. The leak was credited to @StivenTheDev, and the post quickly gained traction.

    But looking ahead, OpenAI plans to continue refining its audio models and is exploring custom voice capabilities while ensuring safety and responsible AI use. Beyond audio, OpenAI is also investing in multimodal AI, including video, to enable more dynamic and interactive agent-based experiences.

    Daily insights on business use cases with VB Daily

    If you want to impress your boss, VB Daily has you covered. We give you the inside scoop on what companies are doing with generative AI, from regulatory shifts to practical deployments, so you can share insights for maximum ROI.

    Read our Privacy Policy

    Thanks for subscribing. Check out more VB newsletters here.

    An error occured.

    Share. Facebook Twitter Pinterest LinkedIn Reddit WhatsApp Telegram Email
    Previous ArticleLightSpeed Studios will focus on original IP in future games
    Next Article NAS storage: TrueNAS aims to make it big in Europe
    TechAiVerse
    • Website

    Jonathan is a tech enthusiast and the mind behind Tech AI Verse. With a passion for artificial intelligence, consumer tech, and emerging innovations, he deliver clear, insightful content to keep readers informed. From cutting-edge gadgets to AI advancements and cryptocurrency trends, Jonathan breaks down complex topics to make technology accessible to all.

    Related Posts

    Security cameras are finally part of the Matter standard

    November 20, 2025

    Black Friday power bank deals: What to expect and early sales

    November 20, 2025

    Black Friday laptop deals: What to expect and early sales

    November 20, 2025
    Leave A Reply Cancel Reply

    Top Posts

    Ping, You’ve Got Whale: AI detection system alerts ships of whales in their path

    April 22, 2025410 Views

    Lumo vs. Duck AI: Which AI is Better for Your Privacy?

    July 31, 2025109 Views

    6.7 Cummins Lifter Failure: What Years Are Affected (And Possible Fixes)

    April 14, 202575 Views

    Is Libby Compatible With Kobo E-Readers?

    March 31, 202555 Views
    Don't Miss
    Technology November 20, 2025

    Security cameras are finally part of the Matter standard

    Security cameras are finally part of the Matter standard Image: Ben Patterson/Foundry Smart bulbs, robot…

    Black Friday power bank deals: What to expect and early sales

    Black Friday laptop deals: What to expect and early sales

    Experts: Black Friday 2025 could be your last chance for cheap PC deals

    Stay In Touch
    • Facebook
    • Twitter
    • Pinterest
    • Instagram
    • YouTube
    • Vimeo

    Subscribe to Updates

    Get the latest creative news from SmartMag about art & design.

    About Us
    About Us

    Welcome to Tech AI Verse, your go-to destination for everything technology! We bring you the latest news, trends, and insights from the ever-evolving world of tech. Our coverage spans across global technology industry updates, artificial intelligence advancements, machine learning ethics, and automation innovations. Stay connected with us as we explore the limitless possibilities of technology!

    Facebook X (Twitter) Pinterest YouTube WhatsApp
    Our Picks

    Security cameras are finally part of the Matter standard

    November 20, 20250 Views

    Black Friday power bank deals: What to expect and early sales

    November 20, 20250 Views

    Black Friday laptop deals: What to expect and early sales

    November 20, 20250 Views
    Most Popular

    Xiaomi 15 Ultra Officially Launched in China, Malaysia launch to follow after global event

    March 12, 20250 Views

    Apple thinks people won’t use MagSafe on iPhone 16e

    March 12, 20250 Views

    French Apex Legends voice cast refuses contracts over “unacceptable” AI clause

    March 12, 20250 Views
    © 2025 TechAiVerse. Designed by Divya Tech.
    • Home
    • About Us
    • Contact Us
    • Privacy Policy
    • Terms & Conditions

    Type above and press Enter to search. Press Esc to cancel.