Close Menu

    Subscribe to Updates

    Get the latest creative news from FooBar about art, design and business.

    What's Hot

    Obsidian boss says there are no plans for The Outer Worlds 3 following missed targets for the 2025 sequel

    Ares Interactive’s “AI-enabled development, marketing, and live-ops” secures $70m in Series A Funding

    Take-Two pauses development on Borderlands 4 Switch 2 port

    Facebook X (Twitter) Instagram
    • Artificial Intelligence
    • Business Technology
    • Cryptocurrency
    • Gadgets
    • Gaming
    • Health
    • Software and Apps
    • Technology
    Facebook X (Twitter) Instagram Pinterest Vimeo
    Tech AI Verse
    • Home
    • Artificial Intelligence

      Read the extended transcript: President Donald Trump interviewed by ‘NBC Nightly News’ anchor Tom Llamas

      February 6, 2026

      Stocks and bitcoin sink as investors dump software company shares

      February 4, 2026

      AI, crypto and Trump super PACs stash millions to spend on the midterms

      February 2, 2026

      To avoid accusations of AI cheating, college students are turning to AI

      January 29, 2026

      ChatGPT can embrace authoritarian ideas after just one prompt, researchers say

      January 24, 2026
    • Business

      New VoidLink malware framework targets Linux cloud servers

      January 14, 2026

      Nvidia Rubin’s rack-scale encryption signals a turning point for enterprise AI security

      January 13, 2026

      How KPMG is redefining the future of SAP consulting on a global scale

      January 10, 2026

      Top 10 cloud computing stories of 2025

      December 22, 2025

      Saudia Arabia’s STC commits to five-year network upgrade programme with Ericsson

      December 18, 2025
    • Crypto

      Arthur Hayes Attributes Bitcoin Crash to ETF-Linked Dealer Hedging

      February 8, 2026

      Monero XMR Attempts First Recovery in a Month, But Death Cross Risk Looms

      February 8, 2026

      HBAR Price Eyes a Potential 30% Rally – Here’s What the Charts are Signalling 

      February 8, 2026

      Bitcoin Mining Difficulty Hits Its Biggest Drop Since 2021 China Ban

      February 8, 2026

      How Severe Is This Bitcoin Bear Market and Where Is Price Headed Next?

      February 8, 2026
    • Technology

      How to stream the 2026 Super Bowl for free tonight: Patriots vs. Seahawks time, where to watch Super Bowl LX, start time, halftime show and more

      February 8, 2026

      AT&T’s budget-friendly phone for kids was designed with parental controls in mind

      February 8, 2026

      We may see Apple’s new iPads and MacBooks in only a matter of weeks

      February 8, 2026

      Steam now lets developers display the exact date of when their game leaves Early Access

      February 8, 2026

      The iPhone 17e will reportedly bring some key upgrades without raising the price

      February 8, 2026
    • Others
      • Gadgets
      • Gaming
      • Health
      • Software and Apps
    Check BMI
    Tech AI Verse
    You are at:Home»Technology»In crowded voice AI market, OpenAI bets on instruction-following and expressive speech to win enterprise adoption
    Technology

    In crowded voice AI market, OpenAI bets on instruction-following and expressive speech to win enterprise adoption

    TechAiVerseBy TechAiVerseAugust 29, 2025No Comments6 Mins Read3 Views
    Facebook Twitter Pinterest Telegram LinkedIn Tumblr Email Reddit
    In crowded voice AI market, OpenAI bets on instruction-following and expressive speech to win enterprise adoption
    Share
    Facebook Twitter LinkedIn Pinterest WhatsApp Email

    In crowded voice AI market, OpenAI bets on instruction-following and expressive speech to win enterprise adoption

    August 28, 2025 4:26 PM

    Credit: VentureBeat, generated with MidJourney

    Want smarter insights in your inbox? Sign up for our weekly newsletters to get only what matters to enterprise AI, data, and security leaders. Subscribe Now


    OpenAI adds to an increasingly competitive AI voice market for enterprises with its new model, gpt-realtime, that follows complex instructions and with voices “that sound more natural and expressive.”

    As voice AI continues to grow, and customers find use cases such as customer service calls or real-time translation, the market for realistic-sounding AI voices that also offer enterprise-grade security is heating up. OpenAI claims its new model provides a more human-like voice, but it still needs to compete against companies like ElevenLabs.

    The model will be available on the Realtime API, which the company also made generally available. Along with the gpt-realtime model, OpenAI also released new voices on the API, which it calls Cedar and Marin, and updated its other voices to work with the latest model.

    OpenAI said in a livestream that it worked with its customers who are building voice applications to train gpt-realtime and “carefully aligned the model to evals that are built on real-world scenarios like customer support and academic tutoring.”


    AI Scaling Hits Its Limits

    Power caps, rising token costs, and inference delays are reshaping enterprise AI. Join our exclusive salon to discover how top teams are:

    • Turning energy into a strategic advantage
    • Architecting efficient inference for real throughput gains
    • Unlocking competitive ROI with sustainable AI systems

    Secure your spot to stay ahead: https://bit.ly/4mwGngO


    The company touted the model’s ability to create emotive, natural-sounding voices that also align with how developers build with the technology. 

    Speech-to-speech models

    The model operates within a speech-to-speech framework, enabling it to understand spoken prompts and respond vocally. Speech-to-speech models are ideally suited for real-time responses, where a person, typically a customer, interacts with an application. 

    For example, a customer wants to return some products and calls a customer service platform. They could be talking to an AI voice assistant that responds to questions and requests as if they were speaking with a human. 

    In a livestream, OpenAI customers T-Mobile showcased an AI voice-powered agent that helps people find new phones. Another customer, the real estate search platform Zillow, showcased an agent who helps someone narrow down a neighborhood to find the perfect place. 

    OpenAI said gpt-realtime is its “most advanced, production-ready voice model.” Like its other voice models, it can switch languages mid-sentence. However, OpenAI researchers noted gpt-realtime can follow more complex instructions like “speak emphatically in a French accent.”

    But gpt-realtime faces competition from other models that many brands already use. ElevenLabs released Conversation AI 2.0 in May. Soundhound partners with fast food franchises for an AI voice drive-thru. Emphatic AI startup Hume has launched its EVI 3 model, which allows users to generate AI versions of their own voice. 

    As enterprises discover various use cases for voice AI, even more general model providers that offer multimodal LLMs are making a case for themselves. Mistral released its new Voxtral model, stating it would work well with real-time translation. Google is enhancing its audio capabilities and gaining popularity with an audio feature on NotebookLM that converts research notes into a podcast. 

    Better instruction following

    OpenAI said gpt-realtime is smarter and understands native audio better, including the ability to catch non-verbal cues like laughs or sighs. 

    Benchmarking using the Big Bench Audio eval showed the model scoring 82.8% in accuracy, compared to its previous model, which scored 65.6%. OpenAI did not provide numbers testing gpt-realtime against models from its competitors. 

    OpenAI focused on improving the model’s instruction-following capabilities, ensuring the model would adhere to directions more effectively. The new model achieves a score of 30.5% on the MultiChallenge audio benchmark. The engineers also beefed up function calling so gpt-realtime can access the correct tools. 

    Realtime API updates

    To support the new model and enhance how enterprises integrate real-time AI capabilities into their applications, OpenAI has added several new features to the Realtime API. 

    It can now support MCP and recognize image inputs, allowing it to inform users about what it sees in real-time. This is a feature Google heavily emphasized during its Project Astra presentation last year. 

    The Realtime API can also handle Session Initiation Protocol (SIP). SIP connects apps to phones like a public phone network or desk phones, opening up more contact center use cases. Users can also save and reuse prompts on the API.

    So far, people are impressed with the model, although these are still initial tests of a model that was recently released.  

    Tbh, the MCP and SIP features are the real story here, not just another model.

    The ability to connect to external tools and systems seamlessly is what will finally move these models from being impressive demos to being integrated into actual workflows.

    The real time aspect…

    — JK (@_junaidkhalid1) August 28, 2025

    Well, GPT-realtime got a livestream not because most users are interested, but for strategic business reasons

    Call centers are a major target for LLM providers and the first company to reach a real breakthrough will get massive revenue

    — AnKo (@anko_979) August 28, 2025

    Pros & Cons from @OpenAI real-time update from someone building in AI audio:

    Pro: Better function calling, more emotion, 20% cheaper, better control, image is cool but won’t use

    Con: no custom voices (creative experience MUST HAVE), still *expensive* vs TTS-LLM-STT pipelines

    — Gavin Purcell (@gavinpurcell) August 28, 2025

    OpenAI reduced prices for gpt-realtime by 20% to $32 per million audio input tokens and $64 for audio output tokens. 

    Daily insights on business use cases with VB Daily

    If you want to impress your boss, VB Daily has you covered. We give you the inside scoop on what companies are doing with generative AI, from regulatory shifts to practical deployments, so you can share insights for maximum ROI.

    Read our Privacy Policy

    Thanks for subscribing. Check out more VB newsletters here.

    An error occured.

    Share. Facebook Twitter Pinterest LinkedIn Reddit WhatsApp Telegram Email
    Previous ArticleNous Research drops Hermes 4 AI models that outperform ChatGPT without content restrictions
    Next Article Microsoft refuses to divulge data flows to Police Scotland
    TechAiVerse
    • Website

    Jonathan is a tech enthusiast and the mind behind Tech AI Verse. With a passion for artificial intelligence, consumer tech, and emerging innovations, he deliver clear, insightful content to keep readers informed. From cutting-edge gadgets to AI advancements and cryptocurrency trends, Jonathan breaks down complex topics to make technology accessible to all.

    Related Posts

    How to stream the 2026 Super Bowl for free tonight: Patriots vs. Seahawks time, where to watch Super Bowl LX, start time, halftime show and more

    February 8, 2026

    AT&T’s budget-friendly phone for kids was designed with parental controls in mind

    February 8, 2026

    We may see Apple’s new iPads and MacBooks in only a matter of weeks

    February 8, 2026
    Leave A Reply Cancel Reply

    Top Posts

    Ping, You’ve Got Whale: AI detection system alerts ships of whales in their path

    April 22, 2025659 Views

    Lumo vs. Duck AI: Which AI is Better for Your Privacy?

    July 31, 2025246 Views

    6.7 Cummins Lifter Failure: What Years Are Affected (And Possible Fixes)

    April 14, 2025148 Views

    6 Best MagSafe Phone Grips (2025), Tested and Reviewed

    April 6, 2025111 Views
    Don't Miss
    Gaming February 8, 2026

    Obsidian boss says there are no plans for The Outer Worlds 3 following missed targets for the 2025 sequel

    Obsidian boss says there are no plans for The Outer Worlds 3 following missed targets…

    Ares Interactive’s “AI-enabled development, marketing, and live-ops” secures $70m in Series A Funding

    Take-Two pauses development on Borderlands 4 Switch 2 port

    NBA 2K and Grand Theft Auto franchises boost Take-Two Q3 net revenue by 25% to $1.7bn

    Stay In Touch
    • Facebook
    • Twitter
    • Pinterest
    • Instagram
    • YouTube
    • Vimeo

    Subscribe to Updates

    Get the latest creative news from SmartMag about art & design.

    About Us
    About Us

    Welcome to Tech AI Verse, your go-to destination for everything technology! We bring you the latest news, trends, and insights from the ever-evolving world of tech. Our coverage spans across global technology industry updates, artificial intelligence advancements, machine learning ethics, and automation innovations. Stay connected with us as we explore the limitless possibilities of technology!

    Facebook X (Twitter) Pinterest YouTube WhatsApp
    Our Picks

    Obsidian boss says there are no plans for The Outer Worlds 3 following missed targets for the 2025 sequel

    February 8, 20260 Views

    Ares Interactive’s “AI-enabled development, marketing, and live-ops” secures $70m in Series A Funding

    February 8, 20260 Views

    Take-Two pauses development on Borderlands 4 Switch 2 port

    February 8, 20260 Views
    Most Popular

    7 Best Kids Bikes (2025): Mountain, Balance, Pedal, Coaster

    March 13, 20250 Views

    VTOMAN FlashSpeed 1500: Plenty Of Power For All Your Gear

    March 13, 20250 Views

    This new Roomba finally solves the big problem I have with robot vacuums

    March 13, 20250 Views
    © 2026 TechAiVerse. Designed by Divya Tech.
    • Home
    • About Us
    • Contact Us
    • Privacy Policy
    • Terms & Conditions

    Type above and press Enter to search. Press Esc to cancel.