Close Menu

    Subscribe to Updates

    Get the latest creative news from FooBar about art, design and business.

    What's Hot

    Omnicom’s lack of surprises in its 2025 earnings is both a good and bad thing

    ‘Comment sections are not customers’: American Eagle brings back Sydney Sweeney amid celebrity push

    Media Briefing: Publishers explore selling AI visibility know-how to brands

    Facebook X (Twitter) Instagram
    • Artificial Intelligence
    • Business Technology
    • Cryptocurrency
    • Gadgets
    • Gaming
    • Health
    • Software and Apps
    • Technology
    Facebook X (Twitter) Instagram Pinterest Vimeo
    Tech AI Verse
    • Home
    • Artificial Intelligence

      Read the extended transcript: President Donald Trump interviewed by ‘NBC Nightly News’ anchor Tom Llamas

      February 6, 2026

      Stocks and bitcoin sink as investors dump software company shares

      February 4, 2026

      AI, crypto and Trump super PACs stash millions to spend on the midterms

      February 2, 2026

      To avoid accusations of AI cheating, college students are turning to AI

      January 29, 2026

      ChatGPT can embrace authoritarian ideas after just one prompt, researchers say

      January 24, 2026
    • Business

      The HDD brand that brought you the 1.8-inch, 2.5-inch, and 3.5-inch hard drives is now back with a $19 pocket-sized personal cloud for your smartphones

      February 12, 2026

      New VoidLink malware framework targets Linux cloud servers

      January 14, 2026

      Nvidia Rubin’s rack-scale encryption signals a turning point for enterprise AI security

      January 13, 2026

      How KPMG is redefining the future of SAP consulting on a global scale

      January 10, 2026

      Top 10 cloud computing stories of 2025

      December 22, 2025
    • Crypto

      Is Bitcoin Price Entering a New Bear Market? Here’s Why Metrics Say Yes

      February 19, 2026

      Cardano’s Trading Activity Crashes to a 6-Month Low — Can ADA Still Attempt a Reversal?

      February 19, 2026

      Is Extreme Fear a Buy Signal? New Data Questions the Conventional Wisdom

      February 19, 2026

      Coinbase and Ledn Strengthen Crypto Lending Push Despite Market Slump

      February 19, 2026

      Bitcoin Caught Between Hawkish Fed and Dovish Warsh

      February 19, 2026
    • Technology

      Omnicom’s lack of surprises in its 2025 earnings is both a good and bad thing

      February 19, 2026

      ‘Comment sections are not customers’: American Eagle brings back Sydney Sweeney amid celebrity push

      February 19, 2026

      Media Briefing: Publishers explore selling AI visibility know-how to brands

      February 19, 2026

      How the MLS plans to convert World Cup interest into lasting soccer fandom

      February 19, 2026

      Philips Hue releases new upgraded Turaco outdoor lights

      February 19, 2026
    • Others
      • Gadgets
      • Gaming
      • Health
      • Software and Apps
    Check BMI
    Tech AI Verse
    You are at:Home»Technology»Meta unleashes Llama API running 18x faster than OpenAI: Cerebras partnership delivers 2,600 tokens per second
    Technology

    Meta unleashes Llama API running 18x faster than OpenAI: Cerebras partnership delivers 2,600 tokens per second

    TechAiVerseBy TechAiVerseApril 30, 2025No Comments6 Mins Read2 Views
    Facebook Twitter Pinterest Telegram LinkedIn Tumblr Email Reddit
    Meta unleashes Llama API running 18x faster than OpenAI: Cerebras partnership delivers 2,600 tokens per second
    Share
    Facebook Twitter LinkedIn Pinterest WhatsApp Email

    Meta unleashes Llama API running 18x faster than OpenAI: Cerebras partnership delivers 2,600 tokens per second

    April 29, 2025 1:02 PM

    Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More


    Meta announced today a partnership with Cerebras Systems to power its new Llama API, offering developers access to inference speeds up to 18 times faster than traditional GPU-based solutions.

    The announcement, made at Meta’s inaugural LlamaCon developer conference in Menlo Park, positions the company to compete directly with OpenAI, Anthropic, and Google in the rapidly growing AI inference service market, where developers purchase tokens by the billions to power their applications.

    “Meta has selected Cerebras to collaborate to deliver the ultra-fast inference that they need to serve developers through their new Llama API,” said Julie Shin Choi, chief marketing officer at Cerebras, during a press briefing. “We at Cerebras are really, really excited to announce our first CSP hyperscaler partnership to deliver ultra-fast inference to all developers.”

    The partnership marks Meta’s formal entry into the business of selling AI computation, transforming its popular open-source Llama models into a commercial service. While Meta’s Llama models have accumulated over one billion downloads, until now the company had not offered a first-party cloud infrastructure for developers to build applications with them.

    “This is very exciting, even without talking about Cerebras specifically,” said James Wang, a senior executive at Cerebras. “OpenAI, Anthropic, Google — they’ve built an entire new AI business from scratch, which is the AI inference business. Developers who are building AI apps will buy tokens by the millions, by the billions sometimes. And these are just like the new compute instructions that people need to build AI applications.”

    A benchmark chart shows Cerebras processing Llama 4 at 2,648 tokens per second, dramatically outpacing competitors SambaNova (747), Groq (600) and GPU-based services from Google and others — explaining Meta’s hardware choice for its new API. (Credit: Cerebras)

    Breaking the speed barrier: How Cerebras supercharges Llama models

    What sets Meta’s offering apart is the dramatic speed increase provided by Cerebras’ specialized AI chips. The Cerebras system delivers over 2,600 tokens per second for Llama 4 Scout, compared to approximately 130 tokens per second for ChatGPT and around 25 tokens per second for DeepSeek, according to benchmarks from Artificial Analysis.

    “If you just compare on API-to-API basis, Gemini and GPT, they’re all great models, but they all run at GPU speeds, which is roughly about 100 tokens per second,” Wang explained. “And 100 tokens per second is okay for chat, but it’s very slow for reasoning. It’s very slow for agents. And people are struggling with that today.”

    This speed advantage enables entirely new categories of applications that were previously impractical, including real-time agents, conversational low-latency voice systems, interactive code generation, and instant multi-step reasoning — all of which require chaining multiple large language model calls that can now be completed in seconds rather than minutes.

    The Llama API represents a significant shift in Meta’s AI strategy, transitioning from primarily being a model provider to becoming a full-service AI infrastructure company. By offering an API service, Meta is creating a revenue stream from its AI investments while maintaining its commitment to open models.

    “Meta is now in the business of selling tokens, and it’s great for the American kind of AI ecosystem,” Wang noted during the press conference. “They bring a lot to the table.”

    The API will offer tools for fine-tuning and evaluation, starting with Llama 3.3 8B model, allowing developers to generate data, train on it, and test the quality of their custom models. Meta emphasizes that it won’t use customer data to train its own models, and models built using the Llama API can be transferred to other hosts—a clear differentiation from some competitors’ more closed approaches.

    Cerebras will power Meta’s new service through its network of data centers located throughout North America, including facilities in Dallas, Oklahoma, Minnesota, Montreal, and California.

    “All of our data centers that serve inference are in North America at this time,” Choi explained. “We will be serving Meta with the full capacity of Cerebras. The workload will be balanced across all of these different data centers.”

    The business arrangement follows what Choi described as “the classic compute provider to a hyperscaler” model, similar to how Nvidia provides hardware to major cloud providers. “They are reserving blocks of our compute that they can serve their developer population,” she said.

    Beyond Cerebras, Meta has also announced a partnership with Groq to provide fast inference options, giving developers multiple high-performance alternatives beyond traditional GPU-based inference.

    Meta’s entry into the inference API market with superior performance metrics could potentially disrupt the established order dominated by OpenAI, Google, and Anthropic. By combining the popularity of its open-source models with dramatically faster inference capabilities, Meta is positioning itself as a formidable competitor in the commercial AI space.

    “Meta is in a unique position with 3 billion users, hyper-scale datacenters, and a huge developer ecosystem,” according to Cerebras’ presentation materials. The integration of Cerebras technology “helps Meta leapfrog OpenAI and Google in performance by approximately 20x.”

    For Cerebras, this partnership represents a major milestone and validation of its specialized AI hardware approach. “We have been building this wafer-scale engine for years, and we always knew that the technology’s first rate, but ultimately it has to end up as part of someone else’s hyperscale cloud. That was the final target from a commercial strategy perspective, and we have finally reached that milestone,” Wang said.

    The Llama API is currently available as a limited preview, with Meta planning a broader rollout in the coming weeks and months. Developers interested in accessing the ultra-fast Llama 4 inference can request early access by selecting Cerebras from the model options within the Llama API.

    “If you imagine a developer who doesn’t know anything about Cerebras because we’re a relatively small company, they can just click two buttons on Meta’s standard software SDK, generate an API key, select the Cerebras flag, and then all of a sudden, their tokens are being processed on a giant wafer-scale engine,” Wang explained. “That kind of having us be on the back end of Meta’s whole developer ecosystem is just tremendous for us.”

    Meta’s choice of specialized silicon signals something profound: in the next phase of AI, it’s not just what your models know, but how quickly they can think it. In that future, speed isn’t just a feature—it’s the whole point.

    Daily insights on business use cases with VB Daily

    If you want to impress your boss, VB Daily has you covered. We give you the inside scoop on what companies are doing with generative AI, from regulatory shifts to practical deployments, so you can share insights for maximum ROI.

    Read our Privacy Policy

    Thanks for subscribing. Check out more VB newsletters here.

    An error occured.

    Share. Facebook Twitter Pinterest LinkedIn Reddit WhatsApp Telegram Email
    Previous ArticleEA and Respawn lay off more staff, cancel incubation projects
    Next Article Three Bees reveals Perfect Tides: Station to Station is coming to Switch
    TechAiVerse
    • Website

    Jonathan is a tech enthusiast and the mind behind Tech AI Verse. With a passion for artificial intelligence, consumer tech, and emerging innovations, he deliver clear, insightful content to keep readers informed. From cutting-edge gadgets to AI advancements and cryptocurrency trends, Jonathan breaks down complex topics to make technology accessible to all.

    Related Posts

    Omnicom’s lack of surprises in its 2025 earnings is both a good and bad thing

    February 19, 2026

    ‘Comment sections are not customers’: American Eagle brings back Sydney Sweeney amid celebrity push

    February 19, 2026

    Media Briefing: Publishers explore selling AI visibility know-how to brands

    February 19, 2026
    Leave A Reply Cancel Reply

    Top Posts

    Ping, You’ve Got Whale: AI detection system alerts ships of whales in their path

    April 22, 2025684 Views

    Lumo vs. Duck AI: Which AI is Better for Your Privacy?

    July 31, 2025273 Views

    6.7 Cummins Lifter Failure: What Years Are Affected (And Possible Fixes)

    April 14, 2025156 Views

    6 Best MagSafe Phone Grips (2025), Tested and Reviewed

    April 6, 2025118 Views
    Don't Miss
    Technology February 19, 2026

    Omnicom’s lack of surprises in its 2025 earnings is both a good and bad thing

    Omnicom’s lack of surprises in its 2025 earnings is both a good and bad thing…

    ‘Comment sections are not customers’: American Eagle brings back Sydney Sweeney amid celebrity push

    Media Briefing: Publishers explore selling AI visibility know-how to brands

    How the MLS plans to convert World Cup interest into lasting soccer fandom

    Stay In Touch
    • Facebook
    • Twitter
    • Pinterest
    • Instagram
    • YouTube
    • Vimeo

    Subscribe to Updates

    Get the latest creative news from SmartMag about art & design.

    About Us
    About Us

    Welcome to Tech AI Verse, your go-to destination for everything technology! We bring you the latest news, trends, and insights from the ever-evolving world of tech. Our coverage spans across global technology industry updates, artificial intelligence advancements, machine learning ethics, and automation innovations. Stay connected with us as we explore the limitless possibilities of technology!

    Facebook X (Twitter) Pinterest YouTube WhatsApp
    Our Picks

    Omnicom’s lack of surprises in its 2025 earnings is both a good and bad thing

    February 19, 20262 Views

    ‘Comment sections are not customers’: American Eagle brings back Sydney Sweeney amid celebrity push

    February 19, 20260 Views

    Media Briefing: Publishers explore selling AI visibility know-how to brands

    February 19, 20262 Views
    Most Popular

    7 Best Kids Bikes (2025): Mountain, Balance, Pedal, Coaster

    March 13, 20250 Views

    VTOMAN FlashSpeed 1500: Plenty Of Power For All Your Gear

    March 13, 20250 Views

    This new Roomba finally solves the big problem I have with robot vacuums

    March 13, 20250 Views
    © 2026 TechAiVerse. Designed by Divya Tech.
    • Home
    • About Us
    • Contact Us
    • Privacy Policy
    • Terms & Conditions

    Type above and press Enter to search. Press Esc to cancel.