Close Menu

    Subscribe to Updates

    Get the latest creative news from FooBar about art, design and business.

    What's Hot

    Solo.io wins ‘most likely to succeed’ award at VB Transform 2025 innovation showcase

    The great AI agent acceleration: Why enterprise adoption is happening faster than anyone predicted

    $8.8 trillion protected: How one CISO went from ‘that’s BS’ to bulletproof in 90 days

    Facebook X (Twitter) Instagram
    • Artificial Intelligence
    • Business Technology
    • Cryptocurrency
    • Gadgets
    • Gaming
    • Health
    • Software and Apps
    • Technology
    Facebook X (Twitter) Instagram Pinterest Vimeo
    Tech AI Verse
    • Home
    • Artificial Intelligence

      Apple sued by shareholders for allegedly overstating AI progress

      June 22, 2025

      How far will AI go to defend its own survival?

      June 2, 2025

      The internet thinks this video from Gaza is AI. Here’s how we proved it isn’t.

      May 30, 2025

      Nvidia CEO hails Trump’s plan to rescind some export curbs on AI chips to China

      May 22, 2025

      AI poses a bigger threat to women’s work, than men’s, report says

      May 21, 2025
    • Business

      Cloudflare open-sources Orange Meets with End-to-End encryption

      June 29, 2025

      Google links massive cloud outage to API management issue

      June 13, 2025

      The EU challenges Google and Cloudflare with its very own DNS resolver that can filter dangerous traffic

      June 11, 2025

      These two Ivanti bugs are allowing hackers to target cloud instances

      May 21, 2025

      How cloud and AI transform and improve customer experiences

      May 10, 2025
    • Crypto

      MoonPay Executives Might Have Fallen for $250,000 Trump-Themed Crypto Scam

      July 11, 2025

      Top 3 Altcoins Trending in Nigeria This Week

      July 11, 2025

      Tether is Removing USDT From These 5 Legacy Blockchains

      July 11, 2025

      HBAR Faces Final Hurdle After Explosive Rally; Are Bulls Tiring Out?

      July 11, 2025

      OKX Europe CEO Discusses Bitcoin’s Breakout Rally | US Crypto News

      July 11, 2025
    • Technology

      Solo.io wins ‘most likely to succeed’ award at VB Transform 2025 innovation showcase

      July 11, 2025

      The great AI agent acceleration: Why enterprise adoption is happening faster than anyone predicted

      July 11, 2025

      $8.8 trillion protected: How one CISO went from ‘that’s BS’ to bulletproof in 90 days

      July 11, 2025

      AWS doubles down on infrastructure as strategy in the AI race with SageMaker upgrades

      July 11, 2025

      The best Amazon Prime Day deals for the last day: Our top picks on headphones, TVs, robot vacuums and more

      July 11, 2025
    • Others
      • Gadgets
      • Gaming
      • Health
      • Software and Apps
    Shop Now
    Tech AI Verse
    You are at:Home»Technology»Beyond GPT architecture: Why Google’s Diffusion approach could reshape LLM deployment
    Technology

    Beyond GPT architecture: Why Google’s Diffusion approach could reshape LLM deployment

    TechAiVerseBy TechAiVerseJune 14, 2025No Comments8 Mins Read0 Views
    Facebook Twitter Pinterest Telegram LinkedIn Tumblr Email Reddit
    Beyond GPT architecture: Why Google’s Diffusion approach could reshape LLM deployment
    Share
    Facebook Twitter LinkedIn Pinterest WhatsApp Email

    Beyond GPT architecture: Why Google’s Diffusion approach could reshape LLM deployment

    June 13, 2025 2:48 PM

    Created by VentureBeat using ChatGPT

    Join the event trusted by enterprise leaders for nearly two decades. VB Transform brings together the people building real enterprise AI strategy. Learn more


    Last month, along with a comprehensive suite of new AI tools and innovations, Google DeepMind unveiled Gemini Diffusion. This experimental research model uses a diffusion-based approach to generate text. Traditionally, large language models (LLMs) like GPT and Gemini itself have relied on autoregression, a step-by-step approach where each word is generated based on the previous one. Diffusion language models (DLMs), also known as diffusion-based large language models (dLLMs), leverage a method more commonly seen in image generation, starting with random noise and gradually refining it into a coherent output. This approach dramatically increases generation speed and can improve coherency and consistency. 

    Gemini Diffusion is currently available as an experimental demo; sign up for the waitlist here to get access. 

    (Editor’s note: We’ll be unpacking paradigm shifts like diffusion-based language models—and what it takes to run them in production—at VB Transform, June 24–25 in San Francisco, alongside Google DeepMind, LinkedIn and other enterprise AI leaders.)

    Understanding diffusion vs. autoregression

    Diffusion and autoregression are fundamentally different approaches. The autoregressive approach generates text sequentially, with tokens predicted one at a time. While this method ensures strong coherence and context tracking, it can be computationally intensive and slow, especially for long-form content.

    Diffusion models, by contrast, begin with random noise, which is gradually denoised into a coherent output. When applied to language, the technique has several advantages. Blocks of text can be processed in parallel, potentially producing entire segments or sentences at a much higher rate. 

    Gemini Diffusion can reportedly generate 1,000-2,000 tokens per second. In contrast, Gemini 2.5 Flash has an average output speed of 272.4 tokens per second. Additionally, mistakes in generation can be corrected during the refining process, improving accuracy and reducing the number of hallucinations. There may be trade-offs in terms of fine-grained accuracy and token-level control; however, the increase in speed will be a game-changer for numerous applications. 

    How does diffusion-based text generation work?

    During training, DLMs work by gradually corrupting a sentence with noise over many steps, until the original sentence is rendered entirely unrecognizable. The model is then trained to reverse this process, step by step, reconstructing the original sentence from increasingly noisy versions. Through the iterative refinement, it learns to model the entire distribution of plausible sentences in the training data.

    While the specifics of Gemini Diffusion have not yet been disclosed, the typical training methodology for a diffusion model involves these key stages:

    Forward diffusion: With each sample in the training dataset, noise is added progressively over multiple cycles (often 500 to 1,000) until it becomes indistinguishable from random noise. 

    Reverse diffusion: The model learns to reverse each step of the noising process, essentially learning how to “denoise” a corrupted sentence one stage at a time, eventually restoring the original structure.

    This process is repeated millions of times with diverse samples and noise levels, enabling the model to learn a reliable denoising function. 

    Once trained, the model is capable of generating entirely new sentences. DLMs generally require a condition or input, such as a prompt, class label, or embedding, to guide the generation towards desired outcomes. The condition is injected into each step of the denoising process, which shapes an initial blob of noise into structured and coherent text. 

    Advantages and disadvantages of diffusion-based models

    In an interview with VentureBeat, Brendan O’Donoghue, research scientist at Google DeepMind and one of the leads on the Gemini Diffusion project, elaborated on some of the advantages of diffusion-based techniques when compared to autoregression. According to O’Donoghue, the major advantages of diffusion techniques are the following:

    • Lower latencies: Diffusion models can produce a sequence of tokens in much less time than autoregressive models.
    • Adaptive computation: Diffusion models will converge to a sequence of tokens at different rates depending on the task’s difficulty. This allows the model to consume fewer resources (and have lower latencies) on easy tasks and more on harder ones.
    • Non-causal reasoning: Due to the bidirectional attention in the denoiser, tokens can attend to future tokens within the same generation block. This allows non-causal reasoning to take place and allows the model to make global edits within a block to produce more coherent text.
    • Iterative refinement / self-correction: The denoising process involves sampling, which can introduce errors just like in autoregressive models. However, unlike autoregressive models, the tokens are passed back into the denoiser, which then has an opportunity to correct the error.

    O’Donoghue also noted the main disadvantages: “higher cost of serving and slightly higher time-to-first-token (TTFT), since autoregressive models will produce the first token right away. For diffusion, the first token can only appear when the entire sequence of tokens is ready.”

    Performance benchmarks

    Google says Gemini Diffusion’s performance is comparable to Gemini 2.0 Flash-Lite.

    Benchmark Type Gemini Diffusion Gemini 2.0 Flash-Lite
    LiveCodeBench (v6) Code 30.9% 28.5%
    BigCodeBench Code 45.4% 45.8%
    LBPP (v2) Code 56.8% 56.0%
    SWE-Bench Verified* Code 22.9% 28.5%
    HumanEval Code 89.6% 90.2%
    MBPP Code 76.0% 75.8%
    GPQA Diamond Science 40.4% 56.5%
    AIME 2025 Mathematics 23.3% 20.0%
    BIG-Bench Extra Hard Reasoning 15.0% 21.0%
    Global MMLU (Lite) Multilingual 69.1% 79.0%

    * Non-agentic evaluation (single turn edit only), max prompt length of 32K.

    The two models were compared using several benchmarks, with scores based on how many times the model produced the correct answer on the first try. Gemini Diffusion performed well in coding and mathematics tests, while Gemini 2.0 Flash-lite had the edge on reasoning, scientific knowledge, and multilingual capabilities. 

    As Gemini Diffusion evolves, there’s no reason to think that its performance won’t catch up with more established models. According to O’Donoghue, the gap between the two techniques is “essentially closed in terms of benchmark performance, at least at the relatively small sizes we have scaled up to. In fact, there may be some performance advantage for diffusion in some domains where non-local consistency is important, for example, coding and reasoning.”

    Testing Gemini Diffusion

    VentureBeat was granted access to the experimental demo. When putting Gemini Diffusion through its paces, the first thing we noticed was the speed. When running the suggested prompts provided by Google, including building interactive HTML apps like Xylophone and Planet Tac Toe, each request completed in under three seconds, with speeds ranging from 600 to 1,300 tokens per second.

    To test its performance with a real-world application, we asked Gemini Diffusion to build a video chat interface with the following prompt:

    Build an interface for a video chat application. It should have a preview window that accesses the camera on my device and displays its output. The interface should also have a sound level meter that measures the output from the device's microphone in real time.

    In less than two seconds, Gemini Diffusion created a working interface with a video preview and an audio meter. 

    Though this was not a complex implementation, it could be the start of an MVP that can be completed with a bit of further prompting. Note that Gemini 2.5 Flash also produced a working interface, albeit at a slightly slower pace (approximately seven seconds).

    Gemini Diffusion also features “Instant Edit,” a mode where text or code can be pasted in and edited in real-time with minimal prompting. Instant Edit is effective for many types of text editing, including correcting grammar, updating text to target different reader personas, or adding SEO keywords. It is also useful for tasks such as refactoring code, adding new features to applications, or converting an existing codebase to a different language. 

    Enterprise use cases for DLMs

    It’s safe to say that any application that requires a quick response time stands to benefit from DLM technology. This includes real-time and low-latency applications, such as conversational AI and chatbots, live transcription and translation, or IDE autocomplete and coding assistants.

    According to O’Donoghue, with applications that leverage “inline editing, for example, taking a piece of text and making some changes in-place, diffusion models are applicable in ways autoregressive models aren’t.” DLMs also have an advantage with reason, math, and coding problems, due to “the non-causal reasoning afforded by the bidirectional attention.”

    DLMs are still in their infancy; however, the technology can potentially transform how language models are built. Not only do they generate text at a much higher rate than autoregressive models, but their ability to go back and fix mistakes means that, eventually, they may also produce results with greater accuracy.

    Gemini Diffusion enters a growing ecosystem of DLMs, with two notable examples being Mercury, developed by Inception Labs, and LLaDa, an open-source model from GSAI. Together, these models reflect the broader momentum behind diffusion-based language generation and offer a scalable, parallelizable alternative to traditional autoregressive architectures.

    Daily insights on business use cases with VB Daily

    If you want to impress your boss, VB Daily has you covered. We give you the inside scoop on what companies are doing with generative AI, from regulatory shifts to practical deployments, so you can share insights for maximum ROI.

    Read our Privacy Policy

    Thanks for subscribing. Check out more VB newsletters here.

    An error occured.

    Share. Facebook Twitter Pinterest LinkedIn Reddit WhatsApp Telegram Email
    Previous ArticleThe case for embedding audit trails in AI systems before scaling
    Next Article Do reasoning models really “think” or not? Apple research sparks lively debate, response
    TechAiVerse
    • Website

    Jonathan is a tech enthusiast and the mind behind Tech AI Verse. With a passion for artificial intelligence, consumer tech, and emerging innovations, he deliver clear, insightful content to keep readers informed. From cutting-edge gadgets to AI advancements and cryptocurrency trends, Jonathan breaks down complex topics to make technology accessible to all.

    Related Posts

    Solo.io wins ‘most likely to succeed’ award at VB Transform 2025 innovation showcase

    July 11, 2025

    The great AI agent acceleration: Why enterprise adoption is happening faster than anyone predicted

    July 11, 2025

    $8.8 trillion protected: How one CISO went from ‘that’s BS’ to bulletproof in 90 days

    July 11, 2025
    Leave A Reply Cancel Reply

    Top Posts

    New Akira ransomware decryptor cracks encryptions keys using GPUs

    March 16, 202528 Views

    OpenAI details ChatGPT-o3, o4-mini, o4-mini-high usage limits

    April 19, 202522 Views

    6.7 Cummins Lifter Failure: What Years Are Affected (And Possible Fixes)

    April 14, 202519 Views

    Rsync replaced with openrsync on macOS Sequoia

    April 7, 202519 Views
    Don't Miss
    Technology July 11, 2025

    Solo.io wins ‘most likely to succeed’ award at VB Transform 2025 innovation showcase

    Solo.io wins ‘most likely to succeed’ award at VB Transform 2025 innovation showcase July 11,…

    The great AI agent acceleration: Why enterprise adoption is happening faster than anyone predicted

    $8.8 trillion protected: How one CISO went from ‘that’s BS’ to bulletproof in 90 days

    AWS doubles down on infrastructure as strategy in the AI race with SageMaker upgrades

    Stay In Touch
    • Facebook
    • Twitter
    • Pinterest
    • Instagram
    • YouTube
    • Vimeo

    Subscribe to Updates

    Get the latest creative news from SmartMag about art & design.

    About Us
    About Us

    Welcome to Tech AI Verse, your go-to destination for everything technology! We bring you the latest news, trends, and insights from the ever-evolving world of tech. Our coverage spans across global technology industry updates, artificial intelligence advancements, machine learning ethics, and automation innovations. Stay connected with us as we explore the limitless possibilities of technology!

    Facebook X (Twitter) Pinterest YouTube WhatsApp
    Our Picks

    Solo.io wins ‘most likely to succeed’ award at VB Transform 2025 innovation showcase

    July 11, 20251 Views

    The great AI agent acceleration: Why enterprise adoption is happening faster than anyone predicted

    July 11, 20252 Views

    $8.8 trillion protected: How one CISO went from ‘that’s BS’ to bulletproof in 90 days

    July 11, 20252 Views
    Most Popular

    Ethereum must hold $2,000 support or risk dropping to $1,850 – Here’s why

    March 12, 20250 Views

    Xiaomi 15 Ultra Officially Launched in China, Malaysia launch to follow after global event

    March 12, 20250 Views

    Apple thinks people won’t use MagSafe on iPhone 16e

    March 12, 20250 Views
    © 2025 TechAiVerse. Designed by Divya Tech.
    • Home
    • About Us
    • Contact Us
    • Privacy Policy
    • Terms & Conditions

    Type above and press Enter to search. Press Esc to cancel.