Close Menu

    Subscribe to Updates

    Get the latest creative news from FooBar about art, design and business.

    What's Hot

    Ready or Not tops 2m console sales in 2 weeks despite controversial changes | News-in-Brief

    Six of PlayStation’s ten US biggest-selling games in Q2 2025 were published by Microsoft

    UK game tech company JECO secures $1.3 million in pre-seed investment

    Facebook X (Twitter) Instagram
    • Artificial Intelligence
    • Business Technology
    • Cryptocurrency
    • Gadgets
    • Gaming
    • Health
    • Software and Apps
    • Technology
    Facebook X (Twitter) Instagram Pinterest Vimeo
    Tech AI Verse
    • Home
    • Artificial Intelligence

      AI models may be accidentally (and secretly) learning each other’s bad behaviors

      July 30, 2025

      Another Chinese AI model is turning heads

      July 15, 2025

      AI chatbot Grok issues apology for antisemitic posts

      July 13, 2025

      Apple sued by shareholders for allegedly overstating AI progress

      June 22, 2025

      How far will AI go to defend its own survival?

      June 2, 2025
    • Business

      Cloudflare open-sources Orange Meets with End-to-End encryption

      June 29, 2025

      Google links massive cloud outage to API management issue

      June 13, 2025

      The EU challenges Google and Cloudflare with its very own DNS resolver that can filter dangerous traffic

      June 11, 2025

      These two Ivanti bugs are allowing hackers to target cloud instances

      May 21, 2025

      How cloud and AI transform and improve customer experiences

      May 10, 2025
    • Crypto

      Shiba Inu Price’s 16% Drop Wipes Half Of July Gains; Is August In Trouble?

      July 30, 2025

      White House Crypto Report Suggests Major Changes to US Crypto Tax

      July 30, 2025

      XRP Whale Outflows Reflect Price Concern | Weekly Whale Watch

      July 30, 2025

      Stellar (XLM) Bull Flag Breakout Shows Cracks as Momentum Fades

      July 30, 2025

      Binance Listing Could Be a ‘Kiss of Death’ for Pi Network and New Tokens

      July 30, 2025
    • Technology

      Apple iOS 26: Is your iPhone compatible? Here’s a list which devices can download it today

      July 31, 2025

      iOS 26 beta release: Here’s everything you need to know about new Apple features and how to get it on your iPhone

      July 31, 2025

      Is Mark Zuckerberg flip flopping on open source AI?

      July 31, 2025

      Spotify now requires face scans to access age-restricted content in the UK

      July 31, 2025

      Showrunner, an AI-powered streaming service, launches in alpha this week

      July 31, 2025
    • Others
      • Gadgets
      • Gaming
      • Health
      • Software and Apps
    Check BMI
    Tech AI Verse
    You are at:Home»Technology»Beyond GPT architecture: Why Google’s Diffusion approach could reshape LLM deployment
    Technology

    Beyond GPT architecture: Why Google’s Diffusion approach could reshape LLM deployment

    TechAiVerseBy TechAiVerseJune 14, 2025No Comments8 Mins Read0 Views
    Facebook Twitter Pinterest Telegram LinkedIn Tumblr Email Reddit
    Beyond GPT architecture: Why Google’s Diffusion approach could reshape LLM deployment
    Share
    Facebook Twitter LinkedIn Pinterest WhatsApp Email

    BMI Calculator – Check your Body Mass Index for free!

    Beyond GPT architecture: Why Google’s Diffusion approach could reshape LLM deployment

    June 13, 2025 2:48 PM

    Created by VentureBeat using ChatGPT

    Join the event trusted by enterprise leaders for nearly two decades. VB Transform brings together the people building real enterprise AI strategy. Learn more


    Last month, along with a comprehensive suite of new AI tools and innovations, Google DeepMind unveiled Gemini Diffusion. This experimental research model uses a diffusion-based approach to generate text. Traditionally, large language models (LLMs) like GPT and Gemini itself have relied on autoregression, a step-by-step approach where each word is generated based on the previous one. Diffusion language models (DLMs), also known as diffusion-based large language models (dLLMs), leverage a method more commonly seen in image generation, starting with random noise and gradually refining it into a coherent output. This approach dramatically increases generation speed and can improve coherency and consistency. 

    Gemini Diffusion is currently available as an experimental demo; sign up for the waitlist here to get access. 

    (Editor’s note: We’ll be unpacking paradigm shifts like diffusion-based language models—and what it takes to run them in production—at VB Transform, June 24–25 in San Francisco, alongside Google DeepMind, LinkedIn and other enterprise AI leaders.)

    Understanding diffusion vs. autoregression

    Diffusion and autoregression are fundamentally different approaches. The autoregressive approach generates text sequentially, with tokens predicted one at a time. While this method ensures strong coherence and context tracking, it can be computationally intensive and slow, especially for long-form content.

    Diffusion models, by contrast, begin with random noise, which is gradually denoised into a coherent output. When applied to language, the technique has several advantages. Blocks of text can be processed in parallel, potentially producing entire segments or sentences at a much higher rate. 

    Gemini Diffusion can reportedly generate 1,000-2,000 tokens per second. In contrast, Gemini 2.5 Flash has an average output speed of 272.4 tokens per second. Additionally, mistakes in generation can be corrected during the refining process, improving accuracy and reducing the number of hallucinations. There may be trade-offs in terms of fine-grained accuracy and token-level control; however, the increase in speed will be a game-changer for numerous applications. 

    How does diffusion-based text generation work?

    During training, DLMs work by gradually corrupting a sentence with noise over many steps, until the original sentence is rendered entirely unrecognizable. The model is then trained to reverse this process, step by step, reconstructing the original sentence from increasingly noisy versions. Through the iterative refinement, it learns to model the entire distribution of plausible sentences in the training data.

    While the specifics of Gemini Diffusion have not yet been disclosed, the typical training methodology for a diffusion model involves these key stages:

    Forward diffusion: With each sample in the training dataset, noise is added progressively over multiple cycles (often 500 to 1,000) until it becomes indistinguishable from random noise. 

    Reverse diffusion: The model learns to reverse each step of the noising process, essentially learning how to “denoise” a corrupted sentence one stage at a time, eventually restoring the original structure.

    This process is repeated millions of times with diverse samples and noise levels, enabling the model to learn a reliable denoising function. 

    Once trained, the model is capable of generating entirely new sentences. DLMs generally require a condition or input, such as a prompt, class label, or embedding, to guide the generation towards desired outcomes. The condition is injected into each step of the denoising process, which shapes an initial blob of noise into structured and coherent text. 

    Advantages and disadvantages of diffusion-based models

    In an interview with VentureBeat, Brendan O’Donoghue, research scientist at Google DeepMind and one of the leads on the Gemini Diffusion project, elaborated on some of the advantages of diffusion-based techniques when compared to autoregression. According to O’Donoghue, the major advantages of diffusion techniques are the following:

    • Lower latencies: Diffusion models can produce a sequence of tokens in much less time than autoregressive models.
    • Adaptive computation: Diffusion models will converge to a sequence of tokens at different rates depending on the task’s difficulty. This allows the model to consume fewer resources (and have lower latencies) on easy tasks and more on harder ones.
    • Non-causal reasoning: Due to the bidirectional attention in the denoiser, tokens can attend to future tokens within the same generation block. This allows non-causal reasoning to take place and allows the model to make global edits within a block to produce more coherent text.
    • Iterative refinement / self-correction: The denoising process involves sampling, which can introduce errors just like in autoregressive models. However, unlike autoregressive models, the tokens are passed back into the denoiser, which then has an opportunity to correct the error.

    O’Donoghue also noted the main disadvantages: “higher cost of serving and slightly higher time-to-first-token (TTFT), since autoregressive models will produce the first token right away. For diffusion, the first token can only appear when the entire sequence of tokens is ready.”

    Performance benchmarks

    Google says Gemini Diffusion’s performance is comparable to Gemini 2.0 Flash-Lite.

    Benchmark Type Gemini Diffusion Gemini 2.0 Flash-Lite
    LiveCodeBench (v6) Code 30.9% 28.5%
    BigCodeBench Code 45.4% 45.8%
    LBPP (v2) Code 56.8% 56.0%
    SWE-Bench Verified* Code 22.9% 28.5%
    HumanEval Code 89.6% 90.2%
    MBPP Code 76.0% 75.8%
    GPQA Diamond Science 40.4% 56.5%
    AIME 2025 Mathematics 23.3% 20.0%
    BIG-Bench Extra Hard Reasoning 15.0% 21.0%
    Global MMLU (Lite) Multilingual 69.1% 79.0%

    * Non-agentic evaluation (single turn edit only), max prompt length of 32K.

    The two models were compared using several benchmarks, with scores based on how many times the model produced the correct answer on the first try. Gemini Diffusion performed well in coding and mathematics tests, while Gemini 2.0 Flash-lite had the edge on reasoning, scientific knowledge, and multilingual capabilities. 

    As Gemini Diffusion evolves, there’s no reason to think that its performance won’t catch up with more established models. According to O’Donoghue, the gap between the two techniques is “essentially closed in terms of benchmark performance, at least at the relatively small sizes we have scaled up to. In fact, there may be some performance advantage for diffusion in some domains where non-local consistency is important, for example, coding and reasoning.”

    Testing Gemini Diffusion

    VentureBeat was granted access to the experimental demo. When putting Gemini Diffusion through its paces, the first thing we noticed was the speed. When running the suggested prompts provided by Google, including building interactive HTML apps like Xylophone and Planet Tac Toe, each request completed in under three seconds, with speeds ranging from 600 to 1,300 tokens per second.

    To test its performance with a real-world application, we asked Gemini Diffusion to build a video chat interface with the following prompt:

    Build an interface for a video chat application. It should have a preview window that accesses the camera on my device and displays its output. The interface should also have a sound level meter that measures the output from the device's microphone in real time.

    In less than two seconds, Gemini Diffusion created a working interface with a video preview and an audio meter. 

    Though this was not a complex implementation, it could be the start of an MVP that can be completed with a bit of further prompting. Note that Gemini 2.5 Flash also produced a working interface, albeit at a slightly slower pace (approximately seven seconds).

    Gemini Diffusion also features “Instant Edit,” a mode where text or code can be pasted in and edited in real-time with minimal prompting. Instant Edit is effective for many types of text editing, including correcting grammar, updating text to target different reader personas, or adding SEO keywords. It is also useful for tasks such as refactoring code, adding new features to applications, or converting an existing codebase to a different language. 

    Enterprise use cases for DLMs

    It’s safe to say that any application that requires a quick response time stands to benefit from DLM technology. This includes real-time and low-latency applications, such as conversational AI and chatbots, live transcription and translation, or IDE autocomplete and coding assistants.

    According to O’Donoghue, with applications that leverage “inline editing, for example, taking a piece of text and making some changes in-place, diffusion models are applicable in ways autoregressive models aren’t.” DLMs also have an advantage with reason, math, and coding problems, due to “the non-causal reasoning afforded by the bidirectional attention.”

    DLMs are still in their infancy; however, the technology can potentially transform how language models are built. Not only do they generate text at a much higher rate than autoregressive models, but their ability to go back and fix mistakes means that, eventually, they may also produce results with greater accuracy.

    Gemini Diffusion enters a growing ecosystem of DLMs, with two notable examples being Mercury, developed by Inception Labs, and LLaDa, an open-source model from GSAI. Together, these models reflect the broader momentum behind diffusion-based language generation and offer a scalable, parallelizable alternative to traditional autoregressive architectures.

    Daily insights on business use cases with VB Daily

    If you want to impress your boss, VB Daily has you covered. We give you the inside scoop on what companies are doing with generative AI, from regulatory shifts to practical deployments, so you can share insights for maximum ROI.

    Read our Privacy Policy

    Thanks for subscribing. Check out more VB newsletters here.

    An error occured.

    BMI Calculator – Check your Body Mass Index for free!

    Share. Facebook Twitter Pinterest LinkedIn Reddit WhatsApp Telegram Email
    Previous ArticleThe case for embedding audit trails in AI systems before scaling
    Next Article Do reasoning models really “think” or not? Apple research sparks lively debate, response
    TechAiVerse
    • Website

    Jonathan is a tech enthusiast and the mind behind Tech AI Verse. With a passion for artificial intelligence, consumer tech, and emerging innovations, he deliver clear, insightful content to keep readers informed. From cutting-edge gadgets to AI advancements and cryptocurrency trends, Jonathan breaks down complex topics to make technology accessible to all.

    Related Posts

    Apple iOS 26: Is your iPhone compatible? Here’s a list which devices can download it today

    July 31, 2025

    iOS 26 beta release: Here’s everything you need to know about new Apple features and how to get it on your iPhone

    July 31, 2025

    Is Mark Zuckerberg flip flopping on open source AI?

    July 31, 2025
    Leave A Reply Cancel Reply

    Top Posts

    6.7 Cummins Lifter Failure: What Years Are Affected (And Possible Fixes)

    April 14, 202532 Views

    Ping, You’ve Got Whale: AI detection system alerts ships of whales in their path

    April 22, 202531 Views

    New Akira ransomware decryptor cracks encryptions keys using GPUs

    March 16, 202529 Views

    OpenAI details ChatGPT-o3, o4-mini, o4-mini-high usage limits

    April 19, 202522 Views
    Don't Miss
    Gaming July 31, 2025

    Ready or Not tops 2m console sales in 2 weeks despite controversial changes | News-in-Brief

    Ready or Not tops 2m console sales in 2 weeks despite controversial changes | News-in-Brief…

    Six of PlayStation’s ten US biggest-selling games in Q2 2025 were published by Microsoft

    UK game tech company JECO secures $1.3 million in pre-seed investment

    Microsoft implements Xbox age verification to comply with UK Online Safety Act

    Stay In Touch
    • Facebook
    • Twitter
    • Pinterest
    • Instagram
    • YouTube
    • Vimeo

    Subscribe to Updates

    Get the latest creative news from SmartMag about art & design.

    About Us
    About Us

    Welcome to Tech AI Verse, your go-to destination for everything technology! We bring you the latest news, trends, and insights from the ever-evolving world of tech. Our coverage spans across global technology industry updates, artificial intelligence advancements, machine learning ethics, and automation innovations. Stay connected with us as we explore the limitless possibilities of technology!

    Facebook X (Twitter) Pinterest YouTube WhatsApp
    Our Picks

    Ready or Not tops 2m console sales in 2 weeks despite controversial changes | News-in-Brief

    July 31, 20250 Views

    Six of PlayStation’s ten US biggest-selling games in Q2 2025 were published by Microsoft

    July 31, 20250 Views

    UK game tech company JECO secures $1.3 million in pre-seed investment

    July 31, 20250 Views
    Most Popular

    Xiaomi 15 Ultra Officially Launched in China, Malaysia launch to follow after global event

    March 12, 20250 Views

    Apple thinks people won’t use MagSafe on iPhone 16e

    March 12, 20250 Views

    French Apex Legends voice cast refuses contracts over “unacceptable” AI clause

    March 12, 20250 Views
    © 2025 TechAiVerse. Designed by Divya Tech.
    • Home
    • About Us
    • Contact Us
    • Privacy Policy
    • Terms & Conditions

    Type above and press Enter to search. Press Esc to cancel.