Close Menu

    Subscribe to Updates

    Get the latest creative news from FooBar about art, design and business.

    What's Hot

    Build a Rocket Boy confirms more layoffs amid further claims of “organized espionage and corporate sabotage”

    Former Blizzard CCO and Bonfire CEO Rob Pardo to present keynote address at GDC Festival of Gaming

    Turkish mobile developer Vento Games secures $4m in seed round funding

    Facebook X (Twitter) Instagram
    • Artificial Intelligence
    • Business Technology
    • Cryptocurrency
    • Gadgets
    • Gaming
    • Health
    • Software and Apps
    • Technology
    Facebook X (Twitter) Instagram Pinterest Vimeo
    Tech AI Verse
    • Home
    • Artificial Intelligence

      What the polls say about how Americans are using AI

      February 27, 2026

      Tensions between the Pentagon and AI giant Anthropic reach a boiling point

      February 21, 2026

      Read the extended transcript: President Donald Trump interviewed by ‘NBC Nightly News’ anchor Tom Llamas

      February 6, 2026

      Stocks and bitcoin sink as investors dump software company shares

      February 4, 2026

      AI, crypto and Trump super PACs stash millions to spend on the midterms

      February 2, 2026
    • Business

      Google releases Gemini 3.1 Flash Lite at 1/8th the cost of Pro

      March 4, 2026

      Huawei Watch GT Series

      March 4, 2026

      Weighing up the enterprise risks of neocloud providers

      March 3, 2026

      A stolen Gemini API key turned a $180 bill into $82,000 in two days

      March 3, 2026

      These ultra-budget laptops “include” 1.2TB storage, but most of it is OneDrive trial space

      March 1, 2026
    • Crypto

      Banks Respond to Kraken’s Federal Reserve Access as Trump Sides with Crypto

      March 4, 2026

      Hyperliquid and DEXs Break the Top 10 — Is the CEX Era Ending?

      March 4, 2026

      Consensus Hong Kong 2026: The Institutional Turn 

      March 4, 2026

      New Crypto Mutuum Finance (MUTM) Reports V1 Protocol Progress as Roadmap Enters Phase 3

      March 4, 2026

      Bitcoin Short Sellers Caught Off Guard in New White House Move

      March 4, 2026
    • Technology

      Big tech companies agree to not ruin your electric bill with AI data centers

      March 5, 2026

      Mark Zuckerberg downplays Meta’s own research in New Mexico child safety trial

      March 5, 2026

      Bill Gates-backed TerraPower begins nuclear reactor construction

      March 5, 2026

      Assassin’s Creed Unity is getting a free 60 fps patch tomorrow

      March 5, 2026

      LG reveals pricing for its 2026 OLED TVs

      March 5, 2026
    • Others
      • Gadgets
      • Gaming
      • Health
      • Software and Apps
    Check BMI
    Tech AI Verse
    You are at:Home»Technology»Less is more: UC Berkeley and Google unlock LLM potential through simple sampling
    Technology

    Less is more: UC Berkeley and Google unlock LLM potential through simple sampling

    TechAiVerseBy TechAiVerseMarch 22, 2025No Comments6 Mins Read2 Views
    Facebook Twitter Pinterest Telegram LinkedIn Tumblr Email Reddit
    Less is more: UC Berkeley and Google unlock LLM potential through simple sampling
    Share
    Facebook Twitter LinkedIn Pinterest WhatsApp Email

    Less is more: UC Berkeley and Google unlock LLM potential through simple sampling

    March 21, 2025 3:39 PM

    Image credit: VentureBeat with Imagen 3

    Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More


    A new paper by researchers from Google Research and the University of California, Berkeley, demonstrates that a surprisingly simple test-time scaling approach can boost the reasoning abilities of large language models (LLMs). The key? Scaling up sampling-based search, a technique that relies on generating multiple responses and using the model itself to verify them. 

    The core finding is that even a minimalist implementation of sampling-based search, using random sampling and self-verification, can elevate the reasoning performance of models like Gemini 1.5 Pro beyond that of o1-Preview on popular benchmarks. The findings can have important implications for enterprise applications and challenge the assumption that highly specialized training or complex architectures are always necessary for achieving top-tier performance.

    The limits of current test-time compute scaling

    The current popular method for test-time scaling in LLMs is to train the model through reinforcement learning to generate longer responses with chain-of-thought (CoT) traces. This approach is used in models such as OpenAI o1 and DeepSeek-R1. While beneficial, these methods usually require substantial investment in the training phase.

    Another test-time scaling method is “self-consistency,” where the model generates multiple responses to the query and chooses the answer that appears more often. Self-consistency reaches its limits when handling complex problems, as in these cases, the most repeated answer is not necessarily the correct one.

    Sampling-based search offers a simpler and highly scalable alternative to test-time scaling: Let the model generate multiple responses and select the best one through a verification mechanism. Sampling-based search can complement other test-time compute scaling strategies and, as the researchers write in their paper, “it also has the unique advantage of being embarrassingly parallel and allowing for arbitrarily scaling: simply sample more responses.”

    More importantly, sampling-based search can be applied to any LLM, including those that have not been explicitly trained for reasoning.

    How sampling-based search works

    The researchers focus on a minimalist implementation of sampling-based search, using a language model to both generate candidate responses and verify them. This is a “self-verification” process, where the model assesses its own outputs without relying on external ground-truth answers or symbolic verification systems.

    Search-based sampling Credit: VentureBeat

    The algorithm works in a few simple steps: 

    1—The algorithm begins by generating a set of candidate solutions to the given problem using a language model. This is done by giving the model the same prompt multiple times and using a non-zero temperature setting to create a diverse set of responses.

    2—Each candidate’s response undergoes a verification process in which the LLM is prompted multiple times to determine whether the response is correct. The verification outcomes are then averaged to create a final verification score for the response.

    3— The algorithm selects the highest-scored response as the final answer. If multiple candidates are within close range of each other, the LLM is prompted to compare them pairwise and choose the best one. The response that wins the most pairwise comparisons is chosen as the final answer.

    The researchers considered two key axes for test-time scaling:

    Sampling: The number of responses the model generates for each input problem.

    Verification: The number of verification scores computed for each generated solution

    How sampling-based search compares to other techniques

    The study revealed that reasoning performance continues to improve with sampling-based search, even when test-time compute is scaled far beyond the point where self-consistency saturates. 

    At a sufficient scale, this minimalist implementation significantly boosts reasoning accuracy on reasoning benchmarks like AIME and MATH. For example, Gemini 1.5 Pro’s performance surpassed that of o1-Preview, which has explicitly been trained on reasoning problems, and Gemini 1.5 Flash surpassed Gemini 1.5 Pro.

    “This not only highlights the importance of sampling-based search for scaling capability, but also suggests the utility of sampling-based search as a simple baseline on which to compare other test-time compute scaling strategies and measure genuine improvements in models’ search capabilities,” the researchers write.

    It is worth noting that while the results of search-based sampling are impressive, the costs can also become prohibitive. For example, with 200 samples and 50 verification steps per sample, a query from AIME will generate around 130 million tokens, which costs $650 with Gemini 1.5 Pro. However, this is a very minimalistic approach to sampling-based search, and it is compatible with optimization techniques proposed in other studies. With smarter sampling and verification methods, the inference costs can be reduced considerably by using smaller models and generating fewer tokens. For example, by using Gemini 1.5 Flash to perform the verification, the costs drop to $12 per question.

    Effective self-verification strategies

    There is an ongoing debate on whether LLMs can verify their own answers. The researchers identified two key strategies for improving self-verification using test-time compute:

    Directly comparing response candidates: Disagreements between candidate solutions strongly indicate potential errors. By providing the verifier with multiple responses to compare, the model can better identify mistakes and hallucinations, addressing a core weakness of LLMs. The researchers describe this as an instance of “implicit scaling.”

    Task-specific rewriting: The researchers propose that the optimal output style of an LLM depends on the task. Chain-of-thought is effective for solving reasoning tasks, but responses are easier to verify when written in a more formal, mathematically conventional style. Verifiers can rewrite candidate responses into a more structured format (e.g., theorem-lemma-proof) before evaluation.

    “We anticipate model self-verification capabilities to rapidly improve in the short term, as models learn to leverage the principles of implicit scaling and output style suitability, and drive improved scaling rates for sampling-based search,” the researchers write.

    Implications for real-world applications

    The study demonstrates that a relatively simple technique can achieve impressive results, potentially reducing the need for complex and costly model architectures or training regimes.

    This is also a scalable technique, enabling enterprises to increase performance by allocating more compute resources to sampling and verification. It also enables developers to push frontier language models beyond their limitations on complex tasks.

    “Given that it complements other test-time compute scaling strategies, is parallelizable and allows for arbitrarily scaling, and admits simple implementations that are demonstrably effective, we expect sampling-based search to play a crucial role as language models are tasked with solving increasingly complex problems with increasingly large compute budgets,” the researchers write. 

    Daily insights on business use cases with VB Daily

    If you want to impress your boss, VB Daily has you covered. We give you the inside scoop on what companies are doing with generative AI, from regulatory shifts to practical deployments, so you can share insights for maximum ROI.

    Read our Privacy Policy

    Thanks for subscribing. Check out more VB newsletters here.

    An error occured.

    Share. Facebook Twitter Pinterest LinkedIn Reddit WhatsApp Telegram Email
    Previous ArticleMonica Harrington was the hidden figure of Valve in its critical early years | The DeanBeat
    Next Article Largest ever cyber deal reflects Google’s CNAPP ambitions
    TechAiVerse
    • Website

    Jonathan is a tech enthusiast and the mind behind Tech AI Verse. With a passion for artificial intelligence, consumer tech, and emerging innovations, he deliver clear, insightful content to keep readers informed. From cutting-edge gadgets to AI advancements and cryptocurrency trends, Jonathan breaks down complex topics to make technology accessible to all.

    Related Posts

    Big tech companies agree to not ruin your electric bill with AI data centers

    March 5, 2026

    Mark Zuckerberg downplays Meta’s own research in New Mexico child safety trial

    March 5, 2026

    Bill Gates-backed TerraPower begins nuclear reactor construction

    March 5, 2026
    Leave A Reply Cancel Reply

    Top Posts

    Ping, You’ve Got Whale: AI detection system alerts ships of whales in their path

    April 22, 2025704 Views

    Lumo vs. Duck AI: Which AI is Better for Your Privacy?

    July 31, 2025289 Views

    6.7 Cummins Lifter Failure: What Years Are Affected (And Possible Fixes)

    April 14, 2025164 Views

    6 Best MagSafe Phone Grips (2025), Tested and Reviewed

    April 6, 2025124 Views
    Don't Miss
    Gaming March 5, 2026

    Build a Rocket Boy confirms more layoffs amid further claims of “organized espionage and corporate sabotage”

    Build a Rocket Boy confirms more layoffs amid further claims of “organized espionage and corporate…

    Former Blizzard CCO and Bonfire CEO Rob Pardo to present keynote address at GDC Festival of Gaming

    Turkish mobile developer Vento Games secures $4m in seed round funding

    Good Games Group has bought the Humble and Firestoke back catalogues. Now, newly renamed as Balor Games, it wants to invest in triple-I

    Stay In Touch
    • Facebook
    • Twitter
    • Pinterest
    • Instagram
    • YouTube
    • Vimeo

    Subscribe to Updates

    Get the latest creative news from SmartMag about art & design.

    About Us
    About Us

    Welcome to Tech AI Verse, your go-to destination for everything technology! We bring you the latest news, trends, and insights from the ever-evolving world of tech. Our coverage spans across global technology industry updates, artificial intelligence advancements, machine learning ethics, and automation innovations. Stay connected with us as we explore the limitless possibilities of technology!

    Facebook X (Twitter) Pinterest YouTube WhatsApp
    Our Picks

    Build a Rocket Boy confirms more layoffs amid further claims of “organized espionage and corporate sabotage”

    March 5, 20262 Views

    Former Blizzard CCO and Bonfire CEO Rob Pardo to present keynote address at GDC Festival of Gaming

    March 5, 20262 Views

    Turkish mobile developer Vento Games secures $4m in seed round funding

    March 5, 20262 Views
    Most Popular

    7 Best Kids Bikes (2025): Mountain, Balance, Pedal, Coaster

    March 13, 20250 Views

    VTOMAN FlashSpeed 1500: Plenty Of Power For All Your Gear

    March 13, 20250 Views

    Best TV Antenna of 2025

    March 13, 20250 Views
    © 2026 TechAiVerse. Designed by Divya Tech.
    • Home
    • About Us
    • Contact Us
    • Privacy Policy
    • Terms & Conditions

    Type above and press Enter to search. Press Esc to cancel.