Close Menu

    Subscribe to Updates

    Get the latest creative news from FooBar about art, design and business.

    What's Hot

    Clair Obscur: Expedition 33 wins Game of the Year at Game Developers Choice Awards 2026

    Until Dawn remake dev Ballistic Moon has closed

    Games with loot boxes will be rated PEGI 16 from June, as part of sweeping changes to the age-rating system

    Facebook X (Twitter) Instagram
    • Artificial Intelligence
    • Business Technology
    • Cryptocurrency
    • Gadgets
    • Gaming
    • Health
    • Software and Apps
    • Technology
    Facebook X (Twitter) Instagram Pinterest Vimeo
    Tech AI Verse
    • Home
    • Artificial Intelligence

      What the polls say about how Americans are using AI

      February 27, 2026

      Tensions between the Pentagon and AI giant Anthropic reach a boiling point

      February 21, 2026

      Read the extended transcript: President Donald Trump interviewed by ‘NBC Nightly News’ anchor Tom Llamas

      February 6, 2026

      Stocks and bitcoin sink as investors dump software company shares

      February 4, 2026

      AI, crypto and Trump super PACs stash millions to spend on the midterms

      February 2, 2026
    • Business

      Salesforce tracks possible ShinyHunters campaign targeting its users

      March 15, 2026

      The team behind continuous batching says your idle GPUs should be running inference, not sitting dark

      March 13, 2026

      Met Office ‘supercomputing as a service’ one year old

      March 12, 2026

      Tech hiring evolves as candidates ask for AI compute alongside pay and perks

      March 11, 2026

      Oracle is spending billions on AI data centers as cash flow turns negative

      March 11, 2026
    • Crypto

      Banks Respond to Kraken’s Federal Reserve Access as Trump Sides with Crypto

      March 4, 2026

      Hyperliquid and DEXs Break the Top 10 — Is the CEX Era Ending?

      March 4, 2026

      Consensus Hong Kong 2026: The Institutional Turn 

      March 4, 2026

      New Crypto Mutuum Finance (MUTM) Reports V1 Protocol Progress as Roadmap Enters Phase 3

      March 4, 2026

      Bitcoin Short Sellers Caught Off Guard in New White House Move

      March 4, 2026
    • Technology

      Tree Search Distillation for Language Models Using PPO

      March 15, 2026

      How Verizon Handles Customers Who Misuse 5G Home Internet Service

      March 15, 2026

      I tested the tiny Russell Hobbs coffee maker that uses grounds or Nespresso pods — but I discovered one infuriating drawback

      March 15, 2026

      Trump administration is allegedly collecting $10 billion on the TikTok deal

      March 15, 2026

      Microsoft releases Windows 11 OOB hotpatch to fix RRAS RCE flaw

      March 15, 2026
    • Others
      • Gadgets
      • Gaming
      • Health
      • Software and Apps
    Check BMI
    Tech AI Verse
    You are at:Home»Technology»Model minimalism: The new AI strategy saving companies millions
    Technology

    Model minimalism: The new AI strategy saving companies millions

    TechAiVerseBy TechAiVerseJuly 1, 2025No Comments7 Mins Read6 Views
    Facebook Twitter Pinterest Telegram LinkedIn Tumblr Email Reddit
    Model minimalism: The new AI strategy saving companies millions
    Share
    Facebook Twitter LinkedIn Pinterest WhatsApp Email

    Model minimalism: The new AI strategy saving companies millions

    June 27, 2025 1:00 PM

    This article is part of VentureBeat’s special issue, “The Real Cost of AI: Performance, Efficiency and ROI at Scale.” Read more from this special issue.

    The advent of large language models (LLMs) has made it easier for enterprises to envision the kinds of projects they can undertake, leading to a surge in pilot programs now transitioning to deployment. 

    However, as these projects gained momentum, enterprises realized that the earlier LLMs they had used were unwieldy and, worse, expensive. 

    Enter small language models and distillation. Models like Google’s Gemma family, Microsoft’s Phi and Mistral’s Small 3.1 allowed businesses to choose fast, accurate models that work for specific tasks. Enterprises can opt for a smaller model for particular use cases, allowing them to lower the cost of running their AI applications and potentially achieve a better return on investment. 

    LinkedIn distinguished engineer Karthik Ramgopal told VentureBeat that companies opt for smaller models for a few reasons. 

    “Smaller models require less compute, memory and faster inference times, which translates directly into lower infrastructure OPEX (operational expenditures) and CAPEX (capital expenditures) given GPU costs, availability and power requirements,” Ramgoapl said. “Task-specific models have a narrower scope, making their behavior more aligned and maintainable over time without complex prompt engineering.”

    Model developers price their small models accordingly. OpenAI’s o4-mini costs $1.1 per million tokens for inputs and $4.4/million tokens for outputs, compared to the full o3 version at $10 for inputs and $40 for outputs. 

    Enterprises today have a larger pool of small models, task-specific models and distilled models to choose from. These days, most flagship models offer a range of sizes. For example, the Claude family of models from Anthropic comprises Claude Opus, the largest model, Claude Sonnet, the all-purpose model, and Claude Haiku, the smallest version. These models are compact enough to operate on portable devices, such as laptops or mobile phones. 

    The savings question

    When discussing return on investment, though, the question is always: What does ROI look like? Should it be a return on the costs incurred or the time savings that ultimately means dollars saved down the line? Experts VentureBeat spoke to said ROI can be difficult to judge because some companies believe they’ve already reached ROI by cutting time spent on a task while others are waiting for actual dollars saved or more business brought in to say if AI investments have actually worked.

    Normally, enterprises calculate ROI by a simple formula as described by Cognizant chief technologist Ravi Naarla in a post: ROI = (Benefits-Cost)/Costs. But with AI programs, the benefits are not immediately apparent. He suggests enterprises identify the benefits they expect to achieve, estimate these based on historical data, be realistic about the overall cost of AI, including hiring, implementation and maintenance, and understand you have to be in it for the long haul.

    With small models, experts argue that these reduce implementation and maintenance costs, especially when fine-tuning models to provide them with more context for your enterprise.

    Arijit Sengupta, founder and CEO of Aible, said that how people bring context to the models dictates how much cost savings they can get. For individuals who require additional context for prompts, such as lengthy and complex instructions, this can result in higher token costs. 

    “You have to give models context one way or the other; there is no free lunch. But with large models, that is usually done by putting it in the prompt,” he said. “Think of fine-tuning and post-training as an alternative way of giving models context. I might incur $100 of post-training costs, but it’s not astronomical.”

    Sengupta said they’ve seen about 100X cost reductions just from post-training alone, often dropping model use cost “from single-digit millions to something like $30,000.” He did point out that this number includes software operating expenses and the ongoing cost of the model and vector databases. 

    “In terms of maintenance cost, if you do it manually with human experts, it can be expensive to maintain because small models need to be post-trained to produce results comparable to large models,” he said.

    Experiments Aible conducted showed that a task-specific, fine-tuned model performs well for some use cases, just like LLMs, making the case that deploying several use-case-specific models rather than large ones to do everything is more cost-effective. 

    The company compared a post-trained version of Llama-3.3-70B-Instruct to a smaller 8B parameter option of the same model. The 70B model, post-trained for $11.30, was 84% accurate in automated evaluations and 92% in manual evaluations. Once fine-tuned to a cost of $4.58, the 8B model achieved 82% accuracy in manual assessment, which would be suitable for more minor, more targeted use cases. 

    Cost factors fit for purpose

    Right-sizing models does not have to come at the cost of performance. These days, organizations understand that model choice doesn’t just mean choosing between GPT-4o or Llama-3.1; it’s knowing that some use cases, like summarization or code generation, are better served by a small model.

    Daniel Hoske, chief technology officer at contact center AI products provider Cresta, said starting development with LLMs informs potential cost savings better. 

    “You should start with the biggest model to see if what you’re envisioning even works at all, because if it doesn’t work with the biggest model, it doesn’t mean it would with smaller models,” he said. 

    Ramgopal said LinkedIn follows a similar pattern because prototyping is the only way these issues can start to emerge.

    “Our typical approach for agentic use cases begins with general-purpose LLMs as their broad generalizationability allows us to rapidly prototype, validate hypotheses and assess product-market fit,” LinkedIn’s Ramgopal said. “As the product matures and we encounter constraints around quality, cost or latency, we transition to more customized solutions.”

    In the experimentation phase, organizations can determine what they value most from their AI applications. Figuring this out enables developers to plan better what they want to save on and select the model size that best suits their purpose and budget. 

    The experts cautioned that while it is important to build with models that work best with what they’re developing, high-parameter LLMs will always be more expensive. Large models will always require significant computing power. 

    However, overusing small and task-specific models also poses issues. Rahul Pathak, vice president of data and AI GTM at AWS, said in a blog post that cost optimization comes not just from using a model with low compute power needs, but rather from matching a model to tasks. Smaller models may not have a sufficiently large context window to understand more complex instructions, leading to increased workload for human employees and higher costs. 

    Sengupta also cautioned that some distilled models could be brittle, so long-term use may not result in savings. 

    Constantly evaluate

    Regardless of the model size, industry players emphasized the flexibility to address any potential issues or new use cases. So if they start with a large model and a smaller model with similar or better performance and lower cost, organizations cannot be precious about their chosen model. 

    Tessa Burg, CTO and head of innovation at brand marketing company Mod Op, told VentureBeat that organizations must understand that whatever they build now will always be superseded by a better version. 

    “We started with the mindset that the tech underneath the workflows that we’re creating, the processes that we’re making more efficient, are going to change. We knew that whatever model we use will be the worst version of a model.”

    Burg said that smaller models helped save her company and its clients time in researching and developing concepts. Time saved, she said, that does lead to budget savings over time. She added that it’s a good idea to break out high-cost, high-frequency use cases for light-weight models.

    Sengupta noted that vendors are now making it easier to switch between models automatically, but cautioned users to find platforms that also facilitate fine-tuning, so they don’t incur additional costs. 

    Share. Facebook Twitter Pinterest LinkedIn Reddit WhatsApp Telegram Email
    Previous ArticleBetter governance is required for AI agents
    Next Article The inference trap: How cloud providers are eating your AI margins
    TechAiVerse
    • Website

    Jonathan is a tech enthusiast and the mind behind Tech AI Verse. With a passion for artificial intelligence, consumer tech, and emerging innovations, he deliver clear, insightful content to keep readers informed. From cutting-edge gadgets to AI advancements and cryptocurrency trends, Jonathan breaks down complex topics to make technology accessible to all.

    Related Posts

    Tree Search Distillation for Language Models Using PPO

    March 15, 2026

    How Verizon Handles Customers Who Misuse 5G Home Internet Service

    March 15, 2026

    I tested the tiny Russell Hobbs coffee maker that uses grounds or Nespresso pods — but I discovered one infuriating drawback

    March 15, 2026
    Leave A Reply Cancel Reply

    Top Posts

    Ping, You’ve Got Whale: AI detection system alerts ships of whales in their path

    April 22, 2025718 Views

    Lumo vs. Duck AI: Which AI is Better for Your Privacy?

    July 31, 2025303 Views

    Wired Headphones Are Making A Comeback, And We Have Gen Z To Thank

    July 22, 2025213 Views

    6.7 Cummins Lifter Failure: What Years Are Affected (And Possible Fixes)

    April 14, 2025175 Views
    Don't Miss
    Gaming March 15, 2026

    Clair Obscur: Expedition 33 wins Game of the Year at Game Developers Choice Awards 2026

    Clair Obscur: Expedition 33 wins Game of the Year at Game Developers Choice Awards 2026…

    Until Dawn remake dev Ballistic Moon has closed

    Games with loot boxes will be rated PEGI 16 from June, as part of sweeping changes to the age-rating system

    Google unveils new AI cloud tools to support game development

    Stay In Touch
    • Facebook
    • Twitter
    • Pinterest
    • Instagram
    • YouTube
    • Vimeo

    Subscribe to Updates

    Get the latest creative news from SmartMag about art & design.

    About Us
    About Us

    Welcome to Tech AI Verse, your go-to destination for everything technology! We bring you the latest news, trends, and insights from the ever-evolving world of tech. Our coverage spans across global technology industry updates, artificial intelligence advancements, machine learning ethics, and automation innovations. Stay connected with us as we explore the limitless possibilities of technology!

    Facebook X (Twitter) Pinterest YouTube WhatsApp
    Our Picks

    Clair Obscur: Expedition 33 wins Game of the Year at Game Developers Choice Awards 2026

    March 15, 20264 Views

    Until Dawn remake dev Ballistic Moon has closed

    March 15, 20265 Views

    Games with loot boxes will be rated PEGI 16 from June, as part of sweeping changes to the age-rating system

    March 15, 20263 Views
    Most Popular

    Outbreak turns 30

    March 14, 20250 Views

    New SuperBlack ransomware exploits Fortinet auth bypass flaws

    March 14, 20250 Views

    CDs Offer Guaranteed Returns in an Uncertain Market. Today’s CD Rates, March 14, 2025

    March 14, 20250 Views
    © 2026 TechAiVerse. Designed by Divya Tech.
    • Home
    • About Us
    • Contact Us
    • Privacy Policy
    • Terms & Conditions

    Type above and press Enter to search. Press Esc to cancel.