Close Menu

    Subscribe to Updates

    Get the latest creative news from FooBar about art, design and business.

    What's Hot

    Solo.io wins ‘most likely to succeed’ award at VB Transform 2025 innovation showcase

    The great AI agent acceleration: Why enterprise adoption is happening faster than anyone predicted

    $8.8 trillion protected: How one CISO went from ‘that’s BS’ to bulletproof in 90 days

    Facebook X (Twitter) Instagram
    • Artificial Intelligence
    • Business Technology
    • Cryptocurrency
    • Gadgets
    • Gaming
    • Health
    • Software and Apps
    • Technology
    Facebook X (Twitter) Instagram Pinterest Vimeo
    Tech AI Verse
    • Home
    • Artificial Intelligence

      Apple sued by shareholders for allegedly overstating AI progress

      June 22, 2025

      How far will AI go to defend its own survival?

      June 2, 2025

      The internet thinks this video from Gaza is AI. Here’s how we proved it isn’t.

      May 30, 2025

      Nvidia CEO hails Trump’s plan to rescind some export curbs on AI chips to China

      May 22, 2025

      AI poses a bigger threat to women’s work, than men’s, report says

      May 21, 2025
    • Business

      Cloudflare open-sources Orange Meets with End-to-End encryption

      June 29, 2025

      Google links massive cloud outage to API management issue

      June 13, 2025

      The EU challenges Google and Cloudflare with its very own DNS resolver that can filter dangerous traffic

      June 11, 2025

      These two Ivanti bugs are allowing hackers to target cloud instances

      May 21, 2025

      How cloud and AI transform and improve customer experiences

      May 10, 2025
    • Crypto

      MoonPay Executives Might Have Fallen for $250,000 Trump-Themed Crypto Scam

      July 11, 2025

      Top 3 Altcoins Trending in Nigeria This Week

      July 11, 2025

      Tether is Removing USDT From These 5 Legacy Blockchains

      July 11, 2025

      HBAR Faces Final Hurdle After Explosive Rally; Are Bulls Tiring Out?

      July 11, 2025

      OKX Europe CEO Discusses Bitcoin’s Breakout Rally | US Crypto News

      July 11, 2025
    • Technology

      Solo.io wins ‘most likely to succeed’ award at VB Transform 2025 innovation showcase

      July 11, 2025

      The great AI agent acceleration: Why enterprise adoption is happening faster than anyone predicted

      July 11, 2025

      $8.8 trillion protected: How one CISO went from ‘that’s BS’ to bulletproof in 90 days

      July 11, 2025

      AWS doubles down on infrastructure as strategy in the AI race with SageMaker upgrades

      July 11, 2025

      The best Amazon Prime Day deals for the last day: Our top picks on headphones, TVs, robot vacuums and more

      July 11, 2025
    • Others
      • Gadgets
      • Gaming
      • Health
      • Software and Apps
    Shop Now
    Tech AI Verse
    You are at:Home»Technology»Model minimalism: The new AI strategy saving companies millions
    Technology

    Model minimalism: The new AI strategy saving companies millions

    TechAiVerseBy TechAiVerseJuly 1, 2025No Comments7 Mins Read2 Views
    Facebook Twitter Pinterest Telegram LinkedIn Tumblr Email Reddit
    Model minimalism: The new AI strategy saving companies millions
    Share
    Facebook Twitter LinkedIn Pinterest WhatsApp Email

    Model minimalism: The new AI strategy saving companies millions

    June 27, 2025 1:00 PM

    This article is part of VentureBeat’s special issue, “The Real Cost of AI: Performance, Efficiency and ROI at Scale.” Read more from this special issue.

    The advent of large language models (LLMs) has made it easier for enterprises to envision the kinds of projects they can undertake, leading to a surge in pilot programs now transitioning to deployment. 

    However, as these projects gained momentum, enterprises realized that the earlier LLMs they had used were unwieldy and, worse, expensive. 

    Enter small language models and distillation. Models like Google’s Gemma family, Microsoft’s Phi and Mistral’s Small 3.1 allowed businesses to choose fast, accurate models that work for specific tasks. Enterprises can opt for a smaller model for particular use cases, allowing them to lower the cost of running their AI applications and potentially achieve a better return on investment. 

    LinkedIn distinguished engineer Karthik Ramgopal told VentureBeat that companies opt for smaller models for a few reasons. 

    “Smaller models require less compute, memory and faster inference times, which translates directly into lower infrastructure OPEX (operational expenditures) and CAPEX (capital expenditures) given GPU costs, availability and power requirements,” Ramgoapl said. “Task-specific models have a narrower scope, making their behavior more aligned and maintainable over time without complex prompt engineering.”

    Model developers price their small models accordingly. OpenAI’s o4-mini costs $1.1 per million tokens for inputs and $4.4/million tokens for outputs, compared to the full o3 version at $10 for inputs and $40 for outputs. 

    Enterprises today have a larger pool of small models, task-specific models and distilled models to choose from. These days, most flagship models offer a range of sizes. For example, the Claude family of models from Anthropic comprises Claude Opus, the largest model, Claude Sonnet, the all-purpose model, and Claude Haiku, the smallest version. These models are compact enough to operate on portable devices, such as laptops or mobile phones. 

    The savings question

    When discussing return on investment, though, the question is always: What does ROI look like? Should it be a return on the costs incurred or the time savings that ultimately means dollars saved down the line? Experts VentureBeat spoke to said ROI can be difficult to judge because some companies believe they’ve already reached ROI by cutting time spent on a task while others are waiting for actual dollars saved or more business brought in to say if AI investments have actually worked.

    Normally, enterprises calculate ROI by a simple formula as described by Cognizant chief technologist Ravi Naarla in a post: ROI = (Benefits-Cost)/Costs. But with AI programs, the benefits are not immediately apparent. He suggests enterprises identify the benefits they expect to achieve, estimate these based on historical data, be realistic about the overall cost of AI, including hiring, implementation and maintenance, and understand you have to be in it for the long haul.

    With small models, experts argue that these reduce implementation and maintenance costs, especially when fine-tuning models to provide them with more context for your enterprise.

    Arijit Sengupta, founder and CEO of Aible, said that how people bring context to the models dictates how much cost savings they can get. For individuals who require additional context for prompts, such as lengthy and complex instructions, this can result in higher token costs. 

    “You have to give models context one way or the other; there is no free lunch. But with large models, that is usually done by putting it in the prompt,” he said. “Think of fine-tuning and post-training as an alternative way of giving models context. I might incur $100 of post-training costs, but it’s not astronomical.”

    Sengupta said they’ve seen about 100X cost reductions just from post-training alone, often dropping model use cost “from single-digit millions to something like $30,000.” He did point out that this number includes software operating expenses and the ongoing cost of the model and vector databases. 

    “In terms of maintenance cost, if you do it manually with human experts, it can be expensive to maintain because small models need to be post-trained to produce results comparable to large models,” he said.

    Experiments Aible conducted showed that a task-specific, fine-tuned model performs well for some use cases, just like LLMs, making the case that deploying several use-case-specific models rather than large ones to do everything is more cost-effective. 

    The company compared a post-trained version of Llama-3.3-70B-Instruct to a smaller 8B parameter option of the same model. The 70B model, post-trained for $11.30, was 84% accurate in automated evaluations and 92% in manual evaluations. Once fine-tuned to a cost of $4.58, the 8B model achieved 82% accuracy in manual assessment, which would be suitable for more minor, more targeted use cases. 

    Cost factors fit for purpose

    Right-sizing models does not have to come at the cost of performance. These days, organizations understand that model choice doesn’t just mean choosing between GPT-4o or Llama-3.1; it’s knowing that some use cases, like summarization or code generation, are better served by a small model.

    Daniel Hoske, chief technology officer at contact center AI products provider Cresta, said starting development with LLMs informs potential cost savings better. 

    “You should start with the biggest model to see if what you’re envisioning even works at all, because if it doesn’t work with the biggest model, it doesn’t mean it would with smaller models,” he said. 

    Ramgopal said LinkedIn follows a similar pattern because prototyping is the only way these issues can start to emerge.

    “Our typical approach for agentic use cases begins with general-purpose LLMs as their broad generalizationability allows us to rapidly prototype, validate hypotheses and assess product-market fit,” LinkedIn’s Ramgopal said. “As the product matures and we encounter constraints around quality, cost or latency, we transition to more customized solutions.”

    In the experimentation phase, organizations can determine what they value most from their AI applications. Figuring this out enables developers to plan better what they want to save on and select the model size that best suits their purpose and budget. 

    The experts cautioned that while it is important to build with models that work best with what they’re developing, high-parameter LLMs will always be more expensive. Large models will always require significant computing power. 

    However, overusing small and task-specific models also poses issues. Rahul Pathak, vice president of data and AI GTM at AWS, said in a blog post that cost optimization comes not just from using a model with low compute power needs, but rather from matching a model to tasks. Smaller models may not have a sufficiently large context window to understand more complex instructions, leading to increased workload for human employees and higher costs. 

    Sengupta also cautioned that some distilled models could be brittle, so long-term use may not result in savings. 

    Constantly evaluate

    Regardless of the model size, industry players emphasized the flexibility to address any potential issues or new use cases. So if they start with a large model and a smaller model with similar or better performance and lower cost, organizations cannot be precious about their chosen model. 

    Tessa Burg, CTO and head of innovation at brand marketing company Mod Op, told VentureBeat that organizations must understand that whatever they build now will always be superseded by a better version. 

    “We started with the mindset that the tech underneath the workflows that we’re creating, the processes that we’re making more efficient, are going to change. We knew that whatever model we use will be the worst version of a model.”

    Burg said that smaller models helped save her company and its clients time in researching and developing concepts. Time saved, she said, that does lead to budget savings over time. She added that it’s a good idea to break out high-cost, high-frequency use cases for light-weight models.

    Sengupta noted that vendors are now making it easier to switch between models automatically, but cautioned users to find platforms that also facilitate fine-tuning, so they don’t incur additional costs. 

    Share. Facebook Twitter Pinterest LinkedIn Reddit WhatsApp Telegram Email
    Previous ArticleBetter governance is required for AI agents
    Next Article The inference trap: How cloud providers are eating your AI margins
    TechAiVerse
    • Website

    Jonathan is a tech enthusiast and the mind behind Tech AI Verse. With a passion for artificial intelligence, consumer tech, and emerging innovations, he deliver clear, insightful content to keep readers informed. From cutting-edge gadgets to AI advancements and cryptocurrency trends, Jonathan breaks down complex topics to make technology accessible to all.

    Related Posts

    Solo.io wins ‘most likely to succeed’ award at VB Transform 2025 innovation showcase

    July 11, 2025

    The great AI agent acceleration: Why enterprise adoption is happening faster than anyone predicted

    July 11, 2025

    $8.8 trillion protected: How one CISO went from ‘that’s BS’ to bulletproof in 90 days

    July 11, 2025
    Leave A Reply Cancel Reply

    Top Posts

    New Akira ransomware decryptor cracks encryptions keys using GPUs

    March 16, 202528 Views

    OpenAI details ChatGPT-o3, o4-mini, o4-mini-high usage limits

    April 19, 202522 Views

    6.7 Cummins Lifter Failure: What Years Are Affected (And Possible Fixes)

    April 14, 202519 Views

    Rsync replaced with openrsync on macOS Sequoia

    April 7, 202519 Views
    Don't Miss
    Technology July 11, 2025

    Solo.io wins ‘most likely to succeed’ award at VB Transform 2025 innovation showcase

    Solo.io wins ‘most likely to succeed’ award at VB Transform 2025 innovation showcase July 11,…

    The great AI agent acceleration: Why enterprise adoption is happening faster than anyone predicted

    $8.8 trillion protected: How one CISO went from ‘that’s BS’ to bulletproof in 90 days

    AWS doubles down on infrastructure as strategy in the AI race with SageMaker upgrades

    Stay In Touch
    • Facebook
    • Twitter
    • Pinterest
    • Instagram
    • YouTube
    • Vimeo

    Subscribe to Updates

    Get the latest creative news from SmartMag about art & design.

    About Us
    About Us

    Welcome to Tech AI Verse, your go-to destination for everything technology! We bring you the latest news, trends, and insights from the ever-evolving world of tech. Our coverage spans across global technology industry updates, artificial intelligence advancements, machine learning ethics, and automation innovations. Stay connected with us as we explore the limitless possibilities of technology!

    Facebook X (Twitter) Pinterest YouTube WhatsApp
    Our Picks

    Solo.io wins ‘most likely to succeed’ award at VB Transform 2025 innovation showcase

    July 11, 20251 Views

    The great AI agent acceleration: Why enterprise adoption is happening faster than anyone predicted

    July 11, 20252 Views

    $8.8 trillion protected: How one CISO went from ‘that’s BS’ to bulletproof in 90 days

    July 11, 20252 Views
    Most Popular

    Ethereum must hold $2,000 support or risk dropping to $1,850 – Here’s why

    March 12, 20250 Views

    Xiaomi 15 Ultra Officially Launched in China, Malaysia launch to follow after global event

    March 12, 20250 Views

    Apple thinks people won’t use MagSafe on iPhone 16e

    March 12, 20250 Views
    © 2025 TechAiVerse. Designed by Divya Tech.
    • Home
    • About Us
    • Contact Us
    • Privacy Policy
    • Terms & Conditions

    Type above and press Enter to search. Press Esc to cancel.