Close Menu

    Subscribe to Updates

    Get the latest creative news from FooBar about art, design and business.

    What's Hot

    Show HN: Better Hub – A better GitHub experience

    8 billion tokens a day forced AT&T to rethink AI orchestration — and cut costs by 90%

    The best budget cameras for 2026

    Facebook X (Twitter) Instagram
    • Artificial Intelligence
    • Business Technology
    • Cryptocurrency
    • Gadgets
    • Gaming
    • Health
    • Software and Apps
    • Technology
    Facebook X (Twitter) Instagram Pinterest Vimeo
    Tech AI Verse
    • Home
    • Artificial Intelligence

      Tensions between the Pentagon and AI giant Anthropic reach a boiling point

      February 21, 2026

      Read the extended transcript: President Donald Trump interviewed by ‘NBC Nightly News’ anchor Tom Llamas

      February 6, 2026

      Stocks and bitcoin sink as investors dump software company shares

      February 4, 2026

      AI, crypto and Trump super PACs stash millions to spend on the midterms

      February 2, 2026

      To avoid accusations of AI cheating, college students are turning to AI

      January 29, 2026
    • Business

      How Smarsh built an AI front door for regulated industries — and drove 59% self-service adoption

      February 24, 2026

      Where MENA CIOs draw the line on AI sovereignty

      February 24, 2026

      Ex-President’s shift away from Xbox consoles to cloud gaming reportedly caused friction

      February 24, 2026

      Gartner: Why neoclouds are the future of GPU-as-a-Service

      February 21, 2026

      The HDD brand that brought you the 1.8-inch, 2.5-inch, and 3.5-inch hard drives is now back with a $19 pocket-sized personal cloud for your smartphones

      February 12, 2026
    • Crypto

      Crypto Market Rebound Wipes Out Nearly $500 Million in Short Positions

      February 26, 2026

      Ethereum Climbs Above $2000: Investors Step In With Fresh Accumulation

      February 26, 2026

      Mutuum Finance (MUTM) Prepares New Feature Expansion for V1 Protocol

      February 26, 2026

      Bitcoin Rebounds Toward $70,000, But Is It a Momentary Relief or Slow Bull Run Signal?

      February 26, 2026

      IMF: US Inflation Won’t Hit Fed Target Until 2027, Delaying Rate Cuts

      February 26, 2026
    • Technology

      8 billion tokens a day forced AT&T to rethink AI orchestration — and cut costs by 90%

      February 26, 2026

      The best budget cameras for 2026

      February 26, 2026

      NY AG: Valve’s loot boxes can get kids hooked on gambling

      February 26, 2026

      Instagram will alert parents if teens repeatedly search for suicide or self-harm content

      February 26, 2026

      Gaming accessory maker and publisher Nacon files for insolvency

      February 26, 2026
    • Others
      • Gadgets
      • Gaming
      • Health
      • Software and Apps
    Check BMI
    Tech AI Verse
    You are at:Home»Technology»The rise of prompt ops: Tackling hidden AI costs from bad inputs and context bloat
    Technology

    The rise of prompt ops: Tackling hidden AI costs from bad inputs and context bloat

    TechAiVerseBy TechAiVerseJune 30, 2025No Comments8 Mins Read1 Views
    Facebook Twitter Pinterest Telegram LinkedIn Tumblr Email Reddit
    The rise of prompt ops: Tackling hidden AI costs from bad inputs and context bloat
    Share
    Facebook Twitter LinkedIn Pinterest WhatsApp Email

    The rise of prompt ops: Tackling hidden AI costs from bad inputs and context bloat

    June 27, 2025 1:00 PM

    This article is part of VentureBeat’s special issue, “The Real Cost of AI: Performance, Efficiency and ROI at Scale.” Read more from this special issue.

    Model providers continue to roll out increasingly sophisticated large language models (LLMs) with longer context windows and enhanced reasoning capabilities. 

    This allows models to process and “think” more, but it also increases compute: The more a model takes in and puts out, the more energy it expends and the higher the costs. 

    Couple this with all the tinkering involved with prompting — it can take a few tries to get to the intended result, and sometimes the question at hand simply doesn’t need a model that can think like a PhD — and compute spend can get out of control. 

    This is giving rise to prompt ops, a whole new discipline in the dawning age of AI. 

    “Prompt engineering is kind of like writing, the actual creating, whereas prompt ops is like publishing, where you’re evolving the content,” Crawford Del Prete, IDC president, told VentureBeat. “The content is alive, the content is changing, and you want to make sure you’re refining that over time.”

    The challenge of compute use and cost

    Compute use and cost are two “related but separate concepts” in the context of LLMs, explained David Emerson, applied scientist at the Vector Institute. Generally, the price users pay scales based on both the number of input tokens (what the user prompts) and the number of output tokens (what the model delivers). However, they are not changed for behind-the-scenes actions like meta-prompts, steering instructions or retrieval-augmented generation (RAG). 

    While longer context allows models to process much more text at once, it directly translates to significantly more FLOPS (a measurement of compute power), he explained. Some aspects of transformer models even scale quadratically with input length if not well managed. Unnecessarily long responses can also slow down processing time and require additional compute and cost to build and maintain algorithms to post-process responses into the answer users were hoping for.

    Typically, longer context environments incentivize providers to deliberately deliver verbose responses, said Emerson. For example, many heavier reasoning models (o3 or o1 from OpenAI, for example) will often provide long responses to even simple questions, incurring heavy computing costs. 

    Here’s an example:

    Input: Answer the following math problem. If I have 2 apples and I buy 4 more at the store after eating 1, how many apples do I have?

    Output: If I eat 1, I only have 1 left. I would have 5 apples if I buy 4 more.

    The model not only generated more tokens than it needed to, it buried its answer. An engineer may then have to design a programmatic way to extract the final answer or ask follow-up questions like ‘What is your final answer?’ that incur even more API costs. 

    Alternatively, the prompt could be redesigned to guide the model to produce an immediate answer. For instance: 

    Input: Answer the following math problem. If I have 2 apples and I buy 4 more at the store after eating 1, how many apples do I have? Start your response with “The answer is”…

    Or: 

    Input: Answer the following math problem. If I have 2 apples and I buy 4 more at the store after eating 1, how many apples do I have? Wrap your final answer in bold tags .

    “The way the question is asked can reduce the effort or cost in getting to the desired answer,” said Emerson. He also pointed out that techniques like few-shot prompting (providing a few examples of what the user is looking for) can help produce quicker outputs. 

    One danger is not knowing when to use sophisticated techniques like chain-of-thought (CoT) prompting (generating answers in steps) or self-refinement, which directly encourage models to produce many tokens or go through several iterations when generating responses, Emerson pointed out. 

    Not every query requires a model to analyze and re-analyze before providing an answer, he emphasized; they could be perfectly capable of answering correctly when instructed to respond directly. Additionally, incorrect prompting API configurations (such as OpenAI o3, which requires a high reasoning effort) will incur higher costs when a lower-effort, cheaper request would suffice.

    “With longer contexts, users can also be tempted to use an ‘everything but the kitchen sink’ approach, where you dump as much text as possible into a model context in the hope that doing so will help the model perform a task more accurately,” said Emerson. “While more context can help models perform tasks, it isn’t always the best or most efficient approach.”

    Evolution to prompt ops

    It’s no big secret that AI-optimized infrastructure can be hard to come by these days; IDC’s Del Prete pointed out that enterprises must be able to minimize the amount of GPU idle time and fill more queries into idle cycles between GPU requests. 

    “How do I squeeze more out of these very, very precious commodities?,” he noted. “Because I’ve got to get my system utilization up, because I just don’t have the benefit of simply throwing more capacity at the problem.” 

    Prompt ops can go a long way towards addressing this challenge, as it ultimately manages the lifecycle of the prompt. While prompt engineering is about the quality of the prompt, prompt ops is where you repeat, Del Prete explained. 

    “It’s more orchestration,” he said. “I think of it as the curation of questions and the curation of how you interact with AI to make sure you’re getting the most out of it.” 

    Models can tend to get “fatigued,” cycling in loops where quality of outputs degrades, he said. Prompt ops help manage, measure, monitor and tune prompts. “I think when we look back three or four years from now, it’s going to be a whole discipline. It’ll be a skill.”

    While it’s still very much an emerging field, early providers include QueryPal, Promptable, Rebuff and TrueLens. As prompt ops evolve, these platforms will continue to iterate, improve and provide real-time feedback to give users more capacity to tune prompts over time, Dep Prete noted.

    Eventually, he predicted, agents will be able to tune, write and structure prompts on their own. “The level of automation will increase, the level of human interaction will decrease, you’ll be able to have agents operating more autonomously in the prompts that they’re creating.”

    Common prompting mistakes

    Until prompt ops is fully realized, there is ultimately no perfect prompt. Some of the biggest mistakes people make, according to Emerson: 

    • Not being specific enough about the problem to be solved. This includes how the user wants the model to provide its answer, what should be considered when responding, constraints to take into account and other factors. “In many settings, models need a good amount of context to provide a response that meets users expectations,” said Emerson. 
    • Not taking into account the ways a problem can be simplified to narrow the scope of the response. Should the answer be within a certain range (0 to 100)? Should the answer be phrased as a multiple choice problem rather than something open-ended? Can the user provide good examples to contextualize the query? Can the problem be broken into steps for separate and simpler queries?
    • Not taking advantage of structure. LLMs are very good at pattern recognition, and many can understand code. While using bullet points, itemized lists or bold indicators (****) may seem “a bit cluttered” to human eyes, Emerson noted, these callouts can be beneficial for an LLM. Asking for structured outputs (such as JSON or Markdown) can also help when users are looking to process responses automatically. 

    There are many other factors to consider in maintaining a production pipeline, based on engineering best practices, Emerson noted. These include: 

    • Making sure that the throughput of the pipeline remains consistent; 
    • Monitoring the performance of the prompts over time (potentially against a validation set);
    • Setting up tests and early warning detection to identify pipeline issues.

    Users can also take advantage of tools designed to support the prompting process. For instance, the open-source DSPy can automatically configure and optimize prompts for downstream tasks based on a few labeled examples. While this may be a fairly sophisticated example, there are many other offerings (including some built into tools like ChatGPT, Google and others) that can assist in prompt design. 

    And ultimately, Emerson said, “I think one of the simplest things users can do is to try to stay up-to-date on effective prompting approaches, model developments and new ways to configure and interact with models.” 

    Share. Facebook Twitter Pinterest LinkedIn Reddit WhatsApp Telegram Email
    Previous ArticleShe Got a Permit for Her Chickens. Now the City Is Fining Her $80k
    Next Article Scaling smarter: How enterprise IT teams can right-size their compute for AI
    TechAiVerse
    • Website

    Jonathan is a tech enthusiast and the mind behind Tech AI Verse. With a passion for artificial intelligence, consumer tech, and emerging innovations, he deliver clear, insightful content to keep readers informed. From cutting-edge gadgets to AI advancements and cryptocurrency trends, Jonathan breaks down complex topics to make technology accessible to all.

    Related Posts

    8 billion tokens a day forced AT&T to rethink AI orchestration — and cut costs by 90%

    February 26, 2026

    The best budget cameras for 2026

    February 26, 2026

    NY AG: Valve’s loot boxes can get kids hooked on gambling

    February 26, 2026
    Leave A Reply Cancel Reply

    Top Posts

    Ping, You’ve Got Whale: AI detection system alerts ships of whales in their path

    April 22, 2025693 Views

    Lumo vs. Duck AI: Which AI is Better for Your Privacy?

    July 31, 2025279 Views

    6.7 Cummins Lifter Failure: What Years Are Affected (And Possible Fixes)

    April 14, 2025160 Views

    6 Best MagSafe Phone Grips (2025), Tested and Reviewed

    April 6, 2025122 Views
    Don't Miss
    Uncategorized February 26, 2026

    Show HN: Better Hub – A better GitHub experience

    Show HN: Better Hub – A better GitHub experiencePermissionsClick to toggle optional permissions. Hover the…

    8 billion tokens a day forced AT&T to rethink AI orchestration — and cut costs by 90%

    The best budget cameras for 2026

    NY AG: Valve’s loot boxes can get kids hooked on gambling

    Stay In Touch
    • Facebook
    • Twitter
    • Pinterest
    • Instagram
    • YouTube
    • Vimeo

    Subscribe to Updates

    Get the latest creative news from SmartMag about art & design.

    About Us
    About Us

    Welcome to Tech AI Verse, your go-to destination for everything technology! We bring you the latest news, trends, and insights from the ever-evolving world of tech. Our coverage spans across global technology industry updates, artificial intelligence advancements, machine learning ethics, and automation innovations. Stay connected with us as we explore the limitless possibilities of technology!

    Facebook X (Twitter) Pinterest YouTube WhatsApp
    Our Picks

    Show HN: Better Hub – A better GitHub experience

    February 26, 20260 Views

    8 billion tokens a day forced AT&T to rethink AI orchestration — and cut costs by 90%

    February 26, 20260 Views

    The best budget cameras for 2026

    February 26, 20262 Views
    Most Popular

    7 Best Kids Bikes (2025): Mountain, Balance, Pedal, Coaster

    March 13, 20250 Views

    VTOMAN FlashSpeed 1500: Plenty Of Power For All Your Gear

    March 13, 20250 Views

    This new Roomba finally solves the big problem I have with robot vacuums

    March 13, 20250 Views
    © 2026 TechAiVerse. Designed by Divya Tech.
    • Home
    • About Us
    • Contact Us
    • Privacy Policy
    • Terms & Conditions

    Type above and press Enter to search. Press Esc to cancel.