Close Menu

    Subscribe to Updates

    Get the latest creative news from FooBar about art, design and business.

    What's Hot

    Ready or Not tops 2m console sales in 2 weeks despite controversial changes | News-in-Brief

    Six of PlayStation’s ten US biggest-selling games in Q2 2025 were published by Microsoft

    UK game tech company JECO secures $1.3 million in pre-seed investment

    Facebook X (Twitter) Instagram
    • Artificial Intelligence
    • Business Technology
    • Cryptocurrency
    • Gadgets
    • Gaming
    • Health
    • Software and Apps
    • Technology
    Facebook X (Twitter) Instagram Pinterest Vimeo
    Tech AI Verse
    • Home
    • Artificial Intelligence

      AI models may be accidentally (and secretly) learning each other’s bad behaviors

      July 30, 2025

      Another Chinese AI model is turning heads

      July 15, 2025

      AI chatbot Grok issues apology for antisemitic posts

      July 13, 2025

      Apple sued by shareholders for allegedly overstating AI progress

      June 22, 2025

      How far will AI go to defend its own survival?

      June 2, 2025
    • Business

      Cloudflare open-sources Orange Meets with End-to-End encryption

      June 29, 2025

      Google links massive cloud outage to API management issue

      June 13, 2025

      The EU challenges Google and Cloudflare with its very own DNS resolver that can filter dangerous traffic

      June 11, 2025

      These two Ivanti bugs are allowing hackers to target cloud instances

      May 21, 2025

      How cloud and AI transform and improve customer experiences

      May 10, 2025
    • Crypto

      Shiba Inu Price’s 16% Drop Wipes Half Of July Gains; Is August In Trouble?

      July 30, 2025

      White House Crypto Report Suggests Major Changes to US Crypto Tax

      July 30, 2025

      XRP Whale Outflows Reflect Price Concern | Weekly Whale Watch

      July 30, 2025

      Stellar (XLM) Bull Flag Breakout Shows Cracks as Momentum Fades

      July 30, 2025

      Binance Listing Could Be a ‘Kiss of Death’ for Pi Network and New Tokens

      July 30, 2025
    • Technology

      Apple iOS 26: Is your iPhone compatible? Here’s a list which devices can download it today

      July 31, 2025

      iOS 26 beta release: Here’s everything you need to know about new Apple features and how to get it on your iPhone

      July 31, 2025

      Is Mark Zuckerberg flip flopping on open source AI?

      July 31, 2025

      Spotify now requires face scans to access age-restricted content in the UK

      July 31, 2025

      Showrunner, an AI-powered streaming service, launches in alpha this week

      July 31, 2025
    • Others
      • Gadgets
      • Gaming
      • Health
      • Software and Apps
    Check BMI
    Tech AI Verse
    You are at:Home»Technology»The rise of prompt ops: Tackling hidden AI costs from bad inputs and context bloat
    Technology

    The rise of prompt ops: Tackling hidden AI costs from bad inputs and context bloat

    TechAiVerseBy TechAiVerseJune 30, 2025No Comments8 Mins Read1 Views
    Facebook Twitter Pinterest Telegram LinkedIn Tumblr Email Reddit
    The rise of prompt ops: Tackling hidden AI costs from bad inputs and context bloat
    Share
    Facebook Twitter LinkedIn Pinterest WhatsApp Email

    BMI Calculator – Check your Body Mass Index for free!

    The rise of prompt ops: Tackling hidden AI costs from bad inputs and context bloat

    June 27, 2025 1:00 PM

    This article is part of VentureBeat’s special issue, “The Real Cost of AI: Performance, Efficiency and ROI at Scale.” Read more from this special issue.

    Model providers continue to roll out increasingly sophisticated large language models (LLMs) with longer context windows and enhanced reasoning capabilities. 

    This allows models to process and “think” more, but it also increases compute: The more a model takes in and puts out, the more energy it expends and the higher the costs. 

    Couple this with all the tinkering involved with prompting — it can take a few tries to get to the intended result, and sometimes the question at hand simply doesn’t need a model that can think like a PhD — and compute spend can get out of control. 

    This is giving rise to prompt ops, a whole new discipline in the dawning age of AI. 

    “Prompt engineering is kind of like writing, the actual creating, whereas prompt ops is like publishing, where you’re evolving the content,” Crawford Del Prete, IDC president, told VentureBeat. “The content is alive, the content is changing, and you want to make sure you’re refining that over time.”

    The challenge of compute use and cost

    Compute use and cost are two “related but separate concepts” in the context of LLMs, explained David Emerson, applied scientist at the Vector Institute. Generally, the price users pay scales based on both the number of input tokens (what the user prompts) and the number of output tokens (what the model delivers). However, they are not changed for behind-the-scenes actions like meta-prompts, steering instructions or retrieval-augmented generation (RAG). 

    While longer context allows models to process much more text at once, it directly translates to significantly more FLOPS (a measurement of compute power), he explained. Some aspects of transformer models even scale quadratically with input length if not well managed. Unnecessarily long responses can also slow down processing time and require additional compute and cost to build and maintain algorithms to post-process responses into the answer users were hoping for.

    Typically, longer context environments incentivize providers to deliberately deliver verbose responses, said Emerson. For example, many heavier reasoning models (o3 or o1 from OpenAI, for example) will often provide long responses to even simple questions, incurring heavy computing costs. 

    Here’s an example:

    Input: Answer the following math problem. If I have 2 apples and I buy 4 more at the store after eating 1, how many apples do I have?

    Output: If I eat 1, I only have 1 left. I would have 5 apples if I buy 4 more.

    The model not only generated more tokens than it needed to, it buried its answer. An engineer may then have to design a programmatic way to extract the final answer or ask follow-up questions like ‘What is your final answer?’ that incur even more API costs. 

    Alternatively, the prompt could be redesigned to guide the model to produce an immediate answer. For instance: 

    Input: Answer the following math problem. If I have 2 apples and I buy 4 more at the store after eating 1, how many apples do I have? Start your response with “The answer is”…

    Or: 

    Input: Answer the following math problem. If I have 2 apples and I buy 4 more at the store after eating 1, how many apples do I have? Wrap your final answer in bold tags .

    “The way the question is asked can reduce the effort or cost in getting to the desired answer,” said Emerson. He also pointed out that techniques like few-shot prompting (providing a few examples of what the user is looking for) can help produce quicker outputs. 

    One danger is not knowing when to use sophisticated techniques like chain-of-thought (CoT) prompting (generating answers in steps) or self-refinement, which directly encourage models to produce many tokens or go through several iterations when generating responses, Emerson pointed out. 

    Not every query requires a model to analyze and re-analyze before providing an answer, he emphasized; they could be perfectly capable of answering correctly when instructed to respond directly. Additionally, incorrect prompting API configurations (such as OpenAI o3, which requires a high reasoning effort) will incur higher costs when a lower-effort, cheaper request would suffice.

    “With longer contexts, users can also be tempted to use an ‘everything but the kitchen sink’ approach, where you dump as much text as possible into a model context in the hope that doing so will help the model perform a task more accurately,” said Emerson. “While more context can help models perform tasks, it isn’t always the best or most efficient approach.”

    Evolution to prompt ops

    It’s no big secret that AI-optimized infrastructure can be hard to come by these days; IDC’s Del Prete pointed out that enterprises must be able to minimize the amount of GPU idle time and fill more queries into idle cycles between GPU requests. 

    “How do I squeeze more out of these very, very precious commodities?,” he noted. “Because I’ve got to get my system utilization up, because I just don’t have the benefit of simply throwing more capacity at the problem.” 

    Prompt ops can go a long way towards addressing this challenge, as it ultimately manages the lifecycle of the prompt. While prompt engineering is about the quality of the prompt, prompt ops is where you repeat, Del Prete explained. 

    “It’s more orchestration,” he said. “I think of it as the curation of questions and the curation of how you interact with AI to make sure you’re getting the most out of it.” 

    Models can tend to get “fatigued,” cycling in loops where quality of outputs degrades, he said. Prompt ops help manage, measure, monitor and tune prompts. “I think when we look back three or four years from now, it’s going to be a whole discipline. It’ll be a skill.”

    While it’s still very much an emerging field, early providers include QueryPal, Promptable, Rebuff and TrueLens. As prompt ops evolve, these platforms will continue to iterate, improve and provide real-time feedback to give users more capacity to tune prompts over time, Dep Prete noted.

    Eventually, he predicted, agents will be able to tune, write and structure prompts on their own. “The level of automation will increase, the level of human interaction will decrease, you’ll be able to have agents operating more autonomously in the prompts that they’re creating.”

    Common prompting mistakes

    Until prompt ops is fully realized, there is ultimately no perfect prompt. Some of the biggest mistakes people make, according to Emerson: 

    • Not being specific enough about the problem to be solved. This includes how the user wants the model to provide its answer, what should be considered when responding, constraints to take into account and other factors. “In many settings, models need a good amount of context to provide a response that meets users expectations,” said Emerson. 
    • Not taking into account the ways a problem can be simplified to narrow the scope of the response. Should the answer be within a certain range (0 to 100)? Should the answer be phrased as a multiple choice problem rather than something open-ended? Can the user provide good examples to contextualize the query? Can the problem be broken into steps for separate and simpler queries?
    • Not taking advantage of structure. LLMs are very good at pattern recognition, and many can understand code. While using bullet points, itemized lists or bold indicators (****) may seem “a bit cluttered” to human eyes, Emerson noted, these callouts can be beneficial for an LLM. Asking for structured outputs (such as JSON or Markdown) can also help when users are looking to process responses automatically. 

    There are many other factors to consider in maintaining a production pipeline, based on engineering best practices, Emerson noted. These include: 

    • Making sure that the throughput of the pipeline remains consistent; 
    • Monitoring the performance of the prompts over time (potentially against a validation set);
    • Setting up tests and early warning detection to identify pipeline issues.

    Users can also take advantage of tools designed to support the prompting process. For instance, the open-source DSPy can automatically configure and optimize prompts for downstream tasks based on a few labeled examples. While this may be a fairly sophisticated example, there are many other offerings (including some built into tools like ChatGPT, Google and others) that can assist in prompt design. 

    And ultimately, Emerson said, “I think one of the simplest things users can do is to try to stay up-to-date on effective prompting approaches, model developments and new ways to configure and interact with models.” 

    BMI Calculator – Check your Body Mass Index for free!

    Share. Facebook Twitter Pinterest LinkedIn Reddit WhatsApp Telegram Email
    Previous ArticleShe Got a Permit for Her Chickens. Now the City Is Fining Her $80k
    Next Article Scaling smarter: How enterprise IT teams can right-size their compute for AI
    TechAiVerse
    • Website

    Jonathan is a tech enthusiast and the mind behind Tech AI Verse. With a passion for artificial intelligence, consumer tech, and emerging innovations, he deliver clear, insightful content to keep readers informed. From cutting-edge gadgets to AI advancements and cryptocurrency trends, Jonathan breaks down complex topics to make technology accessible to all.

    Related Posts

    Apple iOS 26: Is your iPhone compatible? Here’s a list which devices can download it today

    July 31, 2025

    iOS 26 beta release: Here’s everything you need to know about new Apple features and how to get it on your iPhone

    July 31, 2025

    Is Mark Zuckerberg flip flopping on open source AI?

    July 31, 2025
    Leave A Reply Cancel Reply

    Top Posts

    6.7 Cummins Lifter Failure: What Years Are Affected (And Possible Fixes)

    April 14, 202532 Views

    Ping, You’ve Got Whale: AI detection system alerts ships of whales in their path

    April 22, 202531 Views

    New Akira ransomware decryptor cracks encryptions keys using GPUs

    March 16, 202529 Views

    OpenAI details ChatGPT-o3, o4-mini, o4-mini-high usage limits

    April 19, 202522 Views
    Don't Miss
    Gaming July 31, 2025

    Ready or Not tops 2m console sales in 2 weeks despite controversial changes | News-in-Brief

    Ready or Not tops 2m console sales in 2 weeks despite controversial changes | News-in-Brief…

    Six of PlayStation’s ten US biggest-selling games in Q2 2025 were published by Microsoft

    UK game tech company JECO secures $1.3 million in pre-seed investment

    Microsoft implements Xbox age verification to comply with UK Online Safety Act

    Stay In Touch
    • Facebook
    • Twitter
    • Pinterest
    • Instagram
    • YouTube
    • Vimeo

    Subscribe to Updates

    Get the latest creative news from SmartMag about art & design.

    About Us
    About Us

    Welcome to Tech AI Verse, your go-to destination for everything technology! We bring you the latest news, trends, and insights from the ever-evolving world of tech. Our coverage spans across global technology industry updates, artificial intelligence advancements, machine learning ethics, and automation innovations. Stay connected with us as we explore the limitless possibilities of technology!

    Facebook X (Twitter) Pinterest YouTube WhatsApp
    Our Picks

    Ready or Not tops 2m console sales in 2 weeks despite controversial changes | News-in-Brief

    July 31, 20250 Views

    Six of PlayStation’s ten US biggest-selling games in Q2 2025 were published by Microsoft

    July 31, 20250 Views

    UK game tech company JECO secures $1.3 million in pre-seed investment

    July 31, 20250 Views
    Most Popular

    Xiaomi 15 Ultra Officially Launched in China, Malaysia launch to follow after global event

    March 12, 20250 Views

    Apple thinks people won’t use MagSafe on iPhone 16e

    March 12, 20250 Views

    French Apex Legends voice cast refuses contracts over “unacceptable” AI clause

    March 12, 20250 Views
    © 2025 TechAiVerse. Designed by Divya Tech.
    • Home
    • About Us
    • Contact Us
    • Privacy Policy
    • Terms & Conditions

    Type above and press Enter to search. Press Esc to cancel.