Close Menu

    Subscribe to Updates

    Get the latest creative news from FooBar about art, design and business.

    What's Hot

    This HP mini PC delivers big power for $350

    Upgrade to Windows 11 Pro for $13 and feel the difference immediately

    This slim 1440p portable laptop monitor is 30% off

    Facebook X (Twitter) Instagram
    • Artificial Intelligence
    • Business Technology
    • Cryptocurrency
    • Gadgets
    • Gaming
    • Health
    • Software and Apps
    • Technology
    Facebook X (Twitter) Instagram Pinterest Vimeo
    Tech AI Verse
    • Home
    • Artificial Intelligence

      Read the extended transcript: President Donald Trump interviewed by ‘NBC Nightly News’ anchor Tom Llamas

      February 6, 2026

      Stocks and bitcoin sink as investors dump software company shares

      February 4, 2026

      AI, crypto and Trump super PACs stash millions to spend on the midterms

      February 2, 2026

      To avoid accusations of AI cheating, college students are turning to AI

      January 29, 2026

      ChatGPT can embrace authoritarian ideas after just one prompt, researchers say

      January 24, 2026
    • Business

      New VoidLink malware framework targets Linux cloud servers

      January 14, 2026

      Nvidia Rubin’s rack-scale encryption signals a turning point for enterprise AI security

      January 13, 2026

      How KPMG is redefining the future of SAP consulting on a global scale

      January 10, 2026

      Top 10 cloud computing stories of 2025

      December 22, 2025

      Saudia Arabia’s STC commits to five-year network upgrade programme with Ericsson

      December 18, 2025
    • Crypto

      Arthur Hayes Attributes Bitcoin Crash to ETF-Linked Dealer Hedging

      February 8, 2026

      Monero XMR Attempts First Recovery in a Month, But Death Cross Risk Looms

      February 8, 2026

      HBAR Price Eyes a Potential 30% Rally – Here’s What the Charts are Signalling 

      February 8, 2026

      Bitcoin Mining Difficulty Hits Its Biggest Drop Since 2021 China Ban

      February 8, 2026

      How Severe Is This Bitcoin Bear Market and Where Is Price Headed Next?

      February 8, 2026
    • Technology

      This HP mini PC delivers big power for $350

      February 9, 2026

      Upgrade to Windows 11 Pro for $13 and feel the difference immediately

      February 9, 2026

      This slim 1440p portable laptop monitor is 30% off

      February 9, 2026

      If you buy Razer’s insane $1337 mouse, I will be very disappointed in you

      February 9, 2026

      Nvidia is reportedly skipping consumer GPUs in 2026. Thanks, AI

      February 9, 2026
    • Others
      • Gadgets
      • Gaming
      • Health
      • Software and Apps
    Check BMI
    Tech AI Verse
    You are at:Home»Technology»Evaluating Agents
    Technology

    Evaluating Agents

    TechAiVerseBy TechAiVerseSeptember 4, 2025No Comments3 Mins Read8 Views
    Facebook Twitter Pinterest Telegram LinkedIn Tumblr Email Reddit
    Share
    Facebook Twitter LinkedIn Pinterest WhatsApp Email

    Evaluating Agents

    “Models constantly change and improve but evals persist”

    Look at the data

    No amount of evals will replace the need to look at the data, once you have a evals good coverage you’ll be able to decrease the time but it’ll be always a must to just look at the agent traces to identify possible issues or things to improve.

    Starting, end to end evals

    You must create evals for your agents, stop relying solely on manual testing.

    Not sure where to start?

    Add e2e evals, define a success criteria (did the agent meet the user’s goal?) and make the evals output a simple yes/no value.

    This is much better than no evals.

    By performing simple end to end agent evaluations you can quickly manage to:
    – identify problematic edge cases
    – update, trim and refine the agent prompts
    – make sure you are not breaking the already working cases
    – compare the performance of the current llm model vs. cheaper ones

    N -1 evals

    Once you created the e2e evals you can move on with “N – 1” evals, that is, evals that need to “simulate” previous interactions between system and user.

    Suppose that either by looking at the data or by running a set of e2e evals you find that there is a problem when the user asks for the brand open stores in his area. Well, it’d be better to create an eval to directly improve this, but if you keep doing it with the e2e evals you won’t be able to always reproduce the error and your evals will take too much time and will cost too much money.

    It’d be much better to “simulate” the previous interactions and then get to the point.

    There’s one issue with this, you’ll have to be careful to keep the “N – 1” interactions updated whenever you make some changes because you will be “simulating” something that will never happen again in your agent.

    Checkpoints

    It’s really difficult and time intensive to evaluate agents outputs when you are trying to validate complex conversation patterns that you want the LLM’s to strictly follow.

    I usually put “checkpoints” inside the prompts, words that I ask the llm to output verbatim.

    This allows me to simply make some evals that check for exact strings. If at some point of the conversation the string is not present, I can pretty much know that the system is not working as expected.

    External tools

    Tools can help you by simplifying the setup/infra and maybe giving you a nice interface, but you still have to look at the data and build the specific evaluations for your use case.

    Don’t rely solely on standard evals, build your own.

    Share. Facebook Twitter Pinterest LinkedIn Reddit WhatsApp Telegram Email
    Previous ArticleEvidence that AI is destroying jobs for young people
    Next Article ReMarkable Paper Pro Move
    TechAiVerse
    • Website

    Jonathan is a tech enthusiast and the mind behind Tech AI Verse. With a passion for artificial intelligence, consumer tech, and emerging innovations, he deliver clear, insightful content to keep readers informed. From cutting-edge gadgets to AI advancements and cryptocurrency trends, Jonathan breaks down complex topics to make technology accessible to all.

    Related Posts

    This HP mini PC delivers big power for $350

    February 9, 2026

    Upgrade to Windows 11 Pro for $13 and feel the difference immediately

    February 9, 2026

    This slim 1440p portable laptop monitor is 30% off

    February 9, 2026
    Leave A Reply Cancel Reply

    Top Posts

    Ping, You’ve Got Whale: AI detection system alerts ships of whales in their path

    April 22, 2025660 Views

    Lumo vs. Duck AI: Which AI is Better for Your Privacy?

    July 31, 2025248 Views

    6.7 Cummins Lifter Failure: What Years Are Affected (And Possible Fixes)

    April 14, 2025148 Views

    6 Best MagSafe Phone Grips (2025), Tested and Reviewed

    April 6, 2025111 Views
    Don't Miss
    Technology February 9, 2026

    This HP mini PC delivers big power for $350

    This HP mini PC delivers big power for $350 Image: StackCommerce TL;DR: A small but powerful HP…

    Upgrade to Windows 11 Pro for $13 and feel the difference immediately

    This slim 1440p portable laptop monitor is 30% off

    If you buy Razer’s insane $1337 mouse, I will be very disappointed in you

    Stay In Touch
    • Facebook
    • Twitter
    • Pinterest
    • Instagram
    • YouTube
    • Vimeo

    Subscribe to Updates

    Get the latest creative news from SmartMag about art & design.

    About Us
    About Us

    Welcome to Tech AI Verse, your go-to destination for everything technology! We bring you the latest news, trends, and insights from the ever-evolving world of tech. Our coverage spans across global technology industry updates, artificial intelligence advancements, machine learning ethics, and automation innovations. Stay connected with us as we explore the limitless possibilities of technology!

    Facebook X (Twitter) Pinterest YouTube WhatsApp
    Our Picks

    This HP mini PC delivers big power for $350

    February 9, 20263 Views

    Upgrade to Windows 11 Pro for $13 and feel the difference immediately

    February 9, 20264 Views

    This slim 1440p portable laptop monitor is 30% off

    February 9, 20264 Views
    Most Popular

    7 Best Kids Bikes (2025): Mountain, Balance, Pedal, Coaster

    March 13, 20250 Views

    VTOMAN FlashSpeed 1500: Plenty Of Power For All Your Gear

    March 13, 20250 Views

    This new Roomba finally solves the big problem I have with robot vacuums

    March 13, 20250 Views
    © 2026 TechAiVerse. Designed by Divya Tech.
    • Home
    • About Us
    • Contact Us
    • Privacy Policy
    • Terms & Conditions

    Type above and press Enter to search. Press Esc to cancel.