Close Menu

    Subscribe to Updates

    Get the latest creative news from FooBar about art, design and business.

    What's Hot

    Honda CR-V Hybrid Lineup Expanded in Malaysia From RM178,200

    vivo V70 – Top 7 Flagship Features You Will Love

    Apple iPad Air with M4 Officially Launches in Malaysia From RM2,799

    Facebook X (Twitter) Instagram
    • Artificial Intelligence
    • Business Technology
    • Cryptocurrency
    • Gadgets
    • Gaming
    • Health
    • Software and Apps
    • Technology
    Facebook X (Twitter) Instagram Pinterest Vimeo
    Tech AI Verse
    • Home
    • Artificial Intelligence

      What the polls say about how Americans are using AI

      February 27, 2026

      Tensions between the Pentagon and AI giant Anthropic reach a boiling point

      February 21, 2026

      Read the extended transcript: President Donald Trump interviewed by ‘NBC Nightly News’ anchor Tom Llamas

      February 6, 2026

      Stocks and bitcoin sink as investors dump software company shares

      February 4, 2026

      AI, crypto and Trump super PACs stash millions to spend on the midterms

      February 2, 2026
    • Business

      Weighing up the enterprise risks of neocloud providers

      March 3, 2026

      A stolen Gemini API key turned a $180 bill into $82,000 in two days

      March 3, 2026

      These ultra-budget laptops “include” 1.2TB storage, but most of it is OneDrive trial space

      March 1, 2026

      FCC approves the merger of cable giants Cox and Charter

      February 28, 2026

      Finding value with AI and Industry 5.0 transformation

      February 28, 2026
    • Crypto

      Strait of Hormuz Shutdown Shakes Asian Energy Markets

      March 3, 2026

      Wall Street’s Inflation Alarm From Iran — What It Means for Crypto

      March 3, 2026

      Ethereum Price Prediction: What To Expect From ETH In March 2026

      March 3, 2026

      Was Bitcoin Hijacked? How Institutional Interests Shaped Its Narrative Since 2015

      March 3, 2026

      XRP Whales Now Hold 83.7% of All Supply – What’s Next For Price?

      March 3, 2026
    • Technology

      Spotify’s new feature makes it easier to find popular audiobooks

      March 3, 2026

      This portable JBL Grip Bluetooth speaker is so good at 20% off

      March 3, 2026

      ‘AI’ could dox your anonymous posts

      March 3, 2026

      Microsoft says new Teams location feature isn’t for ’employee tracking’

      March 3, 2026

      OpenAI got ‘sloppy’ about the wrong thing

      March 3, 2026
    • Others
      • Gadgets
      • Gaming
      • Health
      • Software and Apps
    Check BMI
    Tech AI Verse
    You are at:Home»Technology»Confidence in agentic AI: Why eval infrastructure must come first
    Technology

    Confidence in agentic AI: Why eval infrastructure must come first

    TechAiVerseBy TechAiVerseJuly 3, 2025No Comments6 Mins Read4 Views
    Facebook Twitter Pinterest Telegram LinkedIn Tumblr Email Reddit
    Confidence in agentic AI: Why eval infrastructure must come first
    Share
    Facebook Twitter LinkedIn Pinterest WhatsApp Email

    Confidence in agentic AI: Why eval infrastructure must come first

    July 2, 2025 8:41 AM

    Thys Waanders, Shailesh Nalawadi, Shawn Malhotra, Joanne Chen,

    As AI agents enter real-world deployment, organizations are under pressure to define where they belong, how to build them effectively, and how to operationalize them at scale. At VentureBeat’s Transform 2025, tech leaders gathered to talk about how they’re transforming their business with agents: Joanne Chen, general partner at Foundation Capital; Shailesh Nalawadi, VP of project management with Sendbird; Thys Waanders, SVP of AI transformation at Cognigy; and Shawn Malhotra, CTO, Rocket Companies.

    A few top agentic AI use cases

    “The initial attraction of any of these deployments for AI agents tends to be around saving human capital — the math is pretty straightforward,” Nalawadi said. “However, that undersells the transformational capability you get with AI agents.”

    At Rocket, AI agents have proven to be powerful tools in increasing website conversion.

    “We’ve found that with our agent-based experience, the conversational experience on the website, clients are three times more likely to convert when they come through that channel,” Malhotra said.

    But that’s just scratching the surface. For instance, a Rocket engineer built an agent in just two days to automate a highly specialized task: calculating transfer taxes during mortgage underwriting.

    “That two days of effort saved us a million dollars a year in expense,” Malhotra said. “In 2024, we saved more than a million team member hours, mostly off the back of our AI solutions. That’s not just saving expense. It’s also allowing our team members to focus their time on people making what is often the largest financial transaction of their life.”

    Agents are essentially supercharging individual team members. That million hours saved isn’t the entirety of someone’s job replicated many times. It’s fractions of the job that are things employees don’t enjoy doing, or weren’t adding value to the client. And that million hours saved gives Rocket the capacity to handle more business.

    “Some of our team members were able to handle 50% more clients last year than they were the year before,” Malhotra added. “It means we can have higher throughput, drive more business, and again, we see higher conversion rates because they’re spending the time understanding the client’s needs versus doing a lot of more rote work that the AI can do now.”

    Tackling agent complexity

    “Part of the journey for our engineering teams is moving from the mindset of software engineering – write once and test it and it runs and gives the same answer 1,000 times – to the more probabilistic approach, where you ask the same thing of an LLM and it gives different answers through some probability,” Nalawadi said. “A lot of it has been bringing people along. Not just software engineers, but product managers and UX designers.”

    What’s helped is that LLMs have come a long way, Waanders said. If they built something 18 months or two years ago, they really had to pick the right model, or the agent would not perform as expected. Now, he says, we’re now at a stage where most of the mainstream models behave very well. They’re more predictable. But today the challenge is combining models, ensuring responsiveness, orchestrating the right models in the right sequence and weaving in the right data.

    “We have customers that push tens of millions of conversations per year,” Waanders said. “If you automate, say, 30 million conversations in a year, how does that scale in the LLM world? That’s all stuff that we had to discover, simple stuff, from even getting the model availability with the cloud providers. Having enough quota with a ChatGPT model, for example. Those are all learnings that we had to go through, and our customers as well. It’s a brand-new world.”

    A layer above orchestrating the LLM is orchestrating a network of agents, Malhotra said. A conversational experience has a network of agents under the hood, and the orchestrator is deciding which agent to farm the request out to from those available.

    “If you play that forward and think about having hundreds or thousands of agents who are capable of different things, you get some really interesting technical problems,” he said. “It’s becoming a bigger problem, because latency and time matter. That agent routing is going to be a very interesting problem to solve over the coming years.”

    Tapping into vendor relationships

    Up to this point, the first step for most companies launching agentic AI has been building in-house, because specialized tools didn’t yet exist. But you can’t differentiate and create value by building generic LLM infrastructure or AI infrastructure, and you need specialized expertise to go beyond the initial build, and debug, iterate, and improve on what’s been built, as well as maintain the infrastructure.

    “Often we find the most successful conversations we have with prospective customers tend to be someone who’s already built something in-house,” Nalawadi said. “They quickly realize that getting to a 1.0 is okay, but as the world evolves and as the infrastructure evolves and as they need to swap out technology for something new, they don’t have the ability to orchestrate all these things.”

    Preparing for agentic AI complexity

    Theoretically, agentic AI will only grow in complexity — the number of agents in an organization will rise, and they’ll start learning from each other, and the number of use cases will explode. How can organizations prepare for the challenge?

    “It means that the checks and balances in your system will get stressed more,” Malhotra said. “For something that has a regulatory process, you have a human in the loop to make sure that someone is signing off on this. For critical internal processes or data access, do you have observability? Do you have the right alerting and monitoring so that if something goes wrong, you know it’s going wrong? It’s doubling down on your detection, understanding where you need a human in the loop, and then trusting that those processes are going to catch if something does go wrong. But because of the power it unlocks, you have to do it.”

    So how can you have confidence that an AI agent will behave reliably as it evolves?

    “That part is really difficult if you haven’t thought about it at the beginning,” Nalawadi said. “The short answer is, before you even start building it, you should have an eval infrastructure in place. Make sure you have a rigorous environment in which you know what good looks like, from an AI agent, and that you have this test set. Keep referring back to it as you make improvements. A very simplistic way of thinking about eval is that it’s the unit tests for your agentic system.”

    The problem is, it’s non-deterministic, Waanders added. Unit testing is critical, but the biggest challenge is you don’t know what you don’t know — what incorrect behaviors an agent could possibly display, how it might react in any given situation.

    “You can only find that out by simulating conversations at scale, by pushing it under thousands of different scenarios, and then analyzing how it holds up and how it reacts,” Waanders said.

    Share. Facebook Twitter Pinterest LinkedIn Reddit WhatsApp Telegram Email
    Previous ArticleTransform 2025: Why observability is critical for AI agent ecosystems
    Next Article HOLY SMOKES! A new, 200% faster DeepSeek R1-0528 variant appears from German lab TNG Technology Consulting GmbH
    TechAiVerse
    • Website

    Jonathan is a tech enthusiast and the mind behind Tech AI Verse. With a passion for artificial intelligence, consumer tech, and emerging innovations, he deliver clear, insightful content to keep readers informed. From cutting-edge gadgets to AI advancements and cryptocurrency trends, Jonathan breaks down complex topics to make technology accessible to all.

    Related Posts

    Spotify’s new feature makes it easier to find popular audiobooks

    March 3, 2026

    This portable JBL Grip Bluetooth speaker is so good at 20% off

    March 3, 2026

    ‘AI’ could dox your anonymous posts

    March 3, 2026
    Leave A Reply Cancel Reply

    Top Posts

    Ping, You’ve Got Whale: AI detection system alerts ships of whales in their path

    April 22, 2025703 Views

    Lumo vs. Duck AI: Which AI is Better for Your Privacy?

    July 31, 2025286 Views

    6.7 Cummins Lifter Failure: What Years Are Affected (And Possible Fixes)

    April 14, 2025164 Views

    6 Best MagSafe Phone Grips (2025), Tested and Reviewed

    April 6, 2025124 Views
    Don't Miss
    Gadgets March 4, 2026

    Honda CR-V Hybrid Lineup Expanded in Malaysia From RM178,200

    Honda CR-V Hybrid Lineup Expanded in Malaysia From RM178,200 Honda Malaysia has officially launched the…

    vivo V70 – Top 7 Flagship Features You Will Love

    Apple iPad Air with M4 Officially Launches in Malaysia From RM2,799

    Apple Launches iPhone 17e in Malaysia from RM2,999

    Stay In Touch
    • Facebook
    • Twitter
    • Pinterest
    • Instagram
    • YouTube
    • Vimeo

    Subscribe to Updates

    Get the latest creative news from SmartMag about art & design.

    About Us
    About Us

    Welcome to Tech AI Verse, your go-to destination for everything technology! We bring you the latest news, trends, and insights from the ever-evolving world of tech. Our coverage spans across global technology industry updates, artificial intelligence advancements, machine learning ethics, and automation innovations. Stay connected with us as we explore the limitless possibilities of technology!

    Facebook X (Twitter) Pinterest YouTube WhatsApp
    Our Picks

    Honda CR-V Hybrid Lineup Expanded in Malaysia From RM178,200

    March 4, 20262 Views

    vivo V70 – Top 7 Flagship Features You Will Love

    March 4, 20262 Views

    Apple iPad Air with M4 Officially Launches in Malaysia From RM2,799

    March 4, 20262 Views
    Most Popular

    7 Best Kids Bikes (2025): Mountain, Balance, Pedal, Coaster

    March 13, 20250 Views

    VTOMAN FlashSpeed 1500: Plenty Of Power For All Your Gear

    March 13, 20250 Views

    Best TV Antenna of 2025

    March 13, 20250 Views
    © 2026 TechAiVerse. Designed by Divya Tech.
    • Home
    • About Us
    • Contact Us
    • Privacy Policy
    • Terms & Conditions

    Type above and press Enter to search. Press Esc to cancel.