Close Menu

    Subscribe to Updates

    Get the latest creative news from FooBar about art, design and business.

    What's Hot

    The Download: California’s AI power plans, and and why it’s so hard to make welfare AI fair

    California is set to become the first US state to manage power outages with AI

    Meta is reportedly using actual tents to build data centers

    Facebook X (Twitter) Instagram
    • Artificial Intelligence
    • Business Technology
    • Cryptocurrency
    • Gadgets
    • Gaming
    • Health
    • Software and Apps
    • Technology
    Facebook X (Twitter) Instagram Pinterest Vimeo
    Tech AI Verse
    • Home
    • Artificial Intelligence

      AI chatbot Grok issues apology for antisemitic posts

      July 13, 2025

      Apple sued by shareholders for allegedly overstating AI progress

      June 22, 2025

      How far will AI go to defend its own survival?

      June 2, 2025

      The internet thinks this video from Gaza is AI. Here’s how we proved it isn’t.

      May 30, 2025

      Nvidia CEO hails Trump’s plan to rescind some export curbs on AI chips to China

      May 22, 2025
    • Business

      Cloudflare open-sources Orange Meets with End-to-End encryption

      June 29, 2025

      Google links massive cloud outage to API management issue

      June 13, 2025

      The EU challenges Google and Cloudflare with its very own DNS resolver that can filter dangerous traffic

      June 11, 2025

      These two Ivanti bugs are allowing hackers to target cloud instances

      May 21, 2025

      How cloud and AI transform and improve customer experiences

      May 10, 2025
    • Crypto

      Bitcoin (BTC) Slides From $123,000 High Ahead of US CPI Print

      July 15, 2025

      Shadowy Entity Behind Trump’s DeFi Project Revealed as Disgraced Web3 Firm

      July 15, 2025

      Satoshi-Era 80,000 BTC Whale Move Coins to CEXs as Bitcoin Hits All-Time Highs

      July 15, 2025

      XRP in Focus as Fed’s ISO 20022 Goes Live – What Traders Should Know

      July 15, 2025

      Bitcoin Skeptic Vanguard Quietly Becomes MicroStrategy’s No. 1 Shareholder

      July 15, 2025
    • Technology

      The Download: California’s AI power plans, and and why it’s so hard to make welfare AI fair

      July 15, 2025

      California is set to become the first US state to manage power outages with AI

      July 15, 2025

      Meta is reportedly using actual tents to build data centers

      July 15, 2025

      Nvidia is set to resume China chip sales after months of regulatory whiplash

      July 15, 2025

      Brian Singerman is raising over $500M for a new fund with a twist on the VC model

      July 15, 2025
    • Others
      • Gadgets
      • Gaming
      • Health
      • Software and Apps
    Shop Now
    Tech AI Verse
    You are at:Home»Technology»From hallucinations to hardware: Lessons from a real-world computer vision project gone sideways
    Technology

    From hallucinations to hardware: Lessons from a real-world computer vision project gone sideways

    TechAiVerseBy TechAiVerseJune 28, 2025No Comments7 Mins Read0 Views
    Facebook Twitter Pinterest Telegram LinkedIn Tumblr Email Reddit
    From hallucinations to hardware: Lessons from a real-world computer vision project gone sideways
    Share
    Facebook Twitter LinkedIn Pinterest WhatsApp Email

    From hallucinations to hardware: Lessons from a real-world computer vision project gone sideways

    June 28, 2025 12:05 PM

    VentureBeat/Midjourney

    Join the event trusted by enterprise leaders for nearly two decades. VB Transform brings together the people building real enterprise AI strategy. Learn more


    Computer vision projects rarely go exactly as planned, and this one was no exception. The idea was simple: Build a model that could look at a photo of a laptop and identify any physical damage — things like cracked screens, missing keys or broken hinges. It seemed like a straightforward use case for image models and large language models (LLMs), but it quickly turned into something more complicated.

    Along the way, we ran into issues with hallucinations, unreliable outputs and images that were not even laptops. To solve these, we ended up applying an agentic framework in an atypical way — not for task automation, but to improve the model’s performance.

    In this post, we will walk through what we tried, what didn’t work and how a combination of approaches eventually helped us build something reliable.

    Where we started: Monolithic prompting

    Our initial approach was fairly standard for a multimodal model. We used a single, large prompt to pass an image into an image-capable LLM and asked it to identify visible damage. This monolithic prompting strategy is simple to implement and works decently for clean, well-defined tasks. But real-world data rarely plays along.

    We ran into three major issues early on:

    • Hallucinations: The model would sometimes invent damage that did not exist or mislabel what it was seeing.
    • Junk image detection: It had no reliable way to flag images that were not even laptops, like pictures of desks, walls or people occasionally slipped through and received nonsensical damage reports.
    • Inconsistent accuracy: The combination of these problems made the model too unreliable for operational use.

    This was the point when it became clear we would need to iterate.

    First fix: Mixing image resolutions

    One thing we noticed was how much image quality affected the model’s output. Users uploaded all kinds of images ranging from sharp and high-resolution to blurry. This led us to refer to research highlighting how image resolution impacts deep learning models.

    We trained and tested the model using a mix of high-and low-resolution images. The idea was to make the model more resilient to the wide range of image qualities it would encounter in practice. This helped improve consistency, but the core issues of hallucination and junk image handling persisted.

    The multimodal detour: Text-only LLM goes multimodal

    Encouraged by recent experiments in combining image captioning with text-only LLMs — like the technique covered in The Batch, where captions are generated from images and then interpreted by a language model, we decided to give it a try.

    Here’s how it works:

    • The LLM begins by generating multiple possible captions for an image. 
    • Another model, called a multimodal embedding model, checks how well each caption fits the image. In this case, we used SigLIP to score the similarity between the image and the text.
    • The system keeps the top few captions based on these scores.
    • The LLM uses those top captions to write new ones, trying to get closer to what the image actually shows.
    • It repeats this process until the captions stop improving, or it hits a set limit.

    While clever in theory, this approach introduced new problems for our use case:

    • Persistent hallucinations: The captions themselves sometimes included imaginary damage, which the LLM then confidently reported.
    • Incomplete coverage: Even with multiple captions, some issues were missed entirely.
    • Increased complexity, little benefit: The added steps made the system more complicated without reliably outperforming the previous setup.

    It was an interesting experiment, but ultimately not a solution.

    A creative use of agentic frameworks

    This was the turning point. While agentic frameworks are usually used for orchestrating task flows (think agents coordinating calendar invites or customer service actions), we wondered if breaking down the image interpretation task into smaller, specialized agents might help.

    We built an agentic framework structured like this:

    • Orchestrator agent: It checked the image and identified which laptop components were visible (screen, keyboard, chassis, ports).
    • Component agents: Dedicated agents inspected each component for specific damage types; for example, one for cracked screens, another for missing keys.
    • Junk detection agent: A separate agent flagged whether the image was even a laptop in the first place.

    This modular, task-driven approach produced much more precise and explainable results. Hallucinations dropped dramatically, junk images were reliably flagged and each agent’s task was simple and focused enough to control quality well.

    The blind spots: Trade-offs of an agentic approach

    As effective as this was, it was not perfect. Two main limitations showed up:

    • Increased latency: Running multiple sequential agents added to the total inference time.
    • Coverage gaps: Agents could only detect issues they were explicitly programmed to look for. If an image showed something unexpected that no agent was tasked with identifying, it would go unnoticed.

    We needed a way to balance precision with coverage.

    The hybrid solution: Combining agentic and monolithic approaches

    To bridge the gaps, we created a hybrid system:

    1. The agentic framework ran first, handling precise detection of known damage types and junk images. We limited the number of agents to the most essential ones to improve latency.
    2. Then, a monolithic image LLM prompt scanned the image for anything else the agents might have missed.
    3. Finally, we fine-tuned the model using a curated set of images for high-priority use cases, like frequently reported damage scenarios, to further improve accuracy and reliability.

    This combination gave us the precision and explainability of the agentic setup, the broad coverage of monolithic prompting and the confidence boost of targeted fine-tuning.

    What we learned

    A few things became clear by the time we wrapped up this project:

    • Agentic frameworks are more versatile than they get credit for: While they are usually associated with workflow management, we found they could meaningfully boost model performance when applied in a structured, modular way.
    • Blending different approaches beats relying on just one: The combination of precise, agent-based detection alongside the broad coverage of LLMs, plus a bit of fine-tuning where it mattered most, gave us far more reliable outcomes than any single method on its own.
    • Visual models are prone to hallucinations: Even the more advanced setups can jump to conclusions or see things that are not there. It takes a thoughtful system design to keep those mistakes in check.
    • Image quality variety makes a difference: Training and testing with both clear, high-resolution images and everyday, lower-quality ones helped the model stay resilient when faced with unpredictable, real-world photos.
    • You need a way to catch junk images: A dedicated check for junk or unrelated pictures was one of the simplest changes we made, and it had an outsized impact on overall system reliability.

    Final thoughts

    What started as a simple idea, using an LLM prompt to detect physical damage in laptop images, quickly turned into a much deeper experiment in combining different AI techniques to tackle unpredictable, real-world problems. Along the way, we realized that some of the most useful tools were ones not originally designed for this type of work.

    Agentic frameworks, often seen as workflow utilities, proved surprisingly effective when repurposed for tasks like structured damage detection and image filtering. With a bit of creativity, they helped us build a system that was not just more accurate, but easier to understand and manage in practice.

    Shruti Tiwari is an AI product manager at Dell Technologies.

    Vadiraj Kulkarni is a data scientist at Dell Technologies.

    Daily insights on business use cases with VB Daily

    If you want to impress your boss, VB Daily has you covered. We give you the inside scoop on what companies are doing with generative AI, from regulatory shifts to practical deployments, so you can share insights for maximum ROI.

    Read our Privacy Policy

    Thanks for subscribing. Check out more VB newsletters here.

    An error occured.

    Share. Facebook Twitter Pinterest LinkedIn Reddit WhatsApp Telegram Email
    Previous ArticleAI agents are hitting a liability wall. Mixus has a plan to overcome it using human overseers on high-risk workflows
    Next Article Sky ECC distributor released from French custody pending trial
    TechAiVerse
    • Website

    Jonathan is a tech enthusiast and the mind behind Tech AI Verse. With a passion for artificial intelligence, consumer tech, and emerging innovations, he deliver clear, insightful content to keep readers informed. From cutting-edge gadgets to AI advancements and cryptocurrency trends, Jonathan breaks down complex topics to make technology accessible to all.

    Related Posts

    The Download: California’s AI power plans, and and why it’s so hard to make welfare AI fair

    July 15, 2025

    California is set to become the first US state to manage power outages with AI

    July 15, 2025

    Meta is reportedly using actual tents to build data centers

    July 15, 2025
    Leave A Reply Cancel Reply

    Top Posts

    New Akira ransomware decryptor cracks encryptions keys using GPUs

    March 16, 202528 Views

    OpenAI details ChatGPT-o3, o4-mini, o4-mini-high usage limits

    April 19, 202522 Views

    6.7 Cummins Lifter Failure: What Years Are Affected (And Possible Fixes)

    April 14, 202520 Views

    Rsync replaced with openrsync on macOS Sequoia

    April 7, 202520 Views
    Don't Miss
    Technology July 15, 2025

    The Download: California’s AI power plans, and and why it’s so hard to make welfare AI fair

    The Download: California’s AI power plans, and and why it’s so hard to make welfare…

    California is set to become the first US state to manage power outages with AI

    Meta is reportedly using actual tents to build data centers

    Nvidia is set to resume China chip sales after months of regulatory whiplash

    Stay In Touch
    • Facebook
    • Twitter
    • Pinterest
    • Instagram
    • YouTube
    • Vimeo

    Subscribe to Updates

    Get the latest creative news from SmartMag about art & design.

    About Us
    About Us

    Welcome to Tech AI Verse, your go-to destination for everything technology! We bring you the latest news, trends, and insights from the ever-evolving world of tech. Our coverage spans across global technology industry updates, artificial intelligence advancements, machine learning ethics, and automation innovations. Stay connected with us as we explore the limitless possibilities of technology!

    Facebook X (Twitter) Pinterest YouTube WhatsApp
    Our Picks

    The Download: California’s AI power plans, and and why it’s so hard to make welfare AI fair

    July 15, 20252 Views

    California is set to become the first US state to manage power outages with AI

    July 15, 20252 Views

    Meta is reportedly using actual tents to build data centers

    July 15, 20252 Views
    Most Popular

    Xiaomi 15 Ultra Officially Launched in China, Malaysia launch to follow after global event

    March 12, 20250 Views

    Apple thinks people won’t use MagSafe on iPhone 16e

    March 12, 20250 Views

    French Apex Legends voice cast refuses contracts over “unacceptable” AI clause

    March 12, 20250 Views
    © 2025 TechAiVerse. Designed by Divya Tech.
    • Home
    • About Us
    • Contact Us
    • Privacy Policy
    • Terms & Conditions

    Type above and press Enter to search. Press Esc to cancel.