Close Menu

    Subscribe to Updates

    Get the latest creative news from FooBar about art, design and business.

    What's Hot

    TSA Allows You To Add Your Passport To This Apple Wallet Alternative

    OpenAI taps Tata for 100MW AI data center capacity in India, eyes 1GW

    OpenAI deepens India push with Pine Labs fintech partnership

    Facebook X (Twitter) Instagram
    • Artificial Intelligence
    • Business Technology
    • Cryptocurrency
    • Gadgets
    • Gaming
    • Health
    • Software and Apps
    • Technology
    Facebook X (Twitter) Instagram Pinterest Vimeo
    Tech AI Verse
    • Home
    • Artificial Intelligence

      Read the extended transcript: President Donald Trump interviewed by ‘NBC Nightly News’ anchor Tom Llamas

      February 6, 2026

      Stocks and bitcoin sink as investors dump software company shares

      February 4, 2026

      AI, crypto and Trump super PACs stash millions to spend on the midterms

      February 2, 2026

      To avoid accusations of AI cheating, college students are turning to AI

      January 29, 2026

      ChatGPT can embrace authoritarian ideas after just one prompt, researchers say

      January 24, 2026
    • Business

      The HDD brand that brought you the 1.8-inch, 2.5-inch, and 3.5-inch hard drives is now back with a $19 pocket-sized personal cloud for your smartphones

      February 12, 2026

      New VoidLink malware framework targets Linux cloud servers

      January 14, 2026

      Nvidia Rubin’s rack-scale encryption signals a turning point for enterprise AI security

      January 13, 2026

      How KPMG is redefining the future of SAP consulting on a global scale

      January 10, 2026

      Top 10 cloud computing stories of 2025

      December 22, 2025
    • Crypto

      Wall Street Moves Into Prediction Markets With Election-Contract ETF Filings

      February 18, 2026

      Tectonic to Host Inaugural Quantum Summit at ETHDenver 2026 Focused on Post-Quantum Cryptography Readiness for Web3

      February 18, 2026

      Ki Young Ju Says Bitcoin May Need to Hit $55K Before True Recovery Begins

      February 18, 2026

      MYX Finance Is Oversold For The First Time Ever, Yet No Relief In Sight

      February 18, 2026

      Everyone is Talking about the SaaSpocalypse, But Why Does it matter for Crypto?

      February 18, 2026
    • Technology

      TSA Allows You To Add Your Passport To This Apple Wallet Alternative

      February 19, 2026

      OpenAI taps Tata for 100MW AI data center capacity in India, eyes 1GW

      February 19, 2026

      OpenAI deepens India push with Pine Labs fintech partnership

      February 19, 2026

      Etsy sells secondhand clothing marketplace Depop to eBay for $1.2B

      February 19, 2026

      Hacking conference Def Con bans three people linked to Epstein

      February 19, 2026
    • Others
      • Gadgets
      • Gaming
      • Health
      • Software and Apps
    Check BMI
    Tech AI Verse
    You are at:Home»Technology»New vision model from Cohere runs on two GPUs, beats top-tier VLMs on visual tasks
    Technology

    New vision model from Cohere runs on two GPUs, beats top-tier VLMs on visual tasks

    TechAiVerseBy TechAiVerseAugust 2, 2025No Comments5 Mins Read2 Views
    Facebook Twitter Pinterest Telegram LinkedIn Tumblr Email Reddit
    New vision model from Cohere runs on two GPUs, beats top-tier VLMs on visual tasks
    Share
    Facebook Twitter LinkedIn Pinterest WhatsApp Email

    New vision model from Cohere runs on two GPUs, beats top-tier VLMs on visual tasks

    August 1, 2025 3:05 PM

    Image credit: VentureBeat with DALL-E 3

    Want smarter insights in your inbox? Sign up for our weekly newsletters to get only what matters to enterprise AI, data, and security leaders. Subscribe Now


    The rise in Deep Research features and other AI-powered analysis has given rise to more models and services looking to simplify that process and read more of the documents businesses actually use. 

    Canadian AI company Cohere is banking on its models, including a newly released visual model, to make the case that Deep Research features should also be optimized for enterprise use cases. 

    The company has released Command A Vision, a visual model specifically targeting enterprise use cases, built on the back of its Command A model. The 112 billion parameter model can “unlock valuable insights from visual data, and make highly accurate, data-driven decisions through document optical character recognition (OCR) and image analysis,” the company says.

    “Whether it’s interpreting product manuals with complex diagrams or analyzing photographs of real-world scenes for risk detection, Command A Vision excels at tackling the most demanding enterprise vision challenges,” the company said in a blog post. 


    The AI Impact Series Returns to San Francisco – August 5

    The next phase of AI is here – are you ready? Join leaders from Block, GSK, and SAP for an exclusive look at how autonomous agents are reshaping enterprise workflows – from real-time decision-making to end-to-end automation.

    Secure your spot now – space is limited: https://bit.ly/3GuuPLF


    This means Command A Vision can read and analyze the most common types of images enterprises need: graphs, charts, diagrams, scanned documents and PDFs. 

    ? @cohere just dropped Command A Vision on @huggingface ?

    Designed for enterprise multimodal use cases: interpreting product manuals, analyzing photos, asking about charts… ❓??

    A 112B dense vision-language model with SOTA performance – check out the benchmark metrics in… pic.twitter.com/ORMfM5f8cF

    — Jeff Boudier ? (@jeffboudier) July 31, 2025

    Since it’s built on Command A’s architecture, Command A Vision requires two or fewer GPUs, just like the text model. The vision model also retains the text capabilities of Command A to read words on images and understands at least 23 languages. Cohere said that, unlike other models, Command A Vision reduces the total cost of ownership for enterprises and is fully optimized for retrieval use cases for businesses. 

    How Cohere is architecting Command A

    Cohere said it followed a Llava architecture to build its Command A models, including the visual model. This architecture turns visual features into soft vision tokens, which can be divided into different tiles. 

    These tiles are passed into the Command A text tower, “a dense, 111B parameters textual LLM,” the company said. “In this manner, a single image consumes up to 3,328 tokens.”

    Cohere said it trained the visual model in three stages: vision-language alignment, supervised fine-tuning (SFT) and post-training reinforcement learning with human feedback (RLHF).

    “This approach enables the mapping of image encoder features to the language model embedding space,” the company said. “In contrast, during the SFT stage, we simultaneously trained the vision encoder, the vision adapter and the language model on a diverse set of instruction-following multimodal tasks.”

    Visualizing enterprise AI 

    Benchmark tests showed Command A Vision outperforming other models with similar visual capabilities. 

    Cohere pitted Command A Vision against OpenAI’s GPT 4.1, Meta’s Llama 4 Maverick, Mistral’s Pixtral Large and Mistral Medium 3 in nine benchmark tests. The company did not mention if it tested the model against Mistral’s OCR-focused API, Mistral OCR. 

    It enables agents to securely see inside your organization’s visual data, unlocking the automation of tedious tasks involving slides, diagrams, PDFs, and photos. pic.twitter.com/iHZnUWekrk

    — cohere (@cohere) July 31, 2025

    Command A Vision outscored the other models in tests such as ChartQA, OCRBench, AI2D and TextVQA. Overall, Command A Vision had an average score of 83.1% compared to GPT 4.1’s 78.6%, Llama 4 Maverick’s 80.5% and the 78.3% from Mistral Medium 3. 

    Most large language models (LLMs) these days are multimodal, meaning they can generate or understand visual media like photos or videos. However, enterprises generally use more graphical documents such as charts and PDFs, so extracting information from these unstructured data sources often proves difficult. 

    With Deep Research on the rise, the importance of bringing in models capable of reading, analyzing and even downloading unstructured data has grown.

    Cohere also said it’s offering Command A Vision in an open weights system, in hopes that enterprises looking to move away from closed or proprietary models will start using its products. So far, there is some interest from developers.

    Very impressed at its accuracy extracting hand handwritten notes from an image!

    — Adam Sardo (@sardo_adam) July 31, 2025

    Finally, an AI that won’t judge my terrible doodles.

    — Martha Wisener ? (@martwisener) August 1, 2025

    Daily insights on business use cases with VB Daily

    If you want to impress your boss, VB Daily has you covered. We give you the inside scoop on what companies are doing with generative AI, from regulatory shifts to practical deployments, so you can share insights for maximum ROI.

    Read our Privacy Policy

    Thanks for subscribing. Check out more VB newsletters here.

    An error occured.

    Share. Facebook Twitter Pinterest LinkedIn Reddit WhatsApp Telegram Email
    Previous ArticleWhy open-source AI became an American national priority
    Next Article Everything you need to know about iOS 26 beta release: How to download it on your iPhone, new Apple features like Liquid Glass and more
    TechAiVerse
    • Website

    Jonathan is a tech enthusiast and the mind behind Tech AI Verse. With a passion for artificial intelligence, consumer tech, and emerging innovations, he deliver clear, insightful content to keep readers informed. From cutting-edge gadgets to AI advancements and cryptocurrency trends, Jonathan breaks down complex topics to make technology accessible to all.

    Related Posts

    TSA Allows You To Add Your Passport To This Apple Wallet Alternative

    February 19, 2026

    OpenAI taps Tata for 100MW AI data center capacity in India, eyes 1GW

    February 19, 2026

    OpenAI deepens India push with Pine Labs fintech partnership

    February 19, 2026
    Leave A Reply Cancel Reply

    Top Posts

    Ping, You’ve Got Whale: AI detection system alerts ships of whales in their path

    April 22, 2025684 Views

    Lumo vs. Duck AI: Which AI is Better for Your Privacy?

    July 31, 2025272 Views

    6.7 Cummins Lifter Failure: What Years Are Affected (And Possible Fixes)

    April 14, 2025156 Views

    6 Best MagSafe Phone Grips (2025), Tested and Reviewed

    April 6, 2025117 Views
    Don't Miss
    Technology February 19, 2026

    TSA Allows You To Add Your Passport To This Apple Wallet Alternative

    TSA Allows You To Add Your Passport To This Apple Wallet Alternative Ilona Titova/Getty Images…

    OpenAI taps Tata for 100MW AI data center capacity in India, eyes 1GW

    OpenAI deepens India push with Pine Labs fintech partnership

    Etsy sells secondhand clothing marketplace Depop to eBay for $1.2B

    Stay In Touch
    • Facebook
    • Twitter
    • Pinterest
    • Instagram
    • YouTube
    • Vimeo

    Subscribe to Updates

    Get the latest creative news from SmartMag about art & design.

    About Us
    About Us

    Welcome to Tech AI Verse, your go-to destination for everything technology! We bring you the latest news, trends, and insights from the ever-evolving world of tech. Our coverage spans across global technology industry updates, artificial intelligence advancements, machine learning ethics, and automation innovations. Stay connected with us as we explore the limitless possibilities of technology!

    Facebook X (Twitter) Pinterest YouTube WhatsApp
    Our Picks

    TSA Allows You To Add Your Passport To This Apple Wallet Alternative

    February 19, 20260 Views

    OpenAI taps Tata for 100MW AI data center capacity in India, eyes 1GW

    February 19, 20260 Views

    OpenAI deepens India push with Pine Labs fintech partnership

    February 19, 20260 Views
    Most Popular

    7 Best Kids Bikes (2025): Mountain, Balance, Pedal, Coaster

    March 13, 20250 Views

    VTOMAN FlashSpeed 1500: Plenty Of Power For All Your Gear

    March 13, 20250 Views

    This new Roomba finally solves the big problem I have with robot vacuums

    March 13, 20250 Views
    © 2026 TechAiVerse. Designed by Divya Tech.
    • Home
    • About Us
    • Contact Us
    • Privacy Policy
    • Terms & Conditions

    Type above and press Enter to search. Press Esc to cancel.