Close Menu

    Subscribe to Updates

    Get the latest creative news from FooBar about art, design and business.

    What's Hot

    Place Any Space Smart Smoke and Carbon Monoxide Detector review

    Monsgeek FUN60 Ultra review: The TMR-powered keyboard every gamer needs

    Time to ditch normal internet? How to know if mobile broadband is enough

    Facebook X (Twitter) Instagram
    • Artificial Intelligence
    • Business Technology
    • Cryptocurrency
    • Gadgets
    • Gaming
    • Health
    • Software and Apps
    • Technology
    Facebook X (Twitter) Instagram Pinterest Vimeo
    Tech AI Verse
    • Home
    • Artificial Intelligence

      AI has become the norm for students. Teachers are playing catch-up.

      December 23, 2025

      Trump signs executive order seeking to ban states from regulating AI companies

      December 13, 2025

      Apple’s AI chief abruptly steps down

      December 3, 2025

      The issue that’s scrambling both parties: From the Politics Desk

      December 3, 2025

      More of Silicon Valley is building on free Chinese AI

      December 1, 2025
    • Business

      Top 10 cloud computing stories of 2025

      December 22, 2025

      Saudia Arabia’s STC commits to five-year network upgrade programme with Ericsson

      December 18, 2025

      Zeroday Cloud hacking event awards $320,0000 for 11 zero days

      December 18, 2025

      Amazon: Ongoing cryptomining campaign uses hacked AWS accounts

      December 18, 2025

      Want to back up your iPhone securely without paying the Apple tax? There’s a hack for that, but it isn’t for everyone… yet

      December 16, 2025
    • Crypto

      FOMC Signals No Rush to Cut Rates Until March 2026, Crypto Faces Test

      December 31, 2025

      Pi Coin Price Prediction: What to Expect In 2026?

      December 31, 2025

      Hedera’s 800% Fee Hike in 2026: Will It Bear Impact on HBAR’s Price?

      December 31, 2025

      Coinbase Lists Lighter’s LIT Token as Price Nears $3

      December 31, 2025

      Monad Price Prepares for a 64% Surge— But a $50 Million Long Squeeze Looms Below

      December 31, 2025
    • Technology

      Place Any Space Smart Smoke and Carbon Monoxide Detector review

      December 31, 2025

      Monsgeek FUN60 Ultra review: The TMR-powered keyboard every gamer needs

      December 31, 2025

      Time to ditch normal internet? How to know if mobile broadband is enough

      December 31, 2025

      No playbook, just pressure: Publishers eye the rise of agentic browsers

      December 31, 2025

      LG announces new line of xboom speakers ahead of CES

      December 31, 2025
    • Others
      • Gadgets
      • Gaming
      • Health
      • Software and Apps
    Check BMI
    Tech AI Verse
    You are at:Home»Technology»SWE-Bench Pro
    Technology

    SWE-Bench Pro

    TechAiVerseBy TechAiVerseSeptember 22, 2025No Comments2 Mins Read3 Views
    Facebook Twitter Pinterest Telegram LinkedIn Tumblr Email Reddit
    SWE-Bench Pro
    Share
    Facebook Twitter LinkedIn Pinterest WhatsApp Email

    SWE-Bench Pro

    SWE-Bench Pro

    Code and data for the following works:

    • SWE-bench Pro: Can AI Agents Solve Long-Horizon Software Engineering Tasks?

    👋 Overview

    SWE-Bench Pro is a challenging benchmark evaluating LLMs/Agents on long-horizon software engineering tasks.
    Given a codebase and an issue, a language model is tasked with generating a patch that resolves the described problem.

    The dataset is inspired from SWE-Bench: https://github.com/SWE-bench/SWE-bench

    To access SWE-bench Pro, copy and run the following code:

    from datasets import load_dataset
    swebench = load_dataset('ScaleAI/SWE-bench_Pro', split='test')

    🚀 Set Up

    SWE-bench Pro uses Docker for reproducible evaluations.
    In addition, the evaluation script requires Modal to scale the evaluation set.

    Follow the instructions in the Docker setup guide to install Docker on your machine.
    If you’re setting up on Linux, we recommend seeing the post-installation steps as well.

    Run the following commands to store modal credentials:

    pip install modal
    modalv setup # and follow the prompts to generate your token and secret
    

    After running these steps, you should be able to see a token ID and secret in ~/.modal.toml:
    EG:

    We store prebuilt Docker images for each instance. They can be found in this directory:

    https://hub.docker.com/repository/docker/jefzda/sweap-images/general

    The format of the images is as follows.

    jefzda/sweap-images:{repo_base}.{repo_name}-{repo_base}__{repo_name}-{hash}

    For example:

    jefzda/sweap-images:gravitational.teleport-gravitational__teleport-82185f232ae8974258397e121b3bc2ed0c3729ed-v626ec2a48416b10a88641359a169d99e935ff03

    💽 Usage

    First generate patch predictions using your harness of choice.
    Evaluate patch predictions on SWE-bench Pro with the following command:

    python sweap_pro_eval_modal.py 
        --raw_sample_path=external_hf_v2.csv 
        --patch_path={OUTPUT}/gold_patches.json 
        --output_dir={OUTPUT}/ 
        --scripts_dir=run_scripts 
        --num_workers=100 
        --dockerhub_username=your-username

    Replace gold_patches with your patch json, and point raw_sample_path to the SWE-Bench Pro CSV.

    Share. Facebook Twitter Pinterest LinkedIn Reddit WhatsApp Telegram Email
    Previous ArticleNew Govee Outdoor Lights with tri-color effects now available
    Next Article OpenAI and Nvidia announce partnership to deploy 10GW of Nvidia systems
    TechAiVerse
    • Website

    Jonathan is a tech enthusiast and the mind behind Tech AI Verse. With a passion for artificial intelligence, consumer tech, and emerging innovations, he deliver clear, insightful content to keep readers informed. From cutting-edge gadgets to AI advancements and cryptocurrency trends, Jonathan breaks down complex topics to make technology accessible to all.

    Related Posts

    Place Any Space Smart Smoke and Carbon Monoxide Detector review

    December 31, 2025

    Monsgeek FUN60 Ultra review: The TMR-powered keyboard every gamer needs

    December 31, 2025

    Time to ditch normal internet? How to know if mobile broadband is enough

    December 31, 2025
    Leave A Reply Cancel Reply

    Top Posts

    Ping, You’ve Got Whale: AI detection system alerts ships of whales in their path

    April 22, 2025563 Views

    Lumo vs. Duck AI: Which AI is Better for Your Privacy?

    July 31, 2025208 Views

    6.7 Cummins Lifter Failure: What Years Are Affected (And Possible Fixes)

    April 14, 2025112 Views

    6 Best MagSafe Phone Grips (2025), Tested and Reviewed

    April 6, 202594 Views
    Don't Miss
    Technology December 31, 2025

    Place Any Space Smart Smoke and Carbon Monoxide Detector review

    Place Any Space Smart Smoke and Carbon Monoxide Detector review Skip to content Image: Christopher…

    Monsgeek FUN60 Ultra review: The TMR-powered keyboard every gamer needs

    Time to ditch normal internet? How to know if mobile broadband is enough

    No playbook, just pressure: Publishers eye the rise of agentic browsers

    Stay In Touch
    • Facebook
    • Twitter
    • Pinterest
    • Instagram
    • YouTube
    • Vimeo

    Subscribe to Updates

    Get the latest creative news from SmartMag about art & design.

    About Us
    About Us

    Welcome to Tech AI Verse, your go-to destination for everything technology! We bring you the latest news, trends, and insights from the ever-evolving world of tech. Our coverage spans across global technology industry updates, artificial intelligence advancements, machine learning ethics, and automation innovations. Stay connected with us as we explore the limitless possibilities of technology!

    Facebook X (Twitter) Pinterest YouTube WhatsApp
    Our Picks

    Place Any Space Smart Smoke and Carbon Monoxide Detector review

    December 31, 20252 Views

    Monsgeek FUN60 Ultra review: The TMR-powered keyboard every gamer needs

    December 31, 20253 Views

    Time to ditch normal internet? How to know if mobile broadband is enough

    December 31, 20253 Views
    Most Popular

    What to Know and Where to Find Apple Intelligence Summaries on iPhone

    March 12, 20250 Views

    A Team of Female Founders Is Launching Cloud Security Tech That Could Overhaul AI Protection

    March 12, 20250 Views

    Senua’s Saga: Hellblade 2 leads BAFTA Game Awards 2025 nominations

    March 12, 20250 Views
    © 2025 TechAiVerse. Designed by Divya Tech.
    • Home
    • About Us
    • Contact Us
    • Privacy Policy
    • Terms & Conditions

    Type above and press Enter to search. Press Esc to cancel.