Close Menu

    Subscribe to Updates

    Get the latest creative news from FooBar about art, design and business.

    What's Hot

    Performance woes in Unreal Engine 5 games are developers’ fault, says Tim Sweeney

    Microsoft debuts in-house AI models as it builds independence from OpenAI

    USB testing “wheel” provides multiple ports to check faulty devices or cables

    Facebook X (Twitter) Instagram
    • Artificial Intelligence
    • Business Technology
    • Cryptocurrency
    • Gadgets
    • Gaming
    • Health
    • Software and Apps
    • Technology
    Facebook X (Twitter) Instagram Pinterest Vimeo
    Tech AI Verse
    • Home
    • Artificial Intelligence

      Blue-collar jobs are gaining popularity as AI threatens office work

      August 17, 2025

      Man who asked ChatGPT about cutting out salt from his diet was hospitalized with hallucinations

      August 15, 2025

      What happens when chatbots shape your reality? Concerns are growing online

      August 14, 2025

      Scientists want to prevent AI from going rogue by teaching it to be bad first

      August 8, 2025

      AI models may be accidentally (and secretly) learning each other’s bad behaviors

      July 30, 2025
    • Business

      Why Certified VMware Pros Are Driving the Future of IT

      August 24, 2025

      Murky Panda hackers exploit cloud trust to hack downstream customers

      August 23, 2025

      The rise of sovereign clouds: no data portability, no party

      August 20, 2025

      Israel is reportedly storing millions of Palestinian phone calls on Microsoft servers

      August 6, 2025

      AI site Perplexity uses “stealth tactics” to flout no-crawl edicts, Cloudflare says

      August 5, 2025
    • Crypto

      Former Indian Politician Convicted in Bitcoin Extortion Case

      August 30, 2025

      Top 3 Real World Asset (RWA) Altcoins to Watch in September

      August 30, 2025

      Ethereum Dip May Be Temporary with $1 Billion Whale Buys and Slower Profit Taking

      August 30, 2025

      Everything We Know So Far About the Bitcoin Thriller “Killing Satoshi”

      August 30, 2025

      Why HBAR’s Bearish Sentiment Might Be Its Trigger for a Price Rebound

      August 30, 2025
    • Technology

      Performance woes in Unreal Engine 5 games are developers’ fault, says Tim Sweeney

      August 30, 2025

      Microsoft debuts in-house AI models as it builds independence from OpenAI

      August 30, 2025

      USB testing “wheel” provides multiple ports to check faulty devices or cables

      August 30, 2025

      xAI launches Grok Code Fast 1 aiming to rival GitHub Copilot and OpenAI Codex

      August 30, 2025

      A growing number of states are restricting corporate use of facial recognition

      August 30, 2025
    • Others
      • Gadgets
      • Gaming
      • Health
      • Software and Apps
    Check BMI
    Tech AI Verse
    You are at:Home»Technology»OpenAI’s models ‘memorized’ copyrighted content, new study suggests
    Technology

    OpenAI’s models ‘memorized’ copyrighted content, new study suggests

    TechAiVerseBy TechAiVerseApril 5, 2025No Comments3 Mins Read1 Views
    Facebook Twitter Pinterest Telegram LinkedIn Tumblr Email Reddit
    OpenAI’s models ‘memorized’ copyrighted content, new study suggests
    Share
    Facebook Twitter LinkedIn Pinterest WhatsApp Email

    BMI Calculator – Check your Body Mass Index for free!

    OpenAI’s models ‘memorized’ copyrighted content, new study suggests

    A new study appears to lend credence to allegations that OpenAI trained at least some of its AI models on copyrighted content.

    OpenAI is embroiled in suits brought by authors, programmers, and other rights-holders who accuse the company of using their works — books, codebases, and so on — to develop its models without permission. OpenAI has long claimed a fair use defense, but the plaintiffs in these cases argue that there isn’t a carve-out in U.S. copyright law for training data.

    The study, which was co-authored by researchers at the University of Washington, the University of Copenhagen, and Stanford, proposes a new method for identifying training data “memorized” by models behind an API, like OpenAI’s.

    Models are prediction engines. Trained on a lot of data, they learn patterns — that’s how they’re able to generate essays, photos, and more. Most of the outputs aren’t verbatim copies of the training data, but owing to the way models “learn,” some inevitably are. Image models have been found to regurgitate screenshots from movies they were trained on, while language models have been observed effectively plagiarizing news articles.

    The study’s method relies on words that the co-authors call “high-surprisal” — that is, words that stand out as uncommon in the context of a larger body of work. For example, the word “radar” in the sentence “Jack and I sat perfectly still with the radar humming” would be considered high-surprisal because it’s statistically less likely than words such as “engine” or “radio” to appear before “humming.”

    The co-authors probed several OpenAI models, including GPT-4 and GPT-3.5, for signs of memorization by removing high-surprisal words from snippets of fiction books and New York Times pieces and having the models try to “guess” which words had been masked. If the models managed to guess correctly, it’s likely they memorized the snippet during training, concluded the co-authors.

    An example of having a model “guess” a high-surprisal word.Image Credits:OpenAI

    According to the results of the tests, GPT-4 showed signs of having memorized portions of popular fiction books, including books in a dataset containing samples of copyrighted ebooks called BookMIA. The results also suggested that the model memorized portions of New York Times articles, albeit at a comparatively lower rate.

    Abhilasha Ravichander, a doctoral student at the University of Washington and a co-author of the study, told TechCrunch that the findings shed light on the “contentious data” models might have been trained on.

    “In order to have large language models that are trustworthy, we need to have models that we can probe and audit and examine scientifically,” Ravichander said. “Our work aims to provide a tool to probe large language models, but there is a real need for greater data transparency in the whole ecosystem.”

    OpenAI has long advocated for looser restrictions on developing models using copyrighted data. While the company has certain content licensing deals in place and offers opt-out mechanisms that allow copyright owners to flag content they’d prefer the company not use for training purposes, it has lobbied several governments to codify “fair use” rules around AI training approaches.

    Kyle Wiggers is TechCrunch’s AI Editor. His writing has appeared in VentureBeat and Digital Trends, as well as a range of gadget blogs including Android Police, Android Authority, Droid-Life, and XDA-Developers. He lives in Manhattan with his partner, a music therapist.

    View Bio

    BMI Calculator – Check your Body Mass Index for free!

    Share. Facebook Twitter Pinterest LinkedIn Reddit WhatsApp Telegram Email
    Previous ArticleAI has opened a new era in venture capital according to Forerunner founder Kirsten Green
    Next Article How Kalshi helped prediction markets go mainstream
    TechAiVerse
    • Website

    Jonathan is a tech enthusiast and the mind behind Tech AI Verse. With a passion for artificial intelligence, consumer tech, and emerging innovations, he deliver clear, insightful content to keep readers informed. From cutting-edge gadgets to AI advancements and cryptocurrency trends, Jonathan breaks down complex topics to make technology accessible to all.

    Related Posts

    Performance woes in Unreal Engine 5 games are developers’ fault, says Tim Sweeney

    August 30, 2025

    Microsoft debuts in-house AI models as it builds independence from OpenAI

    August 30, 2025

    USB testing “wheel” provides multiple ports to check faulty devices or cables

    August 30, 2025
    Leave A Reply Cancel Reply

    Top Posts

    Ping, You’ve Got Whale: AI detection system alerts ships of whales in their path

    April 22, 2025166 Views

    6.7 Cummins Lifter Failure: What Years Are Affected (And Possible Fixes)

    April 14, 202548 Views

    New Akira ransomware decryptor cracks encryptions keys using GPUs

    March 16, 202530 Views

    Is Libby Compatible With Kobo E-Readers?

    March 31, 202528 Views
    Don't Miss
    Technology August 30, 2025

    Performance woes in Unreal Engine 5 games are developers’ fault, says Tim Sweeney

    Performance woes in Unreal Engine 5 games are developers’ fault, says Tim SweeneyRecent high-profile Unreal…

    Microsoft debuts in-house AI models as it builds independence from OpenAI

    USB testing “wheel” provides multiple ports to check faulty devices or cables

    xAI launches Grok Code Fast 1 aiming to rival GitHub Copilot and OpenAI Codex

    Stay In Touch
    • Facebook
    • Twitter
    • Pinterest
    • Instagram
    • YouTube
    • Vimeo

    Subscribe to Updates

    Get the latest creative news from SmartMag about art & design.

    About Us
    About Us

    Welcome to Tech AI Verse, your go-to destination for everything technology! We bring you the latest news, trends, and insights from the ever-evolving world of tech. Our coverage spans across global technology industry updates, artificial intelligence advancements, machine learning ethics, and automation innovations. Stay connected with us as we explore the limitless possibilities of technology!

    Facebook X (Twitter) Pinterest YouTube WhatsApp
    Our Picks

    Performance woes in Unreal Engine 5 games are developers’ fault, says Tim Sweeney

    August 30, 20252 Views

    Microsoft debuts in-house AI models as it builds independence from OpenAI

    August 30, 20250 Views

    USB testing “wheel” provides multiple ports to check faulty devices or cables

    August 30, 20252 Views
    Most Popular

    Xiaomi 15 Ultra Officially Launched in China, Malaysia launch to follow after global event

    March 12, 20250 Views

    Apple thinks people won’t use MagSafe on iPhone 16e

    March 12, 20250 Views

    French Apex Legends voice cast refuses contracts over “unacceptable” AI clause

    March 12, 20250 Views
    © 2025 TechAiVerse. Designed by Divya Tech.
    • Home
    • About Us
    • Contact Us
    • Privacy Policy
    • Terms & Conditions

    Type above and press Enter to search. Press Esc to cancel.