Close Menu

    Subscribe to Updates

    Get the latest creative news from FooBar about art, design and business.

    What's Hot

    Japanese devs face font licensing dilemma as leading provider increases annual plan price from $380 to $20,000+

    Indie dev Chequered Ink puts together $10 10,000 game assets pack so developers “don’t feel the need to turn to AI”

    Valorant Mobile is China’s biggest mobile launch of 2025 | News-in-Brief

    Facebook X (Twitter) Instagram
    • Artificial Intelligence
    • Business Technology
    • Cryptocurrency
    • Gadgets
    • Gaming
    • Health
    • Software and Apps
    • Technology
    Facebook X (Twitter) Instagram Pinterest Vimeo
    Tech AI Verse
    • Home
    • Artificial Intelligence

      Apple’s AI chief abruptly steps down

      December 3, 2025

      The issue that’s scrambling both parties: From the Politics Desk

      December 3, 2025

      More of Silicon Valley is building on free Chinese AI

      December 1, 2025

      From Steve Bannon to Elizabeth Warren, backlash erupts over push to block states from regulating AI

      November 23, 2025

      Insurance companies are trying to avoid big payouts by making AI safer

      November 19, 2025
    • Business

      Public GitLab repositories exposed more than 17,000 secrets

      November 29, 2025

      ASUS warns of new critical auth bypass flaw in AiCloud routers

      November 28, 2025

      Windows 11 gets new Cloud Rebuild, Point-in-Time Restore tools

      November 18, 2025

      Government faces questions about why US AWS outage disrupted UK tax office and banking firms

      October 23, 2025

      Amazon’s AWS outage knocked services like Alexa, Snapchat, Fortnite, Venmo and more offline

      October 21, 2025
    • Crypto

      Five Cryptocurrencies That Often Rally Around Christmas

      December 3, 2025

      Why Trump-Backed Mining Company Struggles Despite Bitcoin’s Recovery

      December 3, 2025

      XRP ETFs Extend 11-Day Inflow Streak as $1 Billion Mark Nears

      December 3, 2025

      Why AI-Driven Crypto Exploits Are More Dangerous Than Ever Before

      December 3, 2025

      Bitcoin Is Recovering, But Can It Drop Below $80,000 Again?

      December 3, 2025
    • Technology

      Criteo CEO Michael Komasinski on agentic commerce, experiments with LLMs, and M&A rumors

      December 3, 2025

      Future of TV Briefing: The streaming ad upfront trends, programmatic priorities revealed in Q3 2025 earnings reports

      December 3, 2025

      Omnicom’s reshuffled leadership emerges as the ad industry’s new power players

      December 3, 2025

      OpenX redraws the SSP-agency relationship

      December 3, 2025

      TikTok Shop sheds bargain-bin reputation as average prices climb across categories

      December 3, 2025
    • Others
      • Gadgets
      • Gaming
      • Health
      • Software and Apps
    Check BMI
    Tech AI Verse
    You are at:Home»Technology»OpenAI desperate to avoid explaining why it deleted pirated book datasets
    Technology

    OpenAI desperate to avoid explaining why it deleted pirated book datasets

    TechAiVerseBy TechAiVerseDecember 2, 2025No Comments7 Mins Read0 Views
    Facebook Twitter Pinterest Telegram LinkedIn Tumblr Email Reddit
    OpenAI desperate to avoid explaining why it deleted pirated book datasets
    Share
    Facebook Twitter LinkedIn Pinterest WhatsApp Email

    OpenAI desperate to avoid explaining why it deleted pirated book datasets





    Not for OpenAI to reason why?

    OpenAI risks increased fines after deleting pirated books datasets.

    OpenAI may soon be forced to explain why it deleted a pair of controversial datasets composed of pirated books, and the stakes could not be higher.

    At the heart of a class-action lawsuit from authors alleging that ChatGPT was illegally trained on their works, OpenAI’s decision to delete the datasets could end up being a deciding factor that gives the authors the win.

    It’s undisputed that OpenAI deleted the datasets, known as “Books 1” and “Books 2,” prior to ChatGPT’s release in 2022. Created by former OpenAI employees in 2021, the datasets were built by scraping the open web and seizing the bulk of its data from a shadow library called Library Genesis (LibGen).

    As OpenAI tells it, the datasets fell out of use within that same year, prompting an internal decision to delete them.

    But the authors suspect there’s more to the story than that. They noted that OpenAI appeared to flip-flop by retracting its claim that the datasets’ “non-use” was a reason for deletion, then later claiming that all reasons for deletion, including “non-use,” should be shielded under attorney-client privilege.

    To the authors, it seemed like OpenAI was quickly backtracking after the court granted the authors’ discovery requests to review OpenAI’s internal messages on the firm’s “non-use.”

    In fact, OpenAI’s reversal only made authors more eager to see how OpenAI discussed “non-use,” and now they may get to find out all the reasons why OpenAI deleted the datasets.

    Last week, US district judge Ona Wang ordered OpenAI to share all communications with in-house lawyers about deleting the datasets, as well as “all internal references to LibGen that OpenAI has redacted or withheld on the basis of attorney-client privilege.”

    According to Wang, OpenAI slipped up by arguing that “non-use” was not a “reason” for deleting the datasets, while simultaneously claiming that it should also be deemed a “reason” considered privileged.

    Either way, the judge ruled that OpenAI couldn’t block discovery on “non-use” just by deleting a few words from prior filings that had been on the docket for more than a year.

    “OpenAI has gone back-and-forth on whether ‘non-use’ as a ‘reason’ for the deletion of Books1 and Books2 is privileged at all,” Wang wrote. “OpenAI cannot state a ‘reason’ (which implies it is not privileged) and then later assert that the ‘reason’ is privileged to avoid discovery.”

    Additionally, OpenAI’s claim that all reasons for deleting the datasets are privileged “strains credulity,” she concluded, ordering OpenAI to produce a wide range of potentially revealing internal messages by December 8. OpenAI must also make its in-house lawyers available for deposition by December 19.

    OpenAI has argued that it never flip-flopped or retracted anything. It simply used vague phrasing that led to confusion over whether any of the reasons for deleting the datasets were considered non-privileged. But Wang didn’t buy into that, concluding that “even if a ‘reason’ like ‘non-use’ could be privileged, OpenAI has waived privilege by making a moving target of its privilege assertions.”

    Asked for comment, OpenAI told Ars that “we disagree with the ruling and intend to appeal.”

    OpenAI’s “flip-flop” may cost it the win

    So far, OpenAI has avoided disclosing its rationale, claiming that all the reasons it had for deleting the datasets are privileged. In-house lawyers weighed in on the decision to delete and were even copied on a Slack channel initially called “excise-libgen.”

    But Wang reviewed those Slack messages and found that “the vast majority of these communications were not privileged because they were ‘plainly devoid of any request for legal advice and counsel [did] not once weigh in.’”

    In a particularly non-privileged batch of messages, one OpenAI lawyer, Jason Kwon, only weighed in once, the judge noted, to recommend the channel name be changed to “project-clear.” Wang reminded OpenAI that “the entirety of the Slack channel and all messages contained therein is not privileged simply because it was created at the direction of an attorney and/or the fact that a lawyer was copied on the communications.”

    The authors believe that exposing OpenAI’s rationale may help prove that the ChatGPT maker willfully infringed on copyrights when pirating the book data. As Wang explained, OpenAI’s retraction risked putting the AI firm’s “good faith and state of mind at issue,” which could increase fines in a loss.

    “In a copyright case, a court can increase the award of statutory damages up to $150,000 per infringed work if the infringement was willful, meaning the defendant ‘was actually aware of the infringing activity’ or the ‘defendant’s actions were the result of reckless disregard for, or willful blindness to, the copyright holder’s rights,’” Wang wrote.

    In a court transcript, a lawyer representing some of the authors suing OpenAI, Christopher Young, noted that OpenAI could be in trouble if evidence showed that it decided against using the datasets for later models due to legal risks. He also suggested that OpenAI could be using the datasets under different names to mask further infringement.

    Judge calls out OpenAI for twisting fair use ruling

    Wang also found it contradictory that OpenAI continued to argue in a recent filing that it acted in good faith, while “artfully” removing “its good faith affirmative defense and key words such as ‘innocent,’ ‘reasonably believed,’ and ‘good faith.’” These changes only strengthened discovery requests to explore authors’ willfulness theory, Wang wrote, noting the sought-after internal messages would now be critical for the court’s review.

    “A jury is entitled to know the basis for OpenAI’s purported good faith,” Wang wrote.

    The judge appeared particularly frustrated by OpenAI seemingly twisting the Anthropic ruling to defend against the authors’ request to learn more about the deletion of the datasets.

    In a footnote, Wang called out OpenAI for “bizarrely” citing an Anthropic ruling that “grossly” misrepresented Judge William Alsup’s decision by claiming that he found that “downloading pirated copies of books is lawful as long as they are subsequently used for training an LLM.”

    Instead, Alsup wrote that he doubted that “any accused infringer could ever meet its burden of explaining why downloading source copies from pirate sites that it could have purchased or otherwise accessed lawfully was itself reasonably necessary to any subsequent fair use.”

    If anything, Wang wrote, OpenAI’s decision to pirate book data—then delete it—seemed “to fall squarely into the category of activities proscribed by” Alsup. For emphasis, she quoted Alsup’s order, which said, “such piracy of otherwise available copies is inherently, irredeemably infringing even if the pirated copies are immediately used for the transformative use and immediately discarded.”

    For the authors, getting hold of OpenAI’s privileged communications could tip the scales in their favor, the Hollywood Reporter suggested. Some authors believe the key to winning could be testimony from Anthropic CEO Dario Amodei, who is accused of creating the controversial datasets while he was still at OpenAI. The authors think Amodei also possesses information on the destruction of the datasets, court filings show.

    OpenAI tried to fight the authors’ motion to depose Amodei, but a judge sided with the authors in March, compelling Amodei to answer their biggest questions on his involvement.

    Whether Amodei’s testimony is a bombshell remains to be seen, but it’s clear that OpenAI may struggle to overcome claims of willful infringement. Wang noted there is a “fundamental conflict” in circumstances “where a party asserts a good faith defense based on advice of counsel but then blocks inquiry into their state of mind by asserting attorney-client privilege,” suggesting that OpenAI may have substantially weakened its defense.

    The outcome of the dispute over the deletions could influence OpenAI’s calculus on whether it should ultimately settle the lawsuit. Ahead of the Anthropic settlement—the largest publicly reported copyright class action settlement in history—authors suing pointed to evidence that Anthropic became “not so gung ho about” training on pirated books “for legal reasons.” That seems to be the type of smoking-gun evidence that authors hope will emerge from OpenAI’s withheld Slack messages.

    Ashley is a senior policy reporter for Ars Technica, dedicated to tracking social impacts of emerging policies and new technologies. She is a Chicago-based journalist with 20 years of experience.



    29 Comments

    Share. Facebook Twitter Pinterest LinkedIn Reddit WhatsApp Telegram Email
    Previous ArticleIn Myanmar, illicit rare-earth mining is taking a heavy toll
    Next Article Supreme Court hears case that could trigger big crackdown on Internet piracy
    TechAiVerse
    • Website

    Jonathan is a tech enthusiast and the mind behind Tech AI Verse. With a passion for artificial intelligence, consumer tech, and emerging innovations, he deliver clear, insightful content to keep readers informed. From cutting-edge gadgets to AI advancements and cryptocurrency trends, Jonathan breaks down complex topics to make technology accessible to all.

    Related Posts

    Criteo CEO Michael Komasinski on agentic commerce, experiments with LLMs, and M&A rumors

    December 3, 2025

    Future of TV Briefing: The streaming ad upfront trends, programmatic priorities revealed in Q3 2025 earnings reports

    December 3, 2025

    Omnicom’s reshuffled leadership emerges as the ad industry’s new power players

    December 3, 2025
    Leave A Reply Cancel Reply

    Top Posts

    Ping, You’ve Got Whale: AI detection system alerts ships of whales in their path

    April 22, 2025467 Views

    Lumo vs. Duck AI: Which AI is Better for Your Privacy?

    July 31, 2025159 Views

    6.7 Cummins Lifter Failure: What Years Are Affected (And Possible Fixes)

    April 14, 202584 Views

    Is Libby Compatible With Kobo E-Readers?

    March 31, 202563 Views
    Don't Miss
    Gaming December 3, 2025

    Japanese devs face font licensing dilemma as leading provider increases annual plan price from $380 to $20,000+

    Japanese devs face font licensing dilemma as leading provider increases annual plan price from $380…

    Indie dev Chequered Ink puts together $10 10,000 game assets pack so developers “don’t feel the need to turn to AI”

    Valorant Mobile is China’s biggest mobile launch of 2025 | News-in-Brief

    Epic Games Store decides “at the last minute” not to distribute Horses

    Stay In Touch
    • Facebook
    • Twitter
    • Pinterest
    • Instagram
    • YouTube
    • Vimeo

    Subscribe to Updates

    Get the latest creative news from SmartMag about art & design.

    About Us
    About Us

    Welcome to Tech AI Verse, your go-to destination for everything technology! We bring you the latest news, trends, and insights from the ever-evolving world of tech. Our coverage spans across global technology industry updates, artificial intelligence advancements, machine learning ethics, and automation innovations. Stay connected with us as we explore the limitless possibilities of technology!

    Facebook X (Twitter) Pinterest YouTube WhatsApp
    Our Picks

    Japanese devs face font licensing dilemma as leading provider increases annual plan price from $380 to $20,000+

    December 3, 20250 Views

    Indie dev Chequered Ink puts together $10 10,000 game assets pack so developers “don’t feel the need to turn to AI”

    December 3, 20250 Views

    Valorant Mobile is China’s biggest mobile launch of 2025 | News-in-Brief

    December 3, 20250 Views
    Most Popular

    Apple thinks people won’t use MagSafe on iPhone 16e

    March 12, 20250 Views

    Volkswagen’s cheapest EV ever is the first to use Rivian software

    March 12, 20250 Views

    Startup studio Hexa acquires majority stake in Veevart, a vertical SaaS platform for museums

    March 12, 20250 Views
    © 2025 TechAiVerse. Designed by Divya Tech.
    • Home
    • About Us
    • Contact Us
    • Privacy Policy
    • Terms & Conditions

    Type above and press Enter to search. Press Esc to cancel.