Close Menu

    Subscribe to Updates

    Get the latest creative news from FooBar about art, design and business.

    What's Hot

    FCC approves the merger of cable giants Cox and Charter

    Anthropic vs. The Pentagon: what enterprises should do

    OpenAI strikes a deal with the Defense Department to deploy its AI models

    Facebook X (Twitter) Instagram
    • Artificial Intelligence
    • Business Technology
    • Cryptocurrency
    • Gadgets
    • Gaming
    • Health
    • Software and Apps
    • Technology
    Facebook X (Twitter) Instagram Pinterest Vimeo
    Tech AI Verse
    • Home
    • Artificial Intelligence

      What the polls say about how Americans are using AI

      February 27, 2026

      Tensions between the Pentagon and AI giant Anthropic reach a boiling point

      February 21, 2026

      Read the extended transcript: President Donald Trump interviewed by ‘NBC Nightly News’ anchor Tom Llamas

      February 6, 2026

      Stocks and bitcoin sink as investors dump software company shares

      February 4, 2026

      AI, crypto and Trump super PACs stash millions to spend on the midterms

      February 2, 2026
    • Business

      FCC approves the merger of cable giants Cox and Charter

      February 28, 2026

      Finding value with AI and Industry 5.0 transformation

      February 28, 2026

      How Smarsh built an AI front door for regulated industries — and drove 59% self-service adoption

      February 24, 2026

      Where MENA CIOs draw the line on AI sovereignty

      February 24, 2026

      Ex-President’s shift away from Xbox consoles to cloud gaming reportedly caused friction

      February 24, 2026
    • Crypto

      Palladium Price Approaches a Critical Turning Point

      February 28, 2026

      Trump to Takeover Cuba, Iran War Tensions Rise, Bitcoin Crashes Again

      February 28, 2026

      A 40% XRP Crash Couldn’t Shake Its Strongest Holders — Is $1.70 Still Possible?

      February 28, 2026

      Why Is the US Stock Market Down Today?

      February 28, 2026

      SoFi Becomes First US Chartered Bank to Support Solana Deposits

      February 28, 2026
    • Technology

      Anthropic vs. The Pentagon: what enterprises should do

      February 28, 2026

      OpenAI strikes a deal with the Defense Department to deploy its AI models

      February 28, 2026

      Trump orders federal agencies to drop Anthropic services amid Pentagon feud

      February 28, 2026

      Google’s Opal just quietly showed enterprise teams the new blueprint for building AI agents

      February 28, 2026

      OpenAI’s big investment from Amazon comes with something else: new ‘stateful’ architecture for enterprise agents

      February 28, 2026
    • Others
      • Gadgets
      • Gaming
      • Health
      • Software and Apps
    Check BMI
    Tech AI Verse
    You are at:Home»Technology»OpenAI desperate to avoid explaining why it deleted pirated book datasets
    Technology

    OpenAI desperate to avoid explaining why it deleted pirated book datasets

    TechAiVerseBy TechAiVerseDecember 2, 2025No Comments7 Mins Read2 Views
    Facebook Twitter Pinterest Telegram LinkedIn Tumblr Email Reddit
    OpenAI desperate to avoid explaining why it deleted pirated book datasets
    Share
    Facebook Twitter LinkedIn Pinterest WhatsApp Email

    OpenAI desperate to avoid explaining why it deleted pirated book datasets





    Not for OpenAI to reason why?

    OpenAI risks increased fines after deleting pirated books datasets.

    OpenAI may soon be forced to explain why it deleted a pair of controversial datasets composed of pirated books, and the stakes could not be higher.

    At the heart of a class-action lawsuit from authors alleging that ChatGPT was illegally trained on their works, OpenAI’s decision to delete the datasets could end up being a deciding factor that gives the authors the win.

    It’s undisputed that OpenAI deleted the datasets, known as “Books 1” and “Books 2,” prior to ChatGPT’s release in 2022. Created by former OpenAI employees in 2021, the datasets were built by scraping the open web and seizing the bulk of its data from a shadow library called Library Genesis (LibGen).

    As OpenAI tells it, the datasets fell out of use within that same year, prompting an internal decision to delete them.

    But the authors suspect there’s more to the story than that. They noted that OpenAI appeared to flip-flop by retracting its claim that the datasets’ “non-use” was a reason for deletion, then later claiming that all reasons for deletion, including “non-use,” should be shielded under attorney-client privilege.

    To the authors, it seemed like OpenAI was quickly backtracking after the court granted the authors’ discovery requests to review OpenAI’s internal messages on the firm’s “non-use.”

    In fact, OpenAI’s reversal only made authors more eager to see how OpenAI discussed “non-use,” and now they may get to find out all the reasons why OpenAI deleted the datasets.

    Last week, US district judge Ona Wang ordered OpenAI to share all communications with in-house lawyers about deleting the datasets, as well as “all internal references to LibGen that OpenAI has redacted or withheld on the basis of attorney-client privilege.”

    According to Wang, OpenAI slipped up by arguing that “non-use” was not a “reason” for deleting the datasets, while simultaneously claiming that it should also be deemed a “reason” considered privileged.

    Either way, the judge ruled that OpenAI couldn’t block discovery on “non-use” just by deleting a few words from prior filings that had been on the docket for more than a year.

    “OpenAI has gone back-and-forth on whether ‘non-use’ as a ‘reason’ for the deletion of Books1 and Books2 is privileged at all,” Wang wrote. “OpenAI cannot state a ‘reason’ (which implies it is not privileged) and then later assert that the ‘reason’ is privileged to avoid discovery.”

    Additionally, OpenAI’s claim that all reasons for deleting the datasets are privileged “strains credulity,” she concluded, ordering OpenAI to produce a wide range of potentially revealing internal messages by December 8. OpenAI must also make its in-house lawyers available for deposition by December 19.

    OpenAI has argued that it never flip-flopped or retracted anything. It simply used vague phrasing that led to confusion over whether any of the reasons for deleting the datasets were considered non-privileged. But Wang didn’t buy into that, concluding that “even if a ‘reason’ like ‘non-use’ could be privileged, OpenAI has waived privilege by making a moving target of its privilege assertions.”

    Asked for comment, OpenAI told Ars that “we disagree with the ruling and intend to appeal.”

    OpenAI’s “flip-flop” may cost it the win

    So far, OpenAI has avoided disclosing its rationale, claiming that all the reasons it had for deleting the datasets are privileged. In-house lawyers weighed in on the decision to delete and were even copied on a Slack channel initially called “excise-libgen.”

    But Wang reviewed those Slack messages and found that “the vast majority of these communications were not privileged because they were ‘plainly devoid of any request for legal advice and counsel [did] not once weigh in.’”

    In a particularly non-privileged batch of messages, one OpenAI lawyer, Jason Kwon, only weighed in once, the judge noted, to recommend the channel name be changed to “project-clear.” Wang reminded OpenAI that “the entirety of the Slack channel and all messages contained therein is not privileged simply because it was created at the direction of an attorney and/or the fact that a lawyer was copied on the communications.”

    The authors believe that exposing OpenAI’s rationale may help prove that the ChatGPT maker willfully infringed on copyrights when pirating the book data. As Wang explained, OpenAI’s retraction risked putting the AI firm’s “good faith and state of mind at issue,” which could increase fines in a loss.

    “In a copyright case, a court can increase the award of statutory damages up to $150,000 per infringed work if the infringement was willful, meaning the defendant ‘was actually aware of the infringing activity’ or the ‘defendant’s actions were the result of reckless disregard for, or willful blindness to, the copyright holder’s rights,’” Wang wrote.

    In a court transcript, a lawyer representing some of the authors suing OpenAI, Christopher Young, noted that OpenAI could be in trouble if evidence showed that it decided against using the datasets for later models due to legal risks. He also suggested that OpenAI could be using the datasets under different names to mask further infringement.

    Judge calls out OpenAI for twisting fair use ruling

    Wang also found it contradictory that OpenAI continued to argue in a recent filing that it acted in good faith, while “artfully” removing “its good faith affirmative defense and key words such as ‘innocent,’ ‘reasonably believed,’ and ‘good faith.’” These changes only strengthened discovery requests to explore authors’ willfulness theory, Wang wrote, noting the sought-after internal messages would now be critical for the court’s review.

    “A jury is entitled to know the basis for OpenAI’s purported good faith,” Wang wrote.

    The judge appeared particularly frustrated by OpenAI seemingly twisting the Anthropic ruling to defend against the authors’ request to learn more about the deletion of the datasets.

    In a footnote, Wang called out OpenAI for “bizarrely” citing an Anthropic ruling that “grossly” misrepresented Judge William Alsup’s decision by claiming that he found that “downloading pirated copies of books is lawful as long as they are subsequently used for training an LLM.”

    Instead, Alsup wrote that he doubted that “any accused infringer could ever meet its burden of explaining why downloading source copies from pirate sites that it could have purchased or otherwise accessed lawfully was itself reasonably necessary to any subsequent fair use.”

    If anything, Wang wrote, OpenAI’s decision to pirate book data—then delete it—seemed “to fall squarely into the category of activities proscribed by” Alsup. For emphasis, she quoted Alsup’s order, which said, “such piracy of otherwise available copies is inherently, irredeemably infringing even if the pirated copies are immediately used for the transformative use and immediately discarded.”

    For the authors, getting hold of OpenAI’s privileged communications could tip the scales in their favor, the Hollywood Reporter suggested. Some authors believe the key to winning could be testimony from Anthropic CEO Dario Amodei, who is accused of creating the controversial datasets while he was still at OpenAI. The authors think Amodei also possesses information on the destruction of the datasets, court filings show.

    OpenAI tried to fight the authors’ motion to depose Amodei, but a judge sided with the authors in March, compelling Amodei to answer their biggest questions on his involvement.

    Whether Amodei’s testimony is a bombshell remains to be seen, but it’s clear that OpenAI may struggle to overcome claims of willful infringement. Wang noted there is a “fundamental conflict” in circumstances “where a party asserts a good faith defense based on advice of counsel but then blocks inquiry into their state of mind by asserting attorney-client privilege,” suggesting that OpenAI may have substantially weakened its defense.

    The outcome of the dispute over the deletions could influence OpenAI’s calculus on whether it should ultimately settle the lawsuit. Ahead of the Anthropic settlement—the largest publicly reported copyright class action settlement in history—authors suing pointed to evidence that Anthropic became “not so gung ho about” training on pirated books “for legal reasons.” That seems to be the type of smoking-gun evidence that authors hope will emerge from OpenAI’s withheld Slack messages.

    Ashley is a senior policy reporter for Ars Technica, dedicated to tracking social impacts of emerging policies and new technologies. She is a Chicago-based journalist with 20 years of experience.



    29 Comments

    Share. Facebook Twitter Pinterest LinkedIn Reddit WhatsApp Telegram Email
    Previous ArticleIn Myanmar, illicit rare-earth mining is taking a heavy toll
    Next Article Supreme Court hears case that could trigger big crackdown on Internet piracy
    TechAiVerse
    • Website

    Jonathan is a tech enthusiast and the mind behind Tech AI Verse. With a passion for artificial intelligence, consumer tech, and emerging innovations, he deliver clear, insightful content to keep readers informed. From cutting-edge gadgets to AI advancements and cryptocurrency trends, Jonathan breaks down complex topics to make technology accessible to all.

    Related Posts

    Anthropic vs. The Pentagon: what enterprises should do

    February 28, 2026

    OpenAI strikes a deal with the Defense Department to deploy its AI models

    February 28, 2026

    Trump orders federal agencies to drop Anthropic services amid Pentagon feud

    February 28, 2026
    Leave A Reply Cancel Reply

    Top Posts

    Ping, You’ve Got Whale: AI detection system alerts ships of whales in their path

    April 22, 2025698 Views

    Lumo vs. Duck AI: Which AI is Better for Your Privacy?

    July 31, 2025280 Views

    6.7 Cummins Lifter Failure: What Years Are Affected (And Possible Fixes)

    April 14, 2025162 Views

    6 Best MagSafe Phone Grips (2025), Tested and Reviewed

    April 6, 2025122 Views
    Don't Miss
    Business Technology February 28, 2026

    FCC approves the merger of cable giants Cox and Charter

    FCC approves the merger of cable giants Cox and CharterThe Federal Communications Commission has given…

    Anthropic vs. The Pentagon: what enterprises should do

    OpenAI strikes a deal with the Defense Department to deploy its AI models

    Trump orders federal agencies to drop Anthropic services amid Pentagon feud

    Stay In Touch
    • Facebook
    • Twitter
    • Pinterest
    • Instagram
    • YouTube
    • Vimeo

    Subscribe to Updates

    Get the latest creative news from SmartMag about art & design.

    About Us
    About Us

    Welcome to Tech AI Verse, your go-to destination for everything technology! We bring you the latest news, trends, and insights from the ever-evolving world of tech. Our coverage spans across global technology industry updates, artificial intelligence advancements, machine learning ethics, and automation innovations. Stay connected with us as we explore the limitless possibilities of technology!

    Facebook X (Twitter) Pinterest YouTube WhatsApp
    Our Picks

    FCC approves the merger of cable giants Cox and Charter

    February 28, 20263 Views

    Anthropic vs. The Pentagon: what enterprises should do

    February 28, 20262 Views

    OpenAI strikes a deal with the Defense Department to deploy its AI models

    February 28, 20264 Views
    Most Popular

    7 Best Kids Bikes (2025): Mountain, Balance, Pedal, Coaster

    March 13, 20250 Views

    VTOMAN FlashSpeed 1500: Plenty Of Power For All Your Gear

    March 13, 20250 Views

    Best TV Antenna of 2025

    March 13, 20250 Views
    © 2026 TechAiVerse. Designed by Divya Tech.
    • Home
    • About Us
    • Contact Us
    • Privacy Policy
    • Terms & Conditions

    Type above and press Enter to search. Press Esc to cancel.