OpenAI Desperate to Avoid Explaining

OpenAI Desperate to Avoid Explaining Why it Deleted Pirated Datasets: Artificial Intelligence Trends

Apparently, OpenAI is desperate to avoid explaining why it deleted pirated book datasets, but a court order may make them do that anyway.

According to ArsTechnica (OpenAI desperate to avoid explaining why it deleted pirated book datasets, written by Ashley Belanger and available here), OpenAI may soon be forced to explain why it deleted a pair of controversial datasets composed of pirated books, and the stakes could not be higher.

At the heart of a class-action lawsuit from authors alleging that ChatGPT was illegally trained on their works, OpenAI’s decision to delete the datasets could end up being a deciding factor that gives the authors the win.

Advertisement
Level Legal

It’s undisputed that OpenAI deleted the datasets, known as “Books 1” and “Books 2,” prior to ChatGPT’s release in 2022. Created by former OpenAI employees in 2021, the datasets were built by scraping the open web and seizing the bulk of its data from a shadow library called Library Genesis (LibGen).

As OpenAI tells it, the datasets fell out of use within that same year, prompting an internal decision to delete them.

But the authors suspect there’s more to the story than that. They noted that OpenAI appeared to flip-flop by retracting its claim that the datasets’ “non-use” was a reason for deletion, then later claiming that all reasons for deletion, including “non-use,” should be shielded under attorney-client privilege.

To the authors, it seemed like OpenAI was quickly backtracking after the court granted the authors’ discovery requests to review OpenAI’s internal messages on the firm’s “non-use.”

Advertisement
S2|DATA

In fact, OpenAI’s reversal only made authors more eager to see how OpenAI discussed “non-use,” and now they may get to find out all the reasons why OpenAI deleted the datasets.

Last week, US District Judge Ona Wang ordered OpenAI to share all communications with in-house lawyers about deleting the datasets, as well as “all internal references to LibGen that OpenAI has redacted or withheld on the basis of attorney-client privilege.”

According to Judge Wang, OpenAI slipped up by arguing that “non-use” was not a “reason” for deleting the datasets, while simultaneously claiming that it should also be deemed a “reason” considered privileged.

Either way, the judge ruled that OpenAI couldn’t block discovery on “non-use” just by deleting a few words from prior filings that had been on the docket for more than a year.

“OpenAI has gone back-and-forth on whether ‘non-use’ as a ‘reason’ for the deletion of Books1 and Books2 is privileged at all,” Judge Wang wrote. “OpenAI cannot state a ‘reason’ (which implies it is not privileged) and then later assert that the ‘reason’ is privileged to avoid discovery.”

Additionally, OpenAI’s claim that all reasons for deleting the datasets are privileged “strains credulity,” she concluded, ordering OpenAI to produce a wide range of potentially revealing internal messages by December 8. OpenAI must also make its in-house lawyers available for deposition by December 19.

Judge Wang goes through the entire timeline of OpenAI’s declaration and non-declaration of privilege on these datasets, stating: “Even if a ‘reason’ like ‘non-use’ could be privileged, OpenAI has waived privilege by making a moving target of its privilege assertions.”

OpenAI has stated “we disagree with the ruling and intend to appeal.” In the meantime, the OpenAI-NYT litigation remains the gift that keeps giving this holiday season! 😁

So, what do you think? Do you think that Judge Wang has a valid argument? Please share any comments you might have or if you’d like to know more about a particular topic.

Disclaimer: The views represented herein are exclusively the views of the author, and do not necessarily represent the views held by my employer, my partners or my clients. eDiscovery Today is made available solely for educational purposes to provide general information about general eDiscovery principles and not to provide specific legal advice applicable to any particular circumstance. eDiscovery Today should not be used as a substitute for competent legal advice from a lawyer you have retained and who has agreed to represent you.


Discover more from eDiscovery Today by Doug Austin

Subscribe to get the latest posts sent to your email.

Leave a Reply