DMR News

Advancing Digital Conversations

OpenAI Engineers Accidentally Delete Data in in New York Times Lawsuit

ByHilary Ong

Nov 22, 2024

OpenAI Engineers Accidentally Delete Data in in New York Times Lawsuit

OpenAI inadvertently erased crucial evidence tied to a high-profile lawsuit brought by The New York Times and other news outlets, according to court documents filed Wednesday. The data loss complicates the ongoing legal battle over allegations that OpenAI improperly used copyrighted material to train its AI models.

The filing reveals that engineers at OpenAI accidentally deleted key search data from one of two virtual machines provided to the plaintiffs’ legal team. Lawyers for The New York Times and Daily News had spent over 150 hours combing through OpenAI’s AI training datasets to identify instances of their content being used without authorization. While OpenAI attempted to recover the deleted files, the folder structure and file names were irreparably lost, rendering the salvaged data useless for tracing how the articles influenced AI model development.

The plaintiffs, represented by Jennifer Maisel, emphasized that the deletion, while not believed to be intentional, forces their team to redo a significant amount of work. The filing also underscores the plaintiffs’ belief that OpenAI, with its internal tools and resources, is better equipped to search its datasets for potentially infringing content.

The lawsuit, originally filed in December 2023, accuses OpenAI and its partner Microsoft of unlawfully training their AI models on “millions” of copyrighted articles from The New York Times, among others. The plaintiffs claim that this alleged misuse gives OpenAI’s tools an unfair advantage by directly competing with original content from publishers. The New York Times seeks billions of dollars in damages, highlighting the financial burden of litigation — the publication has already spent over $1 million on legal expenses.

Despite OpenAI’s stance that training models on publicly available data constitutes fair use, the company has proactively entered licensing agreements with several publishers, including the Associated Press, Axel Springer, and Dotdash Meredith. While the exact terms of these deals remain undisclosed, reports suggest lucrative payments, such as a $16 million annual agreement with Dotdash Meredith.

It’s difficult to imagine the court overlooking the incident entirely, even if both parties agree the data erasure wasn’t intentional. Accidental or not, the timing of the error could still influence the court’s perception of OpenAI’s credibility in this case.

OpenAI spokesperson Jason Deutrom disputed the characterization of the data loss, framing it as a “glitch,” but declined to provide further details, promising a formal response in due course. The New York Times declined to comment on the matter.

As OpenAI continues to navigate legal and ethical questions surrounding AI training practices, this latest development underscores the challenges of balancing innovation with intellectual property rights.


Featured Image courtesy of Dado Ruvic/REUTERS

Follow us for more tech news updates.

Hilary Ong

Hello, from one tech geek to another. Not your beloved TechCrunch writer, but a writer with an avid interest in the fast-paced tech scenes and all the latest tech mojo. I bring with me a unique take towards tech with a honed applied psychology perspective to make tech news digestible. In other words, I deliver tech news that is easy to read.

Leave a Reply

Your email address will not be published. Required fields are marked *