DMR News

Advancing Digital Conversations

Adobe sued over claims its AI model was trained on pirated books

ByJolyen

Dec 18, 2025

Adobe sued over claims its AI model was trained on pirated books

Adobe is facing a proposed class-action lawsuit that alleges the company used pirated books to train one of its AI language models, adding to a growing list of legal challenges over how generative AI systems are built. The case claims Adobe relied on copyrighted works without permission while developing SlimLM, a small language model designed for document-related tasks.

The lawsuit was filed on behalf of Elizabeth Lyon, an author based in Oregon, and was first reported by Reuters. Lyon claims that Adobe used pirated versions of numerous books, including her own, during the training of SlimLM. Lyon has written several guidebooks focused on non-fiction writing and says her works appeared in datasets linked to Adobe’s AI development.

Adobe describes SlimLM as a series of small language models that can be optimized for document assistance on mobile devices. The company says SlimLM was pre-trained on SlimPajama-627B, which it describes as a deduplicated, multi-corpora, open-source dataset released by Cerebras in June 2023. According to the lawsuit, Lyon’s books were included in a pretraining dataset that Adobe used during this process.

The filing argues that SlimPajama is derived from another dataset known as RedPajama. The lawsuit states that SlimPajama was created by copying and manipulating RedPajama, including a dataset called Books3. It claims that because SlimPajama is a derivative copy, it contains Books3 and therefore includes copyrighted works belonging to Lyon and other authors represented in the proposed class.

Books3 is a collection of about 191,000 books that has been widely used to train generative AI systems. Its use has drawn repeated legal scrutiny across the technology sector. RedPajama, which incorporates Books3, has also appeared in multiple lawsuits involving AI training data.

In September, a lawsuit against Apple alleged the company used copyrighted materials to train its Apple Intelligence model, citing RedPajama and accusing Apple of copying protected works without consent, credit, or compensation. In October, Salesforce was named in a similar lawsuit that also claimed RedPajama had been used for AI training.

Legal action over AI training data has become increasingly common as companies rely on large datasets to build models. Some lawsuits argue that these datasets include pirated or otherwise unauthorized material. In September, Anthropic agreed to pay $1.5 billion to authors who accused the company of using pirated versions of their work to train its Claude chatbot. That settlement was widely viewed as a significant moment in ongoing disputes over copyright and AI development.


Featured image credits: Flickr

For more stories like it, click the +Follow button at the top of this page to follow us.

Jolyen

As a news editor, I bring stories to life through clear, impactful, and authentic writing. I believe every brand has something worth sharing. My job is to make sure it’s heard. With an eye for detail and a heart for storytelling, I shape messages that truly connect.

Leave a Reply

Your email address will not be published. Required fields are marked *