In the realm of generative AI, a significant ethical debate has emerged surrounding the training of leading models by major companies like OpenAI and Meta. These models are primarily trained using data gathered from the internet, often without the explicit knowledge or permission of the original content creators. This practice has been controversially termed as the “original sin” of generative AI.
Companies engaging in this practice, including OpenAI, defend it as legally permissible and fair. OpenAI, for instance, has articulated in a blog post that using publicly available internet materials for training AI models is considered fair use, a stance supported by established legal precedents. They argue that this approach balances fairness to creators, the necessity for innovation, and is crucial for maintaining US competitiveness in the field.
This method of data scraping isn’t new and precedes the rise of generative AI. It has been a foundational technique for developing numerous research databases and commercial products, including search engines like Google. These engines, ironically, are used by content creators to attract traffic and audience to their projects.
Despite its historical precedence, this practice faces increasing opposition. Many renowned authors and artists have filed lawsuits against AI companies for allegedly violating copyright laws by training their models on their works without explicit consent. Notable companies facing such lawsuits include Midjourney and OpenAI, the latter also used by VentureBeat for creating article headers.
Amidst this controversy, a new non-profit named “Fairly Trained” has emerged, advocating for prior consent from content creators before their work is used in AI training. Co-founded and led by Ed Newton-Rex, a former employee and vocal critic of Stability AI, the organization promotes training generative AI on data provided with creators’ consent. Their website expresses a belief in the preference of consumers and companies for working with AI firms that respect creators’ rights.
Newton-Rex, in a post on social network X, emphasizes a path for generative AI that respects creators, advocating for the licensing of training data as key. He encourages generative AI companies practicing such respectful approaches to get certified by Fairly Trained.
When questioned about the common defense by AI proponents that training on publicly available data is akin to passive human observation and inspiration from creative works, Newton-Rex refuted this comparison. He highlighted the scalability of AI, which no individual human can match, and the difference in the social contract for human learning versus AI training. According to him, creators did not anticipate their work being used for AI training, which can produce content at scale, potentially replacing the demand for original human-created content.
Regarding AI companies that have already trained models on publicly sourced data, Newton-Rex advises a shift towards a licensing model. He believes there’s still time to create a mutually beneficial ecosystem between human creators and AI companies. OpenAI, for instance, has recently adopted this approach with news outlets, paying millions annually for the privilege of using their data, while continuing to assert their right to train on publicly scraped data.
Fairly Trained offers a “Licensed Model (L) certification” for AI providers who adhere to their principles. The certification process, detailed on their website, involves a written submission and potential follow-up questions. The fees for this service are scaled based on the company’s annual revenue, ranging from $150 + $500 annually to $500 + $6,000 annually for larger companies.
The initiative has already certified several companies, including Beatoven.AI, Boomy, BRIA AI, and others. However, Newton-Rex declined to comment on the fees paid by these companies or specific models not yet certified.
Advisors to Fairly Trained include notable figures like Tom Gruber, former chief technologist of Siri, and Maria Pallante, CEO of the Association of American Publishers. The nonprofit lists its supporters as various associations and groups, some of which are involved in lawsuits against AI companies for copyright infringement.
When asked about Fairly Trained’s involvement in AI lawsuits and funding, Newton-Rex clarified that the organization is not involved in legal actions and currently operates on the fees charged for certification, without external funding.
In conclusion, Fairly Trained represents a growing movement towards ethical considerations in the use of data for training generative AI. By advocating for consent and licensing agreements, it seeks to bridge the gap between the rights of creators and the innovative strides of AI companies, aiming to foster a more respectful and mutually beneficial relationship between the two.