DMR News

Advancing Digital Conversations

NVIDIA’s AI Team in Hot Water Over Unapproved Video Scraping

ByYasmeeta Oon

Aug 6, 2024

NVIDIA’s AI Team in Hot Water Over Unapproved Video Scraping

NVIDIA, a $2.4 trillion company, reportedly engaged in scraping copyrighted content from YouTube, Netflix, and other sources for AI training without permission. According to 404 Media’s Samantha Cole, the company asked employees to download videos for developing commercial AI projects. This move aligns with a broader industry trend where tech companies adopt a “move fast and break things” approach to secure dominance in the competitive AI landscape.

The scraped content was reportedly used to train models for various products, including the Omniverse 3D world generator, self-driving car systems, and digital human initiatives.

In an email to Engadget, NVIDIA defended its actions, claiming that their research complies with copyright law. The company’s spokesperson argued that intellectual property laws protect specific expressions, not facts, ideas, data, or information. They compared their practice to an individual’s right to learn and use information to create their own expression.

YouTube disagrees with NVIDIA’s stance. YouTube spokesperson Jack Malon referred to an April Bloomberg story where CEO Neal Mohan stated that using YouTube content for AI training would be a “clear violation” of its terms. Malon reiterated that this position remains unchanged.

This issue has surfaced before. In April, reports indicated that OpenAI trained its Sora text-to-video generator on YouTube videos without permission. More recently, Runway AI was reported to have followed similar practices.

NVIDIA employees raised ethical and legal concerns about the scraping, but were reportedly told by management that the practice had executive approval. Ming-Yu Liu, NVIDIA’s vice president of research, assured employees that there was an “umbrella approval for all of the data.” Some employees described the practice as an “open legal issue” to be addressed later.

The company’s tactics mirror Facebook’s (now Meta) former “move fast and break things” motto, which led to significant privacy breaches.

NVIDIA also reportedly instructed employees to use other video sources, such as MovieNet, internal video game footage libraries, and GitHub video datasets WebVid and InternVid-10M. Some of this data was marked for academic or non-commercial use only. For instance, the HD-VG-130M dataset, containing 130 million YouTube videos, is restricted to academic research. NVIDIA allegedly disregarded these terms, claiming their use was permissible for commercial AI products.

To avoid detection by YouTube, NVIDIA reportedly used virtual machines (VMs) with rotating IP addresses. An NVIDIA employee suggested using a third-party IP address-rotating tool, but another employee noted that restarting a VM instance on Amazon Web Services provided a new public IP, circumventing potential bans.


Featured Image courtesy of Wikimedia Commons

Follow us for more updates on Nvidia.

Yasmeeta Oon

Just a girl trying to break into the world of journalism, constantly on the hunt for the next big story to share.

Leave a Reply

Your email address will not be published. Required fields are marked *