DMR News

Advancing Digital Conversations

Reddit CEO Demands Microsoft Pay for Data Usage

ByHilary Ong

Aug 2, 2024

Reddit CEO Demands Microsoft Pay for Data Usage

Reddit’s CEO, Steve Huffman, has publicly demanded that Microsoft and other AI companies compensate Reddit for using its data, which they have been accessing without authorization.

In a detailed interview with The Verge, Huffman accused Microsoft’s Bing, along with AI companies Anthropic and Perplexity, of scraping Reddit’s content without permission. He emphasized the challenges and frustrations associated with blocking these companies, describing it as “a real pain in the a**,” but stressed that it was a necessary measure to safeguard Reddit’s data and ensure that the platform receives proper compensation.

Concerns Over Data Control

Reddit has already established deals with companies like Google and OpenAI, where these entities pay for the right to use Reddit’s data. However, Huffman pointed out that Microsoft and others have been resistant to such agreements. He noted that without these deals, Reddit lacks control over how its data is displayed or used, resulting in unauthorized uses, including training AI models and summarizing content on Bing without Reddit’s consent.

Huffman specifically mentioned that Microsoft has been using Reddit’s data to train its AI and has been selling access to this data through the Bing API to other search engines. He also referred to a recent statement by Microsoft AI CEO Mustafa Suleyman, who described public data on the internet as “freeware.”

Steps to Block Unauthorized Data Scraping

To address these issues, Reddit has been actively working to prevent unauthorized data scraping. In early July, Reddit updated its Robots Exclusion Protocol (robots.txt) to block web crawlers from companies with which it does not have agreements. This action resulted in Reddit content being accessible only through search engines like Google, which compensates Reddit for its data.

Microsoft’s head of search, Jordi Ribas, confirmed on X (formerly Twitter) that Bing had been blocked from accessing Reddit’s data, attributing this to Reddit favoring another search engine and affecting competition.

Protect Data Rights and Future Agreements

Huffman pointed to OpenAI’s recent announcement of SearchGPT, which will include Reddit results thanks to a deal between the two companies, as an example of the kind of agreement Reddit seeks to replicate. Reddit spokesperson Tim Rathschmidt clarified that none of Reddit’s current content licensing deals include exclusive use cases for its data, indicating a willingness to work with multiple partners.

This situation reflects a broader trend among content creators and traditional media publishers, including The Verge’s parent company, Vox Media, who are increasingly seeking compensation for their content used by generative AI models. Huffman highlighted that the traditional value exchange from search engines—crawling content in exchange for traffic—is becoming more complex as the lines between search, summarization, and training blur.

After the story was published, Anthropic spokesperson Jennifer Martinez stated that Reddit has been on their block list for web crawling since mid-May and that they respect the robots.txt file as the industry standard for blocking web crawlers.

Microsoft declined to comment on the matter, and Perplexity did not respond to a request for comment.


Featured Image courtesy of Jakub Porzycki/NurPhoto via Getty Images

Follow us for more tech news updates.

Hilary Ong

Hello, from one tech geek to another. Not your beloved TechCrunch writer, but a writer with an avid interest in the fast-paced tech scenes and all the latest tech mojo. I bring with me a unique take towards tech with a honed applied psychology perspective to make tech news digestible. In other words, I deliver tech news that is easy to read.

Leave a Reply

Your email address will not be published. Required fields are marked *