DMR News

Advancing Digital Conversations

Reddit Blocks Internet Archive to Prevent Free User Data Scraping

ByHilary Ong

Aug 13, 2025

Reddit Blocks Internet Archive to Prevent Free User Data Scraping

Reddit has restricted the Internet Archive’s access to its content after learning that AI companies were using the Wayback Machine to scrape user data without payment. The Internet Archive, a nonprofit digital library dedicated to preserving webpages and providing universal access to knowledge, has historically been allowed to collect Reddit’s public data for non-commercial purposes. However, a Reddit spokesperson said the platform recently discovered cases in which AI companies violated its policies by using the Wayback Machine to gather information for free.

New Restrictions on Archiving

While Reddit did not name the AI companies involved, it confirmed that it has introduced measures to prevent the Wayback Machine from being used in this way. From this week, the Internet Archive will no longer be able to crawl Reddit’s post detail pages, user comments, or profiles. Its archiving will be limited to Reddit’s homepage, meaning visitors will only be able to access a daily snapshot of top posts rather than full user-generated content. Reddit has told the Internet Archive that these limits will remain in place until it can ensure compliance with platform rules, including privacy protections and the removal of deleted content.

In recent years, Reddit has made clear that it is open to AI companies accessing its data — as long as they pay for the privilege. The platform currently licenses data to Google for $60 million a year and has a similar arrangement with OpenAI. In contrast, Reddit recently filed a lawsuit against Anthropic, alleging that its AI bots accessed the site over 100,000 times without permission. The Internet Archive has expressed optimism about resolving the issue, with Wayback Machine director Mark Graham saying the two organizations have a long-standing relationship and are engaged in ongoing discussions.

What The Author Thinks

Reddit’s move shows that the fight over AI training data is no longer just about ethics — it’s about money and control. While protecting user privacy is important, the fact that Reddit is happy to share the same data with AI companies if they pay for it makes this as much a business decision as a moral one. For the Internet Archive, the challenge will be proving it can still serve the public interest without becoming a free backdoor for AI firms.


Featured image credit: appshuunter via Unsplash

For more stories like it, click the +Follow button at the top of this page to follow us.

Hilary Ong

Hello, from one tech geek to another. Not your beloved TechCrunch writer, but a writer with an avid interest in the fast-paced tech scenes and all the latest tech mojo. I bring with me a unique take towards tech with a honed applied psychology perspective to make tech news digestible. In other words, I deliver tech news that is easy to read.

Leave a Reply

Your email address will not be published. Required fields are marked *