Reddit Sues Perplexity AI for Data Scraping

Reddit Sues Perplexity AI and Others Over Alleged Industrial-Scale Data Scraping for AI Training

Reddit has filed a lawsuit against Perplexity AI and three other entities, accusing them of illegally scraping millions of Reddit user comments to train artificial intelligence systems for commercial gain. The complaint, filed in a New York federal court on October 22, 2025, alleges that this “industrial-scale, unlawful” data scraping exploits Reddit’s vast repository of user-generated content without permission, threatening the platform’s business model and users’ rights.

Background: Why Reddit Is Targeting AI Data Scrapers

Reddit, a social media platform with over 100 million daily active users, hosts one of the largest and most dynamic collections of human conversation on the internet. Its user comments and discussions are highly valuable for training AI language models, which learn patterns of human language from massive datasets.

In recent years, AI companies have increasingly relied on scraping publicly available online content to build and refine their systems. Reddit has previously reached licensing agreements with tech giants like Google and OpenAI, allowing those companies to use Reddit’s data legally in exchange for payment. These deals have provided Reddit with significant revenue, supporting its growth as a publicly traded company since its Wall Street debut last year.

However, Reddit alleges that Perplexity AI and others have bypassed such licensing to scrape data illicitly, undermining those agreements and Reddit’s control over its content.

The Defendants: Perplexity AI and Associated Entities

Perplexity AI, a San Francisco-based company known for its AI chatbot and “answer engine,” competes with Google and ChatGPT in online search and AI assistance. Reddit claims Perplexity built its product using the scraped data.
Oxylabs UAB, a Lithuanian data-scraping company, allegedly sold the scraped data to AI companies like Perplexity.
AWMProxy, a web domain described by Reddit as a “former Russian botnet,” is implicated as part of the data collection infrastructure.
SerpApi, a Texas-based startup, is also named for its role in scraping Google search results that include Reddit content.

This lawsuit is notable because it not only targets an AI company but also the lesser-known scraping services that supply data to AI developers, shedding light on the complex supply chains behind AI training datasets.

Legal and Industry Context

This is Reddit’s second legal action against AI companies in 2025. In June, Reddit sued Anthropic, another major AI developer, for similar reasons related to unauthorized data use. These lawsuits highlight growing tensions between content platforms and AI firms over data rights, copyright, and fair compensation.

Ben Lee, Reddit’s chief legal officer, emphasized the company's stance: “Scrapers bypass technological protections to steal data, then sell it to clients hungry for training material. Reddit is a prime target because it’s one of the largest and most dynamic collections of human conversation ever created.”

Reddit’s efforts to enforce licensing and protect its data come amid intensified scrutiny of AI companies’ data sourcing practices. The AI industry’s rapid growth has fueled demand for enormous datasets, often acquired through aggressive web scraping, which raises legal and ethical questions about consent, copyright, and privacy.

Implications for the AI Industry and Content Platforms

For AI Companies: The lawsuit signals increased legal risks associated with scraping content from platforms without explicit authorization. Companies like Perplexity AI, which build competitive products using scraped data, may face financial liabilities and reputational damage.
For Content Platforms: Reddit’s aggressive legal stance illustrates how platforms can seek to monetize their user-generated data, either through licensing or litigation, as they balance openness with protecting intellectual property.
For Users: The suit raises concerns about how user content is repurposed by AI systems, often without direct consent or compensation, spotlighting broader debates about digital rights in the age of AI.
For the Legal Landscape: Courts will likely have to address novel questions about data scraping, copyright infringement, and the use of public internet content in AI training—a rapidly evolving area of law.

Visuals and Further Documentation

Official logos of Reddit and Perplexity AI can illustrate the parties involved.
Screenshots of Reddit’s user interface and Perplexity AI’s chatbot interface would provide context on the platforms affected.
Infographics explaining the data scraping process and AI training data flows could help readers understand the technical aspects.

Reddit’s lawsuit against Perplexity AI and associated scraping services marks a critical moment in the ongoing battle over AI data sourcing, signaling that social media platforms are asserting their rights more forcefully as AI technologies advance rapidly. This legal action could set precedents impacting how AI companies acquire and use online content in the future.