Cloudflare has accused AI startup Perplexity of scraping content from websites that explicitly prohibited such activity via robots.txt files. In a blog post, Cloudflare alleges that Perplexity bypassed these blocks by disguising its identity—modifying user-agent strings and using different network identifiers—to avoid detection.
Cloudflare claims it observed this behavior across tens of thousands of domains, involving millions of requests daily, and identified the activity using machine learning and network signals.
Perplexity denied the claims. Spokesperson Jesse Dwyer dismissed the findings as a “sales pitch,” arguing that the cited screenshots didn’t show content access and that the bot mentioned wasn’t theirs.
This isn’t the first time Perplexity has faced scrutiny. In 2024, Wired and other media outlets accused it of content plagiarism. Perplexity's CEO struggled to define plagiarism when questioned during a TechCrunch event.
Cloudflare, which has ramped up its anti-AI scraping tools, has removed Perplexity from its list of verified bots and introduced new blocking measures. The company also recently launched a marketplace allowing site owners to charge AI crawlers for access.