The Era of AI Hijacking: How Generative Scourges are Plundering the Internet
In the age of advanced AI, the boundaries between fair play and thievery have become increasingly fuzzy. The latest perpetrator of this murky landscape is Perplexity AI, a startup that’s allegedly pillaging websites and passing off others’ work as its own.
This “genius” of an AI chatbot combines a search engine with a giant language model to generate answers that leave users none the wiser as to the original source. Unlike their competitors, Perplexity doesn’t have the decency to train its own AI models – instead, it relies on open or commercial ones to pilfer information from the internet and regurgitate it back to users.
But don’t just take our word for it. This June, Forbes accused Perplexity of plagiarizing one of its articles verbatim, while Wired reported the startup’s IP address was snatching content from various websites, flouting the Robots Exclusion Protocol, which is meant to prevent web scrapers from accessing restricted areas. Despite this, Perplexity claims they’re operating within the bounds of fair use copyright laws.
So, what does this mean? In essence, web scraping is the digital equivalent of industrial-scale looting. Search engines like Google do it to index the web, while other companies and researchers gather data for market analysis, academic research, and – as we’ve seen with Perplexity – training their AI models. The question is, at what cost?
Perplexity maintains that summarizing a URL isn’t web scraping, but rather a helpful tool for users. They argue that web crawlers are meant to index websites, while their AI is simply processing user requests. But to others, this distinction is a thin veil masking a more sinister reality.
The tech world is abuzz with the implications. Wired reported that Amazon Web Services (AWS) is investigating Perplexity for ignoring the Robots Exclusion Protocol, while the startup’s CEO promised to cite sources more prominently in the future – a commitment that’s far from foolproof. After all, hallucinated links and AI-generated content have already been spotted in the wild.
Some argue that fair use might be on Perplexity’s side, citing the U.S. Copyright Office’s stance on using limited portions of a work for purposes like commentary, criticism, news reporting, or scholarly reports. But this raises troubling questions about the future of original content and how AI companies will continue to operate in the gray areas of copyright law.
As the dust settles, one thing is clear: Perplexity has set a dangerous precedent. If their behavior goes unchecked, we can expect a wave of AI-powered data thieves to follow suit, and with them, a bleak future of piracy and intellectual property theft.
Now, the question is, what’s stopping Perplexity from taking an even bigger slice of the data pie?
Source link



