Web-Scraping AI Bots Cause Disruption For Scientific Databases and Journals

Automated web-scraping bots seeking training data for AI models are flooding scientific databases and academic journals with traffic volumes that render many sites unusable. The online image repository DiscoverLife, which contains nearly 3 million species photographs, started receiving millions of daily hits in February this year that slowed the site to the point that it no longer loaded, Nature reported Monday. The surge has intensified since the release of DeepSeek, a Chinese large language model that demonstrated effective AI could be built with fewer computational resources than previously thought. This revelation triggered what industry observers describe as an "explosion of bots seeking to scrape the data needed to train this type of model." The Confederation of Open Access Repositories reported that more than 90% of 66 surveyed members experienced AI bot scraping, with roughly two-thirds suffering service disruptions. Medical journal publisher BMJ has seen bot traffic surpass legitimate user activity, overloading servers and interrupting customer services. Read more of this story at Slashdot.

Jun 3, 2025 - 01:00
 0
Web-Scraping AI Bots Cause Disruption For Scientific Databases and Journals
Automated web-scraping bots seeking training data for AI models are flooding scientific databases and academic journals with traffic volumes that render many sites unusable. The online image repository DiscoverLife, which contains nearly 3 million species photographs, started receiving millions of daily hits in February this year that slowed the site to the point that it no longer loaded, Nature reported Monday. The surge has intensified since the release of DeepSeek, a Chinese large language model that demonstrated effective AI could be built with fewer computational resources than previously thought. This revelation triggered what industry observers describe as an "explosion of bots seeking to scrape the data needed to train this type of model." The Confederation of Open Access Repositories reported that more than 90% of 66 surveyed members experienced AI bot scraping, with roughly two-thirds suffering service disruptions. Medical journal publisher BMJ has seen bot traffic surpass legitimate user activity, overloading servers and interrupting customer services.

Read more of this story at Slashdot.