/Resources/Cloudflare – The What, Why, When and the How of it

Cloudflare – The What, Why, When and the How of it

CCLE Team
contact@icle.in

Introduction

Cloudflare recently broke the internet for introducing a marketplace for scraping website content, hitherto available for free. The company has announced a feature which technically allows the publishers and content creators to have control over their data. The publishers registered with Cloudflare can now explicitly enable anyone to scrap their website data, or deny access altogether, or something in between, depending on the chosen settings. Cloudfare’s marketplace can also enable the publishers to monetize the ‘crawling’ at a set rate, a form of compensation from the AI companies. The move might allow the publishers to diversify their business model, enhance revenue stream and better negotiate terms and conditions with AI companies when it comes to the provisioning of data.

About the Company

As per its website, Cloudflare is a global cloud platform which delivers services such as Secure Services Edge (SASE), Security Service Edge (SSE), threat intelligence and full stack development. Over the last year, the company has launched tools such as a one-click solution and a dashboard to block AI bots from accessing content and view how AI crawlers are visiting their site respectively.

Significance of the development

The move is pegged to significantly change how publishers and content creators do their business. The terms & conditions under which AI companies access their data, which virtually do not exist, are likely to significantly change with this service as publishers could now block AI bots by default. This problem of unauthorised scraping has led to multiple suits filed by publishers against AI companies in the past. For instance, ANI recently filed a suit on OpenAI in India accusing ChatGPT of scraping its content for training and generating responses. Similarly, Hungarian newspaper news publishers have filed a case against the same company in Europe reflecting the tension.

Virtually, all the Indian print and online media agencies have filed a case against Google at the Competition Commission of India where one of the primary allegations is that zero-click searches distort the fair level playing field for such publishers. The essence of such a feature is that Google sources the relevant content from the publisher’s website and displays it upfront on the Search Engine Result Page (SERP) negating the need to visit the website. The publishers have alleged that Google doesn't pay for this sourcing and it further denies them the much-needed user traffic for generation of advertisement revenue on their website. The problem of crawling is made explicit by Coudflare's Matthew Prince’s statement which mentions that for every referral sent, Google’s crawler scrapes 14 times, OpenAI’s crawler scrapes 17,000 times and Anthropic scrapes 73,000 times.

Implications on tech governance

It might be too early to comment on the implications at this stage. The technology economy thrives on data and the generation, access and processing of this data has led to novel law and policy concerns. The tussle between Big Tech and publishers, being one of them, has kept the governments across the globe on heels. While the development of AI requires significant investment across the supply chain by tech companies, one commodity, which arguably is the most important and coming for free, is data. Cloudflare has refused to provide terms & conditions on the basis that it seeks to serve publishers, though if the purported move has any substance, it is likely to significantly change those terms. The question of pricing is equally important as a balance between Cloudflare's services and the purported loss around scraping has to be made.