In a move that signals the beginning of a new era in protecting online content, web publishing platform Medium has announced that it will block OpenAI’s GPTBot, an agent that scrapes web pages for content used to train the company’s AI models. But this development is just the tip of the iceberg, as a group of platforms may soon form a unified front against what many consider an exploitation of their content.
A Growing Concern
Medium joins CNN, The New York Times, and numerous other media outlets in adding ‘User-Agent: GPTBot’ to the list of disallowed agents in its robots.txt. This is a document found on many sites that tells crawlers and indexers, the automated systems constantly scanning the web, whether that site consents to being scanned or not.
The Reality Behind Web Scraping
While some may think that indexing is harmless, it’s essential to understand that AI makers do more than just index. They scrape the data to be used as source material for their models. Few are happy about this practice, and certainly not Medium’s CEO, Tony Stubblebine, who writes:
"I’m not a hater, but I also want to be plain-spoken that the current state of generative AI is not a net benefit to the Internet.
They are making money on your writing without asking for your consent, nor are they offering you compensation and credit…
AI companies have leached value from writers in order to spam Internet readers."
A Call to Action
In light of this, Medium has decided to default to telling OpenAI to take a hike when its scraper comes knocking. This move is not likely to make a dent in the actions of spammers and others who will simply ignore the request. However, it’s essential to note that there are other options available.
Stubblebine writes:
"Medium is not alone. We are actively recruiting for a coalition of other platforms to help figure out the future of fair use in the age of AI."
A Coalition of Support
What’s holding them back? Unfortunately, multi-industry partnerships are in general slow to develop due to various reasons.
Stubblebine continues:
"By the standards of publishing and copyright, AI is absolutely brand new and there are countless legal and ethical questions with no clear answers, let alone settled and widely accepted ones.
How can you agree to an IP protection partnership when the definition of IP and copyright is in flux? How can you move to ban AI use when your board is pushing to find ways to use it to the company’s advantage?"
A Call for Unity
It may take a 900-pound internet gorilla like Wikipedia to take a bold first step and break the ice. Other organizations may be hamstrung by business concerns, but there are others unencumbered by such things and which may safely sally forth without fear of disappointing stockholders.
The Way Forward
However, until someone steps up, we will remain at the mercy of the crawlers, which respect or ignore our consent at their pleasure. It’s time for a change.
Stubblebine concludes:
"Medium is not alone. We are actively recruiting for a coalition of other platforms to help figure out the future of fair use in the age of AI."
The Need for Action
In conclusion, the web publishing platform Medium has taken a significant step towards protecting its content from web scraping by blocking OpenAI’s GPTBot. However, this move is just the beginning.
As Stubblebine so aptly puts it:
"A coalition of big organizations would be a powerful counterbalance to unscrupulous AI companies."
It’s time for other platforms to join forces and take action against web scraping.
What Can You Do?
If you’re concerned about your content being scraped, here are some steps you can take:
- Review your robots.txt file: Make sure that you’re blocking unwanted crawlers.
- Contact your hosting provider: Ask them to block the IP addresses of known web scrapers.
- Join a coalition: If you’re part of a platform or organization, join forces with others to take action against web scraping.
The Future of Fair Use
As AI continues to evolve, it’s essential that we prioritize fair use and protect online content from exploitation.
Let’s work together to create a future where platforms can thrive without fear of being scraped.
Related Articles
- The Ethics of Web Scraping: A comprehensive guide to understanding the ethics behind web scraping.
- Protecting Your Content from Web Scraping: Tips and tricks on how to protect your content from web scraping.
Join the Conversation
What do you think about web scraping? Do you think it’s essential for innovation or a threat to online security?
Share your thoughts in the comments below.