Really Simple Licensing (RSL) emerged in 2024 as a technical response to AI companies training on web content without permission. Modeled after robots.txt, RSL allows website owners to declare machine-readable licensing terms for AI crawlers, specifying whether content may be used for training, whether compensation is required, and which AI agents are permitted.

The promise is straightforward: a text file that tells AI companies “these images require licensing.” But the visual industry has learned skepticism about technical fixes. Legislation couldn’t stop Pinterest from freely using images. Visible watermarks didn’t prevent scraping. IPTC metadata gets stripped.

The question isn’t whether RSL is technically sound. It’s whether it provides real value to photo agencies and independent photographers trying to protect their work.

What RSL Does

RSL is a declaration system, not a protection mechanism. Like robots.txt, it’s a text file on a web server (yourdomain.com/rsl.txt) that AI crawlers read before accessing content. The file contains machine-readable instructions about usage permissions.

For photo agencies, this means declaring:

  • Usage restrictions (training permitted/prohibited, generative use, commercial applications)
  • Licensing requirements (payment models, contact information)
  • Agent-specific rules (allowing search engines while blocking training crawlers)

Implementation is simple: create the file, host it, done. Infrastructure providers like Cloudflare have integrated RSL support, making enforcement easier.

The Hard Limitations

RSL only controls content on your domain. It governs what’s crawled from the publisher’s server at the moment of crawling. Nothing more.

Photo agencies typically distribute through syndication partners, license to publishers who host copies and have decades of content already scattered across the web. RSL provides no protection for images already copied, mirrored, or redistributed. An AI company scraping a newspaper’s archive containing licensed agency photos bypasses the agency’s RSL file entirely.

Independent photographers face this more acutely. Most don’t host comprehensive archives on domains they control. Work exists on Instagram, Flickr, portfolio platforms, news sites, and places where RSL is irrelevant.

RSL depends entirely on voluntary compliance. It’s a declaration AI companies may respect or ignore. Major Western companies (OpenAI, Google, Microsoft, Anthropic, Meta) face regulatory scrutiny and reputational risks that incentivize some compliance. But the ecosystem also includes startups in jurisdictions with minimal enforcement, open-source developers creating “research datasets,” and actors who spoof user-agents.

RSL is a “No Trespassing” sign, not a wall.

Why Agencies Should Implement It Anyway

Even imperfect tools have strategic value.

Legal standing: RSL creates unambiguous, machine-readable documentation that licensing terms were clearly communicated. If an AI model trains on agency content despite explicit prohibition, the agency has documented evidence.  It’s voluntary trespassing. This matters in litigation.

Search vs. training decoupling: Agencies need Google Image Search visibility but don’t want AI training use. RSL allows precise differentiation, maintaining search indexing while blocking training crawlers. This wasn’t technically possible before.

Regulatory leverage: The EU AI Act Article 53 requires foundation model providers to document training data sources and copyright compliance. RSL provides verifiable evidence of licensing terms. AI companies operating in the EU must demonstrate that data was obtained legally. RSL shifts the burden of proof: companies must proactively show compliance rather than photographers proving the violation.

The implementation cost is minimal: The strategic benefits, legal standing, regulatory alignment, negotiation basis,justify adoption despite limitations.

 

Screenshot of the Real Simple Licensing website.

Screenshot of the Real Simple Licensing website. Could it be the best protection against AI scrapping?

The Enforcement Stack: RSL + C2PA + Invisible Watermarking

No single technology solves content protection, but three technologies combined create a defensible architecture:

RSL declares usage rights at domain level. C2PA embeds provenance metadata that travels with individual files across platforms, includes a Do not train declaration, and Invisible watermarking persists even when metadata is stripped.

If AI companies ignore RSL and scrape directly, C2PA metadata documents authorisation. If metadata is stripped, watermarks remain detectable. If licensed images appear in generated outputs, watermark detection proves unlicensed training.

Agencies investing in protection need all three layers. RSL is the declaration. C2PA is the passport. Watermarking is the forensic trace.

Implementation

Technical deployment is straightforward: create an RSL.txt file with licensing terms and blocked agents, host it on your primary domain, configure CDN enforcement if using Cloudflare or similar services, and monitor compliance through server logs.

Phase two integrates C2PA metadata into newly licensed images and evaluates watermarking solutions. Phase three involves ongoing monitoring, violation documentation, and industry coordination.

The Collective Action Opportunity

Individual adoption creates modest protection. Coordinated industry adoption creates market leverage.

If major agencies implement aligned RSL standards, they establish industry norms that create leverage in negotiations with AI companies, prevent competitive undercutting on licensing terms, strengthen advocacy for regulatory protections, and distinguish professional licensed content from scraped social media imagery.

This parallels music industry collective licensing (ASCAP, BMI). RSL provides technical infrastructure for coordination without requiring complex legal structures, agencies simply adopt compatible terms.

a robot eating images

No more photo-scraping AI crawlers?

What It Means

RSL won’t stop determined bad actors or protect widely distributed content. It will create clarity where ambiguity benefits AI companies, establish legal foundation for demonstrating deliberate violation, enable selective access that maintains search visibility while restricting training, position agencies for regulatory compliance, and provide negotiation basis for licensing deals.

For agencies with archives on their own domains, adoption is strategically sound. For independent photographers, RSL protects only self-hosted content, but costs nothing to implement.

The realistic framing: RSL is necessary but not sufficient. It works best as part of an integrated strategy combining technical tools (RSL + C2PA + watermarking), legal clarity (documented terms and violations), and commercial positioning (industry coordination and transparency requirements).

RSL provides the visual industry with a machine-readable way to assert that professional imagery is licensed content, not public-domain training data. That declaration won’t stop all scraping, but it creates a clear line between companies that respect licensing terms and those that don’t—a distinction that matters as regulation and legal liability shape the AI industry.

The question isn’t whether RSL is perfect. It’s whether agencies will use every available tool before the AI training economy solidifies into permanent structures.

 

SIDEBAR: Sample RSL Implementation for Photo Agencies

# RSL Declaration for [Agency Name]
# Location: https://agency.com/rsl.txt
# Last Updated: 2025-01-15

# Default Policy
User-agent: *
Disallow: /images/
Disallow: /archive/
Disallow: /cdn/
Use: none
Training: prohibited
Generative: prohibited

# Licensing Requirements
Require-license: training
Require-license: generative-ai
Require-license: model-development
Payment: subscription
Contact: licensing@agency.com
Terms: https://agency.com/ai-licensing-terms

# Search Engine Allowances
User-agent: Googlebot
User-agent: Bingbot
Allow: /images/
Use: search-indexing
Use: seo-display
Training: prohibited

# Specific AI Crawler Rules
User-agent: GPTBot
Use: none
Training: prohibited

User-agent: Google-Extended
Use: none
Training: prohibited

User-agent: CCBot
Use: none
Training: prohibited

User-agent: anthropic-ai
Use: none
Training: prohibited

User-agent: Claude-Web
Use: none
Training: prohibited

# Allowed Partners (if applicable)
User-agent: LicensedPartnerBot
Allow: /images/
Use: licensed-training
Training: permitted
License-ID: LP-2024-001

Implementation Notes:

  • Replace [Agency Name] with actual organization name
  • Update contact email and terms URL
  • Add specific licensed partner user-agents as agreements are established
  • Review and update quarterly as AI crawler landscape evolves
  • Monitor server logs to verify crawler compliance
  • Document any violations for potential legal action

Infrastructure Integration:

If using Cloudflare, enable “Bot Fight Mode” and configure custom rules to enforce RSL terms at the CDN level. This prevents bandwidth consumption from non-compliant crawlers even before requests reach origin servers.

If using other CDNs (Akamai, Fastly, AWS CloudFront), check for RSL-equivalent bot management features in recent platform updates.


This article represents analysis and strategic guidance. Implementation should be coordinated with legal counsel familiar with copyright, licensing, and AI regulatory developments in relevant jurisdictions.

Share Button

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Post Navigation