Your robots.txt file is a single text file at the root of your domain that tells crawlers what they can and can't access. It's also the most common reason a website is invisible to ChatGPT. Most sites either explicitly block AI crawlers (often left over from a 2024 privacy panic) or use a blanket Disallow: / rule that catches every bot, AI or not.

This guide gives you the exact configuration to use in 2026 — the AI bots that exist, what each one does, and how to set them up without accidentally blocking Google or breaking your SEO. It's tactical. You can copy and paste from this page directly into your robots.txt and be done in 10 minutes.

The AI crawlers that matter in 2026

There are seven crawlers you actually need to think about. They split into three categories: search crawlers, training crawlers, and on-demand crawlers. The difference matters because you might want to allow citation but block training, or vice versa.

Search crawlers — these index your content so AI assistants can cite you in real-time responses:

  • OAI-SearchBot — OpenAI's search index crawler. Citation lifeline for ChatGPT browsing mode.
  • PerplexityBot — Perplexity's search index. Critical for Perplexity citations.
  • Claude-SearchBot — Anthropic's search crawler (newer, growing share).
  • Google-Extended — Google's AI training and Search Generative Experience crawler.

Training crawlers — these crawl your content to train future model versions:

  • GPTBot — OpenAI's training crawler.
  • ClaudeBot — Anthropic's training crawler.

On-demand crawlers — these fire when a user explicitly enables web browsing inside the AI assistant:

  • ChatGPT-User — fires when a ChatGPT user clicks "Browse the web" in their session.
  • Claude-User — fires when a Claude user explicitly browses.

The recommended configuration

For 95% of businesses — anyone who wants to be cited by AI assistants and doesn't have legal restrictions on training data — the right move is to allow all of these. Here's the exact robots.txt to drop in:

# AI Search Crawlers — needed for citation in ChatGPT/Claude/Perplexity browsing
User-agent: OAI-SearchBot
Allow: /

User-agent: PerplexityBot
Allow: /

User-agent: Claude-SearchBot
Allow: /

User-agent: Google-Extended
Allow: /

# AI Training Crawlers — needed for future model knowledge of your brand
User-agent: GPTBot
Allow: /

User-agent: ClaudeBot
Allow: /

# On-Demand Crawlers — needed when users explicitly browse to your site
User-agent: ChatGPT-User
Allow: /

User-agent: Claude-User
Allow: /

# Standard search engines
User-agent: Googlebot
Allow: /

User-agent: Bingbot
Allow: /

# Default fallback
User-agent: *
Allow: /

# Sitemap location
Sitemap: https://yourdomain.com/sitemap.xml

Replace yourdomain.com with your actual domain. Upload this to the root of your site as robots.txt (not in a subfolder — it must be at yourdomain.com/robots.txt).

If you want to allow citation but block training

Some businesses — content publishers, paid research firms, niche IP holders — want AI assistants to cite their work but don't want their content used to train future models. This is the "search yes, training no" configuration:

# Allow search crawlers (for citation)
User-agent: OAI-SearchBot
Allow: /

User-agent: PerplexityBot
Allow: /

User-agent: Claude-SearchBot
Allow: /

User-agent: Google-Extended
Allow: /

# Block training crawlers
User-agent: GPTBot
Disallow: /

User-agent: ClaudeBot
Disallow: /

# Allow on-demand browsing (user-initiated)
User-agent: ChatGPT-User
Allow: /

User-agent: Claude-User
Allow: /

User-agent: *
Allow: /

Sitemap: https://yourdomain.com/sitemap.xml

This is a trade-off. Blocking training crawlers means your brand won't appear in future model training data, which weakens long-term citation share. For most businesses, the trade-off isn't worth it. For research publishers and IP-sensitive industries, it can be.

Common mistakes that block AI crawlers without you knowing

Mistake 1: A blanket disallow at the top

If your robots.txt starts with User-agent: * followed by Disallow: /, every bot is blocked from every page. Some default templates have this and it slips through unnoticed. Fix it immediately.

Mistake 2: Blocking the wrong user-agent name

AI crawler names changed throughout 2024 and 2025. If your robots.txt has lines like User-agent: ChatGPT or User-agent: OpenAI, those don't match anything — the real names are GPTBot and OAI-SearchBot. The block does nothing, but it also does no harm. Cleaner to just remove the broken lines.

Mistake 3: Wix or Shopify default robots.txt

Some platforms ship with restrictive defaults. Wix used to block GPTBot by default in 2024 — they updated this in 2025 but legacy sites may still have the old config. Shopify lets you override the default robots.txt via a theme file. Check your platform's documentation for how to customize.

Mistake 4: Subdomain-level blocks

robots.txt is per-subdomain. yourdomain.com/robots.txt doesn't apply to blog.yourdomain.com — that subdomain needs its own robots.txt. Many sites optimize their main domain and forget the blog subdomain that hosts their best content.

How to test your robots.txt

After uploading the new file, verify it's working:

  1. Visit yourdomain.com/robots.txt directly. It should display the new content.
  2. Use Google's robots.txt Tester (in Google Search Console) to verify Googlebot can access your pages.
  3. Run our free Reffed audit — it specifically checks for AI crawler blocks and tells you which bots are allowed or blocked.

Most CDNs cache robots.txt for up to 24 hours. If you make a change and the old version still shows, wait a day or purge your CDN cache.

One thing robots.txt doesn't do

Allowing AI crawlers in robots.txt doesn't guarantee citation. It removes the most common blocker, but you also need structured data, content depth, authority signals, and Bing indexation. Robots.txt is the floor, not the ceiling. See our full diagnostic checklist for the rest.

But if you do nothing else on the AI search optimization checklist, fix your robots.txt first. It's the single highest-leverage 10 minutes you can spend, and it's often the entire reason a site has zero AI citation share.