Robots.txt Generator

Create, customize, and validate your robots.txt with AI crawler management

Crawler Access
AI Train AI Training
Social Social Media
SEO SEO Tools
Path Rules
Settings

Google ignores Crawl-delay. Bing and Yandex support it.

ℹ️ Configure settings to generate robots.txt
robots.txt
Test a URL

What is Robots.txt?

A robots.txt file is a plain text file placed at your website’s root (e.g., https://example.com/robots.txt) that tells crawlers which pages they can or cannot access. It follows the Robots Exclusion Protocol, defined in RFC 9309.

Every major search engine respects robots.txt. When Googlebot, Bingbot, or any compliant crawler arrives, it checks /robots.txt first. Robots.txt controls crawling (fetching pages) — not indexing (appearing in search results). For indexing control, use noindex meta tags.

🕷️

Crawl Control

Specify which paths crawlers can and cannot access on your site.

🤖

AI Bot Management

Block AI training crawlers while allowing AI search bots for visibility.

🗺️

Sitemap Discovery

Point crawlers to your XML sitemap for better content discovery.

AI Crawlers: Training vs. Search

In 2026, the most critical robots.txt decision is managing AI crawlers. There are two distinct categories:

AI Training Crawlers

These bots scrape content to build datasets for large language models. Blocking them prevents your content from being used for model training but has no effect on search visibility. Nearly 21% of top websites now reference GPTBot in their robots.txt.

AI Search Crawlers

These bots fetch pages on-demand for AI-powered search results (ChatGPT browsing, Perplexity, Google AI Overviews). Allowing them means your content can appear as a cited source in AI search, driving traffic to your site.

BotOwnerTypeRecommendation
GPTBotOpenAITrainingBlock if protecting content
Google-ExtendedGoogleTrainingBlock to opt out of Gemini training
ClaudeBotAnthropicTrainingBlock if protecting content
CCBotCommon CrawlTrainingBlock to reduce AI dataset inclusion
ChatGPT-UserOpenAISearchAllow for AI search visibility
PerplexityBotPerplexitySearchAllow for AI search citations

Directives Reference

DirectivePurposeExample
User-agentTarget specific crawlerUser-agent: Googlebot
DisallowBlock path from crawlingDisallow: /admin/
AllowOverride broader DisallowAllow: /admin/ajax.php
SitemapPoint to XML sitemapSitemap: https://example.com/sitemap.xml
Crawl-delaySeconds between requestsCrawl-delay: 10
* (wildcard)Match any stringDisallow: /*.pdf$
$ (end)Match end of URLDisallow: /page?*$

Best Practices

✓ Do
  • Test your robots.txt before deploying
  • Use absolute URLs for Sitemap directives
  • Include a Sitemap reference for better discovery
  • Review quarterly — new bots appear regularly
  • Use Allow to override broader Disallow rules
  • Block AI training bots if protecting content
✕ Don’t
  • Use robots.txt for security (it’s publicly readable)
  • Block CSS/JS files (prevents page rendering)
  • Expect robots.txt to remove already-indexed pages
  • Use Disallow: / unless you want to block everything
  • Forget the trailing slash on directory paths
  • Assume all bots respect robots.txt

Common Mistakes

MistakeImpactFix
Blocking CSS/JS filesSearch engines can’t render pages correctlyAllow /wp-content/, /assets/
Using robots.txt for noindexPages may still appear in SERPs via backlinksUse <meta name="robots" content="noindex">
Relative sitemap URLsCrawlers can’t find your sitemapUse full URL: https://example.com/sitemap.xml
Blocking the entire site accidentallyComplete de-indexing over timeUse specific paths instead of Disallow: /
Not managing AI crawlersContent used for AI training without consentExplicitly block unwanted AI bots by user-agent
Forgetting case sensitivityRules may not match intended pathsMatch the exact case of your URL paths

Frequently Asked Questions

Not directly. Robots.txt controls which pages Googlebot can crawl, but it doesn’t influence how pages rank once indexed. However, misconfigured robots.txt can prevent important pages from being crawled and indexed at all, which effectively removes them from search results.

Yes. Each AI company uses specific user-agent tokens. For example, GPTBot for OpenAI’s training crawler, ClaudeBot for Anthropic, and Google-Extended for Google’s Gemini training. Block them individually with their user-agent names. Note: blocking GPTBot blocks training only — the separate ChatGPT-User agent handles AI-powered search.

Always at the root of your domain: https://example.com/robots.txt. It must be accessible at this exact URL. Subdomains need their own robots.txt files — https://blog.example.com/robots.txt is separate from the main domain’s file.

Google caches robots.txt for up to 24 hours. After updating, changes typically take effect within a day. You can request a re-crawl through Google Search Console for faster processing. Bing and other engines may take longer.

It depends. SEO tool crawlers (AhrefsBot, SemrushBot, etc.) index your backlinks and keywords. Blocking them hides your data from competitors but also prevents you from analyzing your own site in these tools. Most sites leave them at default settings.

Review at least quarterly. New AI crawlers emerge regularly, site structure may change, and outdated rules can harm SEO. After any major site restructure, migration, or launch, verify that your robots.txt still reflects current requirements.