Robots.txt Generator

Create, customize, and validate your robots.txt with AI crawler management

Crawler Access

Search Search Engines

▼

AI Train AI Training

▼

AI Search AI Search & Retrieval

▼

SEO SEO Tools

▼

Path Rules

Settings

Crawl Delay

Google ignores Crawl-delay. Bing and Yandex support it.

ℹ️ Configure settings to generate robots.txt

robots.txt

Test a URL

What is Robots.txt?

A robots.txt file is a plain text file placed at your website’s root (e.g., https://example.com/robots.txt) that tells crawlers which pages they can or cannot access. It follows the Robots Exclusion Protocol, defined in RFC 9309.

Every major search engine respects robots.txt. When Googlebot, Bingbot, or any compliant crawler arrives, it checks /robots.txt first. Robots.txt controls crawling (fetching pages) — not indexing (appearing in search results). For indexing control, use noindex meta tags.

🕷️

Crawl Control

Specify which paths crawlers can and cannot access on your site.

🤖

AI Bot Management

Block AI training crawlers while allowing AI search bots for visibility.

🗺️

Sitemap Discovery

Point crawlers to your XML sitemap for better content discovery.

AI Crawlers: Training vs. Search

In 2026, the most critical robots.txt decision is managing AI crawlers. There are two distinct categories:

AI Training Crawlers

These bots scrape content to build datasets for large language models. Blocking them prevents your content from being used for model training but has no effect on search visibility. Nearly 21% of top websites now reference GPTBot in their robots.txt.

AI Search Crawlers

These bots fetch pages on-demand for AI-powered search results (ChatGPT browsing, Perplexity, Google AI Overviews). Allowing them means your content can appear as a cited source in AI search, driving traffic to your site.

Bot	Owner	Type	Recommendation
`GPTBot`	OpenAI	Training	Block if protecting content
`Google-Extended`	Google	Training	Block to opt out of Gemini training
`ClaudeBot`	Anthropic	Training	Block if protecting content
`CCBot`	Common Crawl	Training	Block to reduce AI dataset inclusion
`ChatGPT-User`	OpenAI	Search	Allow for AI search visibility
`PerplexityBot`	Perplexity	Search	Allow for AI search citations

Directives Reference

Directive	Purpose	Example
`User-agent`	Target specific crawler	`User-agent: Googlebot`
`Disallow`	Block path from crawling	`Disallow: /admin/`
`Allow`	Override broader Disallow	`Allow: /admin/ajax.php`
`Sitemap`	Point to XML sitemap	`Sitemap: https://example.com/sitemap.xml`
`Crawl-delay`	Seconds between requests	`Crawl-delay: 10`
`*` (wildcard)	Match any string	`Disallow: /*.pdf$`
`$` (end)	Match end of URL	`Disallow: /page?*$`

Best Practices

✓ Do

Test your robots.txt before deploying
Use absolute URLs for Sitemap directives
Include a Sitemap reference for better discovery
Review quarterly — new bots appear regularly
Use Allow to override broader Disallow rules
Block AI training bots if protecting content

✕ Don’t

Use robots.txt for security (it’s publicly readable)
Block CSS/JS files (prevents page rendering)
Expect robots.txt to remove already-indexed pages
Use Disallow: / unless you want to block everything
Forget the trailing slash on directory paths
Assume all bots respect robots.txt

Common Mistakes

Mistake	Impact	Fix
Blocking CSS/JS files	Search engines can’t render pages correctly	Allow `/wp-content/`, `/assets/`
Using robots.txt for noindex	Pages may still appear in SERPs via backlinks	Use `<meta name="robots" content="noindex">`
Relative sitemap URLs	Crawlers can’t find your sitemap	Use full URL: `https://example.com/sitemap.xml`
Blocking the entire site accidentally	Complete de-indexing over time	Use specific paths instead of `Disallow: /`
Not managing AI crawlers	Content used for AI training without consent	Explicitly block unwanted AI bots by user-agent
Forgetting case sensitivity	Rules may not match intended paths	Match the exact case of your URL paths

Frequently Asked Questions

Not directly. Robots.txt controls which pages Googlebot can crawl, but it doesn’t influence how pages rank once indexed. However, misconfigured robots.txt can prevent important pages from being crawled and indexed at all, which effectively removes them from search results.

Yes. Each AI company uses specific user-agent tokens. For example, GPTBot for OpenAI’s training crawler, ClaudeBot for Anthropic, and Google-Extended for Google’s Gemini training. Block them individually with their user-agent names. Note: blocking GPTBot blocks training only — the separate ChatGPT-User agent handles AI-powered search.

Always at the root of your domain: https://example.com/robots.txt. It must be accessible at this exact URL. Subdomains need their own robots.txt files — https://blog.example.com/robots.txt is separate from the main domain’s file.

Google caches robots.txt for up to 24 hours. After updating, changes typically take effect within a day. You can request a re-crawl through Google Search Console for faster processing. Bing and other engines may take longer.

It depends. SEO tool crawlers (AhrefsBot, SemrushBot, etc.) index your backlinks and keywords. Blocking them hides your data from competitors but also prevents you from analyzing your own site in these tools. Most sites leave them at default settings.

Review at least quarterly. New AI crawlers emerge regularly, site structure may change, and outdated rules can harm SEO. After any major site restructure, migration, or launch, verify that your robots.txt still reflects current requirements.