The robots.txt file is a plain text file placed at the root of your website (https://example.com/robots.txt) that tells search engine crawlers which pages or sections of your site they should or should not access. It has been around since 1994 and is one of the oldest standards on the web. Every serious search engine respects it, though it is important to understand what it can and cannot do.
How robots.txt Syntax Works
The file uses a simple, line-based syntax. Each block starts with a User-agent line that specifies which crawler the rules apply to, followed by one or more Disallow or Allow directives. Here is a complete breakdown of all supported directives:
User-agent: Identifies the crawler. Use*to target all crawlers, or specify a particular bot likeGooglebot,Bingbot, orGPTBot.Disallow: Tells the crawler not to access the specified path.Disallow: /private/blocks everything under the /private/ directory. An emptyDisallow:means nothing is blocked.Allow: Overrides a Disallow rule for a specific path. Useful when you want to block a directory but allow access to certain files within it.Sitemap: Specifies the URL of your XML sitemap. This is not technically part of the original robots.txt standard, but all major search engines support it.Crawl-delay: Tells the crawler to wait a specified number of seconds between requests. Google ignores this directive (you can set crawl rate in Search Console instead), but Bing and some other crawlers respect it.
A Typical WordPress robots.txt Example
User-agent: *
Disallow: /wp-admin/
Allow: /wp-admin/admin-ajax.php
Disallow: /wp-includes/
Disallow: /readme.html
Disallow: /xmlrpc.php
Disallow: /?s=
Disallow: /search/
Sitemap: https://example.com/sitemap_index.xmlLet us walk through what each line does:
Disallow: /wp-admin/: Prevents crawlers from accessing the WordPress admin area. There is no reason for search engines to crawl your dashboard.Allow: /wp-admin/admin-ajax.php: This exception is important because many themes and plugins use admin-ajax.php for frontend functionality. Blocking it can break features on your public-facing pages.Disallow: /wp-includes/: Blocks the WordPress core includes directory, which contains system files not meant for indexing.Disallow: /readme.html: Hides the WordPress readme file that reveals your WordPress version.Disallow: /xmlrpc.php: Blocks access to the XML-RPC endpoint, which is frequently targeted by brute-force attacks.Disallow: /?s=andDisallow: /search/: Prevents indexing of internal search result pages, which are low-value and can create duplicate content.
robots.txt vs. the noindex Meta Tag
This is one of the most commonly misunderstood distinctions in SEO. Many site owners think blocking a page in robots.txt prevents it from appearing in search results. That is not the case.
robots.txt controls crawling: it tells search engines not to visit a specific URL. But if other websites link to that URL, Google might still index it, showing the URL in search results with a note like "No information is available for this page."
The noindex meta tag controls indexing: it tells search engines "you can crawl this page, but do not include it in your search results." The crucial point is that Google needs to actually crawl the page to see the noindex directive. If you block a page in robots.txt AND add a noindex tag, Google cannot crawl the page to discover the noindex tag, so it might still index the URL based on external signals.
The rule of thumb: use robots.txt to manage crawl budget and keep crawlers out of server-side areas. Use noindex when you want a page removed from search results entirely.
How Googlebot Handles robots.txt
Google checks your robots.txt file regularly, typically caching it for up to 24 hours. If Google cannot fetch the file (for example, your server returns a 500 error), it will temporarily stop crawling your site to be safe. A 404 response, on the other hand, is interpreted as "no restrictions," meaning Google will crawl everything.
Google also supports pattern matching in robots.txt paths. You can use * as a wildcard and $ to indicate the end of a URL:
Disallow: /*.pdf$
Disallow: /category/*/page/The first rule blocks all PDF files across the entire site. The second blocks pagination pages within category archives.
Testing Your robots.txt with Google Search Console
Google Search Console includes a robots.txt tester that lets you check whether a specific URL is blocked. This is valuable after making changes to your robots.txt, since a small typo can accidentally block important pages. Enter the URL you want to test, and the tool tells you whether it is allowed or blocked, and which rule is responsible.
You should test your robots.txt after every change, especially after major site updates, theme changes, or migrations. It takes only a few seconds and can save you from accidentally deindexing parts of your site.
Common robots.txt Mistakes on WordPress Sites
A few mistakes show up repeatedly on WordPress sites:
- Blocking CSS and JavaScript files: Some older robots.txt templates block
/wp-content/or/wp-includes/broadly. This prevents Google from accessing the CSS and JS files it needs to render your pages. If Googlebot cannot render your page properly, it cannot evaluate it correctly for ranking. Always allow access to CSS and JavaScript files. - Blocking the entire site during development: Developers often add
Disallow: /during staging and forget to remove it before launch. WordPress has a "Discourage search engines" setting that does something similar, and this gets left on more often than you would expect. - Using robots.txt as a security measure: The file is publicly accessible. Anyone can read your robots.txt and see exactly which paths you are trying to hide. If you have sensitive content, use proper authentication or server-side access controls instead.
- Conflicting rules: When you have multiple User-agent blocks with overlapping rules, the behavior can be unpredictable. Google uses the most specific matching rule, but other crawlers might handle conflicts differently. Keep your robots.txt simple and avoid redundant blocks.
WordPress Auto-Generated robots.txt and How to Customize It
If no physical robots.txt file exists in your WordPress root directory, WordPress generates a virtual one automatically. This default file is minimal, typically containing just the /wp-admin/ disallow rule with the admin-ajax.php exception.
You have three options for customizing it:
- Create a physical file: Upload a
robots.txtfile to your WordPress root directory via FTP or your hosting file manager. This completely overrides the virtual version. - Use an SEO plugin: Both Yoast SEO and Rank Math provide a robots.txt editor in the WordPress admin panel, so you can make changes without FTP access.
- Use a filter hook: Developers can modify the virtual robots.txt output using the
robots_txtfilter in WordPress. This approach keeps the customization in code, making it easier to track in version control.
What InspectWP Checks
InspectWP checks whether your WordPress site has a robots.txt file, analyzes the rules it contains, and verifies whether a sitemap reference is included. It flags common issues like missing sitemap directives, overly broad disallow rules that might block important content, and rules that could prevent Google from rendering your pages correctly.