What is Robots.txt?

Robots.txt is a text file in your website's root directory that tells search engine crawlers which pages or sections they're allowed to access. It's the first file crawlers check when visiting your site.

You can use robots.txt to block crawlers from seeing certain pages, like admin areas, duplicate content, or pages you don't want indexed. It's a directive, not a guarantee - well-behaved bots follow it, but malicious ones might ignore it.

Why robots.txt matters

Blocking the wrong pages in robots.txt is a common and devastating mistake. If you accidentally block important pages, they won't be crawled or indexed, effectively making them invisible in search results.

Many sites unintentionally carry over robots.txt rules from staging environments that block everything. If your site suddenly disappears from Google, check your robots.txt file first.

What to Block

Block admin pages, login screens, and backend functionality that users don't need to find through search. Block duplicate pages or filtered/sorted versions of the same content.

Don't block pages just because they contain thin or low-quality content. If you don't want them indexed, use a noindex meta tag instead. Robots.txt prevents crawling, not indexing - blocked pages can still appear in search results if they have backlinks.

Common Robots.txt Mistakes

Never block your CSS, JavaScript, or image folders. Google needs to crawl these files to understand how your pages render and evaluate mobile-first indexing compatibility.

Don't use robots.txt as a security measure. The file is publicly accessible at yourdomain.com/robots.txt. Anyone can read it to see which areas you're trying to hide.

Check your robots.txt regularly, especially after site launches or migrations. A misconfigured robots.txt file can tank your organic traffic overnight without any other warning signs.

What is Robots.txt?

Why robots.txt matters

What to Block

Common Robots.txt Mistakes

Related Terms

PostGenius goes live this month