Robots.txt Checker

Robots.txt Checker - SEO Crawler Analysis Tool

Robots.txt Checker

Check if your website has a robots.txt file configured and analyze its rules to manage search engine crawler access to your site.

Checking robots.txt file and analyzing crawler rules...

Understanding Robots.txt

  • Location: Must be placed at the root of your domain (e.g., example.com/robots.txt)
  • Purpose: Tells search engine crawlers which pages they can or cannot access
  • User-agent: Specifies which crawler the rules apply to (* means all crawlers)
  • Disallow: Blocks access to specific paths or pages
  • Allow: Explicitly permits access to specific paths (overrides Disallow)
  • Sitemap: Points crawlers to your XML sitemap location

Robots.txt Examples

✅ Good Robots.txt Example:

User-agent: * Disallow: /admin/ Disallow: /private/ Disallow: /temp/ Allow: /public/ User-agent: Googlebot Disallow: /no-google/ Sitemap: https://example.com/sitemap.xml

⚠️ Common Issues:

User-agent: * Disallow: / # This blocks ALL crawlers from your entire site!
User-agent: * Disallow: *.pdf # Incorrect syntax - should be /*.pdf

🎯 Best Practices:

  • Always test your robots.txt before deploying
  • Use specific paths rather than wildcards when possible
  • Include your sitemap URL
  • Don't block CSS, JS, or image files needed for rendering
  • Be careful with the root disallow (Disallow: /)
  • Use comments (#) to document your rules

🚫 What NOT to Block:

  • CSS and JavaScript files (affects page rendering assessment)
  • Images that are part of your content
  • Important pages you want indexed
  • Your entire website (unless intentional)