Robots.txt Checker
Check if your website has a robots.txt file configured and analyze its rules to manage search engine crawler access to your site.
Checking robots.txt file and analyzing crawler rules...
Understanding Robots.txt
- Location: Must be placed at the root of your domain (e.g., example.com/robots.txt)
- Purpose: Tells search engine crawlers which pages they can or cannot access
- User-agent: Specifies which crawler the rules apply to (* means all crawlers)
- Disallow: Blocks access to specific paths or pages
- Allow: Explicitly permits access to specific paths (overrides Disallow)
- Sitemap: Points crawlers to your XML sitemap location
Robots.txt Examples
✅ Good Robots.txt Example:
User-agent: *
Disallow: /admin/
Disallow: /private/
Disallow: /temp/
Allow: /public/
User-agent: Googlebot
Disallow: /no-google/
Sitemap: https://example.com/sitemap.xml
⚠️ Common Issues:
User-agent: *
Disallow: /
# This blocks ALL crawlers from your entire site!
User-agent: *
Disallow: *.pdf
# Incorrect syntax - should be /*.pdf
🎯 Best Practices:
- Always test your robots.txt before deploying
- Use specific paths rather than wildcards when possible
- Include your sitemap URL
- Don't block CSS, JS, or image files needed for rendering
- Be careful with the root disallow (Disallow: /)
- Use comments (#) to document your rules
🚫 What NOT to Block:
- CSS and JavaScript files (affects page rendering assessment)
- Images that are part of your content
- Important pages you want indexed
- Your entire website (unless intentional)