How to Block Specific Bots from Crawling Your Site Using Robots.txt
The robots.txt
file is a text file placed in your website’s root directory. It instructs web crawlers which pages or directories they can or cannot access.
Why Block Specific Bots?
Not all web crawlers are beneficial. Some bots consume server resources, scrape content, or skew analytics. Blocking unwanted bots improves site performance, security, and SEO accuracy.
Identifying Bots to Block
Check your server logs or analytics tools to identify bots. Common unwanted bots include:
- AhrefsBot
- SemrushBot
- MJ12bot
Syntax for Blocking Bots in robots.txt
Use the User-agent
directive to target specific bots and Disallow
to restrict access.
Block a Single Bot
User-agent: BadBot
Disallow: /
Block Multiple Bots
User-agent: Bot1
Disallow: /
User-agent: Bot2
Disallow: /
Wildcards
User-agent: *
Disallow: /private/
The *
wildcard applies rules to all bots.
Common Mistakes to Avoid
- Typos: Ensure correct spelling of
User-agent
and bot names. - Incorrect Paths: Double-check directories in
Disallow
. - Conflicting Directives: Avoid mixing
Allow
andDisallow
haphazardly.
Testing Your robots.txt
Use Google Search Console’s Robots Testing Tool to validate your rules. Check server logs to confirm bots are respecting the file.
Advanced Tips
- Crawl-Delay: Add
Crawl-delay: 10
to slow down frequent crawlers (not all bots support this). - IP Blocking: Use
.htaccess
or firewall rules for aggressive bots.
Conclusion
robots.txt
is a simple yet powerful tool to control bot access. Regularly audit and test your file to ensure compliance. Note that malicious bots may ignore these rules, so combine with security measures for full protection.
Join the conversation