LART’s Blog


blocking nasty robots

Posted in Main by LART on the December 14th, 2005

I was looking over the logs on one of my honey pots and I saw something strange but not suppriseing. a robot that was using the msie 6 useragent. so then I looked at my logs to see if they downloaded robots.txt and there was nothing. now i’m getting a little pissed. doing some more digging shows that the bot works for Cyveillance after reading the second resault from google I wanted to put on my tin foil hat. This bot deserves to die but it got me thinking what’s the best and ezest way to block bots like this.

I fairly quickly came across “Stopping Spambots: A Spambot Trap”. On that page Neil talks about how he made a simple spambot trap that should trap any abusive robot that does not pay attion to robots.txt. so I thought of 2 things to do first keep track of who simple ignores the robots.txt but does download it (kindof hard w/o looking at logs or editing htaccess/httpd.conf) or make a folder that’s in robots.txt as deny for all then frist have a page that warns users that if they click the link on this page they will be banned from this server. log who hits that second link and log their ip agent etc in a db.

For now I think i’m going to simply make the later of the 2. then make a small php script that I can include on all php pages with a simple sql query and if it finds the ip die();. once I get that I also want to add a second part to the php script that get’s included on all pages. I want to look in using dns blacklists for blocking open proxys as they are nothing but truble. The only down side is that it can take some time and I don’t know of any way to mutithread php. So my idea is use a ouside program mostlikely java as it’s my language of choice to do this step and use a simple system call in php.

Leave a Reply

You must be logged in to post a comment.