I was idly watching some Apache access logs scroll by (well actually I was busy doing something, but like to keep an eye on things to spot any interesting or worrying trends early) and noticed a bunch of entries like this:
18.104.22.168 - - [06/Feb/2011:06:21:23 +0100] "GET /blog/?o=10 HTTP/1.1" 301 290 "-" "magpie-crawler/1.1 (U; Linux amd64; en-GB; +http://www.brandwatch.net)"
22.214.171.124 - - [006/Feb/2011:06:21:33 +0100] "GET /blog/?o=40 HTTP/1.1" 301 290 "-" "magpie-crawler/1.1 (U; Linux amd64; en-GB; +http://www.brandwatch.net)"
Never noticed that UA before, and it never seemed to follow up on the (perfectly valid) 301 redirects. A scan through 3 months of logs shows it's been doing that all the time - what a dumb bot.
Checking the URL provided, this appears to be the home page of a UK-based company providing "Social Media Monitoring Tools" - and who don't have the courtesy to provide any more information about their bot / crawler. Which is evidently not popular in some quarters.
So, as "Brandwatch" provides neither myself not the sites I run with any conceivable benefit, it's on the blocklist they go.
(I wonder if they monitor their own brand?)
Posted at 2011-02-09 06:31:00 |Comments (0)