Tuesday, January 5, 2010 3:45 PM
Here at Penguin Blogs, Inc. we get a fair bit of comment spam. Most of it is automatically blocked by a fairly ingenious filter mechanism, but from time to time unknowns get through such as contentless posts like the following:
Very nice posting. I liked it.
thank you for your great posting.
Well that was a nice post
Purportedly this was written by a "Nick Matyas" of "Web Royalty" - what looks to be a legitimate SEO consultancy (look them up yourself, I'm not giving them the benefit of a link) but who are either using underhanded spamming methods, or have made a bad choice in outsourcing their own SEO.
Whatever, their URL is now on the filter list for this and many other sites, so they won't be troubling us here again.
Wednesday, December 19, 2007 9:31 AM
Deja-vu all over again...
74.86.249.98 - - [19/Dec/2007:10:17:22 +0100] "GET /path/to/file HTTP/1.1" 200 10049 "-" "Mozilla/5.0 (compatible; Gigamega.bot/1.0; +http://www.gigamega.net/bot.html)"
Tuesday, November 27, 2007 12:30 PM
38.113.234.181 - - [27/Nov/2007:13:21:24 +0100] "GET /robots.txt HTTP/1.0" 200 612 "-" "voyager-hc/1.0"
38.113.234.181 - - [27/Nov/2007:13:21:35 +0100] "GET /path/to/some/file.html HTTP/1.0" 301 363 "-" "voyager-hc/1.0"
38.113.234.181 resolves to crawl1.cosmixcorp.com, and
cosmixcorp.com redirects to kosmix.com - a California, USA-based
outfit which appears to be legit in a "we're a cool California start-up" kind of way. Not quite sure
what they're doing (hey - it's Web 2.0), but it evidently involves crawling without an identifiable
bot UA.
Our secret sauce (all Web 2.0 companies need one) is our categorization engine that crawls billions of Web pages in a unique manner to create algo-generated home pages…more on this later.
Wednesday, November 7, 2007 2:26 PM
A string of entries from a bot calling itself LiteFinder/1.0: never heard of it, though the
URL provided (http://www.litefinder.net/about.html) does work and claims it's a "a research project started by a group of Indian candidates from the cities of Bangalore, Patna and Jaipur.".
Thursday, August 3, 2006 5:57 AM
This stupid bot doesn't understand UTF-8 encoded URLS...
220.208.55.xxx - - [03/Aug/2006:07:45:03 +0200] "GET /ã??ã?«
ã??.html HTTP/1.1" 404 2422 "-" "Pockey-GetHTML/4.14.1 (Win32; GUI; ix86)
"
220.208.55.xxx - - [03/Aug/2006:07:45:04 +0200] "GET /ã??ã?£
ã?ªã??ã?³.html HTTP/1.1" 404 2422 "-" "Pockey-Get
HTML/4.14.1 (Win32; GUI; ix86)"
220.208.55.xxx - - [03/Aug/2006:07:45:06 +0200] "GET /è??å?¤
�治�.html HTTP/1.1" 404 2422 "-"
"Pockey-GetHTML/4.14.1 (Win32; GUI; ix86)"
Tuesday, May 16, 2006 11:20 PM
This morning I found one of my sites had been subjected to a deep crawl
by a bot naming itself
"Francis/2.0 (francis@neomo.de http://www.neomo.de/)". The
site seems to be an experimental
but legitimate German-language search engine. The first hits from the
bot were to robots.txt,
the although the site's
crawler information page
doesn't indicate what entries it interprets, if any. Requests look like this:
85.10.204.13 - - [16/May/2006:19:19:09 +0200] "GET /robots.txt HTTP/1.1" 206 390 "-" "Francis/2.0 (francis@neomo.de http://www.neomo.de/)"
85.10.204.13 - - [16/May/2006:19:19:09 +0200] "GET /robots.txt HTTP/1.1" 206 390 "-" "Francis/2.0 (francis@neomo.de http://www.neomo.de/)"
85.10.204.13 - - [16/May/2006:19:19:24 +0200] "GET / HTTP/1.1" 206 1949 "-" "Francis/2.0 (francis@neomo.de http://www.neomo.de/)"
85.10.204.13 - - [16/May/2006:19:19:25 +0200] "GET / HTTP/1.1" 206 1949 "-" "Francis/2.0 (francis@neomo.de http://www.neomo.de/)"
Interestingly all requests returned with HTTP status 206.