How to authenticate Googlebot
Monday, 25 September 2006
Matt from Google has posted the 'official' way to spot if the spider hitting your website is actually Googlebot.
The technique involves an IP lookup to check the crawl host domain contains googlebot.com:
> host 66.249.66.1
1.66.249.66.in-addr.arpa domain name pointer crawl-66-249-66-1.googlebot.com.
Followed by a check on that host to verify it matched the IP address:
> host crawl-66-249-66-1.googlebot.com
crawl-66-249-66-1.googlebot.com has address 66.249.66.1
This will be very handy for analytics vendors in correctly filtering fake spiders from their stats, and also website owners in controllling who can crawl their websites. Blog Zero has some great PHP and Perl code snippets if you'd like to see how it's done.
Comments