google and robots.txt

30 May 2006

google and robots.txt

I received an odd email today from google about one of the semi-private sites I manage.


From: crawl-coverage@google.com
Subject: [#59761910] Google needs assistance to crawl #####.org
Date: May 30, 2006 12:38:02 PM EDT

Hello Brett,

Google is currently blocked from crawling your site by the robots.txt file that your server uses to control access by search engines. As a result, users who are looking for your site are not able to find it using Google. As you know, a large fraction of internet users use Google as their starting point and over 50% of search engines referrals come from Google. Google’s mission is to deliver the best search experience on the Internet
by making the world’s information universally accessible and useful. We would like to include your site at www.###.org in Google’s index to make it easier for your users or customers to find your site.

To allow Google to crawl your site, add the following lines to the
robots.txt file:

User-agent: Googlebot
Disallow:

Once you have allowed Google to access your site it can take between 4-6 weeks for your site to appear in our index. For additional information on getting your site included in our index, please see: http://www.google.com/webmasters/.

To help Google to crawl your site more effectively, you can create a Sitemap. For more information, please visit our site at
http://www.google.com/webmasters/sitemaps/docs/en/about.html.

If you do not wish to be contacted regarding inclusion in the Google index again, please send us an email so that we can remove you from our mailing list.

Sincerely,

Rebecca
The Google Crawl Coverage Team

Here’s what I don’t get: I’ve gone to the trouble of placing a robots.txt file telling them I don’t want the site indexed, which would seem to indicate that this was an intentional, deliberate act. I even specifically excluded googlebot.

But they’re asking me to take it down. Why?

Don’t get me wrong. I’m pleased as punch that they continue to abide by the robots.txt standard, unlike so many other crawlers.

But where is the money in this?


This is: brett's logjam → google and robots.txt.