GoogleBot using Google Analytics data

For the checkout process in one of my sites has multiple steps and I use the same backend file to process each of the steps (i.e. checkout.php). In order to track the each of the pages in google analytics, I set a custom page name / event for each of the steps in the checkout process.
i.e.
With the old urchin.js tracker:
urchinTracker("/checkout-S1");
Or with the new ga.js tracker:
pageTracker._trackPageview("/checkout-S1");

Example:

Page Url Page name / event sent to GA
Verify Items usablelayout.com/checkout.php /checkout-S1
Enter mailing address usablelayout.com/checkout.php /checkout-S2
Enter payment info usablelayout.com/checkout.php /checkout-S3
Thank you / confirmation usablelayout.com/checkout.php /checkout-S4

I have noticed requests in my server logs and in my page not found (404) error report that the GoogleBot had tried to access /checkout-S1, /checkout-S1, ... on my server. It appears that Google is using the data collected in Google Analytics to find new pages to crawl for the Google Bot.

Has anyone else see this sort of this happen? We all knew Google would (at some point) analyze the data that they are collecting from our sites via Google Analytics and start using that for their main search algorithms but this seems to be the proof that they are doing it.

What are your thoughts? What other data is being used by Google that they collect from our sites using Analytics for their search algorithms.

Slashdot this article Slashdot It!  |  Digg this article Digg This Story!

Comments

Not to worry

Someone just pointed me to this. This link may be helpful: http://blogs.zdnet.com/Google/?p=39 . The most likely explanation is that Googlebot can scan JavaScript to discover new urls. But even then, we'd rather not discover Analytics/Urchin-related urls, so I passed this url on to some folks from the crawl team to check into.

Post new comment

The content of this field is kept private and will not be shown publicly.
  • Web page addresses and e-mail addresses turn into links automatically.
  • Allowed HTML tags: <a> <em> <strong> <cite> <code> <ul> <ol> <li> <dl> <dt> <dd>
  • Lines and paragraphs break automatically.

More information about formatting options