|
|
![]() |
![]() |
|
|||
My clients have been featured in Forbes, The New York Times, People magazine, Playboy Online, The Toronto Star, Upclose Magazine,
Home & Garden Television, Culinary Thymes, ABC News, The Salt Lake Tribune, Houston 11, The Washington Post,
The Honolulu Star & more.
|
Yahoos and Google BotFunny how things change. In 1726, Gulliver was a bold adventurer and a Yahoo was, by Gulliver's description, a naked, unkempt humanlike beast with a voracious sexual appetite. Gives a whole new perspective to working at home in your undies, doesn't it? (Do you Yahoo? I do!)And Gulliver? Today, Gulliver is Northern Light's search engine bot, and he's more often found surfing your server logs than shipwrecked on the high seas. On a regular basis, I get people asking me why their search engine lists are poor (because the bots don’t know or like you) or... how to keep the search engines out of their "private" pages (sorry, you can’t). Mostly, a lot of people have a lot of misconceptions... usually because of something they read... usually written by some Yahoo. (Gulliver’s kind, not Dave & Jerry’s) So, let’s fix up some of those misconceptions, okay? If you’re perusing your logs (stats) and notice a whole lot of visits by Gulliver or Scrubby, Scooter or Mercator - those aren’t die hard fans with cute screen names. They’re search engine bots figuring out how to - where to - or whether to - index your site.
If you’re of the curious sort, you can check out the names
of common search engine bots here: Want to keep the search engine bots from indexing a particular file or folder? All you have to do is create a text file called robots.txt file and upload it to your root directory. It should look like this:
User-agent: * The asterisk, of course, is a wildcard symbol to address "all" bots. You can even speak to specific bots... just address them by name, like this:
User-agent: Googlebot
Did you know that you can read anyone’s robots.txt file? Try
this one, for example: Some people are not able to use a robots.txt file because their host doesn’t grant root access or they don’t know how to upload to the root directory. Luckily, you can achieve the same thing with meta tags. Like this:
<META NAME="ROBOTS" CONTENT="NOINDEX"> The first tag tells the search engines not to index that page. The second tells it not to follow any links on that page. You can use one, or both, or combine them, like this: <META NAME="ROBOTS" CONTENT="NOINDEX,NOFOLLOW"> Another important thing to know is that the robots.txt page is not a good way to "hide" private member pages. Firstly, anyone can view the file at yourdomain.com/robots.txt. More importantly, when one of your members hops from your members area to another site, your "member" url will be found in the "other" site’s referrer logs. Fear not, though, that’s why you have passwords in your private member’s area. Even if a search engine lists your "private" pages, people still need a password to get into them... provided you use passwords, of course! While we’re on the topic of referrer logs, when you are "in" your mailing list program, or in that secret vault uploading software... don’t just surf on elsewhere when you’re done. Close your browser first. Open a new one before you mosey on. You don’t want those urls in someone’s referrer logs for trusty little Gulliver and the other bots to find, now do you?
You might also like to know that just because a bot "hit"
your page, does not mean that page will be listed in the
search engines. For example, Googlebot is known for not
indexing a page because it appears to be an exact duplicate
of another page on the web. So much for replicators? See
what Google says about it, here: Last, but certainly not least, a bot will also "look and leave" if it doesn’t like what it sees, or doesn’t see what it’s looking for. But, that’s getting into search engine optimization and that’s a whole article in itself. Lets save that grand adventure for another day, shall we? After all, even back in 1726, Gulliver had more than one adventure. Feel welcome to reprint my articles as is. Please don't change them. All I ask in return is a credit link to my site. Thanks. |
|
|
||
![]() |