Directing Robots - robots.txt Robot Meta Tag No index No follow

Date: Sat, Mar 26, 2005, 4:56pm
From: Bert =zbox hosting

How to keep bots/spiders out of your indexes so they won't harvest the URL's?

You can keep bots out of your directories by using a file named "robots.txt". Non mailcous bots follow the instructions in this file. Such as Googlebot and MSNbot.

In your root just create a new file and name it "robots.txt", then in the file place the following code replacing:

/image1/, /image1/, /image1/, /picKLE-0.3/
With your own sub-directory names.

User-agent: *
Disallow: /image1/
Disallow: /image2/
Disallow: /image3/
Disallow: /picKLE-0.3/

That will keep friendly bots from spidering my image1, image2, image3, and pickle directories.

1robot.txt

Rename this file to robot.txt from 1robot.txt and go into edit and change the sub-directory names.

Some bots can really hammer your site and suck up alot of bandwidth if you allow them to spider your images.

For instance MSNbot sucked up over 150 megs of bandwidth last month whilst it hammered our forum. So I put in the robots.txt file and the bandwidth is almost half for this month.

Keep in mind though that denying friendly bots from spidering everything on your pages will put a hurt on your search engine rankings.

Bert

The Robots META Tag:

Another way to block robot and spider indexing that you may want to use in addition to the robot.txt or you can use it any where by it's self. By placing in the Head tag of a html page a Meta tag that directs the robots away from certain indexes.

The Robots META tag allows HTML authors to indicate to visiting robots if a document may be indexed, or used to harvest more links. No server administrator action is required.

Note that currently only a few robots implement this.

In the meta tag place the following text::
META NAME="ROBOTS" CONTENT="NOINDEX, NOFOLLOW

Directs robots that they should neither index this document, nor analyse it for links.

Here is that information. Provided by Robotstxt.org
Robots Exclusion