Robots.txt is a text file you put on your site to tell search robots which pages you would like them not to visit. Robots.txt implements the Robots Exclusion Protocol, which allows you as a web manager, to define what parts of your site are off-limits to search engine crawlers. For example, Web managers can disallow access to .cgi, private, temporary directories and other areas with pages they do not want accessed or indexed.
The robots.txt file is made up of two parts, the User-agent and the Disallow. The User-agent specifies which robots to allow or disallow and the Disallow specifies which directories robots can or cannot crawl. The robots.txt is a gentleman's agreement and some crawlers, such as Google, may ignore the robots.txt file that disallows all crawling.
The structure of a robots.txt is pretty simple. This example allows all robots to visit all files:
User-agent: *
Disallow:
Example of a recommended robots.txt files blocking crawling of the scripts and images directories:
User-agent: *
Disallow: /scripts/
Disallow: /images/
If you have a particular robot in mind, such as the Google image search robot, which collects images on your site for the Google Image search engine, you may include lines like the following:
User-agent: Googlebot-Image
Disallow: /
This means that the Google image search robot, should not try to access any file in the root directory and all its subdirectories.
You can create the robots.txt file manually, using any text editor. It should be an ASCII-encoded text file, not an HTML file and the filename should be lowercase. Include the robots.txt file in your server's root directory. This is standard web management practice. It must be in the main directory because otherwise user agents (search engines) will not be able to find it - they do not search the whole site for a file named robots.txt. Instead, they look first in the main directory and if they don't find it there, they simply assume that this site does not have a robots.txt file and therefore they index everything they find along the way.
All search engines, or at least all the important ones, now look for a robots.txt file as soon their spiders your web site. So, even if you currently do not need to exclude the spiders from any part of your site, having a robots.txt file is still a good idea, it can act as a sort of invitation into your site.
About the Author
Online advertising solutions -
www.promoteclick.com