Get daily feeds in your email. Subscribe Now!

Cant decide the right mobile? Read the reviews of all mobiles at Mobiles Reviews Center

Have great websiteseoblog news? Send us a tip!
Want to view Archives?

Search:
Using Robots.txt to Control Search Engines
Posted on Oct 20 2008 3:20 AM by Asif

Robots.txt is a text file you put on your site to tell search robots which pages you would like them not to visit. Robots.txt implements the Robots Exclusion Protocol, which allows you as a web manager, to define what parts of your site are off-limits to search engine crawlers. For example, Web managers can disallow access to .cgi, private, temporary directories and other areas with pages they do not want accessed or indexed.

The robots.txt file is made up of two parts, the User-agent and the Disallow. The User-agent specifies which robots to allow or disallow and the Disallow specifies which directories robots can or cannot crawl. The robots.txt is a gentleman's agreement and some crawlers, such as Google, may ignore the robots.txt file that disallows all crawling.

The structure of a robots.txt is pretty simple. This example allows all robots to visit all files:

User-agent: *
Disallow:

Example of a recommended robots.txt files blocking crawling of the scripts and images directories:

User-agent: *
Disallow: /scripts/
Disallow: /images/

If you have a particular robot in mind, such as the Google image search robot, which collects images on your site for the Google Image search engine, you may include lines like the following:

User-agent: Googlebot-Image
Disallow: /

This means that the Google image search robot, should not try to access any file in the root directory and all its subdirectories.

You can create the robots.txt file manually, using any text editor. It should be an ASCII-encoded text file, not an HTML file and the filename should be lowercase. Include the robots.txt file in your server's root directory. This is standard web management practice. It must be in the main directory because otherwise user agents (search engines) will not be able to find it - they do not search the whole site for a file named robots.txt. Instead, they look first in the main directory and if they don't find it there, they simply assume that this site does not have a robots.txt file and therefore they index everything they find along the way.

All search engines, or at least all the important ones, now look for a robots.txt file as soon their spiders your web site. So, even if you currently do not need to exclude the spiders from any part of your site, having a robots.txt file is still a good idea, it can act as a sort of invitation into your site.

About the Author

Online advertising solutions - www.promoteclick.com



Full story:
Permlink | Email this | Comments[0]

Add your comment

Reader Comments

No comment found for this blog




Post Comment


Sections

Search Engines

Ask (2)
Google (144)
Live Search (6)
Yahoo (19)

Seo Stuff

Ad Placement (199)
Adsense (59)
Blogging (25)
Contests (28)
Domains (2)
Internet Gossip (21)
Keywords (99)
Link Building (90)
Make Money Online (84)
Page rank (159)
SEO (1170)
SEO Tools (56)
Social Bookmarks (1)
Tips & Tricks (928)
Web Directories (66)
Website Traffic (92)

Resources

Contact Us
Blog Script
Advertise
Blogger Signup
Downloads
Link Exchange
Cheap Hosting
FameBits
Tutorials
Pakistan Jobs
Afghanistan Jobs
Music Lyrics
Movie Trailers
Track Employees
Video Game Trailers
India Pak Videos
Google Android
Cricket Blog
Pakistan Dating




Most Commented On (60 days)

Recent Comments

FlashedCoder Blogs Network

Entertainment
Wireless
Health & Beauty
Consumer
Business
Software
Wallpapers
Spiritual
Politics & Society
Gaming
Traveling
Internet


Other FlashedCoder Network blogs you might be interested in: