How to Create Robot.txt File ~ Seo File List

Thursday, 1 March 2012

How to Create Robot.txt File

Controlling Web Robots by robt.txt File

Robots Exclusion Standard robots.txt
The Robot Exclusion Standard, also known as the Robots Exclusion Protocol or robots.txt protocol, is a convention to prevent cooperating web crawlers and other web robots from accessing all or part of a website which is otherwise publicly viewable

Web Robot
Robots are often used by search engines to categorize and archive web sites, or by webmasters to proofread source code. The standard is different from, but can be used in conjunction with, Sitemaps, a robot inclusion standard for websites

A Web robot also known as web robots, Internet bots, WWW robots or simply bots is a program that automatically and recursively traverses a Web site retrieving content and information from any website. The most common types of Web robots are the search engine spiders. These robots visit Web sites and follow the links to add more information to the search engine database

The largest use of bots is in web spidering, in which an automated script fetches, analyzes and files information from web servers. Search engines such as Google, Yahoo, Bing and others use them to index the web content, spammers use them to scan for email addresses, and they have many other uses. Allowing unknown robots eat your website’s bandwidth dramatically. So controlling the robots from visiting your website is become an important aspect for better optimization

How to create a robots.txt file

1. Create the text file as robots.txt
2. Put the content which controls the robots according to your requirements
3. Upload the file into public_html directory

How to Allow Web Robots

User-agent: *
Disallow:
This above example allows all the web robots to visit all files because the wildcard "*" specifies all robots

How to Disallow Web Robots

User-agent: *
Disallow: /
This above example prevent all the web robots from visiting your website files because the wild-card "*" specifies all robots

How to Disallow Web Crawlers from the Specific Directories of a Website

User-agent: *
Disallow: /private/
Disallow: /temp/
This above example code prevent all crawlers to enter three directories of a website

How to DisAllow Specific Crawler

User-agent: BadBot # replace the 'BadBot' with the actual user-agent of the bot
Disallow: /
This above example code tells that particular robot not to enter or visiting the website

How to DisAllow All Robots from Accessing a Specific Files

User-agent: *
Disallow: /file.html
Disallow: /directory/file.html
This above example code tells all crawlers not to visiting the particular pages of a website