Open Multiple URL In Single Click

Thursday, 1 March 2012

How to Create Robot.txt File

Controlling Web Robots by robt.txt File

Robots Exclusion Standard robots.txt 
The Robot Exclusion Standard, also known as the Robots Exclusion Protocol or robots.txt protocol, is a convention to prevent cooperating web crawlers and other web robots from accessing all or part of a website which is otherwise publicly viewable 

Web Robot 
Robots are often used by search engines to categorize and archive web sites, or by webmasters to proofread source code. The standard is different from, but can be used in conjunction with, Sitemaps, a robot inclusion standard for websites 

A Web robot also known as web robots, Internet bots, WWW robots or simply bots is a program that automatically and recursively traverses a Web site retrieving content and information from any website. The most common types of Web robots are the search engine spiders. These robots visit Web sites and follow the links to add more information to the search engine database 

The largest use of bots is in web spidering, in which an automated script fetches, analyzes and files information from web servers. Search engines such as Google, Yahoo, Bing and others use them to index the web content, spammers use them to scan for email addresses, and they have many other uses. Allowing unknown robots eat your website’s bandwidth dramatically. So controlling the robots from visiting your website is become an important aspect for better optimization 

How to create a robots.txt file

1. Create the text file as robots.txt 
2. Put the content which controls the robots according to your requirements 
3. Upload the file into public_html directory 

How to Allow Web Robots

User-agent: * 
Disallow:
 
This above example allows all the web robots to visit all files because the wildcard "*" specifies all robots 

How to Disallow Web Robots

User-agent: * 
Disallow: /
 
This above example prevent all the web robots from visiting your website files because the wild-card "*" specifies all robots 

How to Disallow Web Crawlers from the Specific Directories of a Website

User-agent: * 
Disallow: /private/ 
Disallow: /temp/
 
This above example code prevent all crawlers to enter three directories of a website 

How to DisAllow Specific Crawler

User-agent: BadBot # replace the 'BadBot' with the actual user-agent of the bot 
Disallow: / 
This above example code tells that particular robot not to enter or visiting the website 

How to DisAllow All Robots from Accessing a Specific Files

User-agent: * 
Disallow: /file.html 
Disallow: /directory/file.html
 
This above example code tells all crawlers not to visiting the particular pages of a website

No comments:

Post a Comment

ads