Controlling Web Robots by robt.txt File
Robots Exclusion Standard robots.txtThe Robot Exclusion Standard, also known as the Robots Exclusion Protocol or robots.txt protocol, is a convention to prevent cooperating web crawlers and other web robots from accessing all or part of a website which is otherwise publicly viewable
Web Robot
Robots are often used by search engines to categorize and archive web sites, or by webmasters to proofread source code. The standard is different from, but can be used in conjunction with, Sitemaps, a robot inclusion standard for websites
A Web robot also known as web robots, Internet bots, WWW robots or simply bots is a program that automatically and recursively traverses a Web site retrieving content and information from any website. The most common types of Web robots are the search engine spiders. These robots visit Web sites and follow the links to add more information to the search engine database
The largest use of bots is in web spidering, in which an automated script fetches, analyzes and files information from web servers. Search engines such as Google, Yahoo, Bing and others use them to index the web content, spammers use them to scan for email addresses, and they have many other uses. Allowing unknown robots eat your website’s bandwidth dramatically. So controlling the robots from visiting your website is become an important aspect for better optimization
How to create a robots.txt file
1. Create the text file as robots.txt2. Put the content which controls the robots according to your requirements
3. Upload the file into public_html directory
How to Allow Web Robots
User-agent: *Disallow:
This above example allows all the web robots to visit all files because the wildcard "*" specifies all robots
How to Disallow Web Robots
User-agent: *Disallow: /
This above example prevent all the web robots from visiting your website files because the wild-card "*" specifies all robots
How to Disallow Web Crawlers from the Specific Directories of a Website
User-agent: *Disallow: /private/
Disallow: /temp/
This above example code prevent all crawlers to enter three directories of a website
How to DisAllow Specific Crawler
User-agent: BadBot # replace the 'BadBot' with the actual user-agent of the botDisallow: /
This above example code tells that particular robot not to enter or visiting the website
How to DisAllow All Robots from Accessing a Specific Files
User-agent: *Disallow: /file.html
Disallow: /directory/file.html
This above example code tells all crawlers not to visiting the particular pages of a website
No comments:
Post a Comment