robots.txt

Web-master / Your website / Web Help

robots.txt

robots.txt
Specifies to search robots what catalogs to take for indexing should not be. If it is empty or does not exist, then everything can be taken.
List of search engine robots


Generate robots.txt file

The search engines always look for a file called "robots.txt" in the root directory of your domain (http://www.mydomain.com/robots.txt) .
This file tells the robots (spiders-indexers) what files they can index and which ones they do not.

robots.txt consists of two fields:

  1. User-agent is the name of the robot,
  2. Disallow - prohibits the indexing of a file or directory.
  3. comments - start with a new line with #.


Rules

Editors
Robots.txt should be created in text format.
As an editor you can use notepad, FTP client, some HTML-editors.

Title
Robots.txt, not robot.txt or Robots.txt, otherwise it will not work.

Location
The robots.txt file should be located in the root directory.

Spaces
":"
Spaces do not matter.

Comments
Comments - start with a new line with #. A space after # is optional.

Order
1st line User-agent, which defines the robot,
And bynext Disallow specifies a file or folder that is not indexed.

If the ban refers to a number of robots, then they are written one by one separately, and then a ban or list of prohibitions, for example:

User-agent: StackRambler
User-agent: Aport
Disallow:/eng
Disallow:/news

#Rambler and Aport to disallow the indexing of links,
#which begin with/news and/eng

The same and for Disallow - every ban with a new line.

If for different robots different prohibitions, then they are separated by an empty string, for example:

User-agent: *
Disallow:/news
# disallow for all the indexing of links,
#which begin with/news

User-agent: StackRambler
User-agent: Aport
Disallow:/eng
Disallow:/news
#Rambler and Aport to disallow the indexing of links,
#which begin with/news and/eng

User-agent: Yandex
Disallow:

#Yandex allow all.

Prevent all robots from indexing files with .doc and .pdf extensions.:

User-Agent: *
Disallow:/*.doc$
Disallow:/*.pdf$


Examples

User-agent: Roverdog
Disallow: email.htm

Allows all robots to index everything:
User-agent: *
Disallow:

Disallow all robots everything:
User-agent: *
Disallow:/

It disallows all robots to index the email.htm file, all files in the folder "cgi-bin" and the folder of the 2nd level "images":
User-agent: *
Disallow: email.htm
Disallow:/cgi-bin/
Disallow:/images/

It disallows Roverdog from indexing all server files:
User-agent: Roverdog
Disallow:/

One moe example:
User-agent: *
Disallow:/cgi-bin/moshkow
Disallow:/cgi-bin/html-KOI/AQUARIUM/songs
Disallow:/cgi-bin/html-KOI/AQUARIUM/history
Disallow:/cgi-bin/html-windows/AQUARIUM/songs
Disallow:/cgi-bin/html-windows/AQUARIUM/history


META tag ROBOTS

META robots tag is used to enable or disallow robots coming to the site to index this page. In addition, this tag is designed to offer robots a walk through all the pages of the site and index them. Now this tag is becoming more important.

<HTML>
<HEAD>
<META NAME="ROBOTS" CONTENT="NOINDEX, NOFOLLOW">
<META NAME="DESCRIPTION" CONTENT="This page ….">
<TITLE>...</TITLE>
</HEAD>
<BODY>

NOINDEX - forbids document indexing;
NOFOLLOW - Denies passage by the links in the document;
INDEX - allows indexing of the document;
FOLLOW - Allows you to follow links.
ALL - index everything, is equal to INDEX, FOLLOW
NONE - do not index anything, is equal to NOINDEX, NOFOLLOW

Robot meta tag examples:

<META NAME=ROBOTS" CONTENT="NOINDEX, FOLLOW">
<META NAME=ROBOTS" CONTENT="INDEX, NOFOLLOW">
<META NAME=ROBOTS" CONTENT="NOINDEX, NOFOLLOW">

Robots.txt Checker - Free check of file functionality robots.txt.