The Robots.txt or Robots Exclusion Protocol is a simple text file used by websites to speak with web crawlers and different web robots. The text file specifies a way to inform the web robot regarding which areas of the website must not be processed or scanned. Robots are typically used by search engines to categorize web sites. Not all robots work with the standard, email harvesters, spambots, malware and robots that scan for security vulnerabilities might even begin with the parts of the website wherever they have been told to remain out. The standard is totally different from, however, will be used in conjunction with Sitemaps, a robot inclusion standard for websites.
When a website owner needs to provide directions to web robots they place a document known as robots.txt within the root of the top-level directory of a web server (e.g. https://www.laafy.com/robots.txt). This document contains the directions in a very specific format (examples are given below). Robots that choose to follow the directions try and fetch this file and read the directions before fetching the other file from the website. If this file does not exist, web robots assume that the website owner did not provide any specific directions and crawl the complete website.
A robots.txt file on a website can work as a request that specific robots ignore specific files or directories when crawling a site. This could be, as an example, out of a preference for privacy from search engine results, or the idea that the content of the chosen directories could be deceptive or impertinent to the categorization of the site as an entire, or out of a desire that an application solely operates on certain information. Links to pages listed in robots.txt will still seem in search results if they are connected to form a page that is crawled.
A robot.txt file can simply made by using this free SEO tool. Generate robots.txt online, Simply putting in the files and directories path in the input column, manage the allow or disallow protocols of the robots, include the sitemap as a reference in this regard and can also set parameters to allow or refused different web robots.
Here is an explanation of what the different words mean in a robots.txt file.
This part is used to specify directions to the robot.
This part will tell all robots to crawl a website because of the wildcard "*" stands for all robots.
This part will tell, these directions apply to just Googlebot.
This Disallow part tells the robot that which pages on the site must be excluded from crawling. If the disallow directive has no value, that means no pages are disallowed.
Here are some examples which will help you to understand and use robots.txt.
This example will allow full access of all robots to a website:
This example will block all access to all robots to a website:
This example will not allow all robots to enter one specific directory:
This example will not allow all robots to enter one specific file in the directories:
Note that all other files in the specified directory will be processed.