What is a robots txt file and how does it affect SEO-
A robots.txt is a text file which is a standard used by websites to communicate with web crawlers and other web robots.
What is a web crawler?
A web crawler is an internet bot also known as web spider/web robots.
Web crawlers are responsible for finding information that are publicly available on the internet. This is done so that it can extract URL’s while scanning through the web.
These extracted URL’s are then placed in queue so that they can be fetched and visited one after the other to understand the-
- structure of the page.
- type of content inside the page.
- meaning of the content.
- time of its creation or the time when it was updated.
- incoming and outgoing links.
After having all the insights these information’s are then being updated and stored i.e. indexed so that they can be ranked further by the algorithm.
Why is a “robots.txt” file created?
A robots.txt file is created by the website owners to instruct/give permission to the search engine bots such as Googlebot, Bingbot etc., regarding the pages they can crawl and index during their crawling and indexing processes.
This also means that it helps the website owners to block the pages which are meant to be private and not being indexed in the search result.
Where can we find or create a robots.txt file?
A robots.txt file is always found in the main root folder of a website.
Can we name it as robot.txt rather than robots.txt?
No you cannot name it as robot.txt.
It is case sensitive, hence has to be named robots.txt.
How to see your robots.txt file?
You can enter the URL “https//yourwebsitename.com/robots.txt” to see your robots.txt file.
What are the most common predefined keywords used to create robots.txt file?
The most common predefined keywords used to create robots.txt file are.
- User-agent: It is used to represent the Search Engine bots.
User-agent: * – Here “*” simply states that the instructions given inside the robots.txt file are meant for every Search Engine bots.
User-agent: Googlebot – It represents that the instructions given inside the robots.txt file are meant for only Googlebot.
To see all the User-Agent List click here.
- Disallow: It instructs the User Agents i.e. “*” or a specific bot to not crawl through a particular URL or file of a website.
Disallow:/ This instruction represents that every file inside the root directory has been blocked for the Search Engine bots, for crawling and indexing.
Disallow:/file_name.html This statement represents that only this particular file has been blocked for the Search Engine bots for crawling and indexing.
- Allow: It explicitly instructs the Googlebot about the pages or subfolders it can access.
One of its feature is that, it can give access to a specific sub folder even if its parent directory is disallowed.
Allow: /images/dogs/ It shows that the parent directory “images” is disallowed but “dogs” that comes under the parent directory is allowed.
- Sitemap: It is used to specify the location of your XML Sitemap.
Example : Sitemap: https://yourwebsitename.com/page-sitemap.xml
How does robots.txt affects SEO?
We can understand from the fact that there are millions of running website which need to crawled and indexed every second to keep the google index updated. Therefore the work done for it is humongous. So how it is done?
Google provides every site with an crawl budget that decides how many times it is going to visit your site.
This budget depends upon factor such as:
- Decrease in the speed of your server while a web crawler crawls through your website thus effecting your real visitors.
- The popularity of a site. The popular sites or sites with higher number of contents are visited more frequently than the less popular sites or sites with lesser number of content.
As we know visitors do not like wait for more than 3sec for a web page to load. Thus skipping your website and content, that eventually results in affecting your SEO factor.
So you can understand that if you have a large website, crawlers may affect the speed of your server more thus effecting your real visitors.
To avoid such circumstances robots.txt file can be used to block the unimportant pages of your website, so that the crawlers can mainly focus on the important pages, contributing to your SEO factor.
- A robots.txt is a text file.
- It instruct/give permission to the search engine bots such as Googlebot etc., regarding the pages they can crawl and index.
- “robots.txt” is case sensitive.
- It helps to block the pages which are meant to be accessed in private.
- The most common predefined keywords User-Agent, Disallow, Allow, Sitemap.
Hope I was able to help you in understanding the basic concept of robots.txt. If yes, please let me know in the comment section. Have a great day.