How To Add NoIndex On .htaccess To Prevent robots.txt From Being Indexed
First of all let’s talk some basic definitions here so that I can explain effectively how and why do we need to add the “noindex” attributes to your robots.txt. Although I am pretty sure that if you are reading this now it means you already have some basic knowledge of tweaking your website and there’s no need for me to dig really deep just to explain everything. My main focus here is to show you and explain the need to tell all search engines not to index robots.txt. So how do I add noindex attribute on my .htaccess so that search engines like Google, Bing, and Yahoo don’t list my robots.txt on search engine result pages?
What is robots.txt?
I don’t want to sound so geeky that is why I will just define robots.txt in its simplest form. The file robots.txt is just a small plain text file (where you can open on any text editors) lying on your webserver’s root directory. A powerful file indeed and can really do wonders for you and your website. If not used properly, you can compromise your website. The robots.txt tells search engines like Google, Yahoo, Bing and many others what to index on their database and what to list on search engine result pages (also known as SERP). Each search engines crawls everything that there is in the Internet and this includes your robots.txt.
Alright, then why do we need to to hide robots.txt from search engines? Actually, we are not going to hide the robots.txt. We will actually let all of those search engines to see the file and crawl it but what we really want to do is tell these search engines not to enlist the file on SERP. Why? Basically, if you go to search engines like Google and you search for something you want to see the most relevant results, right? So if somehow you landed onto someone’s robots.txt does it annoy you and immediately close the page? This results into bad experience on your website and adds onto your bounce rate (too high bounce rate is bad for your website). That is why we need to tell the search engines to still review the robots.txt.
Now, in order for you to do it we’ll need this another powerful file within your webserver – .htaccess. So what is a .htaccess file? This is a configuration file for Apache web based servers. Like the robots.txt, this file is one of the essential part of your webserver. It tells the webserver what to do like page redirections, password-protect certain directories, and many more if you can master how to user it. This is the only venue where we can configure on how search engine should treat our robots.xt file.
- How to add noindex on your htaccess to prevent search engines from indexing your robots.txt?
- Download your .htaccess from your webserver’s root directory (from ftp or you can edit it directly);
A rule-of-thumb, always create a backup file before you do the changes just in case something messed up – or if you don’t follow my instructions right. But I can assure you that after adding the rule below everything should be fine.
- Open the .htacess file with any of your text file editor
- Go to the last part of the file (bottom) and add the following:
# BEGIN NOINDEX FOR SITEMAP
Header set X-Robots-Tag “noindex”
# END NOINDEX FOR SITEMAP
- Save your file and upload it back to your webserver.
- Check your website through a web browser.
Basically that’s it!
Here’s a video from Matt Cutts explaining that even you disallow a page in robots.txt it may still appear in search engines like Google.