Documentation

Find articles, step-by-step instructions, and advice for getting the most out of ThemeXpert.com

Robot.txt

The robot.txt and meta tag are two quite similar but independent mechanism used for a particular purpose. There is a little different purpose of using, though. But in commonly, they are used to inform the Search Engine Robots that which part of your site should be indexed and which part is supposed to be blocked or disallow. In a word, the robots.txt tells search engines what pages they shouldn't index. Both of them should be used very carefully as a little mistake can make big difficulties.

The robot.txt is a public file that is to say Search Engine Robots to generate The Robots Exclusion Protocol. Literally, the website owners use robot.txt file to define which paths/ folders are supposed to be indexed by the search engines and which not. It can be customized manually either. Later, it can be easily checked whether the robots.txt file works well or not, using the Blocked URL's section of your Google Webmaster Tools.

Let's check, how it works. Actually, when a robot searches for a specific URL say, http://something.com/about.html on a search engine, first it checks http://something.com/robot.txt to know about whether the URL path is restricted on not. If not, only then the search engine index that particular path and show expected output result.

If the bot finds inside of the robot.txt that there is defined a section or page/ section to be Disallow, it stops checking that specific page/ section but still, it won't prevent to be indexed. By default, search engines are allowed to go through every section of your site.

Usually, robot.tx is used to restrict a system folder to be indexed whereas Meta Tag defines how a specific section to be displayed as output.

Where to create robot.txt file

To get it work, the only appropriate location where you should create the robot.txt file is in the top-level directory of your web server.

Things to remember: You have to use all lowercase robots.txt, not Robots.TXT.

What to write inside the robot.tx file

Inside of a robot.tx file usually looks like below:

User-agent: *
Disallow:
Allow:

where the User-agent:* defines that its checkable for all bots whereas Disallow: defines that it is not allowed to be indexed and Allow: defines that it is allowed to be indexed. So, if you want to stop any of your site's system folder indexing by search engines, you just need to define it after the Disallow: / text. Example:

User-agent: *
Disallow: /resource/

It will restrict that resource folder for the search engines from indexing. Removing the slash (/) from the end of the Disallow: / text means that the whole site is allowed to visit for every search bots.

Here I'm going to show some different action format of robot.txt and their corresponding functionalities.

Restrict all robots from your entire server

User-agent: *
Disallow: /

Allow accessing all robots to your entire server

User-agent: *
Disallow:

Restrict all robots from some part of your server

User-agent: *
Disallow: /resource/
Disallow: /plugins/
Disallow: /location/

Restrict a single robot from your entire server

User-agent: Yahoo
Disallow: /

Allow accessing a single robot to your entire server

User-agent: Google
Disallow:

User-agent: *
Disallow: /

Restrict indexing by Meta Tag

Robot.txt

Robots meta tag is the most efficient way to restrict particular page/ section/ URL from being indexed by search engines whereas robot.txt only restrict system folders. You can use four combinations of settings with it. If you don't want to hide your site, keep the setting like Index, Follow under the meta tag settings.

Resolve Media Indexing Problem

Joomla standard robot.txt file works absolutely fine except one major issue: It blocks /images folder. Sometimes, sort of mandatory resources files for your site might be stored in the images, media, templates, components, modules and plugins folders. So, to make your site worked perfectly, either comment out the lines or remove them from the robot.txt file as shown below.

User-agent: *
Disallow: /administrator/
Disallow: /cache/
Disallow: /plugins/
# Disallow: /images/ <-- Write # before or remove them
# Disallow: /media/ <-- Write # before or remove them
# Disallow: /templates/ <-- Write # before or remove them
# Disallow: /components/ <-- Write # before or remove them
# Disallow: /modules/ <-- Write # before or remove them
# Disallow: /plugins/ <-- Write # before or remove them

Include two additional rules underneath the rest of the code like below image.

Robot.txt

Fix Sitemap Pointing Problem

If your sitemap is created by a Joomla extension like Xamp, OSmap, Jsitemap, etc. therefore it is not then located in the root of your web directory. robot.txt can be used in that case to locate your xml-sitemap files. You just need to look up the sitemap location configuration of the extension and then simply point it at the bottom of your robot.txt file like below.

Sitemap: index.php?option=com_osmap&view=xml&tmpl=component&id=1

Joomla Update Issue

Often and often Joomla package is being updated including a few changes in the robot.txt file. But in the newer version of Joomla, they do not distribute a new robot.text file because there would be a probability to overwrite your customizations that you made according to your need. Instead, they provide a new file named robot.txt.dis. If you didn't make any customization, simply rename the file as robot.txt but if you have your own customized robot.txt file then checks what's changed in the new file then copy and paste to your customized robot.txt file.

Allowing CSS or JS file

If you ever notice that your site is not like how it should look like then think that there must be restricted any of its resource file (CSS or JS) from the system folders. You can easily allow a restricted CSS or JS file by defining on the robot.txt file like below.

User-agent: * Robot.txt

Make your site Mobile Friendly

Google's continuously been updating their search algorithms for mobile devices to find mobile friendly sites easily. As we know, nowadays, web developers are more sensitive about their site's responsiveness according to device sizes. To know about your site whether it is mobile friendly or not, go to https://www.google.com/webmasters/tools/mobile-friendly/ and double check that Google agrees with you. A dashboard module will be open up. Somehow, you might find that there is sort of resources file are been blocked by robot.txt file.

Robot.txt

You just need to remove a few lines/ rules from the inside of the robot.txt file. Simply remove the following lines from robot.txt.

Robot.txt

Include two additional rules underneath the rest of the code like below image.

Robot.txt

You are done! Now check again and you'll find that your site is now mobile-friendly.

Fix JCH Cache plugin Problem

CSS and JS files are the most important files for a site to load perfectly as they tends to be. But if the system files contains any of the CSS or JS files are blocked on robot.txt file then it prevents Google bot from properly rendering the page and therefore understanding that it's optimized for mobile or not. So, it is important to ensure that all the necessary resources are not blocked on robot.txt file. JCH-Optimize combines multiple CSS and JS files into a single file. In the case you use JCH-Optimizer, you need to make sure that the following two system folders are not blocked. If they are, allow them like below screenshot.

Robot.txt