Category: Liferay

Google webmaster tools

Problems, problems, problems…

Have you ever been mithered by nasty web bots which insist on trawling your web content, even though its password protected ?

Yes us too, this was happening on our implentation of Liferay.

We knew that the answer lay within the ‘robots.txt’ file on your server, we just couldn’t get it to stop bots accessing the required directories. This is an issue that has been bothering us with the use of Liferay for a while now, you might be glad to hear that I’ve solved the issue : This is how!

To make customising the robots.txt file all the more easier for your setup, I suggest that you create an account on Google Webmaster Tools
And register your webpages which are affected.

How to create a robots.txt file:
1. On the Webmaster Tools Home page, click the site you want.
2. Under Site configuration, click Crawler access.
3. Click the Generate robots.txt tab.

The lines that you want in your robots.txt file to stop bots from trying to trawl Liferay on a Glassfish implentation are :

User-agent: *
Disallow: /glassfish/domains/domain1/applications/j2ee-modules/liferay-portal
Disallow: /web/guest/home
Disallow: /web/guest
Allow: /

This might not be what your exactly after, but hopfully it will point you in the right direction.