VirtueMart: remove duplicate content

VirtueMart is a powerful and free Joomla! component for creating eCommerce websites.

It has a lot of very interesting features and, in my two cents opinion, it’s a complete solution to manage your online store.

In a SEO point of view, VirtueMart has some problems with duplicate content and this article will try to teach you how to edit your .htaccess file to use 301 permanent redirects to fix duplicate products in your online store.

I had to fix this problem in a customer website. These are the duplicate urls I found on their default virtuemart installation:

  1. /index.php?option=com_virtuemart&page=shop.browse&category_id=29&lang=it&Itemid=233&vmcchk=1
  2. /index.php?option=com_virtuemart&page=shop.browse&category_id=29&lang=it&Itemid=233
  3. /index.php?option=com_virtuemart&page=shop.browse&category_id=29&lang=it
  4. /index.php?option=com_virtuemart&page=shop.browse&category_id=29
  5. /shop?page=shop.browse&category_id=29&lang=it&Itemid=233&vmcchk=1
  6. /shop?page=shop.browse&category_id=29&lang=it&Itemid=233
  7. /shop?page=shop.browse&category_id=29&lang=it
  8. /shop?page=shop.browse&category_id=29

As you can see, each product has 8 different URLs search engines can reach it. The store had 1.000 products so Google and other search engines saw about 8.000 duplicate pages.

Analyzing URL structure gave me a pattern to redirect progressively every URL to the last one.

You can see which are the problems in the query string and what could be a work flow to fix the duplicates problem:

  1. vmcchk parameter to be ignored
  2. Itemid parameter to be removed
  3. lang parameter to be removed
  4. index.php?option=com_virtuemart to be redirected to shop

VirtueMart Itemid parameter can be safely removed: it’s useless and it creates lot of duplicates. The lang parameter had been put by Joomfish plugin, not in use, and can be removed. The vmcchk parameter shouldn’t beremoved and you can instruct Googlebot to ingore it in Google Webmaster Tools. Last, but not least, we need to redirect index.php?option=com_virtuemart to shop.

Even if there are some plugins likeĀ sh404SEF to switch VirtueMart to a more SEF (Search Engine Friendly) URL structure, I prefer using Apache mod_rewrite and permanent redirects to explain Googlebot and other spiders how to crawl the shop.

This kind of solution, which works for me but WILL NOT WORK FOR YOU without a deep understanding of the statements, tells search engines how to permanently redirect each page to the last URL.

  1. RewriteEngine On
  2. RewriteCond %{QUERY_STRING} (.*)(^Itemid=[a-zA-Z0-9]+&?|^&Itemid=[a-zA-Z0-9]+&|&Itemid=[a-zA-Z0-9]+)(&?.*)
  3. RewriteRule (.*) %{REQUEST_URI}?%1%3 [L,R=301]
  4. RewriteCond %{QUERY_STRING} (.*)(^lang=[a-zA-Z0-9]+&?|^&lang=[a-zA-Z0-9]+&|&lang=[a-zA-Z0-9]+)(&?.*)
  5. RewriteRule (.*) %{REQUEST_URI}?%1%3 [L,R=301]
  6. RewriteCond %{QUERY_STRING} ^(.+&)option=com_virtuemart(.+)?$ [NC]
  7. RewriteRule ^index\.php$ http://%{HTTP_HOST}/shop$1?%1%2 [R=301,L]
  8. RewriteCond %{QUERY_STRING} ^(.+&)?option=com_virtuemart&(.+)?$ [NC]
  9. RewriteRule ^index\.php$ http://%{HTTP_HOST}/shop$1?%1%2 [R=301,L]

I try to explain what every htaccess statement means:

  1. RewriteEngine On
    Enables Apache mod_rewrite for URL rewriting
  2. RewriteCond %{QUERY_STRING} (.*)(^Itemid=[a-zA-Z0-9]+&?|^&Itemid=[a-zA-Z0-9]+&|&Itemid=[a-zA-Z0-9]+)(&?.*)
    Looks for Itemid= inside the query string
  3. RewriteRule (.*) %{REQUEST_URI}?%1%3 [L,R=301]
    Remove Itemid= string from the URL, using a 301 permanent redirect
  4. RewriteCond %{QUERY_STRING} (.*)(^lang=[a-zA-Z0-9]+&?|^&lang=[a-zA-Z0-9]+&|&lang=[a-zA-Z0-9]+)(&?.*)
    Looks for lang= inside the query string
  5. RewriteRule (.*) %{REQUEST_URI}?%1%3 [L,R=301]
    Remove the string from the URL using a 301 permanent redirect
  6. RewriteCond %{QUERY_STRING} ^(.+&)option=com_virtuemart(.+)?$ [NC]
    Looks for the string option=com_virtuemart in the middle and at the end of the query string
  7. RewriteRule ^index\.php$ http://%{HTTP_HOST}/shop$1?%1%2 [R=301,L]
    Rewrites option=com_virtuemart to /shop and remove /index.php, using a 301 permanent redirect
  8. RewriteCond %{QUERY_STRING} ^(.+&)?option=com_virtuemart&(.+)?$ [NC]
    Looks for option=com_virtuemart just after index.php
  9. RewriteRule ^index\.php$ http://%{HTTP_HOST}/shop$1?%1%2 [R=301,L]
    Rewrites index.php?option=com_virtuemart to shop, using a 301 permanent redirect

With those rewrites, we have removed all the duplicates products, moving from the url /index.php?option=com_virtuemart&page=shop.browse&category_id=29&lang=it&Itemid=233 to the url /shop?page=shop.browse&category_id=29

Since we have used permanent redirects, Google and other search engines will know the duplicate pages have moved permanently to the last one and in some weeks your VirtueMart website won’t have any duplicate anymore.

WARNING: this htaccess needs to be deeply understood before you can use it in your website; it WILL NOT WORK in your VirtueMart store just copying-pasting it!

The original post, in Italian, is located here: How to remove duplicate products in virtuemart.

Tags: , , ,

One Response to “VirtueMart: remove duplicate content”

  1. Google indexing related « thomastka Says:

    [...] http://www.polirant.com/2010/12/virtuemart-remove-duplicate-content/ [...]

Leave a Reply