Friday, 3 May 2013

Clean Up your Dirty URLs in Drupal

Having installed Drupal, the website building Content Management System (CMS), on a cloud server I had some issues getting Clean URLs to work.  This post describes how I got rid of my dirty URLs for Drupal hosted websites.

Dirty URLs


Default URLs for a Drupal hosted site include ?q= e.g. www.yoursite.com/?q=node/366 because it triggers a database query for what's after the ?q=.  Removing these characters, or at least not displaying them in the URL makes search engines friendlier towards your site and may increase your page's rankings.  A further step is to add aliases to pages so there is a human readable name instead of an id e.g. www.yoursite.com/barbie-girl instead of www.yoursite.com/?q=node/366.

Apache mod_rewrite

Apache is the software that serves up webpages from my server and a standard component of my CentOS 6.2 LAMP installation.  You can add and configure a number of different modules for Apache and one important one used by Clean URLs is mod_rewrite which translates URLs according to a set of rules (e.g. remove '?q='.

To check if this is installed you need to log into your server with ssh and navigate to the http.conf file which in this instance was at /etc/httpd/conf/httpd.conf.  Edit this httpd.conf file using a text editor e.g. nano and scroll down (found at line 190 with my OS) to check for LoadModule rewrite_module modules/mod_rewrite.so.  If it is commented out with a # then remove that comment and restart Apache with service httpd restart.  Finally to check it's working run the command apachectl -M and look for rewrite_module under Loaded Modules.

I opened my browser and found that although http://www.mysite.com/?q=user/1 worked http://www.mysite.com/user/1 didn't so Clean URLs wasn't working yet.

A little extra reading suggested another change to the httpd.conf file: change AllowOverride None to AllowOverride All within the <Directory "var/www/html"> section.  For my installation the line to be changed was line number 338 however Ctrl-W allows you to search for a word such as override in nano.  Lots of webpages go into more details of the Rewrite code - however this already existed along with lots of other stuff in /var/www/html/.htaccess.  The change to AllowOverride allows this file to come into effect for the DocumentRoot i.e. /var/www/html.

Changing this and retrying the browser URL change worked so I went back to my site (www.yoursite.com), still logged in as user 1 (full admin priveleges) I went to Configuration -> Clean URLs and hit the test button.  This time it passed and I could finally get to the Enable Clean URLs check box.

URL Alias

Now my homepage was simply www.mysite.com an improvement.  However my About page was www.mysite.com/node/1 when I'd like it to be www.mysite.com/about

This is where URL Aliases come into effect as shown below:
Which caused this:

Which allowed me to use www.mysite.com/about

Job done! Phew - not as hard as I was imagining or as hard as the posts I found out there made it.  Of course my situation was quite vanilla as it was a new and simple installation so this solution is not a cure-all.  However this is a simple explanation that currently isn't out there.

Hope it helps someone!