Monday, December 7, 2009

Module Rewrite - URL Rewriting Guide

SkyHi @ Monday, December 07, 2009

top
Module Rewrite

Welcome to mod_rewrite, voodoo of URL manipulation. This document describes how one can use Apache's mod_rewrite to solve typical URL based problems webmasters are usually confronted with in practice. The Apache module mod_rewrite is a module which provides a powerful way to do URL manipulations. With it you can nearly do all types of URL manipulations you ever dreamed about. The price you have to pay is to accept complexity, because mod_rewrite is not easy to understand and use for the beginner.
NOTE: Depending on your server configuration it can be necessary to change the examples for your situation. Always try to understand what it really does before you use it. Bad use would lead to deadloops and will hang the server.
The most example's can be used in the .htaccess file while other ones only in the Apache htppd.conf file.

top
RewriteCond The RewriteCond directive defines a rule condition. Preserve a RewriteRule with one or more RewriteCond directives. The following rewriting rule is only used if its pattern matches the current state of the URI and if these additional conditions apply too.
You can set special flags for condition pattern by appending a third argument to the RewriteCond directive. Flags is a comma-separated list of the following flags:
[NC] (No Case)
This makes the condition pattern case insensitive, no difference between 'A-Z' and 'a-z'.

[OR] (OR next condition)
Used to combinate rule conditions with a OR.


top
RewriteRule The RewriteRule directive is the real rewriting.
You can set special flags for condition pattern by appending a third argument to the RewriteCond directive. Flags is a comma-separated list of the following flags:
[R] (force Redirect)
Redirect the URL to a external redirection. Send the HTTP response, 302 (MOVED TEMPORARILY).

[F] (force URL to be Forbidden)
Forces the current URL to be forbidden. Send the HTTP response, 403 (FORBIDDEN).

[G] (force URL to be Gone)
Forces the current URL to be gone. Send the HTTP response, 410 (GONE).

[L] (last rule)
Forces the rewriting processing to stop here and don't apply any more rewriting rules.

[P] (force proxy)
This flag forces the current URL as a proxy request and put through the proxy module mod_proxy.


top
Regular expressions

Some hints about the syntax of regular expressions:

Text:
  . Any single character
  [chars] One  of chars
  [^chars] None of chars
  text1|text2 text1 or text2
Quantifiers:
  ? 0 or 1 of the preceding text
  * 0 or N of the preceding text (N > 0)
  + 1 or N of the preceding text (N > 1)
Grouping:
  (text) Grouping of text
Anchors:
  ^ Start of line anchor
  $ End of line anchor
Escaping:
  \ char escape that particular char

top
Condition pattern There are some special variants of CondPatterns. Instead of real regular expression strings you can also use one of the following:
< Condition (is lower than Condition)
Treats the Condition as a string and compares it to String. True if String is lower than Condition.

> Condition (is greater than Condition)
Treats the Condition as a string and compares it to String. True if String is greater than CondPattern.

= Condition (is equal to Condition)
Treats the Condition as a string and compares it to String. True if String is equal to CondPattern.

-d (is directory)
Treats the String as a pathname and tests if it exists and is a directory.

-f (is regular file)
Treats the String as a pathname and tests if it exists and is a regular file.

-s (is regular file with size)
Treats the String as a pathname and tests if it exists and is a regular file with size greater than zero.

-l (is symbolic link)
Treats the String as a pathname and tests if it exists and is a symbolic link.

-F (is existing file via sub request)
Checks if String is a valid file and accessible via all the server's currently configured access controls for that path. Use it with care because it decreases your servers performance!

-U (is existing URL via sub request)
Checks if String is a valid URL and accessible via all the server's currently configured access controls for that path. Use it with care because it decreases your servers performance! NOTE: You can prefix the pattern string with a '!' character (exclamation mark) to specify a non-matching pattern.

top
Protecting your images and files from linking DESCRIPTION: In some cases other webmasters are linking to your download files or using images, hosted on your server as inline-images on their pages.
RewriteEngine On
RewriteCond %{HTTP_REFERER} !^$ [NC]
RewriteCond %{HTTP_REFERER} !^http://domain.com [NC]
RewriteCond %{HTTP_REFERER} !^http://www.domain.com [NC]
RewriteCond %{HTTP_REFERER} !^http://212.204.218.80 [NC]
RewriteRule ^.*$ http://www.domain.com/ [R,L]
EXPLAIN: In this case are the visitors redirect to http://www.domain.com/ if the hyperlink has not arrived from http://domain.com, http://www.domain.com or http://212.204.218.80.

top
Redirect visitor by domain name DESCRIPTION: In some cases the same web site is accessible by different addresses, like domain.com, www.domain.com, www.domain2.com and we want to redirect it to one address.
RewriteEngine On
RewriteCond %{HTTP_HOST} !^www.domain.com$ [NC]
RewriteRule ^(.*)$ http://www.domain.com/$1 [R,L]
EXPLAIN: In this case the requested URL http://domain.com/foo.html would redirected to the URL http://www.domain.com/foo.html.

top
Redirect domains to other directory
RewriteEngine On
RewriteCond %{HTTP_HOST} ^www.domain.com$
RewriteCond %{REQUEST_URI} !^/HTML2/
RewriteRule ^(.*)$ /HTML2/$1

top
Redirect visitor by user agent DESCRIPTION: For important top level pages it is sometimes necesarry to provide pages dependend on the browser. One has to provide a version for the latest Netscape, a version for the latest Internet Explorer, a version for the Lynx or old browsers and a average feature version for all others.
# MS Internet Explorer - Mozilla v4
RewriteEngine On
RewriteCond %{HTTP_USER_AGENT} ^Mozilla/4(.*)MSIE
RewriteRule ^index\.html$ /index.IE.html [L]

# Netscape v6.+ - Mozilla v5
RewriteCond %{HTTP_USER_AGENT} ^Mozilla/5(.*)Gecko
RewriteRule ^index\.html$ /index.NS5.html [L]

# Lynx or Mozilla v1/2
RewriteCond %{HTTP_USER_AGENT} ^Lynx/ [OR]
RewriteCond %{HTTP_USER_AGENT} ^Mozilla/[12]
RewriteRule ^index\.html$ /index.20.html [L]

# All other browsers
RewriteRule ^index\.html$ /index.32.html [L]
EXPLAIN: In this case we have to act on the HTTP header User-Agent. If the User-Agent begins with Mozilla/4 and is MS Internet Explorer (MSIE), the page index.html is rewritten to index.IE.html and the rewriting stops. If the User-Agent begins with Mozilla/5 and is Netscape (Gecko), the page index.html is rewritten to index.NS5.html. If the User-Agent begins with Lynx/ or Mozilla/1,2, the page index.html is rewritten to index.20.html. All other browsers receive page index.32.html


REFERENCES
http://www.widexl.com/tutorials/mod_rewrite.html