0

I have a problem with the mod_rewrite module with Lighty.

I am trying to make this: example.com/index.php?search=whatever, appear as example.com/whatever or example.com/search/whatever (have not decided yet -- don't know much about SEO)

While I want it to act like above, I also want to exclude all physichal directories and all files, such as the directory /images/ and the files index.php, favicon.ico, style.css etc. from the rewrite, because it acts weird.

How would I achieve this? I've tried the following, which worked okay for what I wanted, but didn't really work with the exclusion of the directories and files:

url.rewrite-once = (
"^/([a-zA-Z0-9_-]+)" => "/index.php?search=$1",
"^/(images|js|wp-content)/(.*)" => "$0",
"(.*\.php|.*\.css|favicon.ico)" => "$0" )

By the way, what difference is there between this:

"^/([a-zA-Z0-9_-]+)" => "/index.php?search=$1",

And this:

"^/(.*)$" => "/index.php?search=$1"
AnonymousJ
  • 87
  • 1
  • 12

1 Answers1

1

To avoid having to add numerous RewriteCond directives checking that the visitor's request is not an actual file rather than an artificial path, I recommend going with the /search/whatever pattern in favour of the /whatever pattern. Then, so long as you never create an actual directory called "search", you'll never need to check whether a path beginning with /search is an actual file path. So your RewriteRule becomes this simple:

RewriteRule ^/search/([a-zA-Z0-9_-]+)$ /index.php?search=$1

(I'm not familiar with Lighty, so I'm not sure how to translate this into a url.rewrite-once instruction, but this is such a simple rewrite that it should be straightforward.)

However, the visitor's browser will now think that they are viewing a page which is in a sub-directory called "search", so if you have any image elements or CSS files specified with relative paths (paths not anchored to the root directory) such as src="images/photo.jpg" or href="stylesheets/clean.css" then the browser will think those paths are relative to the "search" directory and will ask your web server for /search/images/photo.jpg and /search/stylesheets/clean.css respectively.

There are two ways to do this. The first is to change all page decoration (images, stylesheets, JavaScript) paths to absolute paths. That is, change the path so that it begins with a forward-slash which represents the root directory of the website. So your image path would need to be changed to src="/images/photo.jpg" and your stylesheet path to href="/stylesheets/clean.css". The forward slash at the start tells the web browser that the path starts at the site's root directory, so there is no ambiguity.

The second option is to create convoluted RewriteRules to redirect requests for images, stylesheets, script files, etc, to the correct directories. This tends to become ugly and fragile if you have a lot of media types in a lot of different directories and you need them to work from a lot of different sub-directories (virtual and/or otherwise).

Which option you choose depends on your requirements and preferences.

Regarding your question about the difference between [a-zA-Z0-9_-]+ and .* the first pattern only allows letters a to z (lowercase or uppercase), digits, underscores and hyphens. The second pattern allows any characters. For security and debugging reasons, it's usually better to use the pattern which limits characters to only those which should be allowed. So I'd go with the first of those patterns, adding additional permitted characters if necessary, rather than allow all characters.

Bobulous
  • 12,967
  • 4
  • 37
  • 68
  • Okay that helped a lot! But now I've found another problem. Whenever I search with special letters, such as ö, ü, ï etc. it wont work. I've tried adding the \p{L} for all letters like this: "^/search/([0-9\p{L}_-]+)" => "/index.php?search=$1" but that doesn't help at all. By the way, whenever I search for a name such as "john example", it only searches for whatever is before the space. I've tried to combat this with "john+example", "john_example" and "john-example", but none of those works. Am I missing something in my RewriteRule? – AnonymousJ Apr 21 '13 at 14:05
  • While when I do allow all characters through the `.*` pattern, everything seems to work fine. – AnonymousJ Apr 21 '13 at 14:29
  • I'm not familiar with the `\p{L}` syntax, but regex does offer the `\w` word characters class (which matches letters, digits and undescores). So you could try `([\w+-]+)` to allow word characters, plus-symbols and hyphens. You can also type a space into the character class (but make sure the literal hyphen is the very last character in the square brackets) but spaces don't tend to be permitted in raw URLs, so I don't know whether spaces will ever match. – Bobulous Apr 21 '13 at 16:50
  • Well, I don't know what I'm doing wrong, but using `\w` and the one you provided with plus-symbol and hyphens solves the problem with "john+example". But, it does not work when searching for something with %F6 (the equivalent of ö in the address bar of my Chrome browser), while it works with the use of the `.*` syntax/pattern. – AnonymousJ Apr 21 '13 at 17:27
  • Possibly `\w` only matches basic a to z letters. Or possibly you need to add a `%` symbol just before the final hyphen in the square brackets. Either way, if `.*` works for you then it probably won't hurt to use it in this case. Just make sure your PHP script checks that the value provided does not contain code injection before it does anything sensitive with the value (such as adding it to a database query). – Bobulous Apr 21 '13 at 17:32
  • Well, I do sanitize the input that comes through the PHP-script [like this](http://stackoverflow.com/questions/16134260/do-i-sanitize-escape-correctly). Hopefully I am sanitizing enough/correctly, so I can use the `.*` syntax, as the other wont work. :-/ – AnonymousJ Apr 21 '13 at 18:22