setup
we have the following directory structure on our HTTP/web server:
/questions/who?/
/questions/what?/
/happy? part. 01/
/happy? yet?/
/happy? yet? again? really?!/
question
my question: is it possible to have the corresponding URIs/URLs with unescaped/unencoded question marks (?
) resolve correctly? e.g. the URL http://test.org/happy? part. 01/
will resolve to /happy? part. 01/
on the server. due to ?
signifying a query string, this is has been a pickle of a problem for me.
background/research
as expected by default Apache treats the first ?
as the beginning of a query string. so out of the box a URL of http://test.org/happy? part. 01/
will be converted to a URI path /happy
and query string part.01/
, resulting in a 404 since the path /happy
does not exist.
most of the other answers/tips i have found in my research mainly deal with rewriting the URL assuming that the ?
indicates a query string, e.g.
- .htaccess Question mark allowing
- Apache .htaccess: How to remove question mark from URL if not `?id=(.*)`?
- htaccess rewrite rule, remove question mark
however, in this case we can assume that our HTTP server will not be receiving URLs with query strings.
i realize that normally browsers/etc. will encode the URI before sending it to the server (e.g. http://test.org/happy? part. 01/
will be sent to the server as http://test.org/happy%3F%20part.%2001/
, though which characters are encoded depends on the app and their support for which URI standard version: RFC2396 or RFC3986). but for this scenario the server may be getting unencoded URLs but never any URLs with query strings.
my attempts
at first i thought a simple rule like this would suffice:
RewriteRule ([^\?]*?)\?([^\?]*?) $1\?$2 [NE,N]
here i am trying to repeatedly find all the ?
s and simply reinsert them into the URL unescaped. unfortunately the regular expression (and many variations) is not matching the URLs that contain ?
, instead only matching the encoded ?
value %3F
. and even when it matches, the second capture group $2
seems to always be empty. finally, the \?
in the substitution string seems to be preventing anything after it from being written.
the above linked solutions led me to the fact that to check for the ?
i had to check %{THE_REQUEST}
variable since Apache will strip the query string for other server variables/RewriteRules. to that end i tried variations of this:
RewriteCond %{THE_REQUEST} ^[A-Z]+\ \/([^\?]*?)\?([^\?]*?)\/?\ HTTP
RewriteRule ^(.*?)\?(.*?)$ $1\?$2 [NE,N]
while the regular expression of the RewriteCond
is matching URIs with ?
, the %2
in the RewriteRule
causes an Internal Server Error
, though without it i seem to have no way of accessing the part of the URL after the ?
.
finally, i also tried various things with %{QUERY_STRING}
and [QSA]
but still no luck.
thanks for taking a look.