[pmwiki-users] lots of problems when redirecting or rewriting URLs

Joachim Durchholz jo at durchholz.org
Fri Jan 20 08:13:49 CST 2006


DaveG schrieb:
> I quote a number of things from the apache docs below. Here is what I'm 
> referencing: http://httpd.apache.org/docs/2.2/mod/mod_rewrite.html

OK.

> Joachim Durchholz wrote:
> 
>> DaveG schrieb:
>>> Pmwiki handles the conversion FROM complex format 
>>> INTO simple format that is shown on the browser address bar.
>>
>> Yes.
>>
>>>     b) rewrites the browser address bar URL to the simple format;
>>
>> No.
> 
> The .htaccess have converted to complex format, which includes a call to 
> pmiwki.php. Pmwiki must be changing http headers to get the simple url 
> format in the browser address bar.

Definitely not the HTTP headers - they aren't involved in what's shown 
in the browser's address bar (unless there's a header I haven't heard 
about).

The rewrite rules do the transformation from simple to complex.
PmWiki generates simple URLs if $EnablePathInfo is set.
When the user clicks on the simple-URL link, that link is automatically 
moved to the address bar, then transmitted to the server.
The URL is fed to the rewrite rules, which closes the cycle.

PmWiki does not do the transformation. PmWiki generates the links from 
scratch and doesn't use the incoming URL in any way.

>> The result of the RewriteRule directive is never seen by the browser!
> 
> Because pmwiki rewrites the headers I suspect.

As said: there's no "header rewriting" in effect. All that PmWiki does 
(and needs to do) is to generate the right URLs in the links.

>>> #
>>>     RewriteBase /~nepherim/pmwiki/
>>> Strip out this part of the url, and leave us with whatever follows. 
>>> Thus, from "http://dom.com/~nepherim/pmwiki/Main/HomePage" we now 
>>> have "Main/HomePage".
>>
>> No.
> 
> Here I disagree. The Apache docs pretty clearly state that:
> "i.e., the local directory prefix is stripped at this stage of 
> processing and your rewriting rules act only on the remainder. At the 
> end it is automatically added back to the path."

Yes, but that's independent of whether you have RewriteBase set or not.
I'm not 100% sure what RewriteBase does, but it's definitely not needed 
for stripping the prefix. Initially, the CleanUrls page had no reference 
to RewriteBase, and indeed it would work without that in many 
configurations, including some where PmWiki was installed into a 
subdirectory.

>>> #
>>>     RewriteCond %{QUERY_STRING} ^$
>>> Something to enable searching. What I'm not sure of is what condition 
>>> needs to be satisfied to execute the following rewrite.
>>
>> Nonono.
>>
>> QUERY_STRING is a part of the URL - anything that goes after ?.
>> I.e. ?action=edit would be a valid query string of a URL.
> 
> I understand. My point, and question was that RewriteCond is a 
> conditional diretive. So what I'm not sure of here is *what* condition 
> it's putting.

It's not putting a condition, it's getting one (and acting upon it).

I'm not 100% sure, but I think it's testing whether the environment 
variable QUERY_STRING matches ^$, which amounts to testing whether it's 
empty. In that case, the following rule is either ignored or used (I'm 
not sure which).

>>> (RewriteCond is basically an IF statement -- if the condition 
>>> evaluates to true (AND RewriteCond's immediately following evaluate 
>>> to true) then execute the next RewruteRule directive.)

Which pretty much fills in my ignorance about RewriteCond :-)

>>> #
>>>     RewriteRule ^/?$ ~nepherim/pmwiki/Main/HomePage/ [R=permanent,QSA,L]
>>> Alter URLs with a trailing "/" or with no trailing "/" to the 
>>> HomePage. Thus, user entered URLs of "~nepherim/pmwiki" or 
>>> "~nepherim/pmwiki/" translate to "~nepherim/pmwiki/Main/HomePage/".
>>>
>>> R=permenant: tell the browser that this redirect is permenant. 
>>> Default is Temporary.
>>
>> No. Default is "internal", i.e. Apache immediately takes the newly 
>> generated URL and serves whatever is behind *that*.
> 
> It is internal, but again, this is what I got from the Apache docs:
> "If no code is given a HTTP response of 302 (MOVED TEMPORARILY) is used."

Ah, I was a bit unclear.

The default for [R] is indeed 302.
The default for not having R in the [...] options is an internal subrequest.

>>> QSA: Query String Append. If we have queries on the incomming URL, 
>>> like   "?action=edit" then append them to our new URL.
>>>
>>> L: Stop processing. In this case the next RewriteRule doesn't get 
>>> processed, and we're done.
>>> *** Question: I must be misunderstanding this parameter. Why do we 
>>> stop? If we stop here then we haven't converted to complex format.
>>
>> Because a permanent redirect directly returns to the browser, without 
>> giving it any HTML (but it does return the redirected-to URL).
> 
> Seems to be a little different to:
> "Stop the rewriting process here and don't apply any more rewriting 
> rules ... Use this flag to prevent the currently rewritten URL from 
> being rewritten further by following rules. For example, use it to 
> rewrite the root-path URL ('/') to a real one, e.g., '/e/www/'."
> 
> *However,* if L does stop processing, I don't see how the simple url 
> gets converted to complex, as that comes in the next statement.

Actually, there's no difference; the Apache manual is talking about a 
different phase of request processing :-)

L does indeed stop rewrite processing. At that stage, the request is in 
status "URL changed to yadda.yadda.yadda/Main/HomePage, request status 
is 302".

Apache then goes on to request *serving*.
Now if the status is 302, Apache doesn't care about looking any further. 
It simply sends a HTTP header with a 302 status and the redirect URL.

The browser is then expected to adjust its page cache, decide whether 
the redirected page is interesting at all, and if yes, to fetch the 
redirect page and update the URL bar.
However, that's another request, and the server side doesn't know 
whether a call for .../Main/HomePage was generated through a redirect or 
by typing it directly on the URL bar.

>>> #
>>>     RewriteRule ^([^/a-z].*) pmwiki.php?n=$1 [QSA,L]
>>> Here we're matching anything which is NOT a lowercase letter (a-z) 
>>> followed by anything else. What this means in practice is that we're 
>>> finding the pmwiki group and page name, since pmwiki groups always 
>>> start with an uppercase character.
>>
>> Actually with a non-lowercase letter.
> 
> Same as uppercase...?

No.

Lowercase characters are defined as a-z.
There are many languages that have lowercase characters outside of that 
range. In German, that would be äöü. The French have accented and 
cedilla'd characters. Etc. etc. etc.

Essentially, the [a-z] range above was used because a-z are the standard 
filename characters on a Unixoid operating system. I.e. the PmWiki 
namespace is then divided like this:
.../[a-z]* - files that live in the installation directory of PmWiki
Anything else - PmWiki pages.

So actually it's less of a lowercase vs. uppercase distinction, it's a 
file name vs. wiki page distinction. (Of course, this largely boils down 
to the same, but e.g. on a Windows operating system, the filesystem name 
conventions do allow umlauts, so [:lower:] might be a better choice than 
[a-z] - depending on whether the installed regex library that Apache 
uses does recognize :lower:, that is...)

Regards,
Jo




More information about the pmwiki-users mailing list