[pmwiki-users] lots of problems when redirecting or rewriting URLs

Joachim Durchholz jo at durchholz.org
Thu Jan 19 17:47:58 CST 2006


DaveG schrieb:
> *Addendum:* As I was writing this, a *lot* became clear. The main thing 
> I realized is that the intent of the .htaccess script is to change URLs 
> FROM simple format INTO complex format (refer to definitions below). 
> Pmwiki handles the conversion FROM complex format INTO simple format 
> that is shown on the browser address bar.

Yes.

> --- *Questions*
> 1] What does $EnablePathInfo = 1; actually do? I think it:
>     a) rewrites URL's on wiki pages to the simple format;

Yes.

>     b) rewrites the browser address bar URL to the simple format;

No.

The full picture is:

The browser sends the simple format;
the rewrite rule transforms to the complex format.

PmWiki doesn't care much what kind of URL led to it (it can't reliably 
infer that anyway). It simply looks first at complex-format info, and if 
that comes up empty, it tries the simple-format (pathinfo) one. (Or 
maybe it's the other way round. I don't really know or care.)
After it got group and page name, PmWiki doesn't even take another look 
at the URL string (except for action=... and such).

When PmWiki writes links, it doesn't care how it got the link. It simply 
looks at $EnablePathInfo, and if it's TRUE, it emits a simple-format 
URL, and if it's FALSE, it emits a complex-format one.

>     Thus, the rewrites only need to convert from incoming simple format 
> to the complex format so that PHP and thus Pmwiki can handle the request.

Yes.

> --- *Definitions*
> In the text below, I'll use:
>   - "complex URL": URL's in the format "/pmwiki.php?n=Main.HomePage".
> 
>   - "simple URL": URL's in the format "/Main/HomePage".
> 
> 
> --- *Background*
> PHP needs incoming URL's to be of the complex format in order to process 
> them correctly.

Yes.

 > PHP (and thus pmwiki) cannot handle or process simple
> URLs, as there is no way to know which parts of the URL are parameters.

No.

Well, sort of. It turns out that there is no reliable way to extract the 
path info from the URL information that it has.

With Apache, you can (usually!) take the information directly off 
$_ENV['PATHINFO']. You can even do without rewrite rules by inferring 
group and page name from that in your config.php.
(Might be a good addition to the CleanUrls recipe.)

With IIS, things get more complicated.

> Thus, the rules in .htaccess convert from simple format to complex format.

Yes.

> --- *Analysis*
> 2] Here's the line by line script analysis, assuming the user enters a 
> simple URL:
>     http://dom.com/~nepherim/pmwiki/Main/HomePage
> 
> and here's the complex URL we need to convert into so PHP can process:
>     http://dom.com/~nepherim/pmwiki/pmwiki.php?n=Main.HomePage

Yes.

> Pmwiki will handle the the conversion of the "complex URL" back into the 
> simple format we will see in the browser address bar.

Well, sort of - PmWiki just writes simple URLs when it emits its pages.
After that, no further conversion is needed - the browser picks the 
simple URLs off the HTML pages and displays them.

The result of the RewriteRule directive is never seen by the browser!

> #
>     Options +FollowSymLinks
> Follow existing symbolic links. (Need more detail here.)

The rewrite engine will complain if FollowSymLinks isn't set.
The rationale is that URL rewriting is strictly more powerful than 
following symbolic links, so if symbolic links are disallowed, rewriting 
should be even less allowed.

> #
>     RewriteEngine on
> Turn on the rewrite engine.

Yes.
Without that line, RewriteBase and RewriteRule will be ignored.

> #
>     RewriteBase /~nepherim/pmwiki/
> Strip out this part of the url, and leave us with whatever follows. 
> Thus, from "http://dom.com/~nepherim/pmwiki/Main/HomePage" we now have 
> "Main/HomePage".

No.
I don't properly recall the details, so I have to refer everybody to the 
documentation on http://httpd.apache.org.
Basically, it's that the rewrite engine is an ugly hack, and this is the 
place where this hackishness surfaces. IIRC it provides the rewrite 
engine with some prefix information that Apache stripped before.

> #
>     RewriteCond %{QUERY_STRING} ^$
> Something to enable searching. What I'm not sure of is what condition 
> needs to be satisfied to execute the following rewrite.

Nonono.

QUERY_STRING is a part of the URL - anything that goes after ?.
I.e. ?action=edit would be a valid query string of a URL.

> (RewriteCond is basically an IF statement -- if the condition evaluates 
> to true (AND RewriteCond's immediately following evaluate to true) then 
> execute the next RewruteRule directive.)
> 
> #
>     RewriteRule ^/?$ ~nepherim/pmwiki/Main/HomePage/ [R=permanent,QSA,L]
> Alter URLs with a trailing "/" or with no trailing "/" to the HomePage. 
> Thus, user entered URLs of "~nepherim/pmwiki" or "~nepherim/pmwiki/" 
> translate to "~nepherim/pmwiki/Main/HomePage/".
> 
> R=permenant: tell the browser that this redirect is permenant. Default 
> is Temporary.

No. Default is "internal", i.e. Apache immediately takes the newly 
generated URL and serves whatever is behind *that*.

> QSA: Query String Append. If we have queries on the incomming URL, like 
>   "?action=edit" then append them to our new URL.
> 
> L: Stop processing. In this case the next RewriteRule doesn't get 
> processed, and we're done.
> *** Question: I must be misunderstanding this parameter. Why do we stop? 
> If we stop here then we haven't converted to complex format.

Because a permanent redirect directly returns to the browser, without 
giving it any HTML (but it does return the redirected-to URL).

The browser is then expected to display the new URL in the URL line and 
request it.

IOW if somebody requests the directory, people will be automatically 
redirected to ~nepherim/pmwiki/Main/HomePage/ .

> #
>     RewriteRule ^([^/a-z].*) pmwiki.php?n=$1 [QSA,L]
> Here we're matching anything which is NOT a lowercase letter (a-z) 
> followed by anything else. What this means in practice is that we're 
> finding the pmwiki group and page name, since pmwiki groups always start 
> with an uppercase character.

Actually with a non-lowercase letter.

Such as: special character like $&=, or umlauts, or whatever.

> (Differentiating between upper and lower case also prevents the 
> processing of any internal pmwiki paths that are being used to create 
> the page, like pub, upload, etc.)

Yes.

> The first non-lowercase character we find in "Main/HomePage" is the 
> first "M", so we take that and everything after: "Main/HomePage".

Yes.

> This string is referenced by "$1". So, the Rewrite Rule replaces our 
> "Main/HomePage" with "pmwiki.php?n=Main.HomePage".

Yes.

> As we are done with the script, the Base part of the URL 
> ("/~nepherim/pmwiki/") is now added back to what we created,

That's the step that RewriteBase corrects.

 > so we end up with the complex URL PHP and pmwiki needs:
>     /~nepherim/pmwiki/pmwiki.php?n=Main.HomePage

Yes.

HTH :-)

Regards,
Jo




More information about the pmwiki-users mailing list