[pmwiki-users] Group and page name aren't recognized by pmwiki.php

Joachim Durchholz jo at durchholz.org
Tue Mar 22 15:59:15 CST 2005


Hi all,

I'm having serious trouble getting the "nice URL" recipes from the 
Cookbook to work. I'm not sure where exactly the problem is, I suspect 
misunderstandings on my side and/or a PmWiki bug.

The ultimate goal: all pages should be reachable under the URL 
http://www.maquaris.de/Group/Page. (This implies that a "pub" group 
cannot be created, because PmWiki accesses skins and other resources 
under http://www.maquaris.de/pub. That's not a serious problem since 
PmWiki uppercases the first letter of any group name, but if there were 
a name clash, I'd simply disallow the "pub" group.)

Since I need the short URLs to be displayed in the browser's location 
bar, Redirect is out. Alias doesn't work because it insists on finding 
.../Group/Pagename as a file or directory, and these don't exist, so 
Alias is out, too. Leaves me with RewriteRule.

The .htaccess file looks like this:

   RewriteEngine on
   RewriteCond %{REQUEST_URI} !^/*(pmwiki\.php|pub)/+.*$
   RewriteRule ^(.*)$ pmwiki.php/$1 [L,NS]

The first line is just to activate the other Rewrite* directives.

The RewriteCond directive is there to allow all URLs that start with 
pmwiki.php or pub. I was unsure about the actual number and position of 
slashes, besides users might get adventurous and add extraneous slashes 
in the URL bar of the browser, so I made it recognize zero or more 
slashes before and one or more slashes after the pmwiki.php|pub item. 
The negation makes sure that the following rule is activated only if the 
pattern does *not* match, i.e. the following rule activates iff the URL 
doesn't already start with pmwiki.php or pub as the first path element.

The RewriteRule simply captures the entire URL and prepends the 
pmwiki.php/ to it. The [L] option prevents any further rewrite rules 
from being activated, the [NS] option prevents the rule from being 
re-applied to the output of the rewrite engine. (These options are 
largely paranoia and "keeping the stuff compatible with future changes", 
I think that neither case can actually happen in my current configuration.)

(This RewriteCond - RewriteRule construction is necessary because the 
RewriteRule needs the matched pattern for its $1 variable, but a 
negative pattern doesn't give me a variable - it looks only for things 
that are not there. Oh, maybe I could have used $0 to recall the entire 
request URI, giving me
   RewriteRule !^/*(pmwiki\.php|pub)/+.*$ pmwiki.php/$0 [L,NS]
without the need for a RewriteCond - well, something to test tomorrow.)

The Apache log tells me that this all actually works (irrelevant 
information snipped to avoid line wrap):
   mod_gzip: r->uri=[/pmwiki.php/] OK
   "GET / HTTP/1.1" 200 2631
   "GET /pub/skins/gemini/layout-standard.css HTTP/1.1" 304 -
   "GET /pub/skins/gemini/rightbar-narrow.css HTTP/1.1" 304 -
   "GET /pub/skins/gemini/font-sans.css HTTP/1.1" 304 -
   "GET /pub/skins/gemini/blue-color.css HTTP/1.1" 304 -
   "GET /pub/skins/pmwiki/pmwiki-32.gif HTTP/1.1" 304 -
   "GET /pub/skins/gemini/blue70-top.jpg HTTP/1.1" 304 -
The mod_gzip output is interesting only in that it seems to give the 
script URI after rewriting, while the GET request is still logged with 
the script URL before rewriting.
I don't know what the 304 response codes mean (might be some redirect 
code - not sure what exactly is happening here, but the graphics are 
delivered just fine - I might look into this closer tomorrow, but I 
think if there are any problems here, they are unrelated to the problem 
I'm having).

Now this all works fine, but when I call up any other page, I *still* 
get the wiki's home page!

Here's the Apache log:
mod_gzip: r->uri=[/pmwiki.php//News/News] OK
   "GET //News/News HTTP/1.1" 200 2634
   Cannot add header information - [due to debug output]
   mod_gzip: r->uri=[/pmwiki.php/] OK
   "GET / HTTP/1.1" 200 2631
   "GET /pub/skins/gemini/layout-standard.css HTTP/1.1" 200 5904
   "GET /pub/skins/gemini/rightbar-narrow.css HTTP/1.1" 200 723
   "GET /pub/skins/gemini/font-sans.css HTTP/1.1" 200 1736
   "GET /pub/skins/gemini/blue-color.css HTTP/1.1" 200 2170
   "GET /pub/skins/gemini/blue70-top.jpg HTTP/1.1" 200 704
   "GET /pub/skins/pmwiki/pmwiki-32.gif HTTP/1.1" 200 1127
   Cannot add header information - [due to debug output]
   mod_gzip: r->uri=[/pmwiki.php/favicon.ico] OK
   "GET /favicon.ico HTTP/1.1" 200 2637

I have added debug output, the code that extracts the URI information 
now looks like this:

$pagename = $_REQUEST['n'];
     echo ("pagename from _REQUEST[n] = $pagename<br>");
if (!$pagename) $pagename = $_REQUEST['pagename'];
     echo ("pagename from _REQUEST[pagename] = $pagename<br>");
if (!$pagename &&
     preg_match('!^'.preg_quote($_SERVER['SCRIPT_NAME'],'!').'/?([^?]*)!',
       $_SERVER['REQUEST_URI'],$match))
   $pagename = urldecode($match[1]);
     $x1 = '!^'.preg_quote($_SERVER['SCRIPT_NAME'],'!').'/?([^?]*)!';
     $x2 = $_SERVER['REQUEST_URI'];
     echo ("pagename from matching $x2 to $x1 = $pagename<br>");
if (preg_match('/[\\x80-\\xbf]/',$pagename))
   $pagename=utf8_decode($pagename);
     echo ("pagename after utf8 decode = $pagename<br>");
$pagename = preg_replace('![^[:alnum:]\\x80-\\xff]+$!','',$pagename);
     echo ("pagename after stripping high bits = $pagename<br>");

(i.e. those "echo" lines produce the debug output).

Pm, could you take a look at this? I suspect it's the regex that tries 
to find SCRIPT_NAME in the REQUEST_URI, and the assumption underlying 
that regex was invalidated by the rewrite stuff; but I don't know enough 
about the assumption to propose a patch, let alone fix the problem. 
(Besides, I don't know what other considerations go into URL parsing - 
different HTTPD servers behave differently, so any correction that I 
might think of would probably break on other servers.)

I have left the server at maquaris.de open, diagnostics are turned on.

Any help would be appreciated.

Regards,
Jo



More information about the pmwiki-users mailing list