[Pmwiki-users] $UrlPathPattern defined/commented

Patrick R. Michaud pmichaud
Fri May 21 16:27:54 CDT 2004


On Fri, May 21, 2004 at 11:09:11PM +0200, Christian Ridderstr?m wrote:
> What exactly do we want to allow in a URI? In the source I find
> 	$UrlPathPattern="[^\\s<>[\\]\"\'()`|^]*[^\\s<>[\\]\"\'()`|^,.?]";
> I think some comments next to this definition would be nice, or perhaps a
> reference to a wiki page where it's discussed. 

I'll write it here if someone can cut-n-paste to an appropriate place
on pmwiki.org:

RFC2396 and RFC2732 (on uri syntax) basically say that a proper uri 
must not contain control characters, spaces, or any of the characters  
   <   >   "   {   }   |   \   ^   `
All other characters can appear in a uri, although many have special
meanings depending on where they are used in the uri.

PmWiki's $UrlPathPattern syntax largely follows the RFCs, but also
takes into consideration the contexts in which uris are likely to
appear in markup.  The pattern breaks into two parts, the first 
part matches everything before the last character of the uri, and 
the second part matches the last character of the uri:

      [^\\s<>[\\]\"\'()`|^]*         [^\\s<>[\\]\"\'()`|^,.?]

In both parts, space, "<", ">", <">, "`", "|", and "^" are
disallowed because of the RFC definition.  PmWiki incorrectly 
allows "{", "}", and "\", but this hasn't been an issue in 
practice and can be easily fixed if we want.

Both parts also disallow things that the RFCs allow, such as 
parens, square brackets, and single quotes, under the theory
that these are more likely to be markup than part of a uri.

Finally, the second part of the pattern is used to prevent
a trailing period, comma, or question mark from being included
in the uri, since these will usually be the end of a sentence
or phrase rather than the last character of a uri.

Pm



More information about the pmwiki-users mailing list