[pmwiki-users] Adding markup for Wikiwords for technical documentation

Thu Jun 9 15:39:40 CDT 2005

Patrick R. Michaud wrote:

> On Thu, Jun 09, 2005 at 08:19:00PM +0200, Joachim Durchholz wrote:
> 
>>>Markup('myCustomWikiWord2','>myCustomWikiWord1',"/[A-Z][A-Z0-9]{1,}/e",
>>>"Keep(WikiLink(\$pagename,'$0'),'L')");
>>
>>OK, so you don't want to wikify a single upper-case letter.
>>
>>Um... maybe you should add a lookbehind and a lookahead assertion to 
>>prevent lowercase letters before and after the string. I.e. with the 
>>above, you'll have SD wikified in aSDf.
> 
> [...] Also, lookahead/lookbehind isn't needed here--just use \b to force the
> pattern onto a word boundary.  ($WikiWordPattern has the \b built-in.)
 >
 > [...] And let's not forget about \w, which
> is a letter, digit, or underscore -- much easier to write.
 >
 >  $WikiWordPattern =
 >  "[[:upper:]]\\w*[[:upper:]0-9]\\w*|[[:lower:]]\\w*[[:upper:]]\\w*";

The definitions of \b, \w, [[:upper:]], and [[:lower:]] are 
language-dependent - e.g. umlauts are "word characters" if the locale 
setting of your PHP environment says so.

Normally, this is exactly what one wants. If PmWiki is installed in an 
environment with a different locale, it automagically adapts to the 
local conditions - which, most likely, are exactly what the admin wanted.

In this case, we're talking about programming language symbols, where 
"lowercase letter" usually means [a-z] and nothing else, similar for 
[A-Z] for uppercase letters.

Note that my lookbehind and lookahead assertions didn't correctly handle 
locale either - using "word boundary" (\b) is probably better than 
testing for [^a-zA-Z] unless people really want to have the wikified 
symbols start after the ä in äasdf - but the regular expressions that 
test the names themselves should use explicit enumerations, not 
[[:upper:]] etc.

Regards,
Jo