[pmwiki-users] Search for terms with ss and ß

Petko Yotov 5ko at 5ko.fr
Wed Feb 8 05:01:04 PST 2023


On 08/02/2023 13:32, Dominique Faure wrote:
> In order to minimize the references sources, what would be the best
> way to use that in ISO8859MakePageNamePatterns cookbook recipe instead
> of relying on another large set of regexp replacement?
> 
> Something like below?
> -----
>   function cb_unaccent($m) { return UnaccentUTF8($m[1]); }
> 
>   # standard patterns from pmwiki.php
>   SDV($PageNameChars, '-[:alnum:]');
>   SDV($MakePageNamePatterns, array(
>       "/'/" => '',                          # strip single-quotes
>       "/[^$PageNameChars]+/" => ' ',        # convert everything else 
> to space
>       '/((^|[^-\\w])\\w)/' => 'cb_toupper', # CamelCase
>       '/ /' => '',                          # drop spaces
>       '/(.*)/' => 'cb_unaccent'));
> -----

So this is about having the page names like "ChampsElysees" without 
diacritics while having links like "[[Champs Élysées]]".

The above may fail to capitalize some words starting with an accented 
character, or simply discard it.

I see in scripts/xlpage-utf-8.php that $MakePageNamePatterns removes any 
text after a ?question mark or a #hash - I have no idea when this may 
happen but I suppose at some point it was needed.

I would probably do something like this:

   include_once("scripts/xlpage-utf-8.php");
   $MakePageNamePatterns =
     array_merge(['/^(.*)$/'=>'cb_unaccent'], $MakePageNamePatterns);

This should call cb_unaccent() first and remove all accents, and from 
then it should be business as usual.

Petko



More information about the pmwiki-users mailing list