[pmwiki-users] Search for terms with ss and ß

Hans Bracker design at softflow.uk
Fri Feb 3 06:29:47 PST 2023


> utf8fold() uses the $StringFolding array which defines "ß" ("\xc3\x9f") as "ss".

> Normally you should use the global $StrFoldFunction(terms) to fold your search terms - this ensures you use the same function as the one that is used when storing the page index data.

> I recently wrote a function what replaces accented letters with plain ones, so you can search for "voilà" or "voila" and it will find both (also in Cyrillic).

Ah , I understand, thank you! Looking forward to see your function!

$StrFoldFunction($terms) will not do in my case, as I don't wish to have all terms converted to lower case.
Sometimes one wants a case sensitive search, other times not. TextExtract got an option for that.
What I came up with so far, is this, aiming for a start just for that old German 'ß':

1. look through search terms and see if any characters need converting, then add the converted term to the search terms (so we can be looking for both):

   if (isset($opt['']))
                foreach($opt[''] as $v)
                        if (preg_match("/\xc3\x9f/",$v))
                                $opt[''][] = preg_replace("/\xc3\x9f/",'ss', $v);

2. on top of that, change the actual derived regex search pattern, to look for 'ß' and 'ss' equivalently:

   $pat = preg_replace("/\xc3\x9f|ss/","(\xc3\x9f|ss)", $pat);

Between step 1. and 2. the regex pattern var $pat is set up according to user options given.

I needed to do both steps to get some reasonable results in TextExtract, with correct highlight of both terms with a 'ß' and 'ss' in them.

I imagine this could be expanded for other characters to use alternatives, but the $StringFolding array won't do. I'll need something rather than the $ISO88591MakePageNamePatterns array, converted to UTF-8,
with entries like
      '/è/' => 'e',   '/é/' => 'e',   '/ê/' => 'e',   '/ë/' => 'e',
(see https://www.pmwiki.org/wiki/Cookbook/ISO8859MakePageNamePatterns for that array).
So I might just do that, or perhaps you got that already?

cheers,
Hans




More information about the pmwiki-users mailing list