[pmwiki-users] Search for terms with ss and ß
Hans Bracker
design at softflow.uk
Fri Feb 3 06:29:47 PST 2023
> utf8fold() uses the $StringFolding array which defines "ß" ("\xc3\x9f") as "ss".
> Normally you should use the global $StrFoldFunction(terms) to fold your search terms - this ensures you use the same function as the one that is used when storing the page index data.
> I recently wrote a function what replaces accented letters with plain ones, so you can search for "voilà" or "voila" and it will find both (also in Cyrillic).
Ah , I understand, thank you! Looking forward to see your function!
$StrFoldFunction($terms) will not do in my case, as I don't wish to have all terms converted to lower case.
Sometimes one wants a case sensitive search, other times not. TextExtract got an option for that.
What I came up with so far, is this, aiming for a start just for that old German 'ß':
1. look through search terms and see if any characters need converting, then add the converted term to the search terms (so we can be looking for both):
if (isset($opt['']))
foreach($opt[''] as $v)
if (preg_match("/\xc3\x9f/",$v))
$opt[''][] = preg_replace("/\xc3\x9f/",'ss', $v);
2. on top of that, change the actual derived regex search pattern, to look for 'ß' and 'ss' equivalently:
$pat = preg_replace("/\xc3\x9f|ss/","(\xc3\x9f|ss)", $pat);
Between step 1. and 2. the regex pattern var $pat is set up according to user options given.
I needed to do both steps to get some reasonable results in TextExtract, with correct highlight of both terms with a 'ß' and 'ss' in them.
I imagine this could be expanded for other characters to use alternatives, but the $StringFolding array won't do. I'll need something rather than the $ISO88591MakePageNamePatterns array, converted to UTF-8,
with entries like
'/è/' => 'e', '/é/' => 'e', '/ê/' => 'e', '/ë/' => 'e',
(see https://www.pmwiki.org/wiki/Cookbook/ISO8859MakePageNamePatterns for that array).
So I might just do that, or perhaps you got that already?
cheers,
Hans
More information about the pmwiki-users
mailing list