[pmwiki-users] Search for terms with ss and ß

Hans Bracker design at softflow.uk
Fri Feb 3 23:51:30 PST 2023


Hello Petko,

Friday, February 3, 2023, 3:22:00 PM, you wrote:

>    https://www.pmwiki.org/wiki/Cookbook/UnaccentUTF8

> Not sure if it will be enough for you as it also folds to lowercase. But you can copy this  function and adapt it. Maybe simply remove ":: Lower();" from the argument, or review the documentation for the Intl/Transliterator class at php.net.

Thanks, I tried it out, as you put it, and as a customisation for TextExtract.
I think one needs to be very careful, if one wants to use it.
For German language, and used as it is, it will give many false positives in search results.
Word pairs like Bär and Bar, Blüten and bluten, Fähre and fahre, möchte and mochte, are treated as the same, but have total different meanings. So I would not recommend this recipe for German language sites. I can imagine other languages using UTF8 could have similar problems.

As to my TextExtract search for terms with ss and ß: 
I think it may be better if I offer a customisation, with a custom array of substitutes.
That could then also offer substitutes for accented characters, like used in Roman languages, but not substitutes for ä, ö, ü, and others, which would lead to too many false positive results.


cheers,
Hans




More information about the pmwiki-users mailing list