[pmwiki-users] Search for terms with ss and ß

Petko Yotov 5ko at 5ko.fr
Thu Feb 2 06:32:08 PST 2023


On 02/02/2023 13:29, Hans wrote:
> I noticed when searching and the query contains a 'ss' or a 'ß' , that
> PmWiki will search for both and deliver the right results, seemingly
> treating 'ss' and 'ß' as equivalent. This is great, but I wonder how
> it is done, as it may be a useful behaviour for TextExtract too.

On pmwiki.org we have enabled UTF-8 and this conversion is done in the 
function utf8fold() in scripts/xlpage-utf-8.php.

This function "folds" (converts to lowercase) letters which have a lower 
case. This is done before saving the page terms in wiki.d/.pageindex. 
When someone searches, it similarly folds the search terms.

utf8fold() uses the $StringFolding array which defines "ß" ("\xc3\x9f") 
as "ss".

Normally you should use the global $StrFoldFunction(terms) to fold your 
search terms - this ensures you use the same function as the one that is 
used when storing the page index data.

I recently wrote a function what replaces accented letters with plain 
ones, so you can search for "voilà" or "voila" and it will find both 
(also in Cyrillic).

When I find the time to publish it and document it as a cookbook recipe, 
it will work by redefining the $StrFoldFunction variable to the name of 
the custom function. So again you would use $StrFoldFunction(terms), as 
will PmWiki.

Petko



More information about the pmwiki-users mailing list