[pmwiki-users] Case insensitive seach using UTF-8 and non-Latin chars

Patrick R. Michaud pmichaud at pobox.com
Thu Mar 2 13:07:20 CST 2006


On Thu, Mar 02, 2006 at 10:08:57AM +0000, Athan wrote:
> My intranet pmwiki installation works great with both English and Greek 
> content but case insensitive searching works only with English characters.
> 
> BTW I already use xlpage-utf-8.php in config.php and 'xlpage-i18n' => 'utf-8' 
> in XLPage.
> 
> Is there anything I can do to allow searching case insensitivity in both 
> languages?

Unfortunately, at the moment there's not really a good way for us to 
do this -- it's a limitation of PHP.  

The basic functions available in PHP to perform case-insensitive searches
in substrings aren't really aware of uppercase and lowercase distinctions
for utf-8 encoded strings.  

One approach would be to convert all terms to lowercase when doing
the string search, but even here PHP's support is limited.  To
convert utf-8 to lowercase we'd have to use something like PHP's
mb_strtolower function, but a lot of PHP installations don't have
the mb_* available by default.  Also, we have to be careful that we
don't perform utf-8 lowercase conversions on sites that are using
iso-8859-1 or other character encodings.

On the other hand, the xlpage-utf-8.php script is already defining
a table of case conversions, so maybe I can get the search script to
use that.  Enter a PITS for this issue and I'll see what I can do
about it.  :-)

Pm




More information about the pmwiki-users mailing list