[pmwiki-users] Case insensitive seach using UTF-8 and non-Latin chars
Patrick R. Michaud
pmichaud at pobox.com
Thu Mar 2 13:07:20 CST 2006
On Thu, Mar 02, 2006 at 10:08:57AM +0000, Athan wrote:
> My intranet pmwiki installation works great with both English and Greek
> content but case insensitive searching works only with English characters.
>
> BTW I already use xlpage-utf-8.php in config.php and 'xlpage-i18n' => 'utf-8'
> in XLPage.
>
> Is there anything I can do to allow searching case insensitivity in both
> languages?
Unfortunately, at the moment there's not really a good way for us to
do this -- it's a limitation of PHP.
The basic functions available in PHP to perform case-insensitive searches
in substrings aren't really aware of uppercase and lowercase distinctions
for utf-8 encoded strings.
One approach would be to convert all terms to lowercase when doing
the string search, but even here PHP's support is limited. To
convert utf-8 to lowercase we'd have to use something like PHP's
mb_strtolower function, but a lot of PHP installations don't have
the mb_* available by default. Also, we have to be careful that we
don't perform utf-8 lowercase conversions on sites that are using
iso-8859-1 or other character encodings.
On the other hand, the xlpage-utf-8.php script is already defining
a table of case conversions, so maybe I can get the search script to
use that. Enter a PITS for this issue and I'll see what I can do
about it. :-)
Pm
More information about the pmwiki-users
mailing list