[pmwiki-users] Problem with utf & i18n (French accented characters)

Petko Yotov 5ko at free.fr
Sat Jun 9 14:10:51 CDT 2007


On Saturday 09 June 2007, Patrick R. Michaud wrote:
> On Fri, Jun 08, 2007 at 04:42:51PM -0400, Donald Z. Osborn wrote:
> >    I am setting up a wiki (farm) that needs to be in two working
> > languages (English & French) and accommodate texts in some West African
> > languages that use extended Latin scripts - so it should accommodate
> > UTF-8.
> >
> >    The early set-up is okay now except that I encountered an odd display
> >    issue with the accented French characters in the interface: Basically,
> >    although the output is in utf-8 and my browsers are set to utf-8 I am
> >    getting the black diamond in Firefox 2 and empty square in MSIE7 for
> > the accented characters. Switch to iso-8859-1 and everything appears
> > normal. This is not what I expected.

Donald, the easiest way is to open for editing your UTF-8 page and the same 
page at PmWikiFr, and clipboard-copy the text and paste it in your page.

It is especially important for the XLPage page.

>
> For a variety of reasons, the PmWikiFr.* pages (including PmWikiFr.XLPage)
> have been built using iso-8859-1 encoding instead of utf-8.  So, they
> will tend to not display correctly inside of a utf-8 encoded page.
>
> Thus far PmWiki doesn't have the capability to automatically translate
> between character encodings, because many PHP installations don't
> provide the necessary translation functions.  In recent versions
> of PmWiki 2.2.0-beta I've started storing the character encoding
> identification as part of the page so that PmWiki can eventually
> do this sort of translation, but we're not quite there yet.
>
> Since my machine _does_ have the necessary translations, it might be
> possible for me to come up with utf-8 versions of the PmWikiFr.*
> and other pages, and publish them simultaneously with the
> iso-8859-1 versions.  But managing all of that -- separate sets of
> encodings for each language translations, and trying to explain
> to admins when to use each -- is likely to be a real headache.
>
> I'm very much open for suggestions on this topic.
>
> It would be very cool if we could find a good way to seamlessly
> convert existing iso-8859-1 and other sites to using utf-8
> (with the option to remain iso-8859-1 for those that want it).

There is the iconv program on most unix platforms that can do this, and also 
the PHP iconv() function. Users may copy the downloaded PmWikiFr/* pages and 
the new "import" feature of PmWiki may translate them into utf-8.

For "seamless conversion", a function similar to the PageIndexUpdate function, 
or the various Backup recipes, or even the recipe that converts all pages to 
CompressedPageStore may do the trick.

I also feel (like jdd), that UTF-8 is a good thing, it is the best choice for 
new installations, especially for multilanguage sites, especially now that 
PmWiki has a much better support (AsSpaced, case insensitive search). 
MediaWiki, and DocuWiki, the other most popular php-wikis, are shipped with 
UTF-8 by default.

Thanks,
Petko




More information about the pmwiki-users mailing list