[pmwiki-users] Special chars in wikilib.d/files

Patrick R. Michaud pmichaud at pobox.com
Thu Mar 23 08:19:31 CST 2006


On Sun, Mar 19, 2006 at 09:44:10AM +0100, Dominique Faure wrote:
> On 3/18/06, Patrick R. Michaud <pmichaud at pobox.com> wrote:
> > On Sat, Mar 18, 2006 at 03:32:05AM +0100, Thomas Lederer wrote:
> > > i am having a night full of headaches. There are files in the i18n files
> > > that have special chars (mainly French and German files from what i see
> > > (accents and umlauts), all the Japanese seem to work).
> >
> > The problem is that Mac OS/X seems to require that all filenames
> > use utf-8 encoding, while PmWiki encodes the filenames with whatever
> > encoding was in effect when the pages were created.  For the German
> > and French pages, this would be Latin-1/iso-8859-1 encoding.
> >
> > I don't have a fix for this short of completely redoing the way
> > that PmWiki maps pagenames to filenames.  Coming up with a new
> > way to do it is easy, coming up with a way to do it that
> > preserves compatibility with existing PmWiki installs is
> > much more difficult.
> 
> I also found this a bit annoying having to handle these kind of issues
> when migrating wikis between nodes on a heterogeneous network of web
> servers. Couldn't the XLPage facilities being used to automagically
> set the links title in the appropriate language (thus helping to keep
> "locale-oddities-free" names) ?

XLPage just handles phrase translations -- it really doesn't know 
anything about character encodings.  (All it knows is that the
'Locale' phrase contains the name of a script to be run that can
switch the encodings.)

The real issue is that PHP doesn't have a reliable way to convert
strings between different character set encodings.  Best would
be to use the iconv() function, but a lot of PHP installations
(including mine) are compiled without iconv.  Next best is 
the multibyte string functions, but a lot of PHP installations are
compiled without that, too.

So, since there's not a universally available function in PHP to
convert strings to and from UTF-8, we're stuck with having to
perform a variety of workarounds.  At present I'm thinking that 
we'll url-encode page names, so that a page named 
"PmWikiDe.AdministratorPasswörter" would be stored in a file 
named "PmWikiDe.AdministratorPassw%f6ter".

This in itself is no problem, but managing the transition for
existing sites is likely to be very tricky, since a lot of
sites will already have pagenames with non-url-encoded characters
in them.  So, we have to be able to still read and locate existing
page files, but also start moving things to the new naming format.
And the version of PmWiki that implements this will be an
"irreversible upgrade" -- a site that switches to the newer 
format for storing pages can't simply downgrade to a previous
version of PmWiki and have everything work.

So, it needs a bit of planning and testing before I can get there.
Still, I'm thinking I need to fix the mac situation sooner rather
than later, so something will likely happen soon.

Pm




More information about the pmwiki-users mailing list