[pmwiki-users] Defaulting PmWiki to utf8

Arrigo Marchiori dido at quipo.it
Wed Nov 14 16:28:50 CST 2007


Hello everybody,

On Wed, Nov 14, 2007 at 10:36:32AM -0600, Patrick R. Michaud wrote:

[...]
> Personally, I'm very much in favor of switching PmWiki's default
> to utf8 -- it will bring us some huge benefits -- but the big 
> obstacle is that migrating existing sites from the old iso-8859-1
> default to a utf8 default may be somewhat complicated and/or
> problematic.  Thus I'm seeking comments and opinions.
[...]
> The big problem is that any existing pages of an iso-8859-1
> site will have been saved using an iso-8859-1 encoding, using
> iso-8859-1 encoded filenames.  Thus, it's not just a simple
> matter of changing a configuration option -- we also have to
> convert the various page files as well.

In Italian we use some accented characters (à, è, é, ì, ...) I think
a charset change would be a major step for every PmWiki-based Italian
site. Same thing for French, German...

As a regular UTF-8 user (you can see it by this e-mail :-) I
personally think that the whole Internet should switch to UTF-8. But
I'm seeing also not a very good support of this charset, on some
systems. I'm afraid that some web servers or FTP clients may not
accept filenames encoded in UTF-8. I hope someone can contradict me!
:-)

> So, what I'm seeing at the moment is that if we switch to using
> utf8 by default, admins of existing sites have to be notified 
> somehow that the default has changed and told how to configure
> the site to continue using iso-8859-1, or given a procedure to
> somehow convert the site's pages to utf8.  And once someone
> starts the utf8 conversion, it can get a bit messy to try to
> convert back.

Yes, I think that a big red label should be in the upgrade
instructions, with pointer to a recipe or something that explains how
to convert page text. I don't know about page names, though... :-/

> Any thoughts on the overall process, how much of an impact a move
> like this might have on existing sites, or how we might go about this?

As I told before, I think this would be a major issue for the
interested web sites. But I think that it's an important step that has
to be made some day, the sooner the better. I think that this should
be strongly encouraged.

I suggest to do the first test with the PmWiki localized
documentation: that's a good ready-made example of foreign language
text! :-)

About how to implement a charset conversion, the only idea I have is
to use something like html_entity_decode(htmlentities(text)). I'm
afraid that the filenames' conversion could only be left to each site
admin.

These were my two cents.
-- 
rigo

http://rigo.altervista.org



More information about the pmwiki-users mailing list