Patrick R. Michaud pmichaud at pobox.com
Wed Nov 14 10:36:32 CST 2007

Since its inception, PmWiki has used iso-8859-1 (aka 'Latin-1')
as its default character set.  There are a variety of reasons for
this, but the primary reason has been that PHP and its associated
libraries have not had sufficient support for utf8 to work well.

However, that's no longer the case, and PmWiki now has very good
support for utf8.  So, there has been off-and-on discussion that
perhaps PmWiki should be configured for utf8 by default, with
iso-8859-1 as a language option for sites that want to use it.

Personally, I'm very much in favor of switching PmWiki's default
to utf8 -- it will bring us some huge benefits -- but the big 
obstacle is that migrating existing sites from the old iso-8859-1
default to a utf8 default may be somewhat complicated and/or
problematic.  Thus I'm seeking comments and opinions.

First of all, sites that are already running utf8 would not
be negatively affected by this.  Also, sites that have only
ASCII text (such as most sites in the United States) would also
not be affected.  A change like this would primarily affect
European sites or sites that have been using latin-based languages.

The big problem is that any existing pages of an iso-8859-1
site will have been saved using an iso-8859-1 encoding, using
iso-8859-1 encoded filenames.  Thus, it's not just a simple
matter of changing a configuration option -- we also have to
convert the various page files as well.

So, what I'm seeing at the moment is that if we switch to using
utf8 by default, admins of existing sites have to be notified 
somehow that the default has changed and told how to configure
the site to continue using iso-8859-1, or given a procedure to
somehow convert the site's pages to utf8.  And once someone
starts the utf8 conversion, it can get a bit messy to try to
convert back.

Any thoughts on the overall process, how much of an impact a move
like this might have on existing sites, or how we might go about this?



