[pmwiki-users] Converting international documentation to UTF-8

Petko Yotov 5ko at 5ko.fr
Sat Jul 30 19:08:05 CDT 2011

On Saturday 30 July 2011 06:36:58 Carlos AB wrote:
> I would like to have more information on how can we start converting
> PmWiki international documentation to utf-8?

I started working on it, but I found a number of annoying bugs which slowed me 
down. We will provide downloads of the translations both in UTF-8 and in the 
older encodings where needed, in the next few days.

A good thing for those WikiGroups will be to reduce the number of pages whose 
"names" contain international characters.


P.S. I was thinking about having PmWiki convert these pages on-the-fly: the 
same file would work both in Latin-1 and in UTF-8 wikis. But this will be more 
complex than I thought.

The Charset page attribute was added to PmWiki betas more than 4 years ago, 
but unfortunately some important things were omitted and it cannot be trusted.

1. The wikis using the encodings iso8859-2, -9 and -13 keep saving a wrong 
charset ISO8859-1 (this will be corrected for version 2.2.30). We can fix 
those wikigroups on pmwiki.org, but for an existing page on some other wiki, 
we couldn't be sure what the encoding is: "ISO8859-1" or one of the others. 
The Charset attribute can be trusted only when it is "UTF-8".

2. The charset page attribute is added by the function PostPage() when a page 
is edited and saved. Some recipes or functions modify pages with the function 
WritePage(), which doesn't save this attribute: for example RecentChanges 
pages will not normally contain the attribute, and an automatic script may not 
know if the page was or wasn't already converted to UTF-8.

3. Pages last saved with a PmWiki version <2.2.0-beta43 or <2.2.0-stable don't 
have the Charset attribute at all. There are many wikis still running 2.1.27. 
(Well, because it can't be trusted, this isn't a big deal.)

More information about the pmwiki-users mailing list