[pmwiki-users] Re:Changing charset from ISO-8859-1 to UTF-8

Algis Kabaila akabaila at pcug.org.au
Thu Apr 7 18:31:02 CDT 2005


On Friday 08 April 2005 02:17, pmwiki-users-request at pmichaud.com wrote:
> Message: 5
> Date: Thu, 7 Apr 2005 18:13:16 +0200
> From: Laurent Meister <meister at apfelwiki.de>
> Subject: Re: [pmwiki-users] Changing charset from ISO-8859-1 to UTF-8
> To: "Patrick R. Michaud" <pmichaud at pobox.com>
> Cc: Pmwiki-users at pmichaud.com
> Message-ID: <c39a0775c8e4d1937999660f39ebe4d7 at apfelwiki.de>
> Content-Type: text/plain; charset="us-ascii"
> 
> 
> Am 07.04.2005 um 16:01 schrieb Patrick R. Michaud:
> 
> > On Thu, Apr 07, 2005 at 11:37:16AM +0200, Laurent Meister wrote:
> >> Hello,
> >>
> >> is there a nice way to convert existing pages from a pmwiki written
> >> with ISO-8859-1 to UTF-8?
> >
> > Depends on what you consider to be "nice".  :-)
> >
> > You can often enlist a browser's help in this -- edit a page that is
> > encoded in ISO-8859-1, copy the markup text to the clipboard, and
> > then paste it into the edit form for a site that is set up for utf-8.
> > The broswer will often handle the encoding translation.
> >
> > Still, I suppose this is more work than one would want to do, 
> > especially
> > for a large site.
> 
> That is the reason why I posted my problem here. On ApfelWiki we have 
> nearly 1500 wikipages. And it would be a lot of work. :-(
> 
> >  I could see about coming up with a translation
> > module for PmWiki, that could do mass conversion of pages or perhaps
> > even automatically converting character sets on the fly.
> 
> That's what I consider to be nice ;-)
> 
> 
> 
> >
> >> Some Browser seems not to show chars like the " ?" euro-symbol
> >> properly. Switching the page to UTF-8 would solve this problem.
> >
> > You might also treat the euro-symbol chars as markup to be converted to
> > &euro;, either when the page is rendered or when it is saved.  This 
> > might
> > eliminate the need to convert pages to utf-8.  Let me know if you want 
> > to
> > try this approach.  :-)
> 
> You mean to make a special markup skript for this?
> 
> Laurent Meister (kt007)
> 

I have a similar, problem with a much smaller volume of files - conversion 
from Lithuanian iso-8859-13 to utf-8 on my HAN (Home Area Network).  I plan 
to write a small Python script for convertion of Lithuanian diacriticals in 
iso-8859-13 to map into utf-8.  The processing would be on a file with the 
results written to another file with a slightly modified name.

The idea is to have the "sheep safe and the wolf fed" - the old iso-8859-13 
files will remain intact and the new utf-8 with modified names will be ready 
for use.

The mapping of iso-8859-13 to utf-8 is very similar to iso-8859-1 to utf-8.  I 
would be happy to share the script  with you, though I do realise that a 
similar mapping can be completed with php - alas, I don't program php and I 
am fast becoming a slow learner.

Actually, in some respects my conversion problems are worse than yours - I 
just looked at a daily news summary from Lithuania - the email is in 
iso-8859-4, which was used for Lithuanian for a while, with iso-8859-13 being 
now a standard, or so it is claimed.

Good luck in your move to utf-8!

OldAl.
-- 
Algis Kabaila
http://www.pcug.org.au/~akabaila




More information about the pmwiki-users mailing list