[pmwiki-users] i18n and iso-8859-13

Algis Kabaila akabaila at pcug.org.au
Thu Mar 31 04:06:24 CST 2005


On Thursday 31 March 2005 04:00, pmwiki-users-request at pmichaud.com wrote:
> Message: 5
> Date: Wed, 30 Mar 2005 11:12:06 -0600
> From: "Patrick R. Michaud" <pmichaud at pobox.com>
> Subject: Re: [pmwiki-users] i18n and iso-8859-13
> To: Algis Kabaila <akabaila at pcug.org.au>
> Cc: pmwiki-users at pmichaud.com
> Message-ID: <20050330171206.GA25408 at pmichaud.com>
> Content-Type: text/plain; charset=us-ascii
> 
> The rest of the answers...
> 
> On Wed, Mar 30, 2005 at 05:59:20PM +1000, Algis Kabaila wrote:
> > 2. Implementation.
> > After downloading the Pm's i18n.tgz and expanding it, I "doctored" the 
> > iso-8859-9 implementation, renaming it xlpage-iso-8859-16.php.  The 
contents 
> > of this "doctored" file is as follows: [...]
> >   $HTTPHeaders[] = "Content-type: text/html; charset=iso-8859-13;";
> > [...]
> > This enables me to write with Lithuanian diacriticals - and read it with 
> > the correct glyphs.
> > 
> > 3. Questions.
> > 3.1 Considering that my knowledge of php is very close to zilch, is the 
above 
> > procedure "safe"?
> 
> Yes, quite safe, until/unless you want to have Lithuanian diacriticals
> in pagenames.  At that point things may not work properly and we'll need
> extra stuff to handle pagename conversions.  And there may not be any
> good guarantees about the system being able to handle WikiWord links 
> with Lithuanian characters in them.  (See the xlpage-iso-8859-2.php
> page for some of the things we have to do to get iso-8859-2 names and
> links to work correctly.)
> 
> > 3.2 I notice that there are some pages waiting for translation into 
> > Lithuanian: PmWikiLt.PmWikiLt, PmWikiLt.XLPage.  If no one else is doing 
it, 
> > I would like to start the translation.  
> 
> I already mentioned that the easy way to do this is to edit the 
> PmWikiLt.XLPage on pmwiki.org -- however, a very important question at
> the outset is whether to use utf-8 or iso-8859-1 (or even windows-1257)
> for the character set, as controlled by the 'xlpage-i18n' phrase.
> Right now I have it set to use utf-8, because utf-8 is generally a 
> reasonable and forward-looking default, but someone more versed with
> Lithuanian browser standards would have to tell us the best choice.
> 
> For the time being I'm adding xlpage-iso-8859-13.php to the i18n.tgz
> distribution, in case it's a better choice than utf-8.  We can then add any 
> character-specific handling that may be needed to that file.
> 
> I've also looked at supporting windows-1257 encodings in the past, but
> since I don't know the languages it's hard for me to know if I'm doing
> things correctly or not.  If someone wants to be a tester/evaluator for
> that charset then I'll be glad to work on it as well.
> 
> Pm

First and foremost - a big thank you to Patrick for his two replies and to 
Joachim for his welcome contribution.

Whilst I know Lithuanian fluently, I am prepared to an about turn of 180 
degrees.  I tried PmWiki site for Lithuanian (with a small translation).  As 
I understand it, it runs with utf-8.  It was able to display all "Lithuanian" 
diacriticals correctly.  I was convinced that it would not work, as, as far 
as I know, my keyboard issues one byte code for each keypress (KDE keyboard 
setup, with Lithuanian diacriticals in the top numeric row of keyboard).  So 
there has to be a mapping performed and, I would think, there is a flag that 
signals the type of mapping.  

If that sounds confused, it is not surprising - I am confused.  I quite like 
the idea of utf-8 and I am would give it a "fair go", but I am not at all 
clear how the mapping is done and I am not inclined to treat it as a "black 
box".

I would counsel to ignore windows-1257 "standard".  For 99% of practical 
purposes it is the same as iso-8859-13 (also known as The Baltic Rim).  All 
the characters of the three Baltic languages - Lithuanian, Latvian and 
Estonian (and the latter is , as far as I can tell, the same as Finish) have 
the same location (numerical code in 128..255 space) in iso-8859-13 and 
windows-1257.  If iso-8859-13 is supported, then, for practical purposes of 
reading text, windows-1257 is also supported.

I must apologise for slowness of thanking you both, but we do have time 
differences that throw things out of gear.  Also, tomorrow, my ISP is 
changing the "backbone" supplier, so we are likely  to have some service  
interruptions.  Let me just repeat - a big thank you - and I am going back to 
the "drawing board".  

I am familiar with the basics of utf-8, but any clue how and where the mapping 
of characters from 8 bit representation to the chunk of bits from two byte 
representation is made would be very interesting and helpful.  I would very 
much like to try utf-8 on a home installation of a PmWiki.

-- 
Algis Kabaila
http://www.pcug.org.au/~akabaila




More information about the pmwiki-users mailing list