[pmwiki-users] Greek Utf-8

Seth Cherney sethcherney at yahoo.com
Wed Feb 14 03:03:23 CST 2007


--- "Patrick R. Michaud" <pmichaud at pobox.com> wrote:

> On Tue, Feb 13, 2007 at 08:08:09AM -0800, Seth Cherney wrote:
> > It just does not work.
> > 
> > The page, even if declared, is still *not truly* encoded in utf-8.  
> > saxon will still have an error (browsers could care less, they 
> > dont work on the same low level processing as far as I can tell).
> > 
> > Unless I have a header such as:
> > 
> > <?xml version="1.0" encoding="iso-8859-1"?>
> > <!DOCTYPE html 
> >     PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" 
> >     "xhtml1-transitional.dtd">  
> > ...
> > 
> > It is full of byte errors. (ie the server is outputing iso bytes, not utf-8).
> 
> Note that neither the webserver nor PmWiki do any form of
> automatic character encoding conversions.  So, if the text was
> originally entered and encoded as iso-8859-1, then that's
> what PmWiki will output, even if the header says otherwise.

There is something quirky on a windows box.  the text is automatically converted from utf-8 native
to #nnn; format.  I posted an example at http://www.pmwiki.org/wiki/UTF8/GreekDiacritics. The
example there remains true for a native utf-8 *text* file pasted into pmwiki on my box.  I think
the behavior is fairly certain. It did not always do this.  when I first installed 2.1.something,
it saved in utf-8 native...  It does exactly the same thing on my playground at
http://www.xrisma.org/coop, which is on an OpenBSD box at aplus.net. (uploaded from my windows
box).  Even if I create a new page, the result is the same, so pmwiki is creating pages in a
format there that has the same characteristics - from pmwiki.org and from a local native utf-8 txt
file.  the install there is still the 2.1.x.  My LOCAL config file has:

include_once('scripts/urlapprove.php');

...

include_once('$FarmD/scripts/xlpage-utf-8.php');
include_once("$FarmD/cookbook/spellchecker.php"); (My JAVA one that I cant figure out how to
explain yet! - due to config wierdness with apache and tomcat.)
include_once("$FarmD/cookbook/StaticPages.php");
include_once("scripts/trails.php");
include_once("$FarmD/cookbook/zap.php");
include_once("$FarmD/cookbook/zapplus.php"); 
 
> 
> 
> > PS:  any tips on converting between true utf-8 and the #nnn; 
> > sequence in ROSpatterns and/or in markups?  I will write a 
> > verbose if necessary, unless someone knows any type of command/shorthand.
> 
> I'm not certain which way you're wanting the translation to
> go.  When saving a page, do you want utf-8 characters to be
> converted into the &#nnn; counterparts, or vice-versa?

The only drawback to this #nnn; format is that it is hard to edit Greek once it has been entered,
since the edit screen is in this format.  I can live with it for now, but, it would be better to
have everything saved in utf-8 native so that it can be read on edit, but with a markup script to
convert to #nnn; on display, so that it can be indexed properly. ALSO, it makes me quite uneasy,
since I am posting 500,000 pages of text, and don't want things to flip out on me once people
start editing.  I would be incapable of recovery as a likely scenario.  I will be adding 9000
Greek books within the next 2 years, so I guess this is critical :).

It seems, for safety's sake and future compatibility, I would probably need a ROSpattern for the
native utf-8, and also a markup to convert back to #nnn;.  

Thanks for your time as always, Seth

> 
> Pm
> 



 
____________________________________________________________________________________
It's here! Your new message!  
Get new email alerts with the free Yahoo! Toolbar.
http://tools.search.yahoo.com/toolbar/features/mail/



More information about the pmwiki-users mailing list