[pmwiki-users] Greek Utf-8
Seth Cherney
sethcherney at yahoo.com
Wed Feb 14 03:03:23 CST 2007
--- "Patrick R. Michaud" <pmichaud at pobox.com> wrote:
> On Tue, Feb 13, 2007 at 08:08:09AM -0800, Seth Cherney wrote:
> > It just does not work.
> >
> > The page, even if declared, is still *not truly* encoded in utf-8.
> > saxon will still have an error (browsers could care less, they
> > dont work on the same low level processing as far as I can tell).
> >
> > Unless I have a header such as:
> >
> > <?xml version="1.0" encoding="iso-8859-1"?>
> > <!DOCTYPE html
> > PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
> > "xhtml1-transitional.dtd">
> > ...
> >
> > It is full of byte errors. (ie the server is outputing iso bytes, not utf-8).
>
> Note that neither the webserver nor PmWiki do any form of
> automatic character encoding conversions. So, if the text was
> originally entered and encoded as iso-8859-1, then that's
> what PmWiki will output, even if the header says otherwise.
There is something quirky on a windows box. the text is automatically converted from utf-8 native
to #nnn; format. I posted an example at http://www.pmwiki.org/wiki/UTF8/GreekDiacritics. The
example there remains true for a native utf-8 *text* file pasted into pmwiki on my box. I think
the behavior is fairly certain. It did not always do this. when I first installed 2.1.something,
it saved in utf-8 native... It does exactly the same thing on my playground at
http://www.xrisma.org/coop, which is on an OpenBSD box at aplus.net. (uploaded from my windows
box). Even if I create a new page, the result is the same, so pmwiki is creating pages in a
format there that has the same characteristics - from pmwiki.org and from a local native utf-8 txt
file. the install there is still the 2.1.x. My LOCAL config file has:
include_once('scripts/urlapprove.php');
...
include_once('$FarmD/scripts/xlpage-utf-8.php');
include_once("$FarmD/cookbook/spellchecker.php"); (My JAVA one that I cant figure out how to
explain yet! - due to config wierdness with apache and tomcat.)
include_once("$FarmD/cookbook/StaticPages.php");
include_once("scripts/trails.php");
include_once("$FarmD/cookbook/zap.php");
include_once("$FarmD/cookbook/zapplus.php");
>
>
> > PS: any tips on converting between true utf-8 and the #nnn;
> > sequence in ROSpatterns and/or in markups? I will write a
> > verbose if necessary, unless someone knows any type of command/shorthand.
>
> I'm not certain which way you're wanting the translation to
> go. When saving a page, do you want utf-8 characters to be
> converted into the &#nnn; counterparts, or vice-versa?
The only drawback to this #nnn; format is that it is hard to edit Greek once it has been entered,
since the edit screen is in this format. I can live with it for now, but, it would be better to
have everything saved in utf-8 native so that it can be read on edit, but with a markup script to
convert to #nnn; on display, so that it can be indexed properly. ALSO, it makes me quite uneasy,
since I am posting 500,000 pages of text, and don't want things to flip out on me once people
start editing. I would be incapable of recovery as a likely scenario. I will be adding 9000
Greek books within the next 2 years, so I guess this is critical :).
It seems, for safety's sake and future compatibility, I would probably need a ROSpattern for the
native utf-8, and also a markup to convert back to #nnn;.
Thanks for your time as always, Seth
>
> Pm
>
____________________________________________________________________________________
It's here! Your new message!
Get new email alerts with the free Yahoo! Toolbar.
http://tools.search.yahoo.com/toolbar/features/mail/
More information about the pmwiki-users
mailing list