[pmwiki-users] i18n and iso-8859-13
Algis Kabaila
akabaila at pcug.org.au
Sat Apr 2 21:59:27 CST 2005
On Sunday 03 April 2005 02:01, Patrick R. Michaud wrote:
> On Sat, Apr 02, 2005 at 08:01:17PM +1000, Algis Kabaila wrote:
> >
> > Thank you for the answers. I suspect that, at least in part, mapping is
> > accomplished by the web server, when it is invoked. I base this opinion
(and
> > it is only an opinion, not knowledge) on the following observation:
> > [...]
>
> Mappings aren't performed directly by the webserver, but the webserver
> *is* supposed to inform the browser of the correct mapping to be used,
> via the Content-Type header. I suspect that what you're seeing is the
> result of a webserver not sending an appropriate header. Let's walk
> through your diagnosis and see if that explains things...
>
> > I have a sample of Lithuanian text in my home page
> > (http://www.pcug.org.au/~akabaila) on a separate HTML page lituanus.html.
I
> > recently edited it at home, specifying iso-8859-13. I used SuSE9.2 (kde
> > 3.3), Konqueror and Kate for editing and testing. It all went fine - I
could
> > see the correct glyphs in their correct places. It confirms your
suggestion
> > that the browser does the mapping.
> >
> > Before uploading, I thought it be worth while to look at the page on my
home
> > "server" that runs Apache 2.0.49 as a web page (At present my Apache is
> > still "out of the box", without any re-configuration at all.)
> > All glyphs were
> > wrong - I think they were from the iso-8859-1 space. That suggests that
the
> > web server does at least influence the mapping.
>
> Actually, an out-of-the-box Apache is likely to be specifying a charset.
> For example, on my FC3 server the default Apache configuration specifies
>
> AddDefaultCharset UTF-8
>
> which tells Apache to put "UTF-8" in the Content-Type header in the absence
> of any other information from the filename. And according to
> http://httpd.apache.org/docs-2.0/mod/core.html#adddefaultcharset,
> this will also override any charset specified in the body of the
> document via a <meta> element.
>
> Now then, a browser will tend to trust the charset given by the
> webserver's Content-Type: header in preference to a <meta>
> element, so the browser displays the document as though it were
> UTF-8 encoded.
>
> To see this at work, I copied the lituanus.html document onto my
> server, at http://www.pmichaud.com/sandbox/lituanus.html. It displays
> incorrectly in my version of Firefox, because Firefox has been told
> by the webserver to display the document as UTF-8. (Tools => Page Info)
>
> However, if I create a sandbox2/ directory, and place the following
> in sandbox2/.htaccess:
>
> AddDefaultCharset Off
>
> then Apache won't put a charset parameter on the Content-Type header.
> Firefox then uses the encoding it finds in the <meta> tag, and everything
> displays correctly (http://www.pmichaud.com/sandbox2/lituanus.html).
>
> So, the mapping itself is still performed entirely by the browser, the
> webserver is just telling the browser to use the wrong mapping.
>
> > In desperation, regardless
> > of the bad glyphs, I decided to upload lituanus.html to my ISP (TIP),
which
> > runs Apache 1.xx, probably expertly configured by the web gurus.
>
> I suspect they have AddDefaultCharset Off. This would likely be true
> if they built Apache from original sources (this is Apache's default),
> rather than using the version that came with their Linux distro.
>
> > [...]
> > It all sounds logical and reasonable so far. Now for the "unreasonable"
bit:
> > I configured the home PmWiki for Lithuanian characters by including a line
in
> > the .../local/config.php file the following line:
> > [...]
> > Well, in spite of my Apache not being able to display lituanus.html
correctly,
> > the PmWiki, running on the same unconfigured Apache displays Lithuanian
glyphs
> > correctly. I am happy about it, but why is it so? That is the real
question
> > that I can not answer and that "blows out of the water" my tentative
> > conclusions.
>
>
> ... because PmWiki directly sets the Content-Type header that the
> webserver sends back, as opposed to using the <meta> tag to do it.
> The xlpage-iso8859-13.php file does
>
> $HTTPHeaders[] = "Content-type: text/html; charset=iso-8859-13";
>
> which eventually becomes a PHP header() call that modifies the
> HTTP responses returned by the webserver. Since there's an explicit
> Content-Type header, Apache doesn't supply one (with the incorrect
> encoding), and everything works.
>
> > Alos, I looked with the Konqueror at the code that PmWiki produces on
> > pmwiki.org site, but can not see any "charset=xxxx" specification. Where
is
> > it? Is it in CSS and if so how can I access it?
>
> It's in the HTTP response headers (where it's supposed to be according
> to the relevant standards). The <meta http-equiv='...'> tag that many
> HTML documents use is just a workaround that was developed for those
> cases where one didn't want to (or couldn't) reconfigure the webserver
> for a different character encoding.
>
> > Any pointers to:
> > 1. How to install utf-8 with another language (for me, Lithuanian) into
> > PmWiki;
>
> This already exists in PmWiki -- all one has to do is specify
>
> 'xlpage-i18n' => 'utf-8',
>
> in the PmWikiLt.XLPage file. This tells PmWiki to load
> the scripts/xlpage-utf-8.php file (which configures PmWiki for
> dealing with utf-8 encoded documents). Similarly, if you wanted
> to do things in iso-8859-13, you would do
>
> 'xlpage-i18n' => 'iso-8859-13',
>
> in the XLPage and this tells PmWiki to load scripts/xlpage-iso-8859-13.php.
> However, I strongly recommend going with utf-8 if at all possible,
> and would prefer the PmWikiLt.* pages on pmwiki.org be done in
> utf-8 instead of iso-8859-13.
>
> > 2. How to configure Apache 2.xx and where and how learn more
> > about Apache web server, and configure it with iso-8859-13 and
> > utf-8, preferably not having to read 1000 pages of docs that
> > cover other aspects of the web server.
>
> http://httpd.apache.org/docs/mod/core.html#adddefaultcharset
> is a good starting point, but I suspect that what you ultimately
> want is to set
>
> AddDefaultCharset off
>
> which tells Apache to leave any charset specification that is to
> occur to the document itself.
>
> Hope this helps.
It certainly does help - tremendously! It also sets some problems to solve.
Where I could, I confirmed your findings and, not surprisingly, I find your
statements 100% correct.
I checked your copy of lituanus.html and yes, it does display Lithunian
diacriticals correctly. I want to try to reproduce all your results on my
own HAN (Home Area Network) and then to switch over to utf-8.
In my home page at http://www.pcug.org.au/~akabaila, I also have a Lithuanian
text in utf-8 format in a file called utf8lituanus.html. I will see if I can
upload it to your sandbox to see if that displays correctly.
At my age, with noticeable signs of short term memory loss, I find that I make
more mistakes than "normal", so I need to document my tests more carefully
before reporting. And that means that I will need some time to be able to
report anything remotely useful.
So this is just a short note of appreciation,
Al.
>
> Pm
>
>
--
Algis Kabaila
http://www.pcug.org.au/~akabaila
More information about the pmwiki-users
mailing list