[pmwiki-users] i18n and iso-8859-13

Algis Kabaila akabaila at pcug.org.au
Sat Apr 2 21:59:27 CST 2005


On Sunday 03 April 2005 02:01, Patrick R. Michaud wrote:
> On Sat, Apr 02, 2005 at 08:01:17PM +1000, Algis Kabaila wrote:
> > 
> > Thank you for the answers.  I suspect that, at least in part, mapping is 
> > accomplished by the web server, when it is invoked.  I base this opinion 
(and 
> > it is only an opinion, not knowledge) on the following observation:
> > [...]
> 
> Mappings aren't performed directly by the webserver, but the webserver
> *is* supposed to inform the browser of the correct mapping to be used,
> via the Content-Type header.  I suspect that what you're seeing is the 
> result of a webserver not sending an appropriate header.  Let's walk 
> through your diagnosis and see if that explains things...
> 
> > I have a sample of Lithuanian text in my home page 
> > (http://www.pcug.org.au/~akabaila) on a separate HTML page lituanus.html.  
I 
> > recently edited it at home, specifying iso-8859-13.  I used  SuSE9.2 (kde 
> > 3.3), Konqueror and Kate for editing and testing.  It all went fine - I 
could 
> > see the correct glyphs in their correct places.  It confirms your 
suggestion 
> > that the browser does the mapping.
> > 
> >  Before uploading, I thought it be worth while to look at the page on my 
home 
> > "server" that runs  Apache 2.0.49 as a web page (At present my Apache is 
> > still "out of the box", without any re-configuration at all.)  
> > All glyphs were 
> > wrong - I think they were from the iso-8859-1 space.  That suggests that 
the 
> > web server does at least influence the mapping.  
> 
> Actually, an out-of-the-box Apache is likely to be specifying a charset.  
> For example, on my FC3 server the default Apache configuration specifies
> 
>     AddDefaultCharset UTF-8
> 
> which tells Apache to put "UTF-8" in the Content-Type header in the absence
> of any other information from the filename.  And according to 
> http://httpd.apache.org/docs-2.0/mod/core.html#adddefaultcharset,
> this will also override any charset specified in the body of the 
> document via a <meta> element.
> 
> Now then, a browser will tend to trust the charset given by the
> webserver's Content-Type: header in preference to a <meta>
> element, so the browser displays the document as though it were
> UTF-8 encoded.
> 
> To see this at work, I copied the lituanus.html document onto my
> server, at http://www.pmichaud.com/sandbox/lituanus.html.  It displays
> incorrectly in my version of Firefox, because Firefox has been told
> by the webserver to display the document as UTF-8.  (Tools => Page Info)
> 
> However, if I create a sandbox2/ directory, and place the following
> in sandbox2/.htaccess:
> 
>     AddDefaultCharset Off
> 
> then Apache won't put a charset parameter on the Content-Type header.
> Firefox then uses the encoding it finds in the <meta> tag, and everything 
> displays correctly (http://www.pmichaud.com/sandbox2/lituanus.html).
> 
> So, the mapping itself is still performed entirely by the browser, the
> webserver is just telling the browser to use the wrong mapping.
> 
> > In desperation, regardless 
> > of the bad glyphs, I decided to upload lituanus.html to my ISP (TIP), 
which 
> > runs Apache 1.xx, probably expertly configured by the web gurus.  
> 
> I suspect they have AddDefaultCharset Off.  This would likely be true
> if they built Apache from original sources (this is Apache's default), 
> rather than using the version that came with their Linux distro.  
> 
> > [...]
> > It all sounds logical and reasonable so far.  Now for the "unreasonable" 
bit: 
> > I configured the home PmWiki for Lithuanian characters by including a line 
in 
> > the .../local/config.php file the following line:
> > [...]
> > Well, in spite of my Apache not being able to display lituanus.html 
correctly, 
> > the PmWiki, running on the same unconfigured Apache displays Lithuanian 
glyphs 
> > correctly.  I am happy about it, but why is it so?  That is the real 
question 
> > that I can not answer and that "blows out of the water" my tentative 
> > conclusions.  
> 
> 
> ... because PmWiki directly sets the Content-Type header that the
> webserver sends back, as opposed to using the <meta> tag to do it.
> The xlpage-iso8859-13.php file does
> 
>     $HTTPHeaders[] = "Content-type: text/html; charset=iso-8859-13";
> 
> which eventually becomes a PHP header() call that modifies the
> HTTP responses returned by the webserver.  Since there's an explicit
> Content-Type header, Apache doesn't supply one (with the incorrect
> encoding), and everything works.
> 
> > Alos, I looked with the Konqueror at the code that PmWiki produces on 
> > pmwiki.org site, but can not see any "charset=xxxx" specification.  Where 
is 
> > it? Is it in CSS and if so how can I access it?
> 
> It's in the HTTP response headers (where it's supposed to be according 
> to the relevant standards).  The <meta http-equiv='...'> tag that many
> HTML documents use is just a workaround that was developed for those
> cases where one didn't want to (or couldn't) reconfigure the webserver
> for a different character encoding.
> 
> > Any pointers to:
> > 1. How to install utf-8 with another language (for me, Lithuanian) into 
> > PmWiki;
> 
> This already exists in PmWiki -- all one has to do is specify
> 
>    'xlpage-i18n' => 'utf-8',
> 
> in the PmWikiLt.XLPage file.  This tells PmWiki to load 
> the scripts/xlpage-utf-8.php file (which configures PmWiki for
> dealing with utf-8 encoded documents).  Similarly, if you wanted
> to do things in iso-8859-13, you would do
> 
>    'xlpage-i18n' => 'iso-8859-13',
> 
> in the XLPage and this tells PmWiki to load scripts/xlpage-iso-8859-13.php.
> However, I strongly recommend going with utf-8 if at all possible,
> and would prefer the PmWikiLt.* pages on pmwiki.org be done in 
> utf-8 instead of iso-8859-13.  
> 
> > 2. How to configure Apache 2.xx and where and how learn more 
> > about Apache web server, and configure it with iso-8859-13 and 
> > utf-8, preferably not having to read 1000 pages of  docs that 
> > cover other aspects of the web server.
> 
> http://httpd.apache.org/docs/mod/core.html#adddefaultcharset
> is a good starting point, but I suspect that what you ultimately
> want is to set
> 
>     AddDefaultCharset off
> 
> which tells Apache to leave any charset specification that is to
> occur to the document itself.
> 
> Hope this helps.

It certainly does help - tremendously!  It also sets some problems to solve.  
Where I could, I confirmed your findings and, not surprisingly, I find your 
statements 100% correct.  

I checked your copy of lituanus.html and yes, it does display Lithunian 
diacriticals correctly.  I want to try to reproduce all your results on my 
own HAN (Home Area Network) and then to switch over to utf-8.

In my home page at http://www.pcug.org.au/~akabaila, I also have a Lithuanian 
text in utf-8 format in a file called utf8lituanus.html.  I will see if I can 
upload it to your sandbox to see if that displays correctly.

At my age, with noticeable signs of short term memory loss, I find that I make 
more mistakes than "normal", so I need to document my tests more carefully 
before reporting.  And that means that I will need some time to be able to 
report anything remotely useful.

So this is just a short note of appreciation,

Al.
> 
> Pm
> 
> 

-- 
Algis Kabaila
http://www.pcug.org.au/~akabaila



More information about the pmwiki-users mailing list