[pmwiki-users] Re: Translation [was pmwiki-users] i18n and iso-8859-13

Patrick R. Michaud pmichaud at pobox.com
Thu Apr 7 23:25:26 CDT 2005


On Thu, Apr 07, 2005 at 06:53:12PM +1000, Algis Kabaila wrote:
> Actually, I would like to add (in 
> Lithuanian) simple instructions of making a site accept the Lithuanian 
> charactes and operate with utf-8 encoding, if there are no objections.

If you're asking about this for the PmWikiLt.* group on pmwiki.org,
there aren't any objections.  My general rule is that whoever is doing
the work of building/maintaining the language translations gets to
decide the contents and structure of those translations.  :-)

> Translation of some words, viz. "by" is difficult because its meaning is 
> dependent on context.   I thought that the best was not to attempt 
> translation of "by" at all, in order to avoid the "hydraulic  hammer" 
> becoming "water sheep".  I see that a note on the XLPage tells to delete 
> items that are not translated.  Can we just comment it out with # or will 
> that not work?

It can be commented out with #, left empty, or even removed altogether.

> Also, would you mind if I ask a 'non-wiki' question - how to 
> correctly specify utf-8 encoding in a web page?  (I have used meta 
> tags under the wrong impression that this was a standard way 
> of doing it. Currently my **wrong** header looks like this:
>
>  <meta content="text/html; charset=utf-8" http-equiv="content-type">  ).  

Well, meta tags are just one way of doing it.  The HTML 4.01 standard
identifies three mechanisms, highest priority to lowest:

   1. An HTTP charset parameter in a Content-Type field
   2. A <meta> declaration with "http-equiv" set to "Content-Type"
   3. The charset attribute set on an element specifying an external resource

So, the best and recommended way to specify the encoding is in the 
webserver configuration itself, not in the individual web pages.  
Unfortunately in your case, Apache was (mis)configured to always
specifying a charset parameter, which overrides any <meta> tag
the HTML document might have in it.

Anyway, to answer your question, the standard, W3C-recommended way to 
specify utf-8 of documents is to configure Apache to do it.  There's
several ways to do this.  If all of the documents in a directory are
known to be utf-8, then the .htaccess file can contain

    AddDefaultCharset  UTF-8

If Apache has been configured with content negotiation and the standard
charsets, then one can also specify utf-8 for a document by adding a 
".utf8" extension (no hyphen) onto its filename, such as "mydoc.html.utf8".
Other encodings might use ".iso8859-1", ".cp-1251", etc.

However, if it's not feasible to configure Apache for the charset
encodings, then one can tell Apache not to do send any charset 
parameter with

    AddDefaultCharset  Off

and then the browser will use any encoding specified by the document's
<meta http-equiv="Content-Type" ... /> tag.

Hope this helps,

Pm



More information about the pmwiki-users mailing list