[pmwiki-users] transliteration -> unicode markup for Indian languages

Patrick R. Michaud pmichaud at pobox.com
Wed Aug 24 14:48:45 CDT 2005


On Wed, Aug 24, 2005 at 12:18:24PM -0700, Varadarajan Mani-A19487 wrote:
> [...] What I've tried is the following:
> 
> Markup("{T=",'<split','/{T=(.*?)=T}/se', "Tamilize('$1')");
> 
> which converts anything in between {T= and =T} into the Unicode
> characters for Tamil. For example:
> 
> {T= tivviya pirapan^tham =T} 
> 
> would become
> 
> ????????????????????? ???????????????????????????
> 
> It seems to work for the most part, but I'm not sure whether "<split"
> is correct for this type of markup, and whether the markup delimiters
> are advisable.

First, I think the idea and these markup delimiters are excellent.
Seems like a very handy mechanism for writing Tamil.

Where things should be Tamilized is largely a matter of preference
(and probably trial and error).  As you have it above, with "<split"
and the "/s" on the pattern, the {T=...=T} conversions will work
across multiple lines of text, as in

    {T= tivviya
       pirapan^tham
    =T}

which may or may not be what you want.  

You may also want/need to add PSS() into the Markup rule, as in

   Markup("{T=",'<split','/{T=(.*?)=T}/se', "Tamilize(PSS('$1'))");

Otherwise, single and double quotes may end up with unwanted
backslashes in front of them.

Other than those two thoughts, I think it's a terrific idea
and hope to see a Cookbook recipe from it!

Pm




More information about the pmwiki-users mailing list