[pmwiki-users] transliteration -> unicode markup for Indian	languages
    Patrick R. Michaud 
    pmichaud at pobox.com
       
    Wed Aug 24 14:48:45 CDT 2005
    
    
  
On Wed, Aug 24, 2005 at 12:18:24PM -0700, Varadarajan Mani-A19487 wrote:
> [...] What I've tried is the following:
> 
> Markup("{T=",'<split','/{T=(.*?)=T}/se', "Tamilize('$1')");
> 
> which converts anything in between {T= and =T} into the Unicode
> characters for Tamil. For example:
> 
> {T= tivviya pirapan^tham =T} 
> 
> would become
> 
> ????????????????????? ???????????????????????????
> 
> It seems to work for the most part, but I'm not sure whether "<split"
> is correct for this type of markup, and whether the markup delimiters
> are advisable.
First, I think the idea and these markup delimiters are excellent.
Seems like a very handy mechanism for writing Tamil.
Where things should be Tamilized is largely a matter of preference
(and probably trial and error).  As you have it above, with "<split"
and the "/s" on the pattern, the {T=...=T} conversions will work
across multiple lines of text, as in
    {T= tivviya
       pirapan^tham
    =T}
which may or may not be what you want.  
You may also want/need to add PSS() into the Markup rule, as in
   Markup("{T=",'<split','/{T=(.*?)=T}/se', "Tamilize(PSS('$1'))");
Otherwise, single and double quotes may end up with unwanted
backslashes in front of them.
Other than those two thoughts, I think it's a terrific idea
and hope to see a Cookbook recipe from it!
Pm
    
    
More information about the pmwiki-users
mailing list