[Pmwiki-users] Re: null characters or pattern breaking characters

Patrick R. Michaud pmichaud
Wed Jan 14 09:52:23 CST 2004


On Wed, Jan 14, 2004 at 04:48:02PM +0100, Christian Ridderstr?m wrote:
> 
> But just to be on the safe side, you often refer to the URI pattern... 
> are you aware that I'd like to be able to use the null token to break 
> the matching of *any* pattern that looks for a specific range of 
> characters?

Well, kinda--depends on what you mean by "any pattern".  If you want
to break up patterns such as WikiWords and the like--no problem, but
if you want to it to act more as a 'don't match any pattern' sequence
that would break up patterns containing "any character" (such as the
italics pattern ''(.*?)'' ), then we're probably out of luck.  More on
this below...

At any rate, I've focused on the URI pattern because it's the most
permissive of the patterns; any character excluded by the URI pattern
is pretty much excluded from the others (e.g., WikiWords and PageNames
can only contain characters that appear in a URI, otherwise we wouldn't 
be able to specify page names in a URI).  So, things that can't appear
in a URI make for natural "stop/null" characters if they don't have any 
other semantics assigned to them.  ('''' fails in this last respect because 
it's an italics on/off sequence.)

> Hmm... I wonder if I'm "breaking myself" here... would the above 
> mean that this markup:
> 	''Here is some italic text [``[include:Main.HomePage]] bla.''
> is *not* rendered in italics?

Or, what happens with:
    ''Is this italic [``[WikiWord]] text '' or is this? ''

To me, the effects of the null token should be very localized--the
token should "break" the markup that it's in or next to and nothing more.  
In the examples immediately above, the `` would only break the italics 
sequence if it's in or next to the italics markup characters (the single 
quotes), and not break the italics markup just because it happens to occur 
in the middle of the text.


Here's a somewhat more radical proposal, I haven't thought it all the way 
through yet but I'll offer it for discussion and see where it leads.  
What if we defined `` to mean "positively protect the characters
that follow up to the next space"?  This would allow things like:

   ``WikiWord                 Displays as "WikiWord" but is not a WikiWord
   ``[=hello there=]          Displays as "[=hello there=]"
   ``[[WikiWord hello]]       Displays as "[[WikiWord hello]]"
   [[WikiWord text w/``]] ]]  Displays as "text w/]] " linked to WikiWord
   
In particular, we could then define `` appearing at the end of a line
to mean "protect the line break", thus offering a replacement for [[<<]]:

    Yesterday it worked.``
    Today it is not working.``
    Windows is like that.``

(Although I must admit I really like the \\ proposal you've made for
protecting line breaks and I'm favoring it at the moment.)

Unfortunately, this definition for `` really isn't a "null token" and
doesn't provide a way to fix things like @@Meatball:PageName@@, so we'd
probably still need something to resolve that.    Note that 
|| Meatball:PageName|| is likely to be fixed by a change to the URI 
pattern so I'm not really worried about that case anymore.

> There you go with 'two substitutions' again... ok, I guess I'll have to 
> ask you to explain that...
> Or is this simply equivlanet to what I wrote about first substituting 
> the null token into a null-token-character, that's then removed just 
> before output?

They're equivalent.

> But if so, why is this bad from an implementation point of view... it 
> wouldn't be that slow would it?

It's not that it's particularly slow--it's just a bit messier than
I'd like--i.e., I feel like there's probably a more elegant/powerful 
solution.

Pm



More information about the pmwiki-users mailing list