[pmwiki-users] Re: Modified (:markup:)

Mon Mar 21 11:23:38 CST 2005

Jonathan Scott Duff wrote:

> On Mon, Mar 21, 2005 at 09:14:54AM -0600, Patrick R. Michaud wrote:
> 
>>On Mon, Mar 21, 2005 at 08:44:40AM +0100, Joachim Durchholz wrote:
>>
>>>Patrick R. Michaud wrote:
>>>
>>>>I agree, but my implementation of [== ... ==]  currently breaks in 
>>>>the face of text containing multiple [==]'s, so I have to either come
>>>>up with a much better pattern or go back to the drawing board.
>>>
>>>Would a backreference work?
>>>
>>>  \[(=*)[^=](.*?)\1\]
>>
>>The matching code already uses a backreference and no it doesn't work.
> 
> So what doesn't work exactly?  You've got multiple [=...=] and the
> pattern is ...  ?
> 
> 	/\[(=+)(.*?)\1\]/			# ???

Oh right, there should be a + instead of a * after the = (we need at 
least one equals sign).

The catch here is that the (=+) will try to extend as far as possible 
(that's the "greedy" aspect of pattern matching). So if we have

   [==] some text [==]

the \[(=+) will match the leading '[==', the (.*?) will match '] some 
text[', and the \1\] will match the final '==]'.

That's why I put the question mark in (=+?); it *should* try to select 
the shortest match.

... darn, even "/\[(=+?)(.*?)\1\]/" it will probably end up matching 
'[=', '=] some text [=', and '=]', so the regex is posted just a few 
minutes ago won't work either.

So I'll punt and simply use an alternative:

   \[ (=+) ( \] | (.*?) \1 \] )
   1111111   22   33333333333   (these line just
                  aaaaa bb cc   for reference below)
I.e.:
1) first match a [ and as many ='s as available,
2) then either finish off immediately with a ], or
3) eat
    3a) the shortest text that's preceding
    3b) the same number of ='s and
    3c) a closing ].

The downside here is that we'll take any number of ='s in the [=..=] 
construct, including an odd one. I don't think that's a serious problem 
though - having different semantics for [=..=] depending on whether the 
number of ='s between the brackets is even or odd sounds like a bad idea 
to me.

> (Sorry, this is the first message in this thread that I've
> read so I'm probably missing obvious things)

You seem to be right on track, just the proposed regex doesn't work - 
and nonworking regexes are commonplace ;-)

>>>The (=*) captures the string of =s after the opening [ and stores them 
>>>in the $1 variable, the [^= makes sure that a non-= is between the 
>>>brackest-and-equals-signs delimiters (else we'd match stuff like [==] 
>>>and we don't want that - might be a useful markup for other purposes), 
>>
>>Unfortunately, [==] is already valid markup and it's already being
>>used for a variety of purposes.  In particular, it can be used to
>>prevent Wiki[==]Words, and I've used it to avoid processing something
>>that would otherwise be a beginning of line markup.  Thus:
>>
>>   [==]** Line beginning with ''asterisks'', avoid Wiki[==]Word.
>>
>>Unfortunately, the regexp above (and various others I've tried)
>>sees this as one big escape instead of two small ones, resulting in
>>
>>   ]** Line beginning with ''asterisks'', avoid Wiki[Word.
> 
> Couldn't you special case the [==] markup? I.e. something like
> 
> 	Markup('[==]', '_begin', '/\\[==\\]/se', "Keep('')");

Yes, but a single regex is faster and more modular, so it's worth trying.

Regards,
Jo