[pmwiki-users] Re: Modified (:markup:)
Joachim Durchholz
jo at durchholz.org
Mon Mar 21 11:23:38 CST 2005
Jonathan Scott Duff wrote:
> On Mon, Mar 21, 2005 at 09:14:54AM -0600, Patrick R. Michaud wrote:
>
>>On Mon, Mar 21, 2005 at 08:44:40AM +0100, Joachim Durchholz wrote:
>>
>>>Patrick R. Michaud wrote:
>>>
>>>>I agree, but my implementation of [== ... ==] currently breaks in
>>>>the face of text containing multiple [==]'s, so I have to either come
>>>>up with a much better pattern or go back to the drawing board.
>>>
>>>Would a backreference work?
>>>
>>> \[(=*)[^=](.*?)\1\]
>>
>>The matching code already uses a backreference and no it doesn't work.
>
> So what doesn't work exactly? You've got multiple [=...=] and the
> pattern is ... ?
>
> /\[(=+)(.*?)\1\]/ # ???
Oh right, there should be a + instead of a * after the = (we need at
least one equals sign).
The catch here is that the (=+) will try to extend as far as possible
(that's the "greedy" aspect of pattern matching). So if we have
[==] some text [==]
the \[(=+) will match the leading '[==', the (.*?) will match '] some
text[', and the \1\] will match the final '==]'.
That's why I put the question mark in (=+?); it *should* try to select
the shortest match.
... darn, even "/\[(=+?)(.*?)\1\]/" it will probably end up matching
'[=', '=] some text [=', and '=]', so the regex is posted just a few
minutes ago won't work either.
So I'll punt and simply use an alternative:
\[ (=+) ( \] | (.*?) \1 \] )
1111111 22 33333333333 (these line just
aaaaa bb cc for reference below)
I.e.:
1) first match a [ and as many ='s as available,
2) then either finish off immediately with a ], or
3) eat
3a) the shortest text that's preceding
3b) the same number of ='s and
3c) a closing ].
The downside here is that we'll take any number of ='s in the [=..=]
construct, including an odd one. I don't think that's a serious problem
though - having different semantics for [=..=] depending on whether the
number of ='s between the brackets is even or odd sounds like a bad idea
to me.
> (Sorry, this is the first message in this thread that I've
> read so I'm probably missing obvious things)
You seem to be right on track, just the proposed regex doesn't work -
and nonworking regexes are commonplace ;-)
>>>The (=*) captures the string of =s after the opening [ and stores them
>>>in the $1 variable, the [^= makes sure that a non-= is between the
>>>brackest-and-equals-signs delimiters (else we'd match stuff like [==]
>>>and we don't want that - might be a useful markup for other purposes),
>>
>>Unfortunately, [==] is already valid markup and it's already being
>>used for a variety of purposes. In particular, it can be used to
>>prevent Wiki[==]Words, and I've used it to avoid processing something
>>that would otherwise be a beginning of line markup. Thus:
>>
>> [==]** Line beginning with ''asterisks'', avoid Wiki[==]Word.
>>
>>Unfortunately, the regexp above (and various others I've tried)
>>sees this as one big escape instead of two small ones, resulting in
>>
>> ]** Line beginning with ''asterisks'', avoid Wiki[Word.
>
> Couldn't you special case the [==] markup? I.e. something like
>
> Markup('[==]', '_begin', '/\\[==\\]/se', "Keep('')");
Yes, but a single regex is faster and more modular, so it's worth trying.
Regards,
Jo
More information about the pmwiki-users
mailing list