[pmwiki-users] Keep() function documented

Wed Jul 6 08:19:28 CDT 2005

On Wed, Jul 06, 2005 at 11:00:00AM +0200, Joachim Durchholz wrote:
> (It would be best if all the rules were integrated into a single one, so 
> that there is no "order" in which markups are processed - but I don't 
> think that's an option for PmWiki. 

...because there has to be an "order".  It's vitally important that
'''text''' be found and processed before ''text''; it's equally 
important that url processing take place before wikiword links.  
Getting the order correct is why I went to the trouble of creating 
a generic rule-based ordering system -- it's fundamental to the
task of converting wiki markup to some other output form.

> >>... BTW why does PmWiki split the
> >>text into lines? Efficiency reasons, or other considerations?)
> >
> >Two reasons:  First, the line-by-line model is the mental model that 
> >most authors tend to understand when processing text; it makes sense 
> >to keep that particular model.
> 
> ...
> It's just that this forces constructs that may span several lines into a 
> *very* early stage of processing (whether these constructs are nestable 
> or not) 

False.  Notably, PmWiki's implementation of block markups (including
nested lists, tables, etc.) all happen at the line-by-line level, which
is currently fairly late in the processing cycle.  

> >Secondly, it's a huge efficiency boost -- my experiments have shown
> >me that the many pattern matches that get performed are *much* more
> >efficient on many small strings than they are on one very large one.
> 
> I suspect it's the replacement step that is more efficient - replacing 
> two characters with fifteen in a twenty-character string is bound to be 
> more efficient than doing the same in a 20K text (there are advanced 
> string packages that don't exhibit this behavior, but they have been 
> largely unknown and unused).

I think it's the match itself that is sped up as well.  Many of PCRE's
match optimizations work by scanning from the end of the string to be
matched, thus if the string is long there's a lot of scanning to be
performed.  Beyond that, every rule that contains a '*' or '+'
quantifier in it means that there are a lot more combinations and
interactions that PCRE has to try before it can definitively decide
a match doesn't exist.

Pm