[pmwiki-users] On the order of Markup

Petko Yotov 5ko at 5ko.fr
Wed May 3 16:13:30 CDT 2017


Sorry, this became too technical, but if more devs could participate, we 
may find a better solution.

On 2017-05-03 22:09, Peter Kay wrote:
> A lot of Markup_e involves needing $pagename; I would propose having
> MarkupToHTML call the function with both the match and $pagename:
> 
> Markup('include', '>if',
>  '/\\(:include\\s+(\\S.*?):\\)/i',
>   function ($m, $pagename) { return PRR(IncludeText($pagename, 
> $m[1]));});
> 
> the "use ()" syntax will probably allow us to rewrite Markup_e in a
> way that - while perhaps slower - is seamless.
...
> We should be able to do something similar with an eval() thrown in 
> there.
> 

If there is a way not to use eval(), I'll take it.

PHP appears to progressively deprecate and remove everything that used 
evaluation (for a good reason - code injection vulnerabilities). 
Yesterday preg_replace/eval, today create_function, tomorrow will likely 
be the turn of eval. We need to figure it out correctly, I wouldn't like 
to have to review and rewrite again all modules in a couple of years.

There are 108 places in the core where evaluation is used, either via 
Markup() or Markup_e(), or via PPRE(): all these call PCCF() which 
creates a lambda function with evaluated code.

The current solution is not bad, because the markup pattern, and the 
replacement, can be written very "compactly", the pattern close to the 
replacement, and no need to define another 108 named functions. I'd like 
the new way to also be compact, if practical, to allow us easier 
reading, understanding, maintaining and debugging of the code.

We also need to decide if we really drop PHP < 5.3 as there are many 
servers still using older versions. BTW, pmwiki.org uses 5.2.

It is not hard to pass a number of variables to the replacement 
function, I have prototyped a way to pass $matches, $pagename and the 
search pattern, without changing anything else in the markup engine, and 
it works with PHP 4:

class PmCallback {
   var $pn; var $pat; var $rep;
   public function __construct($pn, $pat, $rep) {
     $this->PmCallback($pn, $pat, $rep);
   }
   function PmCallback($pn, $pat, $rep) {
     $this->pn = $pn; $this->pat = $pat; $this->rep = $rep;
   }
   function replace($m) {
     $func = $this->rep;
     return $func( $m, $this->pn, $this->pat ) ;
   }
}

Then in MarkupToHTML:

# if (is_callable($r)) $x = preg_replace_callback($p,$r,$x); # now
   if (is_callable($r)) {
     $cb = new PmCallback($pagename, $p, $r);
     $x = preg_replace_callback($p, array($cb, 'replace') , $x);
   }

It will need a minor change to be able to order the markup rules on the 
name instead of on the pattern, and be able to also pass the name to the 
replacement function.

OTOH, like I said, I want to be able to define these rules in a compact 
manner, and not to have, for example, to write the pattern twice, once 
when I define it, another time when it is processed.

For recipes this is not a huge problem: in your own file you simply have 
the Markup_e(...) line where you pass the pattern and the name of your 
function, like you can do now. And your function is just below: compact 
code, readable, small risk for errors.

For the core it is a different story, Markup_e() is called 67 times, and 
I'm not excited by the prospect of adding 67 more named functions to 
stdmarkup.php.

This is not all, there are also the various $*Patterns like $ROSPatterns 
and $MakePageNamePatterns. And PCCF() is called about 40 times directly 
or via PPRE(), see FmtPageName(). Again, we need to do it in a 
pragmatical, practical way, not too hard to read and understand. This 
may require the use of anonymous functions and dropping PHP < 5.3.


> I was thinking about that, trying to find ways to rewrite Markup_e so
> that it just works.  But there are other approaches:
> 
> Markup('include', '>if',
>   '/\\(:include\\s+(\\S.*?):\\)/i',
>   function ($m) use ($pagename) { return PRR(IncludeText($pagename, 
> $m[1]));});

When you do exactly this $pagename is what it happens to be when 
stdmarkup.php is run, and it is copied at that time to your function. 
This means that when the skin processes the SideBar, your function will 
not have access to the actual $pagename argument of MarkupToHTML, but to 
the old $pagename from stdmarkup.php -- and the SideBar will not be 
output correctly, especially if it is in another group, eg. Site. Same 
for headers/footers/etc.

You have to create this function with "use ($pagename)" inside 
MarkupToHTML to use the actual $pagename.

Petko


> On Wed, May 3, 2017 at 1:48 PM, Petko Yotov <5ko at 5ko.fr> wrote:
>> On 2017-05-03 19:19, Peter Kay wrote:
>>> 
>>> One other problem I recently stumbled on is that the MarkupRules only
>>> allow one instance of a given pattern.
>>> 
>>> So for example if you want to process '/XXX/' twice (for whatever
>>> reason), you can't - only one copy goes in.
>> 
>> 
>> 
>> If you want to process '/XXX/' twice (for whatever reason), there are 
>> dozens
>> of ways to write the same pattern:
>> 
>>   '/[X]XX/'
>>   '/X[X]X/'
>>   '/XX[X]/'
>>   '/(X)XX/'
>>   '/(XX)X/'
>>   '/(?:XX)X/'
>>   '/XXX{1}/'
>>   '/X{1}XX{1}/'
>> 
>> etc., etc., etc.
>> 
>>> I'd rather build the rules using the markup name, so as long
>>> as the names are different, they'll both run.
>> 
>> 
>> The next version of PHP, 7.2, will deprecate the function 
>> create_function().
>> We will have to rewrite the core markup engine, markup rules and the
>> processing of various $*Patterns (future PmWiki versions may have to 
>> drop
>> support for PHP 5.2 and older).
>> 
>> So indeed, it may be possible to find a way to build the rules using 
>> the
>> markup name, and even pass the name to the processing function so a 
>> single
>> function can do multiple rules.
>> 
>> As always, if implemented, such a feature will have to *not* break the
>> existing markup rules that work today for PHP 5.3-7.1.



More information about the pmwiki-users mailing list