[pmwiki-devel] regex question

Fri Aug 28 06:04:59 CDT 2009

Thursday, August 27, 2009, 8:57:28 PM, Peter wrote:

> if (($bracket_end = strpos($line, "@]")) != false) {
>   if (($bracket_start = strpos($line, "[@")) == false ||
$bracket_start >> $bracket_end)
>     $line = '[@'.$line;
> (and the same for [=...=])

> I believe that strpos() is MUCH faster than preg_match and
> additionally (perhaps esp in this case) would be much easier to
> understand and so easier to maintain in the future...

Thanks Peter!

I condensed your code a bit, and added the check for
not-closed brackets, like this

  //fix orphaned @],[@,=],[=
  foreach(array("@","=") as $x) {
    $a = strpos($row,'['.$x); $b = strpos($row,$x.']');
    if ($b!=0 && ($a==0 || $a>$b)) $row = '['.$x.$row;
    else if ($a!=0 && ($b==0 || $a>$b)) $row .= $x.']';
  }

Still, it is more readable that the regex patterns:

 if (preg_match("/\\[([=@])(?!.*\\1\\])/", $row, $m)) $row .= $m[1]."]";
 if (preg_match("/^(?:\\[[^@]|[^[])*[@]\\]/", $row)) $row = "[@".$row;
 if (preg_match("/^(?:\\[[^=]|[^[])*[=]\\]/", $row)) $row = "[=".$row;

I checked for speed, and found no real difference using the one or the
other way.
Searching all PmWiki.* pages for 'input|output' got me 215 results on
34 pages from 103 pages searched, and it took just a second.
Most of the time goes into opening and reading the pages, and into
MarkupToHTML formatting, not into processing the text lines.

I should be finished soon with this overhaul of TextExtract.
Just need to run more random tests to find defects.

Hans