[pmwiki-users] Speedy search?

Patrick R. Michaud pmichaud at pobox.com
Fri Feb 17 10:21:29 CST 2006


Karl wrote:
> I'll send it to you via PM.

Okay, I have it set up on my server, here's an example timing
I'm seeing for the markup   "(:pagelist group=Techlib fmt=dictindex:)"

    00.00 00.00 MarkupToHTML begin
    00.08 00.01 MakePageList begin
    00.47 00.05 MakePageList scanning 1070 pages, readf=0
    00.53 00.06 MakePageList sort
    00.53 00.06 MakePageList end
    06.48 01.05 MarkupToHTML end
    06.61 01.06 MarkupToHTML begin
    06.67 01.07 MarkupToHTML end
    06.74 01.07 MarkupToHTML begin
    06.74 01.07 MarkupToHTML end
    07.01 01.08 now

The first column is wall-clock seconds, the second column
is CPU time used.  You can see that creating the list of pages
itself is fairly quick -- only takes 0.53 seconds of real time
(0.06 seconds of CPU time).  Not bad for having to scan and 
check access permissions on over 1000 pages.

What *is* taking forever is formatting the output -- the time between
"MakePageList end" and "MarkupToHTML end".  So, adding a few stopwatch 
points into dictindex.php and re-running the script, we get:

    00.00 00.00 MarkupToHTML begin
    00.12 00.00 FPLDictIndex start
    00.12 00.00 MakePageList begin
    00.52 00.03 MakePageList scanning 1070 pages, readf=0
    00.58 00.05 MakePageList sort
    00.58 00.05 MakePageList end
    00.58 00.05 FPLDictIndex generate names
    02.29 00.69 FPLDictIndex sort
    02.47 00.72 FPLDictIndex format output
    03.65 01.01 FPLDictIndex end
    03.72 01.03 MarkupToHTML end
    03.84 01.04 MarkupToHTML begin
    03.93 01.05 MarkupToHTML end
    03.94 01.05 MarkupToHTML begin
    03.95 01.05 MarkupToHTML end
    04.17 01.06 now

Because of varying system loads during testing, I generally look at the 
CPU seconds instead of the wall clock seconds for comparison.  Here we 
can see that the "generate names" section of FPLDictIndex is eating up 
the bulk of the time -- 0.54 seconds CPU time.  That's a lot.  A close 
second is the format output section, at 0.29 seconds CPU time.

The generate names section looks like:

    StopWatch("FPLDictIndex generate names");
    for($n=0;$n<count($matches);$n++)
      $matches[$n]['name'] = FmtPageName('$Name',$matches[$n]['pagename']);
    $cmp = create_function('$x,$y',
      "return strcasecmp(\$x['name'],\$y['name']);");

Calls to FmtPageName can be really expensive, so lets try PageVar
instead:

    for($n=0;$n<count($matches);$n++)
      $matches[$n]['name'] = PageVar($matches[$n]['pagename'], '$Name');

    00.00 00.00 MarkupToHTML begin
    00.01 00.00 FPLDictIndex start
    00.01 00.00 MakePageList begin
    00.09 00.04 MakePageList scanning 1070 pages, readf=0
    00.10 00.05 MakePageList sort
    00.10 00.05 MakePageList end
    00.10 00.05 FPLDictIndex generate names
    00.46 00.41 FPLDictIndex sort
    00.48 00.43 FPLDictIndex format output
    00.68 00.62 FPLDictIndex end
    00.69 00.62 MarkupToHTML end

That's some improvement, but we're still eating up 0.36 seconds.
How about just computing the name directly?

    for($n=0;$n<count($matches);$n++)
      $matches[$n]['name'] =
        preg_replace('/^[^.]*\\./', '', $matches[$n]['pagename']);

    00.06 00.05 FPLDictIndex generate names
    00.54 00.42 FPLDictIndex sort

Not much improvement.  Maybe if we try letting MakePageList do the
sort, and sort based on title instead of name...?

    StopWatch("FPLDictIndex start");
    $opt['order'] = 'title';
    $matches = MakePageList($pagename, $opt);

    00.00 00.00 MarkupToHTML begin
    00.01 00.01 FPLDictIndex start
    00.01 00.01 MakePageList begin
    00.06 00.04 MakePageList scanning 1070 pages, readf=1
    01.61 00.27 MakePageList sort
    01.72 00.36 MakePageList end
    01.73 00.36 FPLDictIndex format output
    01.94 00.58 FPLDictIndex end
    01.95 00.59 MarkupToHTML end
    01.97 00.60 MarkupToHTML begin
    01.98 00.62 MarkupToHTML end
    01.99 00.62 MarkupToHTML begin
    01.99 00.62 MarkupToHTML end
    02.38 00.63 now

Now MakePageList takes longer (0.36 seconds versus 0.05 seconds),
but FPLDictIndex, which includes the increased time of MakePageList, 
is a lot shorter (0.58 seconds versus 1.01 seconds).  We've cut
the time almost in half.

I'm going to play with the "format output" section a bit to
see if I can make any improvements for it.  But it looks to me
as though the slowness is in generating output results, and not
in the actual scanning/construction of the pagelist.

Pm




More information about the pmwiki-users mailing list