[pmwiki-users] pagelist performance analysis

Martin Fick fick at fgm.com
Mon Apr 4 17:32:47 CDT 2005


Since I have an interest in getting categories to work
faster, I have done some analysis of (:paglist:).  Pageilst
is the heart of categories.  Since pagelist is inherently a
search, it needs to read every single wikipage within the
pages defined by the pagelist parameters.  This means that
as the page count goes up, the time to comput pagelists
goes up.  This also means that if the content of the pages
grows larger the search time goes up.


I have old machine, ~ pentium 266Mhz so it should be a good
place to test performance bottlenecks. :)

I have approximately 500 page in my search criteria for
cateogries, but they are very small pages, ~ 10 lines and
the performance of pagelist is abismal.  As a reference
point, The wc for my wiki.d dir is:

7138  17987 282663 total



                          ------------

My research has led me to conclude that 2 things are really
slow here:


 1) The FmtPageName function, which the comments say are
    used to:

   ## FmtPageName handles $[internationalization] and $Variable
   ## substitutions in strings based on the $pagename argument.

   I have replaced it with this simple hack and get drastic
   speed improvements on my pages.  Example, a page with a
   pagelist went from 39s to 27s to render.

   This hack simply gives the defaults that I seem to need
   on my site.

   Simple hack (pmwiki.php):

      function FmtPageName($fmt, $pagename) {
	global $FarmD;
	if ($fmt == 'wiki.d/$FullName') return "wiki.d/$pagename";
	if ($fmt == '$FarmD/wikilib.d/$FullName') return "$FarmD/wikilib.d/$pagename";
	return FmtPageNameO($fmt, $pagename);
      }

      rename old  FmtPageName to FmtPageNameO.

   Now obviously I am not suggesting to run things this
   way, but it is a serious contender for optimazation.




 2) Reading many files in PHP.  I made many hacks with page
    content caching.  On pages with multiple paglists  this
    is a great improvement.  The problem is that, of
    course, this only proves that it's slow, it doesn't
    help speed up the simple (probably most important case)
    of one pagelist.

    To speed this up, I resorted to brute force: grep.

    This hack is interesting because it does not actually
    seem to speed things up unless used with hack #1
    (FmtPageName), in fact it can slow things down.  But
    with hack #1, that same page now renders in ~10s!!!!

    This hack simply prefilters the pages with a grep.

    add this line in pagelist.php to call my hack
    functions:

       foreach($incl as $t) $pagelist = StristrPages($pagelist, $t);
       
    in the function FmtPageList after this if statement:
    
      if (@$opt['trail']) {
	$t = ReadTrail($pagename,$opt['trail']);
	foreach($t as $pagefile) $pagelist[] = $pagefile['pagename'];
      } else $pagelist = ListPages($pats);



    the new hack functions:

       function StristrPages($pages, $str) {
	 return GrepFilterPages($pages, $str, "-iF");
       }

       function GrepFilterPages($pages, $str) {
	 global $WikiLibDirs;
	 foreach($pages as $pagename)
	   foreach((array)$WikiLibDirs as $dir) {
	     $pagefile = FmtPageName($dir->dirfmt,$pagename);
	     if (!file_exists($pagefile)) continue;
	     $names[$pagefile]=$pagename;
	     $files .= " '$pagefile'";
	   }
	 $grep = shell_exec("grep -l $args '$str' $files");
	 $files = explode("\n", $grep);
	 foreach($files as $pagefile)  $out[]= $names[$pagefile];
	 return $out;
       }



                          ------------


So, what does this mean?  The grep hack shows that adding
sometype of search engine would be a great benefit to
categories. The FmtPageName hack is less obvious.  Can the
names be cached somehow? (on disk)

Any feedback about these hacks is welcome, do they work for
you, are they blatantly incorrect?  If anyone wants to know
what knid of caching work I hacked together, I can post
that too (it's ugly).


-Martin





More information about the pmwiki-users mailing list