[pmwiki-devel] Pagelist Caching

Patrick R. Michaud pmichaud at pobox.com
Fri May 25 16:37:49 CDT 2007


On Fri, May 25, 2007 at 01:00:54PM -0700, Martin Fick wrote:
> You performed some tests with large pagelists to
> evaluate where most of the time is spent and concluded
> that a large portion is spent rendering the pagelist,
> but I am not sure that is a fair conclusion.  
> 
> Primarily, I think that you are considering A)
> pagelists that create large result sets but you may be
> ignoring B) the pagelists which must scan a large
> amount of pages but which only end up producing small
> pagelists!  

I haven't ignored (B) -- it's just that for the examples
that have been recently discussed on the mailing list, it
was the size of the output set that was eating up the
majority of time and not the time needed to compute the
pagelist.  But the pagelist caching algorithm in PmWiki
_definitely_ targets the (B) situation you describe.

Let's see an example where the pagelist cache is already
doing (B): http://www.pmwiki.org/wiki/Test/AuthList2 .
In this case, pagelist is looking for pages on the site
that have some sort of password associated with them.
Thus, in order to find these pages, it has to scan all
of the pages on the site (5731) to come up with the 80
pages that have a password of some sort on them.

We can see what is happening in the pagelist production via 
the stopwatch at the bottom of the page.

When computing the list from scratch, the system actively
scans all 5731 pages looking for those that have passwords.
Here's the relevant stopwatch trace -- I've added line numbers
to the stopwatch output to make it easier to reference them 
in the description:

   0: 00.00 00.00 config start
   ...
   4: 00.09 00.09 FPLTemplate begin
   5: 00.09 00.09 MakePageList pre
   6: 00.09 00.09 PageListSources begin
   7: 00.09 00.09 PageStore::ls begin wiki.d/{$FullName}
   8: 00.15 00.12 PageStore::ls merge wiki.d/{$FullName}
   9: 00.22 00.19 PageStore::ls end wiki.d/{$FullName}
  10: 00.23 00.21 PageStore::ls begin $FarmD/wikilib.d/{$FullName}
  11: 00.24 00.21 PageStore::ls merge $FarmD/wikilib.d/{$FullName}
  12: 00.24 00.21 PageStore::ls end $FarmD/wikilib.d/{$FullName}
  13: 00.27 00.23 PageListSources end count=5731
  14: 00.27 00.24 PageListSort pre ret=4 order=name
  15: 00.27 00.24 MakePageList items count=5731, filters=PageListPasswords
  16: 02.84 02.31 MakePageList post count=80, readc=5731
  17: 02.84 02.31 PageListCache begin save key=5a8da5720010ae125b59fa8e5c6022bc
  18: 02.84 02.31 PageListCache end save
  ...
  22: 02.85 02.32 MakePageList end
  23: 03.23 02.68 MarkupToHTML begin
  24: 04.05 03.48 MarkupToHTML end
  25: 04.05 03.48 FPLTemplate end

It takes 0.18 wall-clock seconds to scan the pagestores for a
list of the 5731 pages on the site (lines 6-13), and then 
an additional 2.57 seconds to read all 5731 of them and find the 80
that have passwords on them (line 16, readc=5731, count=80).

Pagelist then saves this list as key=5a8da5720010ae125b59fa8e5c6022bc,
so that it can be used later.  The total time to scan all 5731
pages and produce the list of 80 was 2.76 seconds (lines 5 and 22)
and it then takes an additional 1.20 seconds to render the output
(line 24).  Total time for the pagelist is 3.96 seconds (lines 4 and 25).

If we then do a page reload, the pagelist is able to reload
the list from the cache instead of having to rescan the 5731
pages (assuming nothing has invalidated the cache).  Here's 
the stopwatch trace:

   0: 00.00 00.00 config start
   ...
   4: 00.10 00.09 FPLTemplate begin
   5: 00.10 00.09 MakePageList pre
   6: 00.10 00.09 PageListCache begin load key=5a8da5720010ae125b59fa8e5c6022bc
   7: 00.10 00.09 PageListCache end load
   8: 00.10 00.09 PageListSources begin
   9: 00.10 00.09 PageListSources end count=80
  10: 00.10 00.09 PageListSort pre ret=4 order=name
  11: 00.10 00.09 MakePageList items count=80, filters=
  12: 00.11 00.10 MakePageList post count=80, readc=0
  ...
  16: 00.11 00.10 MakePageList end
  17: 00.45 00.42 MarkupToHTML begin
  18: 01.22 01.18 MarkupToHTML end
  19: 01.22 01.18 FPLTemplate end

Here the pagelist function detected that it had a valid pagelist
cache entry, so instead of scanning the 5731 pages it simply
loaded the final list of 80 directly from the cache (0.11 seconds,
lines 5-12).  No page reads were involved (readc=0, line 12).

So, the cache reduced the time to compute the list of 80
pages from 2.76 seconds to 0.01 second.

This list of 80 pages then goes through the pagelist template
formatting (1.11 seconds, line 18), for a total time of 1.12
seconds to produce the pagelist output from the cache.

So yes, the pagelist cache is addressing exactly the situation
that you say it should, by eliminating the time required to scan
a large number of pages to produce a relatively short list.  
It does not improve the speed of rendering the list, however.

Hope this is satisfactory...?

Pm



More information about the pmwiki-devel mailing list