[pmwiki-users] Pagelists, stopwatches, and caches (was: Pagelists caching)

Patrick R. Michaud pmichaud at pobox.com
Wed May 16 20:03:57 CDT 2007


On Wed, May 16, 2007 at 12:47:03PM -0500, Patrick R. Michaud wrote:
> On Wed, May 16, 2007 at 07:32:14PM +0200, Christophe David wrote:
> >    I made some more tests with the following pagelist :
> >
> >    (:pagelist group=RCB name=RCB?* list=normal order=$:_city fmt=#RCB_city:)
> ...
> This will display various timings at the bottom of each page, similar
> to what is done for pages in the Test group on pmwiki.org.
> ...
> I'll post an example in a followup message to this one.

Here's the followup example.  For this example I'm using the
page at http://www.pmwiki.org/wiki/Test/PageListCache , which
contains the markup

  (:pagelist group=PITS name=[0-9]* list=normal order=$:Summary fmt=#default :)

This is very similar to the pagelist markup that Christophe is using
on his site.  The #default template is given as:

  [[#default]]
  (:if ! equal {=$Group} {<$Group}:)
  
  :[[{=$Group}/]] /:
  (:if:)
  : :[[{=$Group}/{=$Name}]] [-{=$:Summary}-]
  [[#defaultend]]


The "stopwatch" results appear at the bottom of the Test.PageListCache
page output, and the rest of this message will relate that output
to what's happening internally.  If you aren't interested in 
pagelist caching or the gory PmWiki internals, it's safe to skip
this message.  :-)

In the stopwatch output, each line represents an "event" of
some sort in PmWiki's overall processing.  For example, the
first three lines of the stopwatch output read:

  00.00 00.00 config start
  00.00 00.00 config end
  00.09 00.08 MarkupToHTML begin

The first number of each line is the cumulative 
"wall clock time" since the first event, and the
second number is the cumulative CPU seconds used.
I'll just use the wall clock timings in this example.

The "config start" and "config end" lines above are specific to
pmwiki.org, and indicate the amount of time needed to process
local customization files.  As you can see, it doesn't take
much (00.00 seconds).  The "MarkupToHTML begin" line marks
the point where PmWiki begins processing the markup for
Test.PageListCache, 0.09 seconds after we logged "config start".

  00.10 00.09 FPLTemplate begin
  00.11 00.10 MakePageList pre
  00.11 00.10 PageListSources begin
  00.11 00.10 PageStore::ls begin wiki.d/{$FullName}
  00.16 00.13 PageStore::ls merge wiki.d/{$FullName}
  00.27 00.23 PageStore::ls end wiki.d/{$FullName}
  00.27 00.23 PageStore::ls begin $FarmD/wikilib.d/{$FullName}
  00.27 00.23 PageStore::ls merge $FarmD/wikilib.d/{$FullName}
  00.28 00.23 PageStore::ls end $FarmD/wikilib.d/{$FullName}
  00.28 00.24 PageListSources end count=922

The "FPLTemplate begin" indicates the point where the (:pagelist:)
directive is being processed via a pagelist template.  So, we can
base the overall time spent on pagelist processing from here.
The first thing that happens is that PmWiki scans the wiki.d/
and wikilib.d/ directories (over 5,600 pages) for pagenames matching
the group=PITS and name=[0-9]* patterns.  When I ran the above
it took pmwiki.org 0.18 seconds to perform this scan, and we
end up with 922 matching pages (count=922).

  00.28 00.24 PageListSort pre ret=4 order=$:Summary
  00.28 00.24 MakePageList items count=922, filters=
  00.30 00.25 MakePageList post count=922, readc=0
  00.30 00.25 PageListCache begin save key=b1924c551f2a4a9203fbdd780ccb9339
  00.30 00.26 PageListCache end save

Next, we loop through the list of 922 pages, building the
internal pagelist structure as we go.  If we were doing any
search term filters, or if $EnablePageListProtect is enabled,
then we'd be checking search terms and page permissions here.
Here it doesn't appear that any additional checking was required
that would involve reading pages (readc=0), so we complete this 
step in 0.02 seconds.

This also shows where the resulting pagelist is saved to the
pagelist cache, so that it can be recalled later without having
to scan the directory and repeat any searches.  The pagelist
is stored in the cache under the key "b1924c551f2a4a9203fbdd780ccb9339".

  00.30 00.26 PageListSort begin
  02.53 02.33 PageListSort end
  02.53 02.33 MakePageList end

This is where we start to see some time usage.  Since we specified
order=$:Summary (a page text variable), PmWiki now has to read the 
text of all 922 pages to extract the corresponding $:Summary values.
It takes 2.23 seconds to read those pages and perform the corresponding
sort.  We've now computed our final pagelist, and we can return this
to the pagelist template formatter for converting the list to HTML.
The total time to required to do the search/sort was 2.42 seconds
(the time elapsed between "MakePageList begin" and "MakePageList end").

  02.89 02.69 MarkupToHTML begin
  04.02 03.80 ReadApprovedUrls Site.ApprovedUrls begin
  04.03 03.80 ReadApprovedUrls Site.ApprovedUrls end
  05.78 05.54 MarkupToHTML end
  05.78 05.54 FPLTemplate end

The FPLTemplate routine next applies the #default template to
all 922 pages, concatenating the results of each (0.36 seconds).
Then that markup is passed to the MarkupToHTML function to be
converted into HTML, requring another 2.89 seconds (time elapsed
from "MarkupToHTML begin" and "MarkupToHTML end").  We're
then done with the (:pagelist:) command and can return to
processing the rest of the page.

In other words, processing this pagelist required a total of 5.67
seconds, 2.42 was spent determining the pages that belong in the
list, and 3.25 was spent formatting that list into HTML.  

It's worth highlighting at this point that any pagelist caching
can only improve the first part of this total -- it doesn't
help us at all with the rendering.

The remainder of the stopwatch entries deal with the rest of the
page output...

  05.80 05.55 MarkupToHTML end

This finishes the rendering of the main page.

  05.82 05.57 MarkupToHTML begin
  05.86 05.61 MarkupToHTML end
  05.86 05.61 MarkupToHTML begin
  05.88 05.63 MarkupToHTML end
  06.17 05.64 now

The first pair of MarkupToHTML entries are for the SideBar (0.04 seconds) 
and the second pair are for Site.PageActions (0.02 seconds).  Finally
we send everything to the browser, and report the total time elapsed
up to the point of generating the stopwatch output.

Okay, let's see the timings when we manage to "hit" the pagelist 
cache:

  00.00 00.00 config start
  00.00 00.00 config end
  00.08 00.08 MarkupToHTML begin
  00.09 00.08 FPLTemplate begin
  00.09 00.09 MakePageList pre
  00.09 00.09 PageListCache begin load key=b1924c551f2a4a9203fbdd780ccb9339
  00.09 00.09 PageListCache end load
  00.09 00.09 PageListSources begin
  00.10 00.09 PageListSources end count=922
  00.10 00.09 PageListSort pre ret=4 order=$:Summary
  00.10 00.09 MakePageList items count=922, filters=
  00.11 00.10 MakePageList post count=922, readc=0

Because PmWiki determined it was safe to load pages from the cache,
we have our set 922 pages very quickly -- only 0.10 second from the
time we started PmWiki until we got the set of 922.  Thus we got
to avoid the 0.18 seconds required to scan the directories,
find matching pages, and check for search terms.

But from here on out, the steps are as slow as before.

  00.11 00.11 PageListSort begin
  02.42 02.22 PageListSort end
  02.42 02.22 MakePageList end

Because our sort criteria is $:Summary, we have to read all
of the 922 pages to get their $:Summary value and perform the sort.
As before, this takes a little over 2 seconds to achieve.

At this point I hear everyone yelling out...

   "WAIT!  We shouldn't have to do the sort over again... 
    the caching algorithm should be saving the pagelist
    AFTER the sort, not before!!"

Actually, in this case (and in many/most cases) caching
the sorted output turns out to not save us much time overall.
Yes, we could cache the sort and avoid reading the 922 pages
at this step, but our pagelist template is going to be looking
for {$:Summary} for each of the 922 pages anyway, so we end up
reading them at some point anyway.  We'd just be deferring
the reads until later in the processing, and not really 
speeding up the overall pagelist.

We also have to be concerned about things like order=random
and order=$SomePageVariable.  Some sort criteria aren't constant
for a given set of pages, so we can't always rely on caching a
sorted version of the list.  (There are some special cases where 
there _could_ be an advantage to caching a sorted version of the
list... but I think they're pretty rare overall, and possibly not 
worth the effort needed to detect and code for them.  
PmWikiPhilosophy #3 applies.)

So, even with pagelist caching enabled, our total time to
get the set of pages is 2.33 seconds (versus 2.42 seconds
when we weren't doing the cache.

The remainder of the stopwatch for the cached case is quite similar
to the earlier one:

  02.78 02.58 MarkupToHTML begin
  03.92 03.68 ReadApprovedUrls Site.ApprovedUrls begin
  03.92 03.69 ReadApprovedUrls Site.ApprovedUrls end
  05.70 05.44 MarkupToHTML end
  05.70 05.44 FPLTemplate end
  05.72 05.45 MarkupToHTML end
  05.74 05.47 MarkupToHTML begin
  05.78 05.51 MarkupToHTML end
  05.78 05.51 MarkupToHTML begin
  05.80 05.53 MarkupToHTML end
  06.07 05.54 now

We still end up using 3.28 seconds to apply the pagelist template
to the list of 922 pages, for a total pagelist processing time
of 5.61 seconds.  For this particular query, the cache provided
a 0.06 second increase.

Okay, at this point I can hear people saying "Well, what good
is page caching?"  And you'd be somewhat correct -- for *this*
particular query, pagelist caching isn't providing much
benefit at all.  But on sites where $EnablePageListProtect is
enabled, or where pagelist is filtering pages based on search
terms, page variables, page text variables, link=, if=, or other
criteria, the pagelist cache can provide a _huge_ speedup by
reducing the number of filter checks needed.

All of this also goes back to what I've been indicating all along,
which is that although pagelist templates are extremely powerful,
they can also be very slow, especially on large pagelists or
complex templates.  This is also why I've recently been focusing 
my attention on improvements to the efficiency of pagelist templates 
as opposed to pagelist generation itself -- that's where the bigger 
payoffs seem to be.

For those who managed to make it to the bottom of this message,
I hope it was worth it.  :-)

Pm



More information about the pmwiki-users mailing list