[pmwiki-users] pageindex (was: Category - Links not to include)

Patrick R. Michaud pmichaud at pobox.com
Tue May 8 12:43:41 CDT 2007


On Tue, May 08, 2007 at 06:00:20PM +0200, Petko Yotov wrote:
> > For the (five) pages that remain, we then read the pages to
> > check permissions, verify that {*$FullName} is really a link
> > target (as opposed to simply having the name in the page text
> > somewhere),
> 
> Because the links are separated in the .pageindex entries by ":", 
> I thought it was enough to determine the linked pages (as opposed 
> to text matches). 

It may be, but the overall philosophy I've taken thus far is
to treat .pageindex as advisory, not authoritative.  We can
revisit this again at some point if we really need to.

> But again, 
> * when $EnablePageListProtect is disabled, and
> * when the pageindex entry is up-to-date, and also,
> * the search is just on name, group, link and/or single words, and
> * there is no search on PageTextVars, nor on quoted expressions, and
> * the selected order is on name/group/mtime,

...sorting on mtime requires reading the page to get the
modification time.

> (actually, many restrictions, but I bet most searches almost never need more),
> 
> then it would have all data without opening every file in the list. If the 
> match contains 100s of pages, and the pagelist is limited by count=10..20 
> then pmwiki seems to open/read all 100s of pages, and not only page10 to 
> page20.

We can't know which pages will be 10..20 until after all sorting
and filtering is done.  

Beyond that, it seems to me that the bulk of pagelist processing time
occurs in formatting the output (using pagelist templates)
and not in finding the set of pages to display.  Perhaps I'm wrong
here -- it might be useful to come up with a few "pathological"
queries that we can benchmark.

> May I ask about the PageListCache mechanism, does it also need to read all 
> pages once the list were cached? 

If PmWiki reads the pagelist results from a cache, then PmWiki reads
only those pages for which it needs to check permissions.
If $EnablePageListProtect is false (and we had a successful cache
hit), then none of the pages are read.

(In looking at the code I just discovered that PmWiki may still
read the pages if page variable filtering is being used... but
this can be easily corrected.)

> And what happens if a user has read 
> permissions on some pages, and another user has not, for the same cached 
> pagelist?

If caching is enabled, the the pagelist cache contains all pages
that match the search criteria, regardless of the visitor's
read permissions.  A later step in the process performs the
checks for read permission based on the visitor's credentials.

But even here the pagelist cache optimizes things, because as it
builds the list of pages for the cache, it also keeps track of
which pages have some sort of read protection on them.  So, when
the pages are later loaded from the pagelist cache, pagelist
checks only those pages that have read protection instead of
checking them all.  

Pm




More information about the pmwiki-users mailing list