[pmwiki-users] Pageindex and distributed documentation (Was Strange pagelist behaviour)

Patrick R. Michaud pmichaud at pobox.com
Thu Jun 15 10:46:20 CDT 2006


On Tue, Jun 13, 2006 at 09:43:56AM -0400, Pico wrote:
> In a thread called "Strange pagelist behavior" Marc Cooper wrote:
> > Anyway, the long and short of it is that I deleted the .pageindex file 
> > in the home wiki, and missing items reappeared.

Oops.  Yes, this is a "bug", but it's a bug allowed by
design for performance reasons.  I just haven't been very good
about pointing out the places where the bug manifests itself,
and you stumbled across one.  Mea culpa.

> (1) Are there situations when we should recommend that administrators 
> delete their pageindex file as part of an upgrade?

Probably.  Better still would be for PmWiki to automatically detect
when it's a good idea to rebuild the .pageindex file from scratch,
but until we get to that point I should probably be more careful
about .pageindex.

Background:  .pageindex is a file that helps improve the performance
of searches and pagelists by reducing the number of pages that have
to be scanned to produce results.  When PmWiki is first installed,
.pageindex doesn't exist, but PmWiki slowly builds the index in
response to search requests and page edits.  Eventually the entire
index is built and things run reasonably fast.

As long as pages change due to normal edits, everything is okay,
but if a page's contents change via some external mechanism, such
a performing an update, then the .pageindex can be "out of sync"
with its contents.  That's what appears to have happened here --
the .pageindex file was based on the pre-upgrade contents of
the FAQ pages.  The .pageindex can also become out of date if
page files are renamed.

As you both discovered, the quick-and-easy solution is to
remove the .pageindex file and let PmWiki build it up again.
This is always a safe solution, and it just means that searches
and pagelists may run a bit slower until PmWiki rebuilds the
index.

But the better long-term solution would be to somehow automatically
detect when .pageindex entries are out-of-date with respect to the
pagefiles, and fix it appropriately.  There's already a halfway
solution in place -- if .pageindex says to include a page that
ought to be excluded, then the normal pagelist checks detect
this and force an update of .pageindex for that page.  But the
opposite situation, where .pageindex excludes a page that ought
to have been included (because it was changed by some external
process), is much more difficult to detect.

> (2) Are there other situations, besides pageindex, where upgrades to 
> wikifarms may involve issues that might not ordinarily be a concern when 
> upgrading to a non-farm installation?

I'm sure there are but I can't think of any at the moment.  
And this particular case wasn't unique to farms -- any upgrade
to a simple wiki site can cause the .pageindex to be out of
date for any distributed pages that may have changed.

> (3) Should these be dealt with through documentation, or programatically?

Programatically is better if we can do it.

> (4) Should distributed documentation be treated differently from all 
> other (site and author generated) content, when it comes to pageindexing 
> and pagelists?

I'd prefer not to -- I don't like making too many distinctions
and special-cases whenever it can be avoided.

> Brainstorming here, we could be talking about:
> 
> (a) a separate pageindex for distributed documentation,

Nope, too difficult to maintain, especially since it's possible
for sites to choose to have locally modified versions of pages.

> (b) using farm-aware code, the way we used $FarmD in our 
> configuration.php files, that will look for the pageindex on the farm 
> when accessing distributed documentation in the PmWiki group,

Not really applicable, since the underlying problem isn't really
associated with farms.

A short-term solution might be to simply provide a "reindex" option
of some sort that makes it easy for someone to say "I think these
results might be wrong -- double-check the index".

But I'd still prefer to come up with some way for pageindex
to quickly detect when a page might have been changed outside
the normal wiki editing process (without having to actually
rescan the page on each check).

Pm




More information about the pmwiki-users mailing list