[Pmwiki-users] How big can PmWiki get?

Patrick R. Michaud pmichaud at sci.tamucc.edu
Thu Mar 6 15:44:42 CST 2003


On 7 Mar 2003, John Rankin wrote:

> What is the largest PmWiki site in number of pages?

Of course there's no theoretical limit to the number of pages, because
of the way that PmWiki organizes them.

The practical limits are on the number of files that can be organized in
a single directory (wiki.d), and the time it takes to search through the
pages.  As a couple of further reference points, the SciTechWiki
(http://www.sci.tamucc.edu/wiki) currently has 1216 pages and 
the TAMUCC Wiki (http://www.tamucc.edu/wiki) has 1335 pages and both of
them seem to run without any problem; even searches don't seem to take
unreasonable amounts of time.  Both of these wikis are running under 
Red Hat Linux.  I know that under RH Linux I have some directories with
4000+ files in them with no major difficulties (mailing list archive
directories), so I don't think the OS will pose a barrier on PmWiki
at least to that point.

However, I did think about these problems a bit when designing PmWiki
and here's what I've decided thus far.  First, as far as directory
limitations go, I designed the PageFileFmt, WikiDir,  and WikiLibDirs 
variables so that an admin can organize the files into an alternate 
structure other than just "wiki.d/Group.Pagename".  For example, 
PageFileFmt could be changed to '$Group/$Title_' or '$Group/$PageName' 
and files would be stored as "wiki.d/Group/Pagename" or 
"wiki.d/Group/Group.Pagename", which would reduce the overall number
of files in any single directory.  Something would still have to create
the directories for the groups, but this isn't a big problem.  But even if 
this approach doesn't break things up sufficiently (e.g., a single
group with thousands of pages) there are still other options--with a 
couple of very minor extensions the files can be organized into 
directories based on the first character(s) of the title, as in 
"wiki.d/P/Group.Pagename" and "wiki.d/W/Group.WikiWord".  
This would spread the load out among more directories as well.  

And, of course, I could always look at relational databases or
other indexing schemes if it became necessary to do so.  However,
I like simplicity, and this is definitely one of those areas where 
I've chosen to avoid gratuitous features; i.e., take the simple 
approach for now and build the complex implementation only 
when a real demonstrated need arises, at which point the real 
parameters of the problem are better known.  I'm also quite
comfortable that if we need to change the back-end storage model at 
some point it'll be easy to create a migration path from the existing
schema to a new one.

As far as searching goes, I should state at the outset that PmWiki's
built-in search capability has always been a "quick-and-dirty" and
simplistic approach to a search.  My perspective on this issue is 
that there are already some excellent site indexing and retrieval 
systems available, and it's much more effective to make use of the
existing packages rather than try to duplicate that functionality 
into PmWiki.  Plus, an index/retrieval engine can index more than 
just the PmWiki components if desired.  So, advanced searching of
wiki pages is definitely one of those capabilities that I think can
better handled outside of the PmWiki software itself.

Anyway, that's my current experience and thinking on these topics.
Thanks for bringing the question up to the list.  What I've written
here probably deserves a page or mention in the documentation somewhere,
but I'm not yet certain where it "fits".  

Pm


On 7 Mar 2003, John Rankin wrote:

> What is the largest PmWiki site in number of pages?
> 
> The encyclopedia is looking at perhaps 5000 pages for the prototype. When does search start slowing down, for example? Are there any limits we should be aware of?
> 
> So far 600+ pages (many of them long) is fine, but we'd like to avoid unpleasant surprises.
> 





More information about the pmwiki-users mailing list