[pmwiki-users] Hiearchical Groups Proposal.

Joachim Durchholz jo at durchholz.org
Wed Oct 18 11:29:47 CDT 2006


Patrick R. Michaud schrieb:
> On Wed, Oct 18, 2006 at 04:35:33PM +0200, Joachim Durchholz wrote:
>> The Editor schrieb:
>>> Jo said:
>>>> Ideally, metadata would be pages, possibly with a special marker in the
>>>> page name so PmWiki knows that they don't get meta-metadata.
>>> Separate pages for groupheaders and footers (which should NOT be
>>> inheritable to subgroups). But no to passwords, etc, which should be
>>> inheritable (ie: store in the page).
>>
>> Say, if the current page is named foo.bar.baz. The relevant attribute 
>> pages might be named foo.bar.baz.%Header, foo.bar.baz.%footer, and 
>> foo.bar.baz.%passwords. If these don't exist, PmWiki would have to look 
>> in foo.bar.%Header, foo.bar.%Footer, and foo.bar.%passwords, then 
>> foo.%Header, foo.%Footer, and foo.%passwords, and finally %Header, 
>> %Footer, and %passwords.
> 
> Actually, at least in the case of passwords, PmWiki would practically
> need to scan the entire path to the root and not just stop at the 
> first one that is found.  Some passwords can be blank or otherwise
> inherited from a parent page.

Scanning for foo.%* is roughly just the same effort as scanning for 
foo.%passwords.
So if PmWiki needs to go further up the hierarchy, it should scan for 
foo.$* anyway, and cache the results in case any additional 
metainformation at that level is needed.

>> E.g. PmWiki could store the actual page text in a file, and all the 
>> other data (history and whatnot) in separate metadata files; that would 
>> make the PmWiki files far easier to read and process using external 
>> tools. 
> 
> On the other hand, it makes administrative overhead much worse,
> because one has to worry about keeping the files in sync.  
> Or, if we're really speaking only of reading the pages (and not
> updating them), then I would vote to keep the markup text in
> the pagefile as it is now, for administrative reasons, but overload 
> WritePage() such that it writes the text= attribute to separate 
> .txt files for external tools to process.

I think we have a spectrum here.

1. Keep everything that relates to a page in a single file. GroupHeader 
etc. could then go as attributes into the parent page.
2. As 1, but place a few often-needed things into their own files. The 
current version of the page is a candidate. Others might be split out 
from the "main file" as needed.
3. Split every kind of metadata out into a separate file.

Whatever the trade-offs dictate...

>> It would also help with page displays - the page read code would 
>> be reduced to a simple filegetcontents(), instead of the current code 
>> that has to identify the text, try to stop reading at end-of-text, and 
>> convert line ends back to newline codes; this would simplify and 
>> slightly speed up the page read code.

 > [...] the file_get_contents() function is only available in
 > PHP 4.3.0 or later (and thus far PmWiki still targets 4.1.x).
 > Yes, this could be worked around by providing a custom
 > file_get_contents() function for versions < 4.3.0, but it
 > doesn't seem to offer enough benefit to be worth the trouble.

That's beside the point. Whether it's file_get_contents() or something 
taken from Pear's Compat package, just reading a file is less work than 
reading and decoding it.

There's an additional point, of course: decoding the text is a small
overhead compared to interpreting all the markup.
OTOH every bit that makes PmWiki faster would help :-)

> [...] Displaying a page nearly always involves processing its
> metadata as well (if only to check permissions), so we aren't really
> avoiding the need to do some file parsing somewhere.

Right, I didn't see that.

> Also, for security reasons we have to encode/decode the contents of
> the markup text anyway,

Why that?

> so the extra step of decoding the newlines isn't at all significant,
> as PmWiki is using PHP's urldecode() function for this (and it has to
>  be done even if we don't decode the newlines).

Again: why is there a need to urldecode?

> And, on many systems, keeping the text in a separate file may
> actually slow things down a bit. 

Granted. Opening two files instead of one will generate some overhead.

It may be best to keep just the data that's needed for a page view in 
one file, and everything else in another one.

 > For example, the slowness that
> I observe on the pmwiki.org server seems to be due to filesystem
> latency issues -- i.e.., delays in being able to open files
> from whatever mass storage system my provider is using.

Latencies can be due to unindexed directories, I'd say.
If that's the case, you should probably move to a server that uses a 
better file systen ;-)

Regards,
Jo




More information about the pmwiki-users mailing list