[pmwiki-users] New feature proposal - need help with markup selection

Wed Mar 16 06:42:56 CST 2005

Hi all,

here's a large-ish feature proposal. I have done all the feasibility 
research and will implement it for a Wiki that I set up for a friend, 
but I'd like to ask here for improvements and whether it's useful as an 
inclusion in the official PmWiki distribution.

Here goes:

I propose two extensions with a very useful synergy: substitutions, and 
repeat groups.

Substitutions are supposed to work as follows:
* Writers insert markup that assigns a value to a symbol.
* Writers use the symbol in other places of the page. When rendering, 
PmWiki substitutes the values for the symbols.

This has the following advantages:
* Largeish texts can be defined once and for all, without need for 
typing the same thing over and over again. (Especially useful if the 
full text contains a lot of markup, e.g. some complicated
* The "print" skin could leave the symbols in, and print them as 
footnotes at the bottom of the page. (The Wiki installed at the Ubuntu 
site does this, and it's a nice idea.) I'm not sure whether this 
"outsourcing" should be done just in the "print" skin, or in general.
* It can be used to decouple table layout from table contents, like this:

|| &TH1  || &TH2  ||
|| &TD1a || &TD2a ||
|| &TD1b || &TD2b ||

&TH1 = foo
&TH2 = bar

&TD1a = This is a very lengthy text that goes into the very first table 
cell.
&TD2a = This is a very lengthy text that goes into the second cell of 
the first line.

&TD1b = This is a very lengthy text that goes into the first cell of the 
second line.
&TD2b = This is a very lengthy text that goes into the second cell of 
the second line.

Repeat groups are supposed to work as follows:
* There's new markup, say (:do:)...(:default:)...(:done:). It can be 
nested (not sure how to implement that best using regexps though... 
depends on the details of how rules are applied: if they are applied 
repeatedly, and from the beginning of the text, until they don't find an 
applicable substitution point, then it's no problem; otherwise I'll 
probably have to do some serious programming).
* The stuff between (:do:) and (:default:) is examined for the use of 
symbols. For each symbol, the appropriate expansion is substituted in. 
This "eats up" one definition for each symbol (it isn't available). This 
substitute-the-symbols-and-eat-the-definition process is repeated until 
no more definitions are left (yes, I'm assuming there are multiple 
definitions for symbols). If one symbol has less definitions than 
others, then the last definition is revived until all symbols have run 
out of definitions. (The example below will clarify this.)
* If there is no definition for any symbol, then the text between 
(:default:) and (:done:) is used. Likewise if the text between (:do:) 
and (:default:) doesn't contain any symbols.

Here's an example:

--- snip ---
|| Term || Definition ||
(:repeat:)
|| &T   || &D         ||
(:default:)
|| &ND                ||
(:done:)

&ND = There are no definitions yet.

&T = foo
&D = This is the archetypical [[metasyntactic variable]] used in 
[[Computer Science]] lectures.

&T = bar
&D = This is used if the lecturer need a second [[metasyntactic variable]].

&T = baz
&D = This is used if the lecturer needs a third one. Last seen in the 
form of "function foo (bar: baz)".

&T = fump
--- snip ---

The nice thing here is that it makes it easy to add additional rows to a 
table, with minimal syntactic fuss.
I purposely inserted an error: there's no &D for the last &T. The last 
line of the generated table will, hence, contain a repetition of the 
last &D definition - an obvious error, but easily corrected by adding a 
line that reads

--- snip ---
&D =
--- snip ---

leaving the last table line conveniently empty until the author finds 
something witty to write for "fump" :-)

There's an interesting extension, though I have no current plans to 
actually implement that: Definitions could refer to a database query, 
and PmWiki could iterate down the resulting rows and take the 
definitions from there. Of course, this extension won't help those with 
no databases, but those who do have one could take a huge advantage from 
this kind of functionality.

Another interesting alternative would be taking definitions from text 
files. This should please the database-challenged among us :-)

Issues known to need input/clarifications/critique
--------------------------------------------------

* The &VARIABLE markup is a bit ugly. I would have preferred * over &, 
but I think some people will want to have the Mozilla mail&news markups 
with * for bold, _ for underline, and / for italics. There are plenty of 
other special characters that could be used, but I have no idea why one 
should be preferrable over another one.
* I have no idea what to allow or forbid in the variable names 
themselves. Just uppercase letters is probably too rigid,
* I don't know whether a markup that starts with a & sign in column 1 
collides with any other markups - it doesn't collide with those of 
PmWiki itself (I have checked), but I haven't looked into other markups.
* I've been thinking about "alternating symbols". E.g. define two 
symbols that are used alternatively (say, they might contain the 
background color definition for table rows). This would eliminate the 
need for some other markup. I don't have an idea how the markup for the 
symbol definitions should be though. "&&VARIABLE =" as a "start of the 
repeat group" marker maybe?

Implementation
--------------

Easy to do as a cookbook recipe, since it can be done from the markup 
functions. (Hats off to PM who chose a rule-based markup system, and 
used partial ordering constraints to establish evaluation order - it's 
exactly the way such things should be done and are almost never done. 
I'm impressed, and that's indeed a rare thing).

Might wander into the core once PM decides to use the mechanism for 
other core functionality, but I'm happy to leave it as it is.

Nested loops are a bit hairy. My first idea was to use an inside-out 
strategy, finding loop constructs that don't contain other looping 
keywords (easy to do as a regex: search for (:do:)...(:end:) that 
doesn't contain a (:do:) inside, substitute, rinse and repeat until no 
(:do:) is left).
Unfortunately, that doesn't work: an inner loop may get replicated and 
filled with variables defined in the outer loop, so outer loops have to 
be done first. Which, in turn, means that the regex that finds the loop 
must count opening and closing loop keywords and declare "end of loop" 
as soon as the open-close keyword count returns to zero. Standard 
regexes can't count, so I'm going to have to rely on nonstandard tricks 
- and that could become hairy. If anybody already has done something 
like that, I'd love to hear about the best techniques...

Comments? Critique? Encouragement? All are most welcome :-)

Regards,
Jo