[Pmwiki-users] Help with PHP regexp

John Feezell johnfeezell
Tue Jan 6 04:32:02 CST 2004


Thanks Pm for taking the time to walk through these it helps a WHOLE BUNCH.

-JF

On Mon, 5 Jan 2004 20:04:56 -0700, Patrick R. Michaud <pmichaud at pobox.com> 
wrote:

> On Mon, Jan 05, 2004 at 11:39:01AM -0600, John Feezell wrote:
>> I recently began studying PHP regular expressions so that I could use 
>> them with PmWiki and FTS.  I have material from the PHP manual but would 
>> like to know how others on the list have gained knowledge of these - 
>> websites, books, etc..
>
> Practice, and just playing with them.
>
>> It would be helpful to see an analysis of one or two of them as they 
>> relate to PmWiki.
>
> Gladly!  PmWiki is largely based on regular expression matching.  In 
> fact,
> I've often thought that I could potentially write PmWiki's text 
> processing
> engine as a sequence of regular expression match/replacement actions, but
> decided that was a bad idea (feels too much like Sendmail's 
> configuration...)
>
> I'll explain each of the patterns below as best I can...
>
>> For example I'm studying the following from PmWiki.php
>> $GroupNamePattern="[A-Z][A-Za-z0-9]+";
>
> A wiki group name starts with an uppercase letter and is followed by one 
> or
> more letters or digits.
>
>> $WikiWordPattern="[A-Z][A-Za-z0-9]*(?:[A-Z][a-z0-9]|[a-z0-9][A-Z])[A-Za- 
>> z0-9]*";
>
> A bit more complex.  Essentially this pattern says that a WikiWord has to
> begin with an uppercase letter, and must have at least one more uppercase
> letter and one lowercase letter or digit (in any order).  The ?: after 
> the
> opening parenthesis says that the parens are for grouping only and are 
> not
> a capturing subpattern.  The part within the parens matches an uppercase
> letter followed by a lowercase letter or digit, or vice-versa.
>
>> $FreeLinkPattern="{{(?>([A-Za-z][A-Za-z0-9]*(?:(?:[\\s_]*|-)[A-Za-z0- 
>> 9]+)*) (?:\\|((?:(?:[\\s_]*|-)[A-Za-z0-9])*))?)}}((?:-?[A-Za-z0-9]+)*)";
>
> Ths is probably the most difficult pattern in PmWiki--it took me a while
> to build this one.  I'll take out some of the optimizing paren constructs
> to explain it.  A freelink consists of two curly braces, 			      {{
> followed by a word,                        [A-Za-z][A-Za-z0-9]* followed 
> by zero or more words delimited by whitespace, underscores,
> or single hyphens,                       (([\\s_]*|-)[A-Za-z0-9]+)*
> optionally followed by a vertical brace
> and zero or more words delimited by
> whitespace, underscores, or single
> hyphens,                                 (\\|(([\\s_]*|-)[A-Za-z0-9]*))?
> followed by two curly braces,              }}
> followed by any sequence of letters.       (-?[A-Za-z0-9]+)*
>
> Again, the ?: after a paren indicates a non-capturing subpattern, and
> the ?> after the first parenthesis helps to optimize the regex match.
>
>> $FragmentPattern="#[A-Za-z][-.:\\w]*";
>
> A simple one--a link fragment consists of a '#', followed by a letter,
> followed by any sequence of hyphens, dots, colons, or alphanumeric
> characters.
>
>> $PageTitlePattern="[A-Z][A-Za-z0-9]*(?:-[A-Za-z0-9]+)*";
>
> A page title is any sequence of words (can be separated by single
> hyphens).
>
>> $UrlPathPattern="[^\\s<>[\\]\"\'()]*[^\\s<>[\\]\"\'(),.?]";
>
> The path component of a URL contains any character EXCEPT whitespace,
> angle brackets <>, square brackets [], quotation marks "', or 
> parenthesis.
> In addition, a URL doesn't end in a comma, period, or question mark.
>
> Questions and comments welcomed.
>
> Pm
>
>
>



-- 
 



More information about the pmwiki-users mailing list