[pmwiki-users] cookbook "ShellTools" (was: Include specific lines of text on a page)

Sun Jan 20 05:58:03 CST 2008

(NOTE: This was originally sent directly to Hans directly and he reminded me
to continue the discussion on the list.  I've edited slightly here to keep
up with later related posts...)

Sorry, I got a little carried away...

Basically grep stands for "general regular expression processor" and it is a
text searching utility.  (I think in the windows world "find" is the closest
utility available although it's a comparison between a chainsaw and a butter
knife...)

The initial suggestion is this:

(:grep "regex" filename/pattern ...")

In other words, search for a regular expression (this would be exactly like
your code except using the preg_match instead of the strstr call ... maybe
plus a bit of work to make the regex include whitespace although it's been
done in many places  throughout pmwiki and I'd be happy to submit a piece of
code I did recently that was fairly robust).  Do this search not on just a
single file, but on a list of filenames and file-name-patterns.  So I could
do this:

(:grep "Peter(?: L\.)? Bowers" Profiles.* Main.Sandbox MyGroup.*Suffix:)

And it would find all occurrences of "Peter Bowers" or "Peter L. Bowers" in
any of those pages represented by the 2 patterns and the 1 filename listed.

The (:pipe ...:) and (:cut ...:) and (:tail ...:) ideas are representative
of some of the shell tools available which are incredibly powerful in text
processing.  However, PHP itself is a very powerful (!!!) text processor
itself and perhaps these would be unnecessarily recreating the wheel.  IF it
was determined that these ideas would be helpful then something like this in
shell scripting:

Grep "Peter(?: L\.)? Bowers" Profiles.* | grep -v "don't like" | tail -n 3

(1) Finds the expected lines in the files
(2) from that list of lines it takes OUT any lines that have the text "don't
like" (because I don't want to see lines where someone says they don't like
me) (the -v option means reVerse the search semantics - search for lines
which do NOT match)
(3) from that list of lines we only take the last (tail) 3 lines (-n means
the Number of lines and then we specify 3 -- like --number=3)

The piping is represented (in shell) by that vertical bar -- it works in DOS
as well for the same reason -- take the stdout of one process and make it
the stdin of the next process.

[Ed. here I deleted my idea of how pipe might be implemented, because Hans'
suggestions of using the already-existing potential for nesting are better.
So for my example it would be like this:

(:tail -n 3 (:grep -v "don't like" (:grep "Peter(?: L\.)? Bowers"
Profile.*:):):)

It reverses the order of the "piping" and makes it a little more lisp-like
rather than shell-like, but it sounds like it's got the very significant
advantage of already being implemented...! ]

The thing that would probably be most helpful is perhaps the changes to the
immediate markup in hand and "grep" might be a bad choice if most in the
community are less linux-oriented -- perhaps your name is the best after
all.  But allowing regex instead of just a simple textual match and allowing
n files (with patterns) are both nice additions.

I like coming up with ideas, but I give no guarantee of their usefulness or
practicality (just ask John Rankin about the "irresponsible wikiforms"), so
please feel free to ignore my thoughts.  I may play with the idea myself
(giving you credit for the initial implementation) if you decide not to take
it to the next level... [Ed. Sounds like you're already moving on it -- this
is what's so great about the pmwiki community!]

-Peter