[pmwiki-users] TextExtract (Search recipe) update

Hans design5 at softflow.co.uk
Tue Sep 1 05:51:07 CDT 2009

I've given TextExtract a complete overhaul:

I added a number of new options, fixed various bugs, and
improved output display.

Search terms are displayed either in their text line, or paragraph,
or whole page (unit=line, =para, =page).

Search terms are highlighted (default background yellow), a
wiki style of highlight can be given (i.e highlight='bgcolor=green',
or =bold, =none ).

Markup can be displayed escaped, hidden, shown as source, or active
(markup=code, =cut, =source, =on)

Result numbering can be added (number=1, number='color=red'),
Page numbering prefixes can be added to th enumbers (perpagenumbering=1)

Process time can be shown in the header (timer=1).

Automatic wrapping of preformatted code lines is enabled, via css style

To exclude certain terms you can add them to the $TextExtractExclude

Plus loads of other possible customisations.

TextExtract can be use as a search box,
and i think the most useful combination of parameters may be:

A: markup=code unit=line (the default)
Lines will be returned with the matches highlighted.
The search will include hidden markup directives,
and all hidden directives and code will be shown as escaped code.

B: markup=cut unit=para
Whole paragraphs will be returned with matches highlighted.
The search excludes hidden markup and directives.
This is a search on what you see in the pages, and showing the results
as part of the paragraph can give more context.

C: markup=source unit=para or unit=page
Displays the page source code (like action=source), but within the
search page. Matches are highlighted.

You can find TextExtract enabled with a number of on search forms
on my site here:

Extracting information in context from the PmWiki documentation worked
well in my tests, and it is quite fast.

I wonder if TextExtract can be experimentally enabled on its Cookbook
page, and we can test how it performs searching the Cookbook and the
PmWiki group. The script opens the pages for reading, but nothing
gets written anywhere. The worst that can happen are php timeout

Please have a look at it and test it and come back with comments!


More information about the pmwiki-users mailing list