[pmwiki-users] TextExtract (Search recipe) update

Wed Sep 16 03:51:59 CDT 2009

Thank you Hans for this updated version of your recipe (and for your last
advice about markup=on).The new templated header is fine.

Gilles.

2009/9/15 Hans <design5 at softflow.co.uk>

> Friday, September 11, 2009, 1:09:42 PM, Hans wrote:
>
> > For TextExtract I cannot just use PmWiki's search engine,
> > because we need to extract text. But thanks to your suggestion I was
> > inspired to look at the handling of search terms again, and will
> > incorporate the way PmWiki's search handles search terms, so we can
> > have input like
> >   'abc xyz' => output with 'abc' AND 'xyz' in the page;
> >   '"abc def" xyz' => output with 'abc def' AND 'xyz' in the page;
> >   'abc -xyz' => output with 'abc' but NOT 'xyz' in the page;
> >   'abc|xyz' => output with 'abc' OR 'xyz' in the page;
>
> Now available in the latest release.
> http://www.pmwiki.org/wiki/Cookbook/TextExtract
>
> I also added some template variables for use in parameters
> header= , footer= , phead=
> for instance a header with a custom title and the search time:
>   header="%rfloat%{$$time}%%'''Listing'''"
>
> I split regular expression search from standard search, to allow
> easier term input, and added a checkbox for regular expression search
> to the search form.
> I added a checkbox for 'Match whole words' for whole word searches.
>
> A note on efficiency:
> TextExtract with its in-built pagelist function runs faster than using
> PmWiki's pagelist, or MakePageList() function, mainly because
> PmWiki's pagelist process opens every page to check if the user is
> authorised to see the page, because it does not want to output any
> non-authorised pages, for instance read-protected pages. This file
> opening can be quite time consuming.
> On the other hand TextExtract constructs a pagelist including even
> read-protected pages, authorisations are not checked at this stage in
> the process. Only later when each page on the source list is opened
> will authorisation be checked, before text lines are extracted and
> processed. So  a lot less pages need to be opened, which makes for
> a faster process. That is the main reason I did not use MakePageList()
> as a source pagelist generator.
>
> Still, a possibility remains to use the PmWiki searchbox with  a
> fmt=#extract option, which will use PmWiki's pagelist functions
> and TextExtract formatting functions. Useful if you need to pass
> pagelist parameters TextExtract does not understand.
>
>  ~Hans
>
>
> _______________________________________________
> pmwiki-users mailing list
> pmwiki-users at pmichaud.com
> http://www.pmichaud.com/mailman/listinfo/pmwiki-users
>

-- 
---------------------------------------
| A | de la langue française
| B | http://www.languefrancaise.net/
| C | languefrancaise at gmail.com
---------------------------------------
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.pmichaud.com/pipermail/pmwiki-users/attachments/20090916/ca2f4dcd/attachment.html