[pmwiki-users] yet another documentation suggestion ...
Joachim Durchholz
jo at durchholz.org
Thu Aug 4 10:20:40 CDT 2005
Patrick R. Michaud wrote:
> On Wed, Aug 03, 2005 at 11:22:23AM +1200, John Rankin wrote:
>
>> Yes! It seems to me that people use many techniques for finding
>> things in a large body of text, including a table of contents, an
>> index and a search. One issue I see is that the page index (list of
>> pages in alphabetical order) isn't very helpful in large page
>> collections, because the sort is not necessarily in a useful order.
>>
>
> This is a very good point. I wonder how hard it would be to add a
> "relevance" measurement to the search, so that it could order pages
> based on a predicted relevance instead of just alphabetically?
There are a lot of heuristics applied in that area. After all, since the
search algorithm cannot really rate relevance, it has to guess it.
I can think of the following heuristics:
* How many pages refer to the page in question
(this rates the "overall quality" of the page)
* How often any of the search terms appear on the page
* Whether the search terms are clustered or distributed evenly
("clustered" gets the better rank; this rates whether the
page accidentally mentions each of the search terms (even) or
they are used together with a specific meaning (clustered))
* Whether the search terms appear near the beginning of the page
(nearer to the beginning improves the rank)
How to weigh them against each other, I don't know. We could ask Google,
but I suspect that this is the #1 trade secret of the company :-)
Cf. http://en.wikipedia.org/wiki/Page_rank . I haven't checked the links
that lead away from that page, but they looked interesting, too.
>> What if we had an (:index text:) directive?
We'd continually be battling out-of-date (:index...:) directives.
Particularly since errors in them don't make any obvious problems (at
worst, a page will not be found, which is exactly the kind of error that
never gets reported).
Regards,
Jo
More information about the pmwiki-users
mailing list