[pmwiki-users] yet another documentation suggestion ...

George J. De Bruin gdb at soundchasers.com
Sun Aug 7 13:30:45 CDT 2005


> I can think of the following heuristics:
> * How many pages refer to the page in question
>    (this rates the "overall quality" of the
page)
> * How often any of the search terms appear on
the page
> * Whether the search terms are clustered or
distributed evenly
>    ("clustered" gets the better rank; this rates
whether the
>    page accidentally mentions each of the search
terms (even) or
>    they are used together with a specific
meaning (clustered))
> * Whether the search terms appear near the
beginning of the page
>    (nearer to the beginning improves the rank)

One more thing I can think of: how many hits a
page gets.  While there is a fallacy of getting
"false" hits, often the more times a page is
accessed is a good indicator that it has
appropriate information, especially when balanced
with the other items in this list.

> How to weigh them against each other, I don't
know. We could ask Google, 
> but I suspect that this is the #1 trade secret
of the company :-)

Actually, I think they perfected the referer
engine heuristic, in combination with these other
measures.  Of course, their agressive spidering of
the web also helped a lot as it gave them a better
base of data to work with. :)





More information about the pmwiki-users mailing list