search engine beware! Was: Re: [pmwiki-users] notice of current edit

Radu radu at monicsoft.net
Fri Apr 15 08:18:22 CDT 2005


Note that this notice of current edit was a recipe I was planning for the 
ProjectManagement bundle, not something for the core.

At 05:45 PM 4/14/2005, Patrick R. Michaud wrote:
>On Thu, Apr 14, 2005 at 11:31:23PM +0200, Joachim Durchholz wrote:
> > Patrick R. Michaud wrote:
> > >
> > >There's another important aspect to this:  in looking through pmwiki.org's
> > >server logs recently, I've noticed that a lot of edit links get triggered
> > >by search engines.

Yum, the search engine trouble. I was going to say something about this 
some time ago, then it got shuffled out as irrelevant. Now it's back so 
here it is:

Sometimes people (especially in the ProjectManagement bundle) would like to 
hide their content from search engines. That could be accomplished for the 
well-behaved engines by adding the correct entries in the root robots.txt file.

Me, I find that file more of a pain than a solution. Mischievous search 
engines or slurp machines (a la webzip) totally ignore these files, and 
some may even use them to get at content that's deemed a bit 'private'

So how about this:
Since no sane individual can see two different pages in the same second, 
not to mention edit them, there is a way to differentiate between search 
engines and actual wiki authors: log the timestamp of the previous access 
from each IP. If it's smaller than a settable interval (default 2s), then 
do not honor requests for edit. For an even stronger deterrent, to save 
processor time when the wiki is supposed to be hidden, we could also add an 
$Enable switch to keep from honoring ANY request to fast-moving IPs.


Cheers,
Radu
(www.monicsoft.net) 




More information about the pmwiki-users mailing list