search engine beware! Was: Re: [pmwiki-users] notice of current edit
Radu
radu at monicsoft.net
Fri Apr 15 08:18:22 CDT 2005
Note that this notice of current edit was a recipe I was planning for the
ProjectManagement bundle, not something for the core.
At 05:45 PM 4/14/2005, Patrick R. Michaud wrote:
>On Thu, Apr 14, 2005 at 11:31:23PM +0200, Joachim Durchholz wrote:
> > Patrick R. Michaud wrote:
> > >
> > >There's another important aspect to this: in looking through pmwiki.org's
> > >server logs recently, I've noticed that a lot of edit links get triggered
> > >by search engines.
Yum, the search engine trouble. I was going to say something about this
some time ago, then it got shuffled out as irrelevant. Now it's back so
here it is:
Sometimes people (especially in the ProjectManagement bundle) would like to
hide their content from search engines. That could be accomplished for the
well-behaved engines by adding the correct entries in the root robots.txt file.
Me, I find that file more of a pain than a solution. Mischievous search
engines or slurp machines (a la webzip) totally ignore these files, and
some may even use them to get at content that's deemed a bit 'private'
So how about this:
Since no sane individual can see two different pages in the same second,
not to mention edit them, there is a way to differentiate between search
engines and actual wiki authors: log the timestamp of the previous access
from each IP. If it's smaller than a settable interval (default 2s), then
do not honor requests for edit. For an even stronger deterrent, to save
processor time when the wiki is supposed to be hidden, we could also add an
$Enable switch to keep from honoring ANY request to fast-moving IPs.
Cheers,
Radu
(www.monicsoft.net)
More information about the pmwiki-users
mailing list