Wiki-Spammers (was: [Pmwiki-users] pmwiki.org - version 1 - vandalised)

Patrick R. Michaud pmichaud
Fri Jan 14 09:03:36 CST 2005


On Thu, Jan 13, 2005 at 11:23:11PM +0100, Steffen Gl?ckselig wrote:
> On Thu, 13 Jan 2005 21:49:37 +0000, Ciaran <ciaranj at gmail.com> wrote:
> 
> >In principle I quite like the idea of a bayesian network for
> >classifying good wiki pages from bad, however the amount of data
> >required to train a bayesian network is generally quite large
> [...]
> One could bundle several wikis using a central facility to accumulate the  
> data from the allowed and disallowed texts. The network generated  
> centrally could later be shipped as default network with fresh  
> pmwiki-installations. These networks would then be costumized at site.

Another problem with bayesian networks is that they need to be
continually (re)trained as spammers find ways to get past the current
filters.  Aggregating several sites together can help in providing
content for training the bayesian filters, but this doesn't work well
for sites that exist primarily to discuss specialized topics (and
therefore need additional local training of the bayesian network).

> Yep. Well, actually one could allow every page per default and later  
> filter the bad pages out, I think.
> ...a lot of manual work to do.

And the amount of work required just to keep things running goes
directly against both PmWikiPhilosophy #1 (favor writers) and #5 
(low administrative maintenance).  So, it doesn't seem like the
right way to go...

> Since pmwiki is centrally developed it seems rather 'simple' to setup a  
> central facility which could accumulate and manage a blacklist and/or a  
> list of banned IPs. 

This was already suggested back in November, see http://pmichaud.com/pipermail/pmwiki-users_pmichaud.com/2004-November/007986.html , and I certainly 
don't see any issue with maintaining a shared blocklist on pmwiki.org.
However, I probably won't be maintaining the blocklist myself, due to 
(1) lack of time and (2) since instituting $UnapprovedLinkCountMax 
filtering on pmwiki.org I haven't had any spam there (except on those 
few pages which don't use that filter).

Pm



More information about the pmwiki-users mailing list