Wiki-Spammers (was: [Pmwiki-users] pmwiki.org - version 1 - vandalised)

Steffen Glückselig steffen
Thu Jan 13 15:23:38 CST 2005


On Thu, 13 Jan 2005 21:49:37 +0000, Ciaran <ciaranj at gmail.com> wrote:

> In principle I quite like the idea of a bayesian network for
> classifying good wiki pages from bad, however the amount of data
> required to train a bayesian network is generally quite large, i.e.
> the wikisite would have to have quite a large growth rate in order to
> generate sufficient data to come up with meaningful training data.
You are right. There must be many edits - rather than a large growth rate  
- to generate a large enough corpus.

One could bundle several wikis using a central facility to accumulate the  
data from the allowed and disallowed texts. The network generated  
centrally could later be shipped as default network with fresh  
pmwiki-installations. These networks would then be costumized at site.

Still...
> *every* wiki-page post would need
> to manually classified by (a) trusted individual(s) which doesn't seem
> a lot better than requiring every wiki-page to be authorised by
> someone who is trusted...
Yep. Well, actually one could allow every page per default and later  
filter the bad pages out, I think.
...a lot of manual work to do.


In my eyes the word-blacklist together with IP-banning and some scoring  
mechanism seem promising.
Currently, for me the word-blacklist is sufficient.


Since pmwiki is centrally developed it seems rather 'simple' to setup a  
central facility which could accumulate and manage a blacklist and/or a  
list of banned IPs. The advantage would be that after being banned (by  
banning certain content of his texts or his IP) from one wiki that  
particular spammer would not get through in any other pmwiki.  
Unfortunately spammers operate in quite narrow time-slots.

Spam Karma  
(http://unknowngenius.com/blog/archives/2004/11/19/spam-karma-merciless-spam-killing-machine/),  
a spam filter for blogs, uses such a central blacklist. With quite some  
success...



best regards
Steffen



More information about the pmwiki-users mailing list