[pmwiki-users] Turing Test for pmwiki

Mon Jul 14 00:00:44 CDT 2008

On Mon, Jul 14, 2008 at 3:38 AM, Patrick R. Michaud <pmichaud at pobox.com> wrote:

> the image-based captchas still seem to be the most prevalent -- I tend to take this as evidence that the image-based captchas still work better.

Using language-based captchas is a linguistic challenge not a
technical one as with image-based captchas. Maybe people have
initially approached it from the wrong angle? Maybe they have been
considering it from the same technical algorithm perspective -- i.e.
automation.

I believe the whole point is that anything computer-generated, by an
algorithm, can eventually be reproduced and beaten by another computer
-- spammers in this case. Hence, linguistic challenges need to be
produced by humans manually to be the most effective in Turing's
sense. (Hmm, isn't it like the android test in Blade Runner?)

> One of the difficulties of "real-world knowledge" tests is that
> much of what we know may be intimately tied to language and culture.

That depends entirely on which phrases you choose. This is the
difficult part, but also the most powerful. You query things which are
universal around the world, if you want everybody to be able to
answer.

The sun rises in the morning, and elephants are larger than mice,
everywhere. However, the month of July is in summer or winter
depending on where you live (which hemisphere).

Language is immensely diverse. It is possible to tune language-based
challenges to your audience. You can use language and cultural
specifics to your advantage when hosting for a specific audience.
Numbers are the same everywhere, but that universality has advantages
and disadvantages.

If your site is for example written in Welsh, then you can make
presumptions about world knowledge of Welsh speakers based on their
region and culture. This makes it even more difficult for spammers.
People from other cultures who do not speak Welsh will not be visiting
the site, or taking the challenge, so no problem.

If you write your challenges in French for a French-speaking audience,
then the spammers would have to understand French and write systems to
work with French syntax and semantics.

For image captchas with numbers they only need one OCR for anywhere in
the world. But with languages they have to create a customised system
for each language. Gotcha!

My point is that computer scientists may have been approaching this
from an algorithmic perspective which is natural for them, hence the
image captchas. Instead you have to approach it from a linguistic
perspective.

> I've set up an example at http://www.pmwiki.org/wiki/Test/TuringCaptcha.

Wow, Thanks, that's great! :-)

On pmwiki-2.2.0-beta65 I always get "Captcha succeeded" no matter what
is entered. I also had to change the form input syntax, are those
changes documented somewhere? I'll upgrade and see what happens
anyway.

I copied and pasted your recipe below. Is that okay as-is in a new
file, or does it require changes to your image-based captcha recipe?

Many Thanks again,

Marcus

> The local customization I used to do this is:
>
>  $CaptchasList = array(
>    'What day comes after Monday?' => 'Tuesday',
>    'What color is a green apple?' => 'green',
>    'What is larger, an elephant or an ant?' => 'elephant',
>    'What shape is a circle?  It is ro...' => 'round',
>  );
>
>  $key = array_rand($CaptchasList);
>
>  $FmtPV['$Captcha'] = "'$key'";
>  $CaptchaValue = $CaptchasList[$key];
>
>  include_once('cookbook/captcha.php');