Blocking comment spam without using captcha

20,167

Solution 1

In my experience the currently most effective methods are honeypot input fields that are made invisible to users via CSS (best use several different methods, such as visibility:hidden, setting a size of 0 pixels, and absolute positioning far outside the browser window); if they're filled anyway you can assume it's a spambot.

This blog describes a rather complex method that I've tried out myself (with 100% success so far), but I suspect that you could get the same result by skipping all the stuff with hashed field names and just add some simple honeypot fields.

Solution 2

1) Adding session-related information into the form Example:

<input type="hidden" name="sh" value="<?php echo dechex(crc32(session_id())); ?>" />

then at postback, check whether session is valid or not.

2) Javascript-only. Use Javascript injection at Submission. Example:

<input type="hidden" id="txtKey" name="key" value="" />
<input type="submit" value="Go" onclick="document.getElementById('txtKey').value = '<?php echo dechex(crc32(session_id())) ?>';" />

3) Time-limit per IP, User or Session. this is quite straightforward.

4) Randomizing field names:

<?php
   $fieldkey = dechex(crc32(mt_rand().dechex(crc32(time()))));
   $_SESSION['fieldkey'] = $fieldkey;
?>
<input type="text" name="name<?php echo $fieldkey; ?>" value="" />
<input type="text" name="address<?php echo $fieldkey; ?>" value="" />   

Then you can check it over at the server side.

Solution 3

Akismet has an API. Someone wrote a wrapper class (BSD liscense) for it over at: http://cesars.users.phpclasses.org/browse/package/4401.html

There's also a Bayesian filter class (BSD Liscense as well) http://cesars.users.phpclasses.org/browse/package/4236.html

Solution 4

This is simple trick to block spam bot or brute force attack without using captcha.

Put this in your form:

<input type="hidden" name="hash" value="<?php echo md5($secret_key.time()).','.time(); ?>" />

Put this in your php code

$human_typing_time = 5;/** page load (1s) + submit (1s) + typing time (3s) */
$vars = explode(',', $_POST['hash']);
if(md5($secret_key.$vars[1]) != $vars[0] || time() < $var[1] + $human_typing_time){
    //bot?
    exit();
} 

Depend on weight of form you can increase or decrease $human_typing_time.

Solution 5

Naive Beyesian filters, of course:

http://blog.liip.ch/archive/2005/03/30/php-naive-bayesian-filter.html

Share:
20,167
ian
Author by

ian

Updated on July 09, 2022

Comments

  • ian
    ian almost 2 years

    What are some non-captcha methods for blocking spam on my comments?

  • Michael Borgwardt
    Michael Borgwardt over 14 years
    In my experience, 1) and 3) completely ineffective since mny bots behave like normal users as far as sessions are concerned, and requiring JavaScript for basic functionality is inacceptible.
  • Kip
    Kip over 14 years
    this doesn't work unless you just discard any posts containing links, which means sometimes valid users are going to lose their posts for no apparent reason. if you remove the links but allow the text you'll probably still get spam (at least that was my experience)
  • Kip
    Kip over 14 years
    +1 this worked on my blog too, mostly. every now and then there will be a spammer that submits to every form on the page though...
  • Programatt
    Programatt over 14 years
    -1, how would this even be effective?
  • Kip
    Kip over 14 years
    hiding the field with the aid of CSS would be better... then any non-JS users wouldn't see it either.
  • Kip
    Kip over 14 years
    in my experience, most spambots blindly submit to the first form on the page. a few submit to every form. i haven't noticed any picking one at random (though i'm sure there are some)
  • Steve Wortham
    Steve Wortham over 14 years
    +1. Very cool. I also like the time limit idea. If a "user" submits a comment 74 milliseconds after requesting the page then you know something's up.
  • Michael Borgwardt
    Michael Borgwardt over 14 years
    Links are nearly the only reason why automated comment spam exists. By disallowing posts that contain links, you'll catch close to 100% of all spam. You don't have to swallow them silently either; display an explicit error message "sorry, no links allowed in comments". But still, it does make comments less useful.
  • Michael Borgwardt
    Michael Borgwardt over 14 years
    @Steve - yes, but unfortunately it's too easy for spambots to defeat.
  • Arkh
    Arkh over 14 years
    Sure, at the moment they don't pick randomly because not a lot of people use the honey pot method. But give it 2 or 3 years and some will. Let's get a little ahead of bots while it's possible.
  • Jacco
    Jacco over 14 years
    also check out "detecting stealth web crawlers": stackoverflow.com/questions/233192/…
  • Programatt
    Programatt over 14 years
    bots will still post spam, even a text only link is better than nothing.
  • Programatt
    Programatt over 14 years
    Spam is useless, but the bot doesn't care.
  • Robert K
    Robert K over 14 years
    @michael: Your anti-javascript attitude is antiquated. Less than 5% of all users have their javascript deactivated. And if you've got it off, then it's your own fault. Besides, just put in a no-script warning to these folks that they can't submit.
  • Michael Borgwardt
    Michael Borgwardt over 14 years
    Javascript is noadays the #1 source of virus infection, botnets and thus ultimately, spam. If you have it on indiscriminately, or require it for basic functionality. then you are part of the problem.
  • Michael Borgwardt
    Michael Borgwardt over 14 years
    I don't really understand what multiple forms have to do with honeypots, but the blog post I linked to describes how you can randomize field names without needing a session. However, I suspect that this extra effort is wasted: simple bots right now will go for honeypot fields with non-random names as well (or does anyone have data that contradicts this?), and if they're forced to become more sophisticated, it would not be all that hard to analyze the lexical page structure and use that to decide which fields to fill and which to skip.
  • Arkh
    Arkh over 14 years
    Not multiple forms, multiple hidden fields. With one bad field for 3 good ones, a bot could just bruteforce trying to get past with one field not completed at each try. So you have 25% of spam still going through. With 10 pots, it has to leave 10 fields blank out of 13.
  • Strae
    Strae over 14 years
    Even if the peopel with js disabled is less than 5%, i dont like to bind a webpage functionality to something that can be not enabled or -aware- be easly fooled and manipulated by the user. However in this case, i'll assume that the 99.99% of the target use javascript, and this use dont have any dangerous drawback.
  • Mark
    Mark over 13 years
    @Michael Borgwardt the internet is 100% of the source for virus infection, botnets and thus ultimately spam.
  • Alex from Jitbit
    Alex from Jitbit over 13 years
    The biggest flaw of the "honeypot" methods - browsers can autofill those fields anyway. For instance, Chrome will put your email into every <input name="email"> if autofilling is turned on.
  • Michael Borgwardt
    Michael Borgwardt over 13 years
    @jitbit: well, that's something to keep in mind, but can't happen with the method I linked to, which uses randomized field names that are different each time the form is displayed. The idea is to make it impossible for the spambot to know which fields must be filled and which must not be filled.
  • BlueRaja - Danny Pflughoeft
    BlueRaja - Danny Pflughoeft about 13 years
    Note that using this method, you may incorrectly block terminal users whose browsers do not apply that sort of CSS (namely, braille-readers)
  • RyanS
    RyanS about 11 years
    If they submit to every form cant you determine that as spam due to the presence of text in the honey pot? No one but a bot will post there, so if it has any contents = spam. (one edge case: autofill)
  • Kip
    Kip about 11 years
    @RyanS: I wrote this nearly four years ago so I'm not sure, but I think I meant that someone submits to every form visible on the screen, without populating the honeypot field.
  • nmsdvid
    nmsdvid over 10 years
    Can you explain me what is $secret_key? And I think you have a typo at time() < $var[1] I think it should be $vars.
  • Matthew Smith
    Matthew Smith over 8 years
    @BlueRaja-DannyPflughoeft, doesn't the same limitation apply to any non-pure-HTML CAPTCHA method?
  • Lothar
    Lothar about 7 years
    Any serious Braille Reader must use CSS.