Posted on December 12, 2010
By John Nunemaker
Spam sucks. Knowing this, we decided from the start to use an outside service for spam filtering (defensio). We also did not want users to ever think about it—no signing up for an account and pasting in an API key. Instead, we have a key for all of Harmony and take care of all that for you.
When a comment is created, it is stored as an unapproved comment and a job is queued up to ping defensio. Defensio then pings us back with a spaminess percentage and whether or not they think the comment should be allowed.
There is a lot of spam out there and your moderation queue can fill up fast. The first thing we did is sort unapproved comments by spaminess ascending. This placed comments that were least likely to be spam at the beginning. This mean you could check those and then just delete all of the rest. We quickly noticed that all comments greater than 60% spaminess were always spam.
Rather than force you all to deal with those comments, we purge any unapproved comment over 60% likely to be spam on an hourly basis. This means less crappy data in our system and less for you to think about.
The next thing we noticed is that occasionally, for whatever reason, the wires get crossed between Defensio and Harmony, leaving comments that are very obviously spam in your moderation queue. Finding this quite annoying, I deployed code today that automatically re-queues any unapproved comment with no spaminess that is over an hour old.
This addition in combination with the automatic purging should bring the amount of moderation you need to do down to almost zero. For example, before deploying this last tweak, RailsTips had about 150 comments in the moderation queue. I just checked and it was down to 2. One had a spaminess of 52% and the other will be purged within the hour as it was at 99%.
Hope you enjoy the zapping of more spam as much as I already am!