A solution for blog spam?

I am currently planning my own CMS geared toward blogging, since I either just don’t like the CMSes I’ve reviewed, or they have a license that is incompatible with my needs.


Other blogging tools also fail in one other respect: The comment system. I do not want to _require_ commenters to register, neither do I want non-registered commenters to post URLs as they please.
What I also don’t want, is to make the registration or comment posting process to become inaccessible: Therefore, any “Captchas”:http://en.wikipedia.org/wiki/Captcha are out of the question.
Another thing I don’t want, is to allow registered commenters to do anything they want, since some creep could hire a high-school kid to spam by hand.
Finally, I don’t want to, like today, have to maintain an ever-growing blacklist. My current mt-blacklist contains 3332 entries at the time of writing. Only _{Insert deity of choice here}_ knows how much this slows down comment processing.
I believe I have a solution to this:
h3. Automatic semi-moderation
My idea for a moderation system works as follows:
* Commenters are divided into four categories:
** Moderators
** Registered, human
** Registered, possible bot (typically newly registered users)
** Unregistered
* Comments from moderators and human registered users will not be automoderated. This means that the comments will be published as-is.
* Comments from unregistered users and newly registered users (the possible bots, remember), will have any URLs stripped. This includes URLs both in links, signatures, or plain text within the entry.
* Upgrading a registered user from “possible bot” to “human” can take place in one of two ways:
** A moderator can upgrade the user.
** Other registered humans can vote for the user. If a user has _n_ votes (configurable, of course), he/she is upgraded to a “human” user, and can post comments uncensored.
* Moderators can also mark single comments as “human”, so that URLs are made available to the general public.
* Moderators can mark spam comments for deletion.
Using this simple method, just about any random visitor should still be able to leave a comment on a blog, and have it instantly published. Spammers won’t be getting their google-boost, while legitimate pages _will_ get the google-boost they deserve. No more flaky redirection-scripts.

Next Post

10 Comments

  1. It’s probably just MT. I have never had a single spam post on my weblog. Then again, I do require preview and well-formed XML. 🙂

  2. First off: For most users, requiring valid xhtml input is not an option. Requiring users to manually input markup is not terribly user-friendly, even for users who know markup, and it’s next-to-impossible for innocent bystanders who have no interest in markup at all.
    Requiring valid markup for this site, would probably not be too much of a problem, but on my other blog, it would quickly become one.
    Secondly: Requiring preview is not too much of a hassle for the determined spammer.
    That you aren’t being spammed, is, I presume more because spammers haven’t started targeting the WordPress commenting system. Yet.

  3. Three things:
    1) oh yes, spammers _do_ target WP. As I have noticed, they especially seek out the standard “hello world” first post of fresh WP installations. I’m not entirely sure, but I believe fresh installations don’t have comment moderation turned on by default, so that seems to be a nice strategy.
    2) your approach to curbing blog spam looks promising to me, but have you considered coding this up as a plugin for an existing piece of blogware? Of course, if you simply want to have fun coding your own little CMS, just go ahead, but writing a plugin literally plugs you in to an existing community with all the benefits of more people working on the project and presumably also looking at your code. (Brian Meidell recently dropped his own blogware in favor of WP for the very same reasons.)
    3) with your current plans it seems to me that you need a fairly busy site. Many blogs only get a couple of comments a week or even a month and for them having a tiered system of registered and unregistered commenters probably won’t work quite as well (due to not enough registered commenters in the first place) as for a fairly busy site.

  4. bq. your approach to curbing blog spam looks promising to me, but have you considered coding this up as a plugin for an existing piece of blogware?
    Yes, I did consider plugging this into some other blogware, but since this moderation system is far from the only thing I have in mind (for comments, and almost every other part of the CMS), I’ll end up rolling my own.
    bq. with your current plans it seems to me that you need a fairly busy site
    Not really. If you have a site where there is just the occasional comment, I think it’s quite likely that the moderator will read and moderate comments all on his own, without drowning under the workload.

  5. Have you looked at the simple solution they’ve got over at http://www.samizdata.net? Very simple; at the comment section they trow in an image of a number and you type in the number given to prove you’re not a bot. That takes care of the bot problem all together. Moderating after that becomes trivial.
    [Ed. note: typographical edit performed]

  6. Alexander: The method you are describing, is what is called a “captcha”:http://en.wikipedia.org/wiki/Captcha.
    The main problem with Captchas are that you also make it impossible for blind users to comment. Plus: you’re making it impossible for those who choose to turn off images (on a GPRS connection, I frequently do so myself). Finally, you make it impossible to use for those with aural user agents.
    So, captchas are a _really, really, really bad_ idea, I’m afraid.

  7. The main reason, I realize, that I like this approach, is that the site owner doesn’t necessarily have to be involved in ‘approving’ people for making a post on the site. I’m not around 24/7 to moderate comments on my own site and one thing I absolutely want to avoid is comments not going through to the site immediately, because I hate it so much myself when you comment on an article only for your comment to appear on the site days later or even never. However, for this to work well (or for the Slashdot per comment voting alternative to work well) you need a fairly busy site with either a number of moderators to promote people or enough other readers to vote for newcomers.

  8. In addition to the reasons Arve mentions of why Captchas are a bad idea, is that they don’t stop hired high-school students from spamming. So, they block out blind users and users with low bandwidth, but don’t block out human spammers. Yes, they block out spam bots, but that’s only a part of the problem, and this solution just doesn’t cut it. It has too many downsides to it.

  9. And lest the “hired hand spammers” sound unlikely: I got 14 spams the other day, for a site in the US, but the spams were entered by hand by someone in Malaysia. Apparently offshoring has reached the spamming world, too.
    Your approach sounds good to me, Arve. Busy or not, most sites get most of their comments from a relatively small percentage of the readers, who you can quickly identify as human. Only moderating the links, not the entire comment, solves most of the problems with moderation interrupting the flow of comment conversation, particularly if you have a way of allowing people who really want to see the moderated links to do so (I’ve idly thought of having a cgi blocked off in robots.txt that displays the unexpurgated version). Per-blog registration is a litte annoying, but if the only difference is whether links are visible to all, or only to humans who click, it ought to work.
    One thing, though: don’t be too quick to dismiss the value of forced preview. While people with setups otherwise similar to mine talk about hundreds of spam comments a week, I generally get one or two, entered by hand. I had one run of twenty comments by a robot, but I presume that was a friend doing it as a proof of concept, since it was just the one set. It’s trivial to work around, but if there are only a dozen of us doing it, and millions who don’t, it’s still not worth their trouble.

  10. bq. One thing, though: don’t be too quick to dismiss the value of forced preview. While people with setups otherwise similar to mine talk about hundreds of spam comments a week, I generally get one or two, entered by hand.
    By all means, forced preview may very well put off spam bots for now, I recognize that. What I’m saying is that I fear that if all blogs required preview, a spammer would build a tool to circumvent this. In that respect, spam is a lot like cryptography vs. cryptanalysis. It’s a race we really can’t expect anyone to win in the end (Quantum cryptography put aside)