Referer tarpit

Let it be no secret that referer spammers annoy me. Let it be no secret that I don’t think they are the brightest crowd around, either.
Which is why I’m actively fighting them. Meet Referer Tarpit.


h3. Rationale
I have watched quite a few lines of output from tail -f and I noticed the following:
# Most referer spammers obey HTTP redirects. They actually need to, since most referer display tools won’t output anything for 301 or 302 responses.
# Most (if not all) of the same referer spammers also fetch the document body of the document they try to spam.
This gives us some ammunition.
h3. What does Referer Tarpit do?
This is very simple: Referer Tarpit is a php script that outputs random data _extremely slowly._ My current installation sends anything from 0.5 to 2 bytes/second, and does so for up to five minutes.
h3. Why does this work?
Many, if not most of the referer spam runs are done from a single machine, and not from botnets or similar.
Without having anything but anecdotal evidence, I suspect that most of these runs are done single-threaded: The spam script spams one URL, and then move on to the next. If some of them are multi-threaded and on a single machine, they will still have a limited number of threads.
This means that every time the spammers hit a tarpit, the particular spamming thread will be delayed for up to five minutes. Which means that the tarpit is saving _others_ from being spammed – not yourself.
h3. How to use
In its current incarnation, Referer Tarpit requires active maintenance, “mod_rewrite”:http://httpd.apache.org/docs/mod/mod_rewrite and “PHP”:http://www.php.net/
You start by creating a rewrite rule in your site-level .htaccess file:
bc.. RewriteEngine on
RewriteCond %{HTTP_REFERER} ^http://.*(doobu|highprofitclub)\.com/.*$ [NC]
RewriteRule !(installdirectory/index\.php)$ http://yoursite.example.com/installdirectory/index.php [R=302,L]
p. Ensure that the regular expression in RewriteCond matches the referer spammers you want to slow down. The ones in this sample are currently doing some spam runs.
You will need to replace yoursite.example.com and installdirectory with your server and the folder you have chosen to install Referer Tarpit to.
If you are using a statistics tool, such as “AWStats”:http://awstats.sourceforge.net/, you might want to set up your tool to ignore accesses to the tarpit script.
h3. Who should not use Referer Tarpit?
Unless you understand the purpose and implications of running a script like this, you should not use it.
* If you are being heavily referer spammed by more than a couple of sources, this script might cause load problems
* Unless you plan on actively monitoring your referer spam, you should not use it.
* You should not use it on all your referer spammers. Don’t add the entire master MT-Blacklist. (Unless you want to risk killing your server, of course).
h3. Download
Download the latest “Referer Tarpit”:http://virtuelvis.com/download/2005/03/tarpit/tarpit.zip (current version: 0.1)
Use freely, and if you have any improvements, feel free to post them back here.
h3. Planned features
* Limit number of concurrent running tarpits.
* Add different tarpitting methods.

3 Comments

  1. I’ve never really paid much attention to my referrer logs. I did have some simple statistics calculated for a few of my pages, including referrer, but that was over five years ago, and the statistics were never open to the public anyway, so I couldn’t really care less if they try to spam my logs (I _have_ seen such attempts being made, though).
    I do regularly check for access errors, though, and find a lot of accesses to things that aren’t there (people trying to hack using commonly known weaknesses in various software). Those “visits” almost exclusively come from Brazil, but I’ve never checked where the referrer spammers come from.
    Of course, it’s probably a lot of open proxies being used, but it’s a bit curious that I get so much weird accesses from one single country…

  2. The referer spam comes from “all over”. “Spam Huntress”:http://www.spamhuntress.com/ is a good resource for those who want to follow their trails.

  3. Ouch.

    The flooding approach really doesn’t work. One request just used 143MB of (my) bandwidth before giving up. I’m going to switch t…