Ask-a-dev: What about referrer spam?

You keep asking the questions—we keep answering them—in our latest round of ‘Ask-a-dev’.

Michael W. asks:

I see all the work on comment spam, how about referrer spam? The wonderful Referrer Karma tool for WordPress practically eliminates the problem, but we on Textpattern have nothing like this. There are only so many variations one can manually add to an htaccess file (mine is 230 lines so far) before its a losing battle of bandwidth. Can the spam plugin process work for these as well? It looked very tied to comment spam.

There is a user-centered and a technical answer to that question:

The user-cenered answer is: log statistics are not a vital and central feature of Textpattern CMS, whereas comments often are; Comments are usually publicly viewable and thus comment spam requires often immediate effort on the part of the publisher to remove those comments. Spam referrers on the other hand are only shown on the private log page viewable by the Site Publisher, so the only action one must take is to ignore those referers.

Also be aware that while it was always possible and very easy to write a plugin against referrer spam, it was more difficult to write anti-spam plugins and their scope would have been limited. That’s why we worked hard for 4.0.3 and improved the infrastructure for writing anti-spam plugins that tie in well with the site and require little effort to develop and use.

The techincal answer(s): Leaving a comment is an interactive process which is controled-by and specific-to the application that is used, i.e. Textpattern. So when comment spamming became a reality, it was obvious to expect the application to be able to do something about it.

However referrers are part of the standardized HTTP-Protocol, we can’t just go in there and change that to force a reload, ask a question, enter a captcha or do other user- or application-centric things. With referrers it’s a simple take-it-or-leave-it situation, it’s residue of normal web-traffic. They are provided as is, and you can only choose to ignore/block/filter them. That limits the possibilities a lot.

The plugin you mention is in fact just that, a black- and white-list. So the effort shifts from maintaining webserver-configuration files to maintaining applications specifis files. I guess the harder part of it, is deciding on the interface (what to allow the user to configure and what not). Anyway, it would be easy to develop such a plugin for Textpattern (and it always was), but apparently there is just not as much interest in it.

Also, because referrers are part of the HTTP-Protocol referrer spam can — and on some hosts already is — being taken care of at other levels, for example using mod_security at the webserver-level, or at the analysis-level in log-statistic-software. And if you follow the discussions around those blacklisting/rule-approaches, you’ll see that while it’s possible to catch a lot of the referrer spam, it’s almost inevitable to also generate a considerable rate of false positives, which is when real harm might be done (in comparison to ‘unwanted’ referrers in private logs, you could be affecting users).

So, if there is sufficient interest in something like this, I am sure a plugin will eventually be written. Textpattern CMS certainly makes it easy to do so.

Comments

  1. I’ve faced referrer spam before and it almost overwhellmed monthly transfer quota at one point and I was all set to write a bayesian filter mechanism until I realized I was overly complicating things. Instead, I just added a snippet of code to my index.php file and it takes care of it by blacklisting particular words.

    The result, referral spam has been reduced to a trickle that doesn’t overwhelm my connection.

    @$referer = $_SERVER[“HTTP_REFERER”];

    $keywords = array(‘taxes’, ‘slot-machine’, ‘slot-machines’, ‘skoob’, ‘lapozz.hu’, ‘alegra’, ‘protonix’, ‘skin-care’, ‘betting’, ‘blackjack’, ‘progressive’, ‘buy-’, ‘top.com’, ‘cheap’, ‘viagra’, ‘generic’, ‘alternative’, ‘online’, ‘carisoprodol’, ‘vinhas’, ‘pharmacy’, ‘canadian’, ‘alt.com’, ‘loans’, ’.mx’, ‘alt.com’, ‘adult’, ‘friend’, ‘female’, ‘porno’, ‘faxo’, ‘loan’, ‘xanax’, ‘stars’, ‘naked’, ‘pornstar’, ‘porn’, ‘payday’, ‘pay-day’, ‘pay.day’, ‘inform’, ‘chat’, ‘personal’, ‘date’, ‘laid’, ‘rates’, ‘xxx’, ‘warez’, ‘lowest’, ‘insurance’, ‘lend’, ‘consolid’, ‘host’, ‘valium’, ‘credit’, ‘vegas’, ‘propecia’, ‘xenical’, ‘report’, ‘rate’, ‘enlarge’, ‘accutane’, ‘financ’, ‘pr0n’, ‘cialis’, ‘ultima’, ‘windows’, ‘tax’, ‘mortage’, ‘casino’, ‘teen’, ‘pussy’, ‘free’, ‘gamble’, ‘horny’, ‘biz’, ‘poker’, ‘info’, ‘biz’, ‘refinance’, ‘finance’, ‘titties’, ‘rgasm’, ‘milf’, ‘anal’, ‘slut’, ‘roulette’, ‘diet’, ‘pills’, ‘prescription’, ‘learnhow’, ‘cash’, ‘credit’, ‘texas’, ‘credit’, ‘cheats’, ‘freaky’, ‘phentermine’, ‘weight’, ‘dvd’, ‘email’, ‘enhance’, ‘escort’, ‘uck’, ‘rate’, ‘fat’, ‘youth’, ‘price’, ‘stock’, ‘orny’, ‘hormone’, ‘master’, ‘card’, ‘visa’, ‘masturbation’, ‘money’, ‘virus’, ‘nigeria’, ‘bank’, ‘phone’, ‘cell’, ‘spam’, ‘psxtreme’, ‘mortage’, ‘crescentarian’, ‘party’, ‘mortgage’, ‘broke’, ‘money’, ‘diploma’, ‘6te’, ‘erospace’, ‘university’ );@ for($i=0; $i< count($keywords); $i ) { if(eregi($keywords[$i],$referer)) { header(“HTTP/1.0 404 Not Found”); exit( “Bad Nasty Nasty Referrer found.”); } }@
  2. Here’s a quick tip. You can deal with referral spam by having all logging turned off, and instead use a stat tracker that is JavaScript based, as referrers won’t be logged unless it’s a real client browser. I use Mint.

  3. @Eric:
    The problem with that sort of method is that you’re most likely going to be catching a lot of false positives. For instance, clicking on your link from your comment, I get that “Bad Nasty Nasty Referrer found.” message. I do realize that I can simply refresh the page and it will come up as it should, but I have a feeling that many people out there on the interweb don’t realize that, and will just go somewhere else instead of spending time at your site.

    I suppose that really the best method would be taking Nathan’s suggestion of using a JS based tracker, like Mint or Weed

    Or of course, you could just not care about stats. But who out there just doesn’t care? I think everyone takes note of their stats, at least a little bit.

Commenting has expired for this article.