Ken's Musings about Fighting Spam

Last updated: Monday, 28 March 2005 14:08 -0500

I receive more than one hundred (100) unsolicited commercial email messages per day. That can be a real time sink, so getting the computers to help me out is a fervently desired situation.

Update (2002-07-09): Running SpamAssassin for each and every incoming mail message ended up putting an immense load on my mail server. So I switched to using it in spamd mode with a line in my global procmailrc file, and spam identification continues just as effectively but at a much lower cost in resources. This also allowed me to enable it for the other people who use my mail server.
Update (2002-05-19): Since I started having SpamAssassin run interference on my mail on 2002-03-11, it has intercepted about 125MiB of bona fide spam dreckage. That's roughly sixty megabytes a month. Needless to say, this has saved me a lot of time..!

Why do I get so much spam?

A hundred messages a day seems like a lot of spam, doesn't it? (It does to me, too!) What could cause me to get so much?

Well, for one thing, my email address appears on the Internet and the Web. Rather a lot, in fact; if you check with Google, you'll find around ten thousand (10,000) references to my various email addresses. Most of those are from mailing list archives (I participate in a lot of mailing lists). In addition, I periodically send announcement messages to lists with large distributions (in the tens of thousands); some of the mailboxes on those lists are doubtless spammers.

The end result is that my email addresses get harvested from numerous places -- mostly hundreds of Web pages and thousands of mail messages, all accessible and archived online. If you would avoid my fate, conceal your email address! :-)

Delete, Block, or Fight

Until March of 2002, I just waded through each day's mail and deleted the ones that were obvious spam. (Since I use a graphical mail reader [not Outlook!], just the act of finding out they were spam doubtless activated a bunch of Web bugs and got my address on still more spam lists.) In essence, I just hadn't made the time to automate the process -- at all, much less to the maximum degree possible.

When it comes to spam, there are at least three ways of dealing with it:

These reactions can obviously be mixed and matched as the situation warrants. For the most part I use the first two.

Blocking at the Network Level

The MTA (Mail Transfer Agent, sendmail and its ilk) typically runs as a daemon, and may or may not have support for the tcp_wrappers interface. (sendmail does, actually, as of version 8.8!) If it does, though, you can use the same mechanism to block access to your mail server as is used by numerous other utilities -- define offending origins in the /etc/hosts.deny file. (See the documentation for the tcp_wrappers package for details.) This can be a goodness, because it gives you a single list of undesirable elements on the network.

This arguably is actually 'restricting at the MTA', described next, but I put it here because of the broader scope of the mechanism.

Restricting at the MTA

Using the check_rcpt and other rules inside the sendmail.cf file, I maintain a database of mail origins from which my mail server refuses to accept mail. I've divided it into three groups:

At the time of this writing, I'm blocking 17 users, 29 domains, and 37 networks. Some of these exclusion lists I've developed from my own experience, and some I've gotten from friends. In any event, there are 83 sources from which my mailbox will never get cluttered, and the number keeps growing.

Handling what gets through

Assuming that spam actually manages to get past the front line of defence and is accepted by my server, I use a quite cool tool to check each message for spam-ness: SpamAssassin. It took me a few hours to get it configured properly, but the end result is this:

  1. Each message incoming to my mailboxes is checked by SpamAssassin, which modifies the header with its assessment of spam-ness.
  2. If SpamAssassin doesn't regard it as spam, it gets delivered normally. If SpamAssassin does think it's spam, a detailed report of why is added to the message's header, and then the message is delivered normally. In addition, the spam is appended to an mbox file so I can see how much I get.
  3. My main MUA has a filter which checks for the SpamAssassin information in the message header. If the message is marked as spam, it gets filed in a special folder; otherwise, it gets processed as usual.

The end result of this is that obvious spam is filed separately, so I can check it for false positives and adjust my rules accordingly -- or just delete all of it in a single operation. Less obvious spam may get through to my inbox, but since I've started using this arrangement that number is much reduced, and the workload is acceptable.

Here's an example of the report that SpamAssassin adds to the message header (edited slightly for readability and to conceal/obscure some details you don't need to know):

From root@localhost  Mon Mar 11 09:03:49 2002
Received: from MERIDIAN.meridianalliance.com
	(www.meridianalliance.com [216.201.149.133] (may be forged))
	by Mail.MeepZor.Com with ESMTP id JAA22570
	for <ken.coar@meepzor.com>; Mon, 11 Mar 2002 09:03:45 -0500
Received: from smtp-gw-4.msn.com ([198.142.81.246])
	by MERIDIAN.meridianalliance.com with Microsoft SMTPSVC(5.0.2195.4453);
	 Sun, 10 Mar 2002 07:50:26 -0600
Message-ID: <00004fbe0d6b$000005c1$00000276@smtp-gw-4.msn.com>
To: <Important.Announcement@Mail.MeepZor.Com>
From: "Nicola Lockett" <5t4reg@msn.com>
Subject: Cash in on the Dropping Interest Rates!                 KHCUFP
Date: Sun, 10 Mar 2002 21:54:59 -0400
MIME-Version: 1.0
Content-Type: text/html;
	charset="iso-8859-1"
Content-Transfer-Encoding: quoted-printable
X-OriginalArrivalTime: 10 Mar 2002 13:50:27.0671 (UTC) FILETIME=[8916B670:01C1C83A]
X-Spam-Status: Yes, hits=16.7 required=5.0 tests=FROM_HAS_MIXED_NUMS,
	REPLY_TO_EMPTY,SUBJ_HAS_SPACES,PLING,MAY_BE_FORGED,CLICK_BELOW,
	EXCUSE_3,HTML_WITH_BGCOLOR,NORMAL_HTTP_TO_IP,A_HREF_TO_REMOVE,
	FREQ_SPAM_PHRASE,SPAM_PHRASES_020,CTYPE_JUST_HTML,SUBJ_HAS_UNIQ_ID
	version=2.01
X-Spam-Flag: YES
X-Spam-Checker-Version: SpamAssassin 2.01
	(devel $Id: SpamAssassin.pm,v 1.61 2002/01/25 04:41:02 jmason Exp $)
X-Spam-Report: Detailed Report
SPAM: -------------------- Start SpamAssassin results ----------------------
  SPAM: This mail is probably spam.  The original message has been altered
  SPAM: so you can recognise or block similar unwanted mail in future.
  SPAM: See http://spamassassin.org/tag/ for more details.
  SPAM: 
  SPAM: Content analysis details:   (16.67 hits, 5 required)
  SPAM: Hit! (1 point)     From: contains numbers mixed in with letters
  SPAM: Hit! (1.27 points) Reply-To: is empty
  SPAM: Hit! (1 point)     Subject contains lots of white space
  SPAM: Hit! (0.5 points)  Subject has an exclamation mark
  SPAM: Hit! (0.5 points)  'Received:' has 'may be forged' warning
  SPAM: Hit! (0.01 points) BODY: Asks you to click below
  SPAM: Hit! (1 point)     BODY: Claims you can be removed from the list
  SPAM: Hit! (1.2 points)  BODY: HTML mail with non-white background
  SPAM: Hit! (1 point)     BODY: Uses a dotted-decimal IP address in URL
  SPAM: Hit! (1.82 points) BODY: Link to a URL containing "remove"
  SPAM: Hit! (1.56 points) Contains phrases frequently found in spam
  SPAM:                    [score:  36, hits: click here, fill out, from]
  SPAM:                    [our, future mailings, here removed, list please,]
  SPAM:                    [mailing list, our mailing, please click, removed]
  SPAM:                    [from, you like, you need, you receive]
  SPAM: Hit! (1 point)     spam-phrase score is over 20
  SPAM: Hit! (3.33 points) HTML-only mail, with no text version
  SPAM: Hit! (1.48 points) Subject contains a unique ID number
  SPAM: 
  SPAM: -------------------- End of SpamAssassin results ---------------------
  

SpamAssassin has a lot of knobs and dials to allow you to tweak its operation.

Conclusion: Does It Work?

Since setting up the process described above, I am spending about a tenth of the time handling spam that I did previously. There's still some manual stuff to do, dealing with the spam that escapes the filter and the non-spam that gets caught, but I feel marvelously free of my spam-chains now.


coar