Articles November 15, 2006 at 7:22 am

Charting Spam

One of the big features of Tiger Server's mail server was the addition of spam filtering. It was a bit bumpy in the beginning, but once you fix the bayesian filtering things work quite nicely.

A question that always gets asked is, "How much spam am I blocking?" and there are several ways to tell. You can use a number of OSS monitoring solutions, comb the logs, or forward spam to a spam account. Reader filipp has submitted a nice solution to the problem that quickly creates a HTML chart of the spam totals. This way you can just glance at a web page and see how much you are blocking.

Read on for more…

One way to train the spam filter that comes with OS X Server (10.4) is by setting up two accounts – "junkmail" and "notjunkmail" and redirecting all spam and false positives to them accordingly. This is all documented on page 52 of the Mail Service manual. Since users' Mail clients are usually quite well trained, I also instruct them to create a rule to do just that for all the email their client considers spam, but hasn't been tagged as such by the server.

The manual also mentions that the redirected emails are analysed every night at 1 AM after which they should be discarded. To automate that, all we have to do is add the correct ipurge command to the crontab (I use /etc/crontab here but normally you would just edit cyrusimap's crontab).


MAILTO="[email protected]" PATH=$PATH:/usr/bin/cyrus/bin

min hour mday month wday who command

30 01 * * * cyrusimap ipurge -f -d 1 user/junkmail user/notjunkmail

I think these simple steps can go a long way in battling spam in a small business environment. One thing that's missing though, is any kind of overview of how much junk-mail we're actually processing. Preferably with some-sort of graphical representation. The MAILTO variable means that all the output of the ipurge command will be sent to the given address, usually the "postmaster" alias. This means we have all the necessary data and can generate the statistics on a remote machine.

I've chosen (what I think is) the most straight-forward approach by using AWK to generate a (partial) HTML file that displays the date of the processing, number of messages numerically and graphically and finally the total amount of messages. Although crude, this technique is very easy to use and doesn't depend on any extra software, except for Mail.app, which is assumed to be the mail client.

To run the script, I have to provide it with the directory with the email files and a name for the generated HTML file:

awk -f spamchart.awk of=test.html ~/Library/Mail/Mailboxes/Cron Jobs/mac.ee.mbox/Messages/*.emlx

The script itself is very simple, with most of the typing spent on CSS for the "bars". Please notice that the total message (per day) count is assumed to be on line 32 in the email. This should be fine for default setups, but must be changed accordingly in case your server adds addtional headers (or doesn't add the spam headers etc).

Here's a sample of the output. Having a graphical view of our spam, I can immediately see that the numbers have been climbing steadily since August of this year. I guess I better get back to work then…

 

No Comments

  • The “fix the bayesian spam filter” URL in this article goes into the admin area where us mere mortals cannot tread.

    Also, if you’ve got to go digging around anyway, I might suggest just installing ASSP instead. Although I’ll admit I haven’t seen SpamAssassin in a long time, so it is probably much improved since my last usage…

    At any rate, ASSP is about as cutting edge and effective as you can get, imho.

  • Has anyone else gotten this to work? I keep getting the following error:

     awk -f spamchart.awk of=spamchart.html ~/Library/Mail/Mailboxes/"Cron Jobs.mbox"/Messages/*.emlx
    awk: syntax error at source line 11 source file spamchart.awk
     context is
                            printf ("<div >>>  style="background: <<< 
    awk: illegal statement at source line 11 source file spamchart.awk
    awk: illegal statement at source line 11 source file spamchart.awk 

    I think it’s probably just a misplaced double quote, but I’m also not at all familiar with awk.

    • Yes, it seems it’s a quoting problem, the correct command should be:

      awk -f spamchart.awk of=spamchart.html ~/Library/Mail/Mailboxes/Cron\ Jobs.mbox/Messages/*.emlx

      You can then quickly open the generated file by:
      open spamchart.html

      Thanks for the feedback! I’ll fix the hint to clarify the quoting.


      -filipp

Leave a reply

You must be logged in to post a comment.