One of the big features of Tiger Server's mail server was the addition of spam filtering. It was a bit bumpy in the beginning, but once you fix the bayesian filtering things work quite nicely.
A question that always gets asked is, "How much spam am I blocking?" and there are several ways to tell. You can use a number of OSS monitoring solutions, comb the logs, or forward spam to a spam account. Reader filipp has submitted a nice solution to the problem that quickly creates a HTML chart of the spam totals. This way you can just glance at a web page and see how much you are blocking.
Read on for more…
One way to train the spam filter that comes with OS X Server (10.4) is by setting up two accounts – "junkmail" and "notjunkmail" and redirecting all spam and false positives to them accordingly. This is all documented on page 52 of the Mail Service manual. Since users' Mail clients are usually quite well trained, I also instruct them to create a rule to do just that for all the email their client considers spam, but hasn't been tagged as such by the server.
The manual also mentions that the redirected emails are analysed every night at 1 AM after which they should be discarded. To automate that, all we have to do is add the correct ipurge command to the crontab (I use /etc/crontab here but normally you would just edit cyrusimap's crontab).
MAILTO="[email protected]" PATH=$PATH:/usr/bin/cyrus/bin
# min hour mday month wday who command
30 01 * * * cyrusimap ipurge -f -d 1 user/junkmail user/notjunkmail
I think these simple steps can go a long way in battling spam in a small business environment. One thing that's missing though, is any kind of overview of how much junk-mail we're actually processing. Preferably with some-sort of graphical representation. The MAILTO variable means that all the output of the ipurge command will be sent to the given address, usually the "postmaster" alias. This means we have all the necessary data and can generate the statistics on a remote machine.
I've chosen (what I think is) the most straight-forward approach by using AWK to generate a (partial) HTML file that displays the date of the processing, number of messages numerically and graphically and finally the total amount of messages. Although crude, this technique is very easy to use and doesn't depend on any extra software, except for Mail.app, which is assumed to be the mail client.
To run the script, I have to provide it with the directory with the email files and a name for the generated HTML file:
awk -f spamchart.awk of=test.html ~/Library/Mail/Mailboxes/Cron Jobs/mac.ee.mbox/Messages/*.emlx
The script itself is very simple, with most of the typing spent on CSS for the "bars". Please notice that the total message (per day) count is assumed to be on line 32 in the email. This should be fine for default setups, but must be changed accordingly in case your server adds addtional headers (or doesn't add the spam headers etc).
Here's a sample of the output. Having a graphical view of our spam, I can immediately see that the numbers have been climbing steadily since August of this year. I guess I better get back to work then…