AFP548 - Topic: Training SpamAssassin

This topic has 7 replies, 5 voices, and was last updated 19 years, 4 months ago by maccanada.

Viewing 6 posts - 1 through 6 (of 6 total)

Author

Posts
April 7, 2003 at 3:15 pm #355464

Bill Eccles
Participant

Gentleones,

Anyone have a clue how to train SpamAssassin for the following server configuration?

Here’s the deal: my primary domain server (in Indiana) receives mail for me and redirects it to my secondary server (in Connecticut) where it is Exim’d, SpamAssassined, Exim’d and redirected back via AMS to the primary server in Indiana. (Long story short: the primary server uses WebSTAR 5 and it doesn’t do jack except rule-based spam filtering, so it redirects all of the mail to my OSX Server server where the full weight and might of SA can stun and awe the spam.)

Eventually, I POP into the primary server and retrieve it all, spam included. (I’m not really ready to bounce anything–I’ve been getting spam with scores as low as 2.5 which is on par with stuff real people send me–hence the desire to train it.) Entourage takes stuff with “**** SPAM…” and files it away for me on my non-server Mac. Lately, I’ve been getting LOTS of spam that barely rates a score…. Sad, really.

So, that presents me with two challenges/questions:

1) How do I get the batch of E-mails, presuming I can get them out as a bunch of text files or something, to the right place on the OSXS server? And where do they go? Or maybe I can forward them to some useful account on the server? And how do I tell SA to learn what’s good and bad, misidentified, etc., if it’s a file or AMS E-mail? (As best I can tell, AMS is incompatible with the SA training because it keeps a closed mail database.)

2) Do I have to clean out all the SpamAssassin-inserted lines detailing the rules report before training, or is the SA training function smart enough to ignore that stuff?

3) Is Bayesian filtering on–and present, come to think of it–automatically in SA? (Just making sure I’m not barking up the wrong tree.)

Thanks, all,
Bill

April 7, 2003 at 6:58 pm #355466

Anonymous
Participant

I’ll have to check on the version number, but have a look at <http://spamassassin.org/doc/Mail_SpamAssassin_Conf.html>–maybe I’m misreading it, but it looks like it’s learning by default. (A “naive-Bayesian-style” filter…?)

Thanks,
Bill

April 8, 2003 at 11:53 pm #355471

Bill Eccles
Participant

I just discovered (much to my dismay) that I ain’t been running Bayesian filtering at all. Whoops!

I discovered this while upgrading to the latest SA and read this in the INSTALL blurb:

[quote:e34c53af9c]
Optional Additional Modules
—————————

In addition, the following modules will be used for some checks, if
available. If they are not available, SpamAssassin will still work, just
not as effectively — some of the spam-detection tests will have to be
skipped.

– DB_File (from CPAN)

Used to store data on-disk, for the Bayes-style logic and
auto-whitelist. *Much* more efficient than the other standard Perl
database packages. Strongly recommended.

perl -MCPAN -e shell
o conf prerequisites_policy ask
install DB_File
quit
[/quote:e34c53af9c]
Sorta’ says it all, doesn’t it?

So I’ll “turn it on” and see what happens.

Bill

April 11, 2003 at 11:10 am #355479

Bill Eccles
Participant

Joel,

Made a few discoveries in the past few days, some of which might bear mentioning in the article series.

First, I discovered that the -D mode for spamd is invaluable! You might want to mention it at some point. Though I can’t figure out the best way to restart the spamd daemon, daemonize it, and ensure that it’s running in the right userid and group after I’m done -Debugging. I usually end up rebooting to make sure I’ve done everything right.

Second, I discovered that in order to make the Bayesian filtering “turn on,” I had to install the DB module mentioned above. That works as advertised, though I’m fairly certain that it fails some tests or something heinous like that. But then you have to edit the SA local.cf to point to a directory where it can throw its database. I threw that into a subdirectory next to the local.cf file.

Thought I had it… but didn’t. Turns out I didn’t have group and owner set exim:wheel (remember, that’s what SA changes its UID to) for this directory or, for that matter, the whitelist database directory. I think that if I were to do it again, I should have put the Bayesian data into the same directory as the whitelist directory. Now both autowhitelists and Bayesian filtering are working. (Well, it reports that it’s keeping tabs on the Bayesian data, increments the counters, but hasn’t started using it yet. That won’t take long with all the spam I get….)

Third, I made Razor2 work! This is the most useful test I have in my arsenal now. However, it didn’t install as advertised, though it still works. If you follow the instructions in the Razor2 INSTALL file, you can only get it to setup the client and the links to the client. That’s enough to make SA happy, but won’t allow you to run the razor-admin app which is required to submit your own stuff to the database.

(An aside: Of the 17 spams I received in the past hour or so, all were listed in Razor2. It’s not the only check that generated points, but since I am suspecting that none of my legit E-mails will ever be listed in Razor2, I am tempted to up its score from 2 to 5 nearly guaranteeing that it will be identified as spam.)

I couldn’t get the dbb to work at all–bad news all the way ’round as it wouldn’t install. And I didn’t try Pyzor after discovering that Razor2 now uses fuzzy iddentifier comparison.

What I do want to do now is set up an Exim router that detects the presence of the Razor2 test and bounces mail with a suitable “Sorry. Bad address”-type message, keeping a copy in a local spam file for Bayesian training. Don’t have a clue how to do that. Can you lend a hand? (I’m using AMS, so that file will have to be Exim-generated or I’ll have to forward it to a local user but somehow still need to get the spam out of the database and into a file to train using sa-learn.)

I do have a little cleaning up to do at this point. Razor2 still wants a local configuration file, though it seems to work well without.

Thanks for your help,
Bill

December 12, 2005 at 4:56 pm #364392

gw1500se
Participant

We’ve just upgraded to Tiger and I am Spamassassin illiterate. Can someone point me to a cookbook for setting up Spamassassin learning. We are running OD and users get their mail via IMAP or POP. We are using the Server Admin GUI to set Spamassassin now but that apparently has nothing in it for learning. I cannot reduce the hits any further without filtering ham but we are still getting too much spam slipping through. TIA.

December 12, 2005 at 7:58 pm #364394

maccanada
Participant

Take a look at my article for doing your own custom rules.
Also take a look thinbits’ post in this thread for getting the right database to be updated when the learn_junk_mail script is run.

Finally you’ll need to make sure the /private/etc/mail/spamassassin/learn_junk_mail script is getting run on a daily basis – best off using cron for that right now.

You should also turn up the spam/anti-virus log level in ServerAdmin – it will show exactly what is going on with each message.
Looking at the headers in your mail client will also show which rules were matched – you should see BAYES_XX when the Bayesian filtering is working.
We’re getting *maybe* 2 messages a week get through the filter (level set at 3).
Author

Posts

Viewing 6 posts - 1 through 6 (of 6 total)

You must be logged in to reply to this topic.

Comments are closed

Recent Forum Topics

Recent Comments