Optimize Spamassassin Detection Rate

Email is a terrible way to communicate and should be avoided where possible. Unfortunately it is also the lowest common denominator on the web and will continue to be for the near future.

In the early days of the internet it was easy to run your own mailserver. Due to the absurd quantity of spam this task got increasingly harder and many tech-savvy people gave up and switched to Gmail or other services. This is a pity because a decentralized email infrastructure is harder to surveil, subpoena or shut down. I encourage everyone to run their own mail service if possible.

In this guide I will summarize the steps needed to get an effective spamassassin (SA) setup.

Find missing perl modules

SA will skip functionality unless all dependencies are installed. Run  spamassassin -D --lint 2>&1 | grep -i failed and add whatever is missing.

Install a local Nameserver

Many blacklists work via DNS lookups and operate on a “some for free” model, giving smaller servers a certain number of free requests. When using a shared nameserver, you will quickly run out of free requests and get blocked.

Use the Checksum Blacklists

Partial checksums of emails seem to be a very effective way to detect spam. SA supports DCC, Razor and Pyzor as main systems. There is also one provided by the German IT magazine iX. You can compare the efficiency of each system here.

  • Pyzor has a package available for Debian, so it’s very easy to install.
  • DCC needs to be compiled, but the effort is worth it.
  • iX Hash is a custom Perl module. It just needs to be dumped in the main config folder, usually  /etc/spamassassin 

Once installed, you should get hits based on checksums or errors in your logs.

Use Bayesian Filter

This filter needs some time to train, but is very effective on new spam, not covered by blacklists yet. You can train it on existing, sorted email or let it learn over time.

For performance, use a SQL database as backend.

Use a moving average spam score

SA has a feature called “Auto Whitelist”. The name is a bit misleading. What it does is to normalize the score of a sender/IP combination over time. So “safe” senders will be less likely to be flagged as spam, even if they send one abnormal email.

Use additional Blacklists

SA comes with a number of pre-configured blacklists, but adding a few may help.  Intra2Net has a nice list comparison of their failure rate and SA snippets for quick installation. Just drop them in your local configuration files. E.g.

Use a Virus Scanner

Phishing, Cryptolockers and other malware arriving by email is a big issue for users. Most will get detected by SA, but some attachments may slip through. You should use ClamAV as a bare minimum. Using an updated commercial scanner will improve your detection rate. Linux virus scanners are mostly bad and unstable, so choose the least-bad one depending on your budget. It should also run in demonized mode to avoid loading all the definitions for every single scan.

Conclusion

Implementing these steps should help you detect the vast majority of spam emails. As a last step you can fine-tune SA’s weights to your environment, if you notice a pattern of false negatives.

Or alternatively define your own checks for very specific spam:

I hope this guide is useful and makes running your own mailserver easier.

Leave a Reply

Your email address will not be published.