Optimize Spamassassin Detection Rate

Jan 1, 2017 · 664 words · 4 minute read

Email is a terrible way to communicate and should be avoided where possible. Unfortunately it is also the lowest common denominator on the web and will continue to be for the near future.

In the early days of the internet it was easy to run your own mailserver. Due to the absurd quantity of spam this task got increasingly harder and many tech-savvy people gave up and switched to Gmail or other services. This is a pity because a decentralized email infrastructure is harder to surveil, subpoena or shut down. I encourage everyone to run their own mail service if possible.

In this guide I will summarize the steps needed to get an effective spamassassin (SA) setup.

Find missing perl modules 🔗

SA will skip functionality unless all dependencies are installed. Run spamassassin -D –lint 2>&1 | grep -i failed and add whatever is missing.

Install a local Nameserver 🔗

Many blacklists work via DNS lookups and operate on a “some for free” model, giving smaller servers a certain number of free requests. When using a shared nameserver, you will quickly run out of free requests and get blocked.

Use the Checksum Blacklists 🔗

Partial checksums of emails seem to be a very effective way to detect spam. SA supports DCC, Razor and Pyzor as main systems. There is also one provided by the German IT magazine iX. You can compare the efficiency of each system here.

Pyzor has a package available for Debian, so it’s very easy to install.
DCC needs to be compiled, but the effort is worth it.
iX Hash is a custom Perl module. It just needs to be dumped in the main config folder, usually /etc/spamassassin

Once installed, you should get hits based on checksums or errors in your logs.

Use Bayesian Filter 🔗

This filter needs some time to train, but is very effective on new spam, not covered by blacklists yet. You can train it on existing, sorted email or let it learn over time.

For performance, use a SQL database as backend.

# Enable the Bayes system                                                                                                                                                                                   use_bayes               1
use_bayes_rules         1
bayes_auto_learn        1
bayes_auto_expire  1

bayes_store_module              Mail::SpamAssassin::BayesStore::MySQL
bayes_sql_dsn                   DBI:mysql:spama_bayes:localhost
bayes_sql_username              spamassassin       
bayes_sql_password              XXXXXX                                                                                                                                                         bayes_sql_password              WiqsqkzbdUs1sORqp6xC
bayes_sql_override_username     mail

Use a moving average spam score 🔗

SA has a feature called “Auto Whitelist”. The name is a bit misleading. What it does is to normalize the score of a sender/IP combination over time. So “safe” senders will be less likely to be flagged as spam, even if they send one abnormal email.

# Use moving average of scores (Auto Whitelist)                                                                                                                                                             use_auto_whitelist      1
auto_whitelist_distinguish_signed   1
auto_whitelist_factory       Mail::SpamAssassin::SQLBasedAddrList
user_awl_dsn                 dbi:mysql:spama_awl:localhost
user_awl_sql_username        spamassassin
user_awl_sql_password        WiqsqkzbdUs1sORqp6xC
user_awl_sql_table           awl
user_awl_sql_override_username  vmail

Use additional Blacklists 🔗

SA comes with a number of pre-configured blacklists, but adding a few may help. Intra2Net has a nice list comparison of their failure rate and SA snippets for quick installation. Just drop them in your local configuration files. E.g.

# https://www.intra2net.com/en/support/antispam/blacklist.php_dnsbl=RCVD_IN_GBUDB.html                                                                                                                      header      RCVD_IN_GBUDB   eval:check_rbl('gbudb', 'truncate.gbudb.net.', '127.0.0.2')
describe    RCVD_IN_GBUDB   Listed in truncate.gbudb.net
tflags      RCVD_IN_GBUDB   net
score       RCVD_IN_GBUDB   6

Use a Virus Scanner 🔗

Phishing, Cryptolockers and other malware arriving by email is a big issue for users. Most will get detected by SA, but some attachments may slip through. You should use ClamAV as a bare minimum. Using an updated commercial scanner will improve your detection rate. Linux virus scanners are mostly bad and unstable, so choose the least-bad one depending on your budget. It should also run in demonized mode to avoid loading all the definitions for every single scan.

Conclusion 🔗

Implementing these steps should help you detect the vast majority of spam emails. As a last step you can fine-tune SA’s weights to your environment, if you notice a pattern of false negatives.

score URIBL_RED 4
score SPF_FAIL 3
score SPF_SOFTFAIL 3
score RAZOR2_CF_RANGE_E8_51_100 2.5
score RAZOR2_CF_RANGE_51_100 2.5
score DCC_CHECK 3

Or alternatively define your own checks for very specific spam:

# Custom rules
header FAKE_INVOICE_2014_SUBJECT    Subject =~ /\bRechnungOnline Monat\b/i
score FAKE_INVOICE_2014_SUBJECT 6

I hope this guide is useful and makes running your own mailserver easier.