Enabling Bayesian filter with amavisd-new + Spamassassin

spamassassin amavis

11,647

I have recently upgraded the mail server to a new CentOS 7 server (from CentOS 6) that is doing nothing but email and DNS. The previous CentOS 6 server was also running several websites on Apache.

I'm not sure what I'm doing differently that is causing bayes to show up in the Mail Headers as a test that was run, but it appears I have Bayes fully configured. Here's how I'm doing it:

As before, I'm running Postfix with amavisd-new being used as the primary Virus and Spam scanner. amavisd-new then hands off to spamassassin.

Postfix is version 2.10.1 from the CentOS Plus repository, amavisd-new is version 2.10.1 from the EPEL repository, and spamassassin is version 3.4.0 from the base repository.

spamassassin's config file is as follows:

[root@mail ~]# cat /etc/mail/spamassassin/local.cf 

    required_hits       5
    report_safe         0
    rewrite_header Subject [SPAM]
    razor_config /etc/mail/spamassassin/.razor/razor-agent.conf
    use_bayes       1
    bayes_path /var/spamassassin/bayes
    bayes_file_mode     077
    auto_learn      0
    use_razor2      1

And now, my mail headers do indicate the bayes test is being run:

X-Virus-Scanned: amavisd-new at developcents.com
X-Spam-Flag: NO
X-Spam-Score: 5.129
X-Spam-Level: *****
X-Spam-Status: No, score=5.129 tagged_above=-999 required=6.2
    tests=[BAYES_99=3.5, BAYES_999=0.2, DKIM_SIGNED=0.1, DKIM_VALID=-0.1,
    DKIM_VALID_AU=-0.1, HTML_FONT_LOW_CONTRAST=0.001, HTML_MESSAGE=0.001,
    MIME_HTML_ONLY=0.723, MIME_QP_LONG_LINE=0.001, RDNS_NONE=0.793,
    SPF_PASS=-0.001, T_REMOTE_IMAGE=0.01, URIBL_BLOCKED=0.001]
    autolearn=no autolearn_force=no

Unfortunately, I'm still trying to get a handle on the spam, as most of the messages are still coming in under the radar (with a score of 6.1 or lower), but I'm making a lot of progress.

For what it's worth, this is slightly off topic, but I recommend the following RBLs in Postfix main.cf as a part of the smtpd_recipient_restrictions definition (note that you'll need to register before you can use a couple of these lists):

reject_rbl_client zen.spamhaus.org,
reject_rbl_client bl.spamcop.net,
reject_rbl_client b.barracudacentral.org,
reject_rbl_client dnsbl.sorbs.net,
reject_rbl_client cbl.abuseat.org,
reject_rbl_client dnsbl-1.uceprotect.net,
reject_rbl_client dnsbl-3.uceprotect.net,

Hope this helps someone.

11,647

David W

Updated on September 18, 2022

Comments

David W almost 2 years
I run a Postfix mailserver on CentOS, and am trying to enable Spamassassin's bayes filter, but I seem to be missing something.

We're running amavisd-new 2.9.1:
```
Name        : amavisd-new
Arch        : noarch
Version     : 2.9.1
Release     : 2.el6
Size        : 3.0 M
Repo        : installed
From repo   : epel
```
.... with Spamassassin 3.3.1:
```
Installed Packages
Name        : spamassassin
Arch        : x86_64
Version     : 3.3.1
Release     : 3.el6
Size        : 3.1 M
Repo        : installed
From repo   : updates
```
From what I can tell, my only spamassassin config files are located in /etc/mail/spamassassin.

The local.cf file in this directory contains the following:
```
# These values can be overridden by editing ~/.spamassassin/user_prefs.cf
# (see spamassassin(1) for details)

# These should be safe assumptions and allow for simple visual sifting
# without risking lost emails.

required_hits 5
report_safe 0
rewrite_header Subject [SPAM]
use_bayes 1
bayes_auto_learn 1
bayes_auto_expire 0
bayes_path /var/amavis/var/.spamassassin/
```
amavisd.conf is located in /etc/amavisd/, and I think I've included all of the configurations I need to in order to turn spamassassin "on" but I'm not positive.

Some websites I've read indicate that the bayesian filter needs to be trained on 100 messages (for both spam and non-spam messages) using sa-learn, but I've seen at least 1 website indicating the filter needs to be trained on 200 messages. That said, I can confirm I've trained the filter on at least 100 spam messages.

So now, whenever I receive an email, after training the filter on these 100 spam messages, I'm still seeing no indication in the mail headers that the baysian filter is being used:
```
X-Virus-Scanned: amavisd-new at developcents.com
X-Spam-Flag: NO
X-Spam-Score: -0.525
X-Spam-Level:
X-Spam-Status: No, score=-0.525 tagged_above=-999 required=4
    tests=[HK_RANDOM_FROM=1, HTML_MESSAGE=0.001, RP_MATCHES_RCVD=-2.499,
    SPF_SOFTFAIL=0.972, URIBL_BLOCKED=0.001] autolearn=unavailable
```
Even if bayes isn't fully trained and ready to be "used" yet, shouldn't I be seeing a tag in the X-Spam-Status section that indicates whether or not it's using the Bayes filter?

(For what its worth, the email for which I've posted the partial mail header above, was spam, and obviously didn't get marked as such)

Is there something I'm missing?