Postfix service enabled but won't start on reboot

14,193

Solution 1

nss-lookup.target is one of the optional systemd special targets. It is essentially acting as a virtual milestone in the boot process. Specifically, any system services that affect DNS lookup (like lwresd probably does) should specify in their .service file that they need to start before nss-lookup.target:

[Unit]
Wants=nss-lookup.target
Before=nss-lookup.target

...and any services that need DNS resolution fully working before they can start can then specify that simply as:

[Unit]
After=nss-lookup.target

It might be that either the lwresd package did not quite follow this scheme, or that your custom distribution has an older version of systemd where there is some reason to use Requires= instead of Wants= which is the current recommended way.

When systemd detects an ordering conflict (an "ordering cycle") in its configuration, it resolves it in a very mechanical way, which is not always optimal in the grand scheme of things. In this case, the problem was in ordering between lwresd and nss-lookup.target, but Postfix got skipped in the process as collateral damage.

Since nss-lookup.target is optional, setting it as a Requires= dependency for Postfix is too strong and might cause problems... exactly like the one you had.

Solution 2

First of all, adding LogLevel=debug on /etc/systemd/system.conf provides useful log to understand what really happens on services at startup runtime. As mentioned in my updated question, there's was an order conflit with nss-lookup.target. By removing nss-lookup.target directly in /etc/systemd/system/postfix.service in both variables[Requires] and [After] the system was able to start postfix on boot process.

Share:
14,193

Related videos on Youtube

SAN ALexis
Author by

SAN ALexis

Updated on September 18, 2022

Comments

  • SAN ALexis
    SAN ALexis over 1 year

    The postfix service won't start after rebooting the system. We are using our own custom Linux OS based on OpenSuse 12.1 and recently verification departement has found that the /var repository is getting bigger and bigger due to unsend mail in the maildrop queue of postfix.

    Who send those mails ? We have some applications that share some logs between systems.

    I was looking over different kind of forum for my answer without any success.

    After rebbot :

    $systemctl status postfix.service
    postfix.service - Postfix Mail Transport Agent
              Loaded: loaded (/etc/systemd/system/postfix.service; enabled)
              Active: inactive (dead)
              CGroup: name=systemd:/system/postfix.service
    

    I found here Postfix doesn't start on reboot that the problem can be a possible conflict with sendmail. To make sure that sendmail program is well link with postfix :

    $ ldd /usr/sbin/sendmail
            linux-vdso.so.1 =>  (0x00007fffa25ff000)
            libpostfix-global.so.1 => /usr/lib64/libpostfix-global.so.1 (0x00007fdb174d2000)
            libpostfix-util.so.1 => /usr/lib64/libpostfix-util.so.1 (0x00007fdb1729a000)
            libc.so.6 => /lib64/libc.so.6 (0x00007fdb16f0a000)
            libdl.so.2 => /lib64/libdl.so.2 (0x00007fdb16d06000)
            libdb-4.8.so => /usr/lib64/libdb-4.8.so (0x00007fdb1698a000)
            libnsl.so.1 => /lib64/libnsl.so.1 (0x00007fdb16772000)
            /lib64/ld-linux-x86-64.so.2 (0x00007fdb1770b000)
            libpthread.so.0 => /lib64/libpthread.so.0 (0x00007fdb16555000)
    

    I also check in /var/log/messages and there's not any information about sendmail or even postfix failure.

    Let's start the service :

    $ systemctl start postfix.service
    $ systemctl status postfix.service
    postfix.service - Postfix Mail Transport Agent
              Loaded: loaded (/etc/systemd/system/postfix.service; enabled)
              Active: active (running) since Sat, 28 Jan 2017 21:49:37 +0300; 41s ago
             Process: 3450 ExecStartPost=/etc/postfix/system/cond_slp register (code=exited, status=0/SUCCESS)
             Process: 3444 ExecStartPost=/etc/postfix/system/wait_qmgr 60 (code=exited, status=0/SUCCESS)
             Process: 3368 ExecStart=/usr/sbin/postfix start (code=exited, status=0/SUCCESS)
             Process: 3366 ExecStartPre=/etc/postfix/system/update_chroot (code=exited, status=0/SUCCESS)
             Process: 3363 ExecStartPre=/bin/echo Starting mail service (Postfix) (code=exited, status=0/SUCCESS)
            Main PID: 3443 (master)
              CGroup: name=systemd:/system/postfix.service
                      ├ 3443 /usr/lib/postfix/master
                      ├ 3445 pickup -l -t fifo -u
                      ├ 3446 qmgr -l -t fifo -u
                      ├ 3447 cleanup -z -t unix -u
                      ├ 3466 trivial-rewrite -n rewrite -t unix -u
                      ├ 3467 local -t unix
                      ├ 3468 local -t unix
                      └ 3469 local -t unix
    

    Good. Reboot the system and the service turn to inactive(dead). Returning to /var/log I found this file mail.warn

    $ cat mail.info
    Jan 28 03:13:55 msx postfix/postfix-script[2527]: warning: not owned by group maildrop: /usr/sbin/postqueue
    Jan 28 03:13:55 msx postfix/postfix-script[2528]: warning: not owned by group maildrop: /usr/sbin/postdrop
    Jan 28 03:13:55 msx postfix/postfix-script[2530]: warning: not set-gid or not owner+group+world executable: /usr/sbin/postqueue
    Jan 28 03:13:55 msx postfix/postfix-script[2531]: warning: not set-gid or not owner+group+world executable: /usr/sbin/postdrop
    Jan 28 21:49:37 msx postfix/postfix-script[3430]: warning: not owned by group maildrop: /usr/sbin/postqueue
    Jan 28 21:49:37 msx postfix/postfix-script[3431]: warning: not owned by group maildrop: /usr/sbin/postdrop
    Jan 28 21:49:37 msx postfix/postfix-script[3434]: warning: not set-gid or not owner+group+world executable: /usr/sbin/postqueue
    Jan 28 21:49:37 msx postfix/postfix-script[3435]: warning: not set-gid or not owner+group+world executable: /usr/sbin/postdrop
    

    I don't know if this can't help to resolve my problem.

    Extra informations

    Inside the postfix.service file :

    $ cat /etc/systemd/system/postfix.service
    [Unit]
    Description=Postfix Mail Transport Agent
    Requires=var-run.mount nss-lookup.target network.target remote-fs.target syslog.target time-sync.target
    After=var-run.mount nss-lookup.target network.target remote-fs.target syslog.target time-sync.target
    After=amavis.service mysql.service cyrus.service ldap.service openslp.service ypbind.service
    Before=mail-transfer-agent.target
    Conflicts=sendmail.service exim.service
    
    [Service]
    Type=forking
    PIDFile=/var/spool/postfix/pid/master.pid
    ExecStartPre=-/bin/echo 'Starting mail service (Postfix)'
    EnvironmentFile=-/etc/sysconfig/postfix
    ExecStartPre=/etc/postfix/system/update_chroot
    ExecStart=/usr/sbin/postfix start
    ExecStartPost=/etc/postfix/system/wait_qmgr 60
    ExecStartPost=/etc/postfix/system/cond_slp register
    ExecReload=/usr/sbin/postfix reload
    ExecReload=/usr/sbin/postfix flush
    ExecStop=/usr/sbin/postfix stop
    ExecStopPost=/etc/postfix/system/cond_slp deregister
    
    [Install]
    WantedBy=multi-user.target
    

    List all services after rebooting without starting the postfix service.

    $systemctl list-unit-files --type=service
    ...
    klog.service              disabled
    klogd.service             masked
    ldconfig.service          masked
    loadmodules.service       masked
    local.service             static
    localfs.service           static
    openhpid.service          enabled
    postfix.service           enabled
    postgresql.service        static
    poweroff.service          static
    proc.service              masked
    ...
    

    Since we are running on OpenSuse 12.1 based OS, journalctl was not introduced yet into systemd

    UPDATE

    After set LogLevel=debug in /etc/systemd/system.conf, I was able to have more data related to this issue. In /var/log/messages/ I found this :

    Jan 31 19:17:00 msx kernel:   10.111126] systemd[1]: -.mount changed dead -> mounted
    Jan 31 19:17:00 msx kernel:   10.111147] systemd[1]: Activating default unit: default.target
    Jan 31 19:17:00 msx kernel:   10.111153] systemd[1]: Trying to enqueue job multi-user.target/start/replace
    Jan 31 19:17:00 msx kernel:   10.111204] systemd[1]: Cannot add dependency job for unit hpiwdt.service, ignoring: Unit hpiwdt.service failed to load: No such file or directory. See system logs and 'systemctl status hpiwdt.service' for details.
    Jan 31 19:17:00 msx kernel:   10.111276] systemd[1]: Found ordering cycle on lwresd.service/start
    Jan 31 19:17:00 msx kernel:   10.111279] systemd[1]: Walked on cycle path to nss-lookup.target/start
    Jan 31 19:17:00 msx kernel:   10.111281] systemd[1]: Walked on cycle path to lwresd.service/start
    Jan 31 19:17:00 msx kernel:   10.111284] systemd[1]: Breaking ordering cycle by deleting job nss-lookup.target/start
    Jan 31 19:17:00 msx kernel:   10.111286] systemd[1]: Deleting job postfix.service/start as dependency of job nss-lookup.target/start
    

    I don't even know what is or does nss-lookup.

    • shcherbak
      shcherbak about 7 years
      Is AppArmor disabled?
    • sebix
      sebix about 7 years
    • SAN ALexis
      SAN ALexis about 7 years
      Hi @shcherbak AppArmor is not running. I will edit my post. I have more data related to this issue when a set LogLevel from systemd with debug flag.