cumulus / postfix in the right vrf
Cumulus Linux is a network operating system. It is a switch, but it also runs Linux OS, allowing us to run our automation tools on it. We use it to automate the configuration of our network. A network where we use VRF (virtual routing and forwarding) to separate customer traffic. The presence of VRFs in the OS however means that we have to tell the daemons in which VRF to run. And sometimes it needs some tweaks, like in the case of postfix.
How the specify the VRF
You can ssh right into the Cumulus switch. And there you can use your regular tools, like ping and curl. But, if you want to end up in the right network, you have to tell the tools where.
These examples are on a Cumulus Linux 3.7:
# net show vrf
VRF Table
---------------- -----
CUSTOMERX 1001
CUSTOMERY 1002
mgmt 1020
...
If you want to ping to an IP in the CUSTOMERX network, you specify so:
# ping -I CUSTOMERX -c 1 -w 1 10.20.30.40
ping: Warning: source address might be selected on device other than mgmt.
PING 10.20.30.40 (10.20.30.40) from 10.5.83.22 CUSTOMERX: 56(84) bytes of data.
64 bytes from 10.20.30.40: icmp_seq=1 ttl=60 time=0.522 ms
If you specify no VRF or the wrong one, you’ll get no reply. If you want to run applications or services in the management VRF, you have to specify mgmt. This is likely the only place where you have direct access to internet.
If you log in, you’ll get the VRF from where you connected attached to your shell:
# ip vrf identify $$
mgmt
This makes sense, as the ssh daemon you’re connected to, is also in that VRF:
# ip vrf identify $(pidof sshd | tr ' ' '\n' | head -n1)
mgmt
Applications without native VRF support
You may have noticed that for ping you can specify a VRF using
-I interface
. Not all applications support that. For those
applications, you can run the command prefixed by a call to ip vrf
:
# nc 10.20.30.40 22 -v
10.20.30.40: inverse host lookup failed: Unknown host
^C
# ip vrf exec CUSTOMERX nc 10.20.30.40 22
SSH-2.0-OpenSSH_8.2p1 Ubuntu-4ubuntu0.1
^C
Starting daemons in the right VRF
Cumulus Linux 3, based on Debian Jessie, uses systemd as init system. Pid 1 will be spawning the daemons, and that means that they won’t start in the management VRF by default.
# ip vrf identify 1
(void)
They have made a nifty little systemd-generator that fixes so you can run services an appropriate VRF. For instance your ntp time daemon, which needs access to the internet:
cat /etc/vrf/systemd.conf
# Systemd-based services that are expected to be run in a VRF context.
#
# If changes are made to this file run systemctl daemon-reload
# to re-generate systemd files.
...
ntp
...
systemctl cat ntp@mgmt.service
# /etc/systemd/system/ntp@.service
# created by vrf generator
...
[Service]
Type=simple
ExecStart=/usr/sbin/ntpd -n -u ntp:ntp -g
Restart=on-failure
...
# /run/systemd/generator/ntp@.service.d/vrf.conf
# created by vrf generator
...
[Service]
ExecStart=
ExecStart=/bin/ip vrf exec %i /usr/sbin/ntpd -n -u ntp:ntp -g
As you can see, the ExecStart is prefixed with an
ip vrf exec mgmt
. So, instead of starting/enabling ntp.service
,
you start/enable ntp@mgmt.service
so it the time daemon runs in the
expected VRF.
Getting postfix in the right VRF
The postfix (mailer) init script in this particular distribution has an annoying quirk: it depends on itself.
cat /etc/init.d/postfix
#!/bin/sh -e
### BEGIN INIT INFO
# Provides: postfix mail-transport-agent
# Required-Start: $local_fs $remote_fs $syslog $named $network $time
# Required-Stop: $local_fs $remote_fs $syslog $named $network
# Should-Start: postgresql mysql clamav-daemon postgrey spamassassin saslauthd dovecot
# Should-Stop: postgresql mysql clamav-daemon postgrey spamassassin saslauthd dovecot
# Default-Start: 2 3 4 5
# Default-Stop: 0 1 6
# Short-Description: Postfix Mail Transport Agent
# Description: postfix is a Mail Transport agent
### END INIT INFO
...
The systemd-sysv-generator parses this, and generates this:
systemctl cat postfix.service
# /run/systemd/generator.late/postfix.service
# Automatically generated by systemd-sysv-generator
[Unit]
...
Before=mail-transport-agent.target shutdown.target
After=local-fs.target remote-fs.target ...
Wants=mail-transport-agent.target network-online.target
...
[Service]
...
ExecStart=/etc/init.d/postfix start
ExecStop=/etc/init.d/postfix stop
ExecReload=/etc/init.d/postfix reload
# /run/systemd/generator/postfix.service.d/50-postfix-$mail-transport-agent.conf
# Automatically generated by systemd-insserv-generator
[Unit]
Wants=mail-transport-agent.target
Before=mail-transport-agent.target
systemctl cat mail-transport-agent.target
# /lib/systemd/system/mail-transport-agent.target
...
# /run/systemd/generator/mail-transport-agent.target.d/50-hard-dependency-postfix-$mail-transport-agent.conf
# Automatically generated by systemd-insserv-generator
[Unit]
SourcePath=/etc/insserv.conf.d/postfix
Requires=postfix.service
That is, postfix.service
provides mail-transport-agent.target
(the
last snippet), but it also depends on it; through Wants and Before
options.
The Cumulus systemd VRF generator in turn generates this:
systemctl cat postfix@mgmt.service
# /etc/systemd/system/postfix@.service
# created by vrf generator
# Automatically generated by systemd-sysv-generator
[Unit]
...
Before=mail-transport-agent.target shutdown.target
After=local-fs.target remote-fs.target ...
Wants=mail-transport-agent.target network-online.target
...
[Service]
Environment=_SYSTEMCTL_SKIP_REDIRECT=true
...
ExecStart=/bin/ip vrf exec %i /etc/init.d/postfix start
ExecStop=/bin/ip vrf exec %i /etc/init.d/postfix stop
ExecReload=/bin/ip vrf exec %i /etc/init.d/postfix reload
...
Unfortunately, this means that postfix@mgmt.service
now depends on
postfix.service
(the version with an unspecified VRF). And that it
is likely that starting postfix.service
will cause
postfix@mgmt.service
to fail — because the competing postfix is
already running:
# LC_ALL=C systemctl list-dependencies postfix@mgmt.service |
grep -E 'postfix|mail-transport'
postfix@mgmt.service
* |-system-postfix.slice
* |-mail-transport-agent.target
* | `-postfix.service
Depending on luck or your configuration, you might get the right process started, but also see the other process as failed:
# LC_ALL=C systemctl list-units --failed
UNIT LOAD ACTIVE SUB DESCRIPTION
* postfix.service loaded failed failed LSB: Postfix Mail Transport Agent
But we don’t want to fix this with luck. Fix it by ensuring that the non-VRF postfix startup never causes conflicts:
cat /etc/systemd/system/postfix.service.d/ignore.conf
[Service]
# Ensure dependencies on this do not conflict with the proper
# postfix@mgmt.service:
ExecStart=
ExecStop=
ExecReload=
ExecStart=/bin/true
With that in place, postfix now starts smoothly in the right VRF. And a test-mail should arrive smoothly:
# FROM=test@example.com && TO=yourself@example.com &&
printf 'Subject: test\r\nDate: %s\r\nFrom: %s\r\nTo: %s\r\n\r\ntest\r\n' \
"$(date -R)" "$FROM" "$TO" | /usr/sbin/sendmail -f "$FROM" "$TO"