stale apparmor config / mysql refuses to start
So, recently we had an issue with a MariaDB server that refused to start. Or, actually, it would start, but before long, SystemD would kill it. But why?
# systemctl start mariadb.service
Job for mariadb.service failed because a timeout was exceeded.
See "systemctl status mariadb.service" and "journalctl -xe" for details.
After 90 seconds, it would be killed.
systemctl status mariadb.service
shows the immediate cause:
# systemctl status mariadb.service
...
systemd[1]: mariadb.service: Start operation timed out. Terminating.
systemd[1]: mariadb.service: Main process exited, code=killed, status=15/TERM
systemd[1]: mariadb.service: Failed with result 'timeout'.
Ok, a start operation timeout. That is caused by the notify
type:
apparently the mysqld
doesn't get a chance to tell SystemD that it
has succesfully completed startup.
First, a quickfix, so we can start at all:
# cat <<EOF >/etc/systemd/system/mariadb.service.d/override.conf
[Service]
Type=simple
EOF
That fixes so we can start — because now SystemD won't require for any "started" notification anymore — but it doesn't explain what is wrong.
Second, an attempt at debugging the cause:
# cat <<EOF >/etc/systemd/system/mariadb.service.d/override.conf
[Service]
NotifyAccess=all
ExecStart=
ExecStart=/usr/bin/strace -fesendmsg,sendto,connect,socket -s8192 \
/usr/sbin/mysqld $MYSQLD_OPTS
EOF
Okay, that one showed EACCESS
errors on the sendmsg()
call on the
/run/systemd/notify
unix socket:
strace[55081]: [pid 55084] socket(AF_UNIX, SOCK_DGRAM|SOCK_CLOEXEC, 0) = 46
strace[55081]: [pid 55084] sendmsg(46, {msg_name={sa_family=AF_UNIX,
sun_path="/run/systemd/notify"}, msg_namelen=22,
msg_iov=[{iov_base="READY=1\nSTATUS=Taking your SQL requests now...\n", iov_len=47}],
msg_iovlen=1, msg_controllen=0, msg_flags=0},
MSG_NOSIGNAL) = -1 EACCES (Permission denied)
Permission denied? But why?
# cat <<EOF >/etc/systemd/system/mariadb.service.d/override.conf
[Service]
NotifyAccess=all
ExecStart=
ExecStart=/usr/bin/strace -fesendmsg,sendto,connect,socket -s8192 \
/bin/sh -c 'printf "READY=1\nSTATUS=Taking your SQL requests now...\n" | \
socat - UNIX-SENDTO:/run/systemd/notify; sleep 3600
EOF
This worked:
strace[54926]: [pid 54931] socket(AF_UNIX, SOCK_DGRAM, 0) = 5
strace[54926]: [pid 54931] sendto(5,
"READY=1\nSTATUS=Taking your SQL requests now...\n", 47, 0,
{sa_family=AF_UNIX, sun_path="/run/systemd/notify"}, 21) = 47
(Unless someone is really trying to mess with you, you can regard
sendto()
and sendmsg()
as equivalent here. socat
simply uses the
other one.)
That means that there is nothing wrong with SystemD or
/run/systemd/notify
. So the problem must be related to
/usr/sbin/mysqld
.
After looking at journalctl -u mariadb.service
for the nth time, I
decided to peek at all of journalctl without any filters. And there
it was after all: audit logs.
# journalctl -t audit
audit[1428513]: AVC apparmor="DENIED" operation="sendmsg"
info="Failed name lookup - disconnected path" error=-13
profile="/usr/sbin/mysqld" name="run/systemd/notify" pid=1428513
comm="mysqld" requested_mask="w" denied_mask="w" fsuid=104 ouid=0
(Observe the -t
in the journalctl invocation above which looks for the
SYSLOG_IDENTIFIER=audit
key-value pair.)
Okay. And fixing it?
# aa-remove-unknown
Skipping profile in /etc/apparmor.d/disable: usr.sbin.mysqld
Skipping profile in /etc/apparmor.d/disable: usr.sbin.rsyslogd
Removing '/usr/sbin/mysqld'
A-ha! Stale cruft in /var/cache/apparmor
.
# /etc/init.d/apparmor restart
Restarting apparmor (via systemctl): apparmor.service.
Finally we could undo the override.conf
and everything started working
as expected.