I create software for fun and money.

Systemd hangs on shutdown

Sun, Jan 23 2022

Problem description

I've been experiencing seemingly random hangs during shutdown and reboot sequences. My computer would show a black screen and a blinking underscore cursor in the upper left corner of the screen.

These hangs would take approximately 90 seconds, which is the default waiting time between SIGTERM and SIGKILL signals sent by systemd to processes during shutdown. After those 90 seconds, the computer would proceed with the shutdown.

Pinpointing the problem

After inspecting the output of journalctl -r, I've found one and a half minutes gap between these two entries:

Jul 17 14:08:48 arch systemd[565]: Stopped target Main User Target.
Jul 17 14:07:20 arch systemd[565]: pulseaudio.service: Succeeded.

Interesting. But let's dig deeper.

I started by adding systemd debug options to the grub configuration file located at /etc/default/grub:

GRUB_CMDLINE_LINUX="systemd.log_level=debug systemd.log_target=kmsg log_buf_len=1M printk.devkmsg=on enforcing=0"

Then I had to update the grub config:

grub-mkconfig -o /boot/grub/grub.cfg

I needed a script to output the dmesg log to a file. I called it

/usr/lib/systemd/system-shutdown/debug.sh

and it contained these lines:

#!/bin/sh
mount -o remount,rw /
dmesg > /shutdown-log.txt
mount -o remount,ro /

To make it executable, I ran

chmod +x /usr/lib/systemd/system-shutdown/debug.sh

I then rebooted my computer and inspected the ~/shutdown-log.txt file for entries containing "timeout". In my case, the offender was openvpn:

[11500.333742] systemd[1]: systemd-networkd.service: Got notification message from PID 291 (WATCHDOG=1)
[11524.641662] systemd[1]: session-1.scope: Stopping timed out. Killing.
[11524.641773] systemd[1]: session-1.scope: Killing process 18930 (sudo) with signal SIGKILL.
[11524.641815] systemd[1]: session-1.scope: Killing process 18956 (openvpn) with signal SIGKILL.
[11524.641900] systemd[1]: session-1.scope: Failed with result 'timeout'.
[11524.641912] systemd[1]: session-1.scope changed stop-sigterm -> failed

Solution

Since I don't care about why the openvpn process doesn't shutdown gracefully, I can just decrease the period between SIGTERM and SIGKILL signals sent by systemd.

I did that by editing the systemd configuration file located at /etc/systemd/system.conf.

I looked for the for the DefaultTimeoutStopSec entry in that file, un-commented it, and set the value to 3s. After rebooting the computer, the long hangs were fixed.