ubuntu bionic / crashing gdm / eglgetdisplay
After upgrading from Ubuntu 17.10 to Ubuntu 18.04, and rebooting, the GNOME Display Manager (gdm) went into a restart loop. No promised speed gains. Instead, I got an unusable desktop.
Being quick with CTRL+ALT+F3, I could enter my username and password in
the text console after a couple attempts — the gdm restart would
continuously steal console/tty focus — after which a
sudo systemctl stop gdm
was possible. This left me with a shell and
plenty of time to examine the situation.
The Xorg.0.log
(and syslog
) went a little like this:
/usr/lib/gdm3/gdm-x-session[1849]: X.Org X Server 1.19.6
/usr/lib/gdm3/gdm-x-session[1849]: Release Date: 2017-12-20
/usr/lib/gdm3/gdm-x-session[1849]: X Protocol Version 11, Revision 0
...
/usr/lib/gdm3/gdm-x-session[1849]: (II) Loading sub module "glamoregl"
/usr/lib/gdm3/gdm-x-session[1849]: (II) LoadModule: "glamoregl"
/usr/lib/gdm3/gdm-x-session[1849]: (II) Loading /usr/lib/xorg/modules/libglamoregl.so
/usr/lib/gdm3/gdm-x-session[1849]: (II) Module glamoregl: vendor="X.Org Foundation"
/usr/lib/gdm3/gdm-x-session[1849]: #011compiled for 1.19.6, module version = 1.0.0
/usr/lib/gdm3/gdm-x-session[1849]: #011ABI class: X.Org ANSI C Emulation, version 0.4
/usr/lib/gdm3/gdm-x-session[1849]: (II) glamor: OpenGL accelerated X.org driver based.
/usr/lib/gdm3/gdm-x-session[1849]: (EE) modeset(0): eglGetDisplay() failed
/usr/lib/gdm3/gdm-x-session[1849]: (EE) modeset(0): glamor initialization failed
...
gnome-session[1923]: X Error of failed request: BadValue (integer parameter out of range for operation)
gnome-session[1923]: Major opcode of failed request: 154 (GLX)
gnome-session[1923]: Minor opcode of failed request: 3 (X_GLXCreateContext)
gnome-session[1923]: Value in failed request: 0x0
gnome-session[1923]: Serial number of failed request: 19
gnome-session[1923]: Current serial number in output stream: 20
gnome-session[1923]: gnome-session-check-accelerated: GL Helper exited with code 256
gnome-session-c[1936]: eglGetDisplay() failed
gnome-session[1923]: gnome-session-check-accelerated: GLES Helper exited with code 256
...
gnome-session-c[1992]: eglGetDisplay() failed
gnome-session[1923]: gnome-session-check-accelerated: GLES Helper exited with code 256
gnome-session[1923]: gnome-session-binary[1923]: WARNING: software acceleration check failed: Child process exited with code 1
gnome-session[1923]: gnome-session-binary[1923]: CRITICAL: We failed, but the fail whale is dead. Sorry....
gnome-session-binary[1923]: WARNING: software acceleration check failed: Child process exited with code 1
gnome-session-binary[1923]: CRITICAL: We failed, but the fail whale is dead. Sorry....
at-spi-bus-launcher[1925]: XIO: fatal IO error 11 (Resource temporarily unavailable) on X server ":0"
...
/usr/lib/gdm3/gdm-x-session[1849]: (II) Server terminated successfully (0). Closing log file.
gdm3: Child process -1849 was already dead.
gdm3: Child process 1833 was already dead.
Luckily I had another system with a browser, and quickly found libglvnd0/libegl installed in Ubuntu 18.04 breaks graphics drivers and forces LLVMpipe driver on i915 systems (launchpad bug 1733136).
But I was on a system without any NVIDIA graphics card, so that couldn’t be my problem, right?
$ sudo lshw -c video
*-display
description: VGA compatible controller
product: Xeon E3-1200 v3 Processor Integrated Graphics Controller
vendor: Intel Corporation
...
configuration: driver=i915 latency=0
So I tried to figure out what this eglGetDisplay()
was and why it
would fail. I tried this adapted code I found on the internet:
$ cat >eglinfo.c <<EOF
#include <X11/Xlib.h>
#include <stdio.h>
#include <EGL/egl.h>
int main(int argc, char *argv[])
{
Display *xlib_dpy = XOpenDisplay(NULL);
// there is no DISPLAY= so, XOpenDisplay will fail anyway
//if (!xlib_dpy) {
// return 1;
//}
int maj, min;
EGLDisplay d = eglGetDisplay(xlib_dpy);
if (!eglInitialize(d, &maj, &min)) {
printf("eglinfo: eglInitialize failed\n");
return 2;
}
return 0;
}
EOF
$ gcc -o eglinfo eglinfo.c -lX11 -lEGL
$ ./eglinfo
modprobe: ERROR: could not insert 'nvidia': No such device
eglinfo: eglInitialize failed
That eglInitialize failed
was to be expected, since we passed NULL
as display, but the “modprobe” was unexpected. Why on earth would it try
to load (only) NVIDIA modules on this machine with Intel graphics.
I went back to the bugreport above, and there it was: “It’s true that the first time you install the nvidia driver (and I mean via the PPA deb file), it tends to make /usr/lib/xorg/modules/extensions/libglx.so and /usr/lib/x86_64-linux-gnu/libGL.so point at the nvidia drivers, but this isn’t the case in my setup, which works fine in artful, just not in bionic.”
A-ha. So, the NVIDIA packages were possibly symlinking over other stuff. Since the NVIDIA card is gone, a purge of all things nvidia should be a good cleanup. And indeed, removing those packages showed things like:
$ dpkg -l | grep nvidia | awk '/^ii/{print$2}' | xargs sudo apt-get remove --purge
...
Removing 'diversion of /usr/lib/x86_64-linux-gnu/libGL.so.1 to /usr/lib/x86_64-linux-gnu/libGL.so.1.distrib by nvidia-340'
...
And now gdm started succesfully and everything ran smoothly.
Except for the following snag: compiling the sample code above was not possible anymore. How was that possible?
$ gcc -o eglinfo eglinfo.c -lX11 -lEGL
/usr/bin/ld: cannot find -lEGL
collect2: error: ld returned 1 exit status
$ grep ^Libs: /usr/lib/x86_64-linux-gnu/pkgconfig/egl.pc
Libs: -L${libdir} -lEGL
$ ls -l /usr/lib/x86_64-linux-gnu/libEGL.so
ls: cannot access '/usr/lib/x86_64-linux-gnu/libEGL.so': No such file or directory
You’d expect the libEGL.so
symlink to be in the same packages as that
egl.pc
, but no. But it wasn’t in libegl1-mesa-dev
. Instead, it was
created by libglvnd-dev
, but had since been raped by the nvidia
packages.
$ sudo apt-get install libglvnd-dev --reinstall
...
$ ls -l /usr/lib/x86_64-linux-gnu/libEGL.so
lrwxrwxrwx 1 root root 15 mrt 5 10:45 /usr/lib/x86_64-linux-gnu/libEGL.so -> libEGL.so.1.0.0
Good, now we’re back in business.