On systems with an NVIDIA GPU, a simple apt upgrade could leave you in a dreaded situation where the GPU still works, but the NVIDA tooling like nvidia-smi doesn’t work anymore. They just print an error message like

$ nvidia-smi
NVML: Driver/library version mismatch

This happens when apt upgrades the version of your tooling but the nvidia kernel modules were already in memory before, so they’re still running with the previous version. The usual approach is to reboot your machine. Sometimes this is not acceptable. In these cases, you can instead remove the now outdated kernel modules and load the updated version.

The solution

In most cases, running this sequence of commands is sufficient:

sudo rmmod nvidia_drm
sudo rmmod nvidia_modeset
sudo rmmod nvidia_uvm
sudo rmmod nvidia
nvidia-smi

The call to nvidia-smi reloads the required kernel modules automatically.

Troubleshooting

We cannot remove kernel modules if another module depends on it. We first need to remove all dependent modules. That’s why we remove nvidia_drm, nvidia_modeset, and nvidia_uvm first. If a module has additional dependents not considered in this article, its removal will fail. For example, if we tried to remove nvidia first, rmmod would print an error message. To find additional dependent modules of nvidia, run

lsmod | grep nvidia

Removing a module can also fail if a process is still using a device. In these cases, use lsof to get a list of these processes. For example, if the nvidia device plugin for Kubernetes prevents you from removing nvidua_uvm, you’ll find this out with

sudo lsof /dev/nvidia_uvm