阅读背景:

drm。ko丢失了CUDA 6.5 / Ubuntu 14.04 / AWS EC2 GPU实例g2.2xlarge。

来源:互联网 

To install CUDA 6.5 on Ubuntu 14.04.1 LTS on AWS EC2 g2.2xlarge instance, whether I install via the .deb file or .run file

要在AWS EC2 g2.2 .2xlarge实例上安装CUDA 6.5,可以通过.deb文件或.run文件安装。

.sudo ./cuda_6.5.14_linux_64.run --kernel-source-path=/usr/src/linux-headers-3.13.0-34-generic

I always get the same error about a missing drm.ko. The code compilation seems successful. Below was the log. (I rebooted before installing)

我总是会犯同样的错误。代码编译似乎很成功。下面是日志。(我之前重启安装)

Kernel module compilation complete.

内核模块编译完成。

Unable to determine if Secure Boot is enabled: No such file or directory

无法确定是否启用了安全启动:没有这样的文件或目录。

Kernel module load error: No such file or directory

内核模块加载错误:没有这样的文件或目录。

Kernel messages:

内核消息:

[ 3.595939] type=1400 audit(1408809902.911:5): apparmor="STATUS"

mso - fareast - font - family:宋体;mso - bidi - font - family: " times new roman "; mso - bidi - theme - font:

operation="profile_replace" profile="unconfined"

操作= " profile_replace " profile =“无限制”

name="/usr/lib/NetworkManager/nm-dhcp-client.action" pid=492

name = " / usr / lib /使/ nm-dhcp-client。行动”pid = 492

comm="apparmor_parser"

通讯= " apparmor_parser "

[ 3.595942] type=1400 audit(1408809902.911:6): apparmor="STATUS"

mso - fareast - font - family:宋体;mso - bidi - font - family: " times new roman "; mso - bidi - theme - font:

operation="profile_replace" profile="unconfined"

操作= " profile_replace " profile =“无限制”

name="/usr/lib/connman/scripts/dhclient-script" pid=492

name = " / usr / lib / connman /脚本/ dhclient-script”pid = 492

comm="apparmor_parser"

通讯= " apparmor_parser "

[ 3.596140] type=1400 audit(1408809902.915:7): apparmor="STATUS"

mso - fareast - font - family:宋体;mso - bidi - font - family: " times new roman "; mso - bidi - theme - font:

operation="profile_replace" profile="unconfined"

操作= " profile_replace " profile =“无限制”

operation="profile_replace" profile="unconfined"

操作= " profile_replace " profile =“无限制”

name="/usr/lib/connman/scripts/dhclient-script" pid=492

name = " / usr / lib / connman /脚本/ dhclient-script”pid = 492

comm="apparmor_parser"

通讯= " apparmor_parser "

[ 4.696067] init: failsafe main process (833) killed by TERM signal

[4.696067]init:故障保险主要过程(833)被信号

[ 4.793261] type=1400 audit(1408809904.107:8): apparmor="STATUS"

[4.793261]type=1400审计(1408809904.107:8):mor="STATUS"

operation="profile_replace" profile="unconfined" name="/sbin/dhclient"

操作= " profile_replace "概要= "无侧限" name = " / sbin / dhclient "

pid=952 comm="apparmor_parser"

pid = 952通讯= " apparmor_parser "

[ 4.793267] type=1400 audit(1408809904.107:9): apparmor="STATUS"

[4.793267]type=1400审计(1408809904.107:9):设备="状态"

operation="profile_replace" profile="unconfined"

操作= " profile_replace " profile =“无限制”

name="/usr/lib/NetworkManager/nm-dhcp-client.action" pid=952

name = " / usr / lib /使/ nm-dhcp-client。行动”pid = 952

comm="apparmor_parser"

通讯= " apparmor_parser "

[ 5.036249] init: plymouth-upstart-bridge main process ended, respawning

[5.036249]初始化:plymouth-upstart-bridge主进程结束,respawning。

[ 6.589233] init: udev-fallback-graphics main process (1203) terminated

[6.589233]init:udev-fallback-graphics主要过程(1203)终止

with status 1

状态1

[ 136.367014] nvidia: module license 'NVIDIA' taints kernel.

[136.367014]nvidia:模块许可的nvidia的污染内核。

[ 136.367019] Disabling lock debugging due to kernel taint

[136.367019]由于内核污染,禁用锁调试。

[ 136.370281] nvidia: module verification failed: signature and/or

[136.370281]nvidia:模块验证失败:签名和/或。

required key missing - tainting kernel

缺少需要的关键——污染内核

[ 136.370383] nvidia: Unknown symbol drm_open (err 0)

[136.370383]nvidia:未知符号drm_open (err 0)

[ 136.370393] nvidia: Unknown symbol drm_poll (err 0)

[136.370393]nvidia:未知符号drm_poll (err 0)

[ 136.370404] nvidia: Unknown symbol drm_pci_init (err 0)

[136.370404]nvidia:未知符号drm_pci_init (err 0)

[ 136.370449] nvidia: Unknown symbol drm_gem_prime_handle_to_fd (err 0)

[136.370449]nvidia:未知符号drm_gem_prime_handle_to_fd (err 0)

[ 136.370462] nvidia: Unknown symbol drm_gem_private_object_init (err 0)

[136.370462]nvidia:未知符号drm_gem_private_object_init (err 0)

[ 136.370474] nvidia: Unknown symbol drm_gem_mmap (err 0)

[136.370474]nvidia:未知符号drm_gem_mmap (err 0)

[ 136.370478] nvidia: Unknown symbol drm_ioctl (err 0)

[136.370478]nvidia:未知符号drm_ioctl (err 0)

[ 136.370486] nvidia: Unknown symbol drm_gem_object_free (err 0)

[136.370486]nvidia:未知符号drm_gem_object_free (err 0)

[ 136.370496] nvidia: Unknown symbol drm_read (err 0)

[136.370496]nvidia:未知符号drm_read (err 0)

[ 136.370509] nvidia: Unknown symbol drm_gem_handle_create (err 0)

[136.370509]nvidia:未知符号drm_gem_handle_create (err 0)

[ 136.370515] nvidia: Unknown symbol drm_prime_pages_to_sg (err 0)

[136.370515]nvidia:未知符号drm_prime_pages_to_sg (err 0)

[ 136.370550] nvidia: Unknown symbol drm_pci_exit (err 0)

[136.370550]nvidia:未知符号drm_pci_exit (err 0)

[ 136.370563] nvidia: Unknown symbol drm_release (err 0)

[136.370563]nvidia:未知符号drm_release (err 0)

[ 136.370565] nvidia: Unknown symbol drm_gem_prime_export (err 0)

[136.370565]nvidia:未知符号drm_gem_prime_export (err 0)

The driver installation is unable to locate the kernel source. Please make sure that the kernel source packages are installed and set up correctly.

驱动程序安装无法定位内核源代码。请确保正确安装和设置了内核源代码包。

2 个解决方案

#1


9  

The error was caused by missing drm module required by NVIDIA driver. By default, Ubuntu AMI installs minimal generic Linux kernel(linux-image-virtual), which doesn't include drm module. To fix it, install the complete generic kernel linux-image-generic. Installing linux-image-extra-virtual would work as it is merely a transitional package to linux-image-generic. I would suggest install linux-generic to include both headers and image. To summarize:

该错误是由于NVIDIA驱动程序所要求的drm模块的缺失造成的。默认情况下,Ubuntu AMI安装了最小的通用Linux内核(Linux -image-virtual),它不包括drm模块。要修复它,安装完整的通用内核linux-image-generic。安装linux-image- virtual -virtual将会起作用,因为它只是linux-image-generic的一个过渡包。我建议安装linux-generic,包括头和图像。总结:

sudo apt-get install linux-generic

There is similar question asked on AWS forum

在AWS论坛上也有类似的问题。

#2


4  

Actually right after the fresh launch of the GPU instance, apt-get upgrade wanted to keep back 4 packages as linux-virtual, linux-image-virtual. I still installed them so that I got strictly nothing more to upgrade. (The fresh setup doesn't have previous nvidia or any nouveau drivers.)

实际上,在GPU实例的新发布之后,apt-get升级希望将4个包保留为linux-virtual, linux-image-virtual。我还是安装了它们,这样我就不会再升级了。(新的安装程序没有以前的nvidia或任何新开发的驱动程序。)

The thing is that linux-image-virtual is a lean build with no drm.ko. Just do

事实是,linux-image-virtual是一个没有drm的精益构建。只做

sudo apt-get install linux-image-extra-virtual

which contains drm.ko.

其中包含drm.ko。

Then go on installing CUDA with either the .deb or .run file.

然后使用.deb或.run文件继续安装CUDA。


分享到: