基于docker下部署深度学习环境(GPU、tensorrt、tensorflow)
一、系统环境Ubuntu 18 4.15.0-159-generic二、显卡驱动安装1、查看显卡型号以及驱动#执行命令ubuntu-drivers devices#显示结果WARNING:root:_pkg_get_support nvidia-driver-390: package has invalid Support Legacyheader, cannot determine suppor
一、系统环境
Ubuntu 18 4.15.0-159-generic
二、显卡驱动安装
方法1、查看显卡型号以及驱动
有的显卡用这种方无法查到型号,只能用第2种方法
#执行命令
ubuntu-drivers devices
#显示结果
WARNING:root:_pkg_get_support nvidia-driver-390: package has invalid Support Legacyheader, cannot determine support level
== /sys/devices/pci0000:00/0000:00:01.0/0000:01:00.0 ==
modalias : pci:v000010DEd00001C20sv00001D05sd00001045bc03sc00i00
vendor : NVIDIA Corporation
model : GP106M [GeForce GTX 1060 Mobile]
manual_install: True
driver : nvidia-driver-460-server - distro non-free
driver : nvidia-driver-450-server - distro non-free
driver : nvidia-driver-470 - distro non-free recommended
driver : nvidia-driver-470-server - distro non-free
driver : nvidia-driver-460 - distro non-free
driver : nvidia-driver-495 - distro non-free
driver : nvidia-driver-390 - distro non-free
driver : nvidia-driver-418-server - distro non-free
driver : xserver-xorg-video-nouveau - distro free builtin
方法2、
lspci | grep -i vga
04:00.0 VGA compatible controller: ASPEED Technology, Inc. ASPEED Graphics Family (rev 41)
af:00.0 VGA compatible controller: NVIDIA Corporation Device 1f02 (rev a1) //注意这个"1f02"10进制数字,到时候要去一个网站上去查显卡的型号
d8:00.0 VGA compatible controller: NVIDIA Corporation Device 1f02 (rev a1)
可以按照十六进制数字代码找到相应的显卡型号http://pci-ids.ucw.cz/mods/PC/10de?action=jump?origin=help
2、到Nvidia官网下载驱动
这里是下载了一个生产分支的NVIDIA-Linux-x86_64-470.94.run,建议是下载生产分支的
3、安装驱动
#安装驱动前需要把原来自带的驱动禁用nouveau
sudo echo "blacklist nouveau" >>/etc/modprobe.d/blacklist.conf
#执行以下命令,会重新生新一个内核
sudo update-initramfs -u
#重启服务器
sudo reboot
#重启后检查,如果没有输出就是禁用生效了
sudo lsmod | grep nouveau
#禁用图形界面服务
sudo service lightdm stop
#安装一些依赖包
sudo apt install gcc make dkms -y
#安装过程中会各种提示,按照提示操作就行
sudo cd /opt/download
sudo chmod 755 NVIDIA-Linux-x86_64-470.94.ru
sudo ./NVIDIA-Linux-x86_64-470.94.ru --no-x-check --no-nouveau-check --no-opengl-files
#安装后如果想要卸载可以执行
sudo /usr/bin/nvidia-uninstall
#检查安装完后是否正常
sudo nvidia-smi
#结果输出,看到以下结果,代表显卡驱动安装正确
Tue Jan 11 11:06:57 2022
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 470.94 Driver Version: 470.94 CUDA Version: 11.4 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 NVIDIA GeForce ... Off | 00000000:01:00.0 Off | N/A |
| N/A 51C P0 24W / N/A | 0MiB / 6078MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
| No running processes found |
+-----------------------------------------------------------------------------+
三、安装docker服务
1、卸载
sudo apt-get remove docker docker-engine docker.io containerd runc
2、安装Docker
sudo apt-get update
# 安装依赖包
sudo apt-get install apt-transport-https ca-certificates curl gnupg-agent software-properties-common
# 添加 Docker 的官方 GPG 密钥
curl -fsSL https://download.docker.com/linux/ubuntu/gpg | sudo apt-key add -
# 验证您现在是否拥有带有指纹的密钥
sudo apt-key fingerprint 0EBFCD88
# 设置稳定版仓库
sudo add-apt-repository "deb [arch=amd64] https://download.docker.com/linux/ubuntu $(lsb_release -cs) stable"
# 更新
$ sudo apt-get update
# 安装最新的Docker-ce
sudo apt-get install docker-ce
# 启动
sudo systemctl enable docker
sudo systemctl start docker
#查看Docker版本,版本要大于1.19
docker --version
Docker version 20.10.12, build e91ed57
3、安装NVIDIA Container Runtime
vim nvidia-container-runtime-script.sh
sudo curl -s -L https://nvidia.github.io/nvidia-container-runtime/gpgkey | \
sudo apt-key add -
distribution=$(. /etc/os-release;echo $ID$VERSION_ID)
sudo curl -s -L https://nvidia.github.io/nvidia-container-runtime/$distribution/nvidia-container-runtime.list | \
sudo tee /etc/apt/sources.list.d/nvidia-container-runtime.list
sudo apt-get update
#执行安装
sh nvidia-container-runtime-script.sh
#安装
sudo apt-get install nvidia-container-runtime
#重启docker
sudo systemctl restart docker
#测试通过docker访问显卡
sudo docker run --rm --gpus all nvidia/cuda:11.0-base nvidia-smi
#显示结果,跟直接在宿主机上执行的结果是一样的
Tue Jan 11 03:12:03 2022
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 470.94 Driver Version: 470.94 CUDA Version: 11.4 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 NVIDIA GeForce ... Off | 00000000:01:00.0 Off | N/A |
| N/A 47C P0 24W / N/A | 0MiB / 6078MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
| No running processes found |
+-----------------------------------------------------------------------------+
4、普通用户下运行docker的环境配置
#添加docker用户组,只要安装完Docker服务,一般是这个docker组是已经存在的
groupadd docker
#将登陆用户加入到docker用户组中
gpasswd -a 用户名 docker
#更新用户组
newgrp docker
#重启Docker服务
systemctl restart docker
#测试,切找到普通用户模式下,执行以下命令,能正确输出就可以了。
docker run -u $(id -u):$(id -g) --gpus all nvidia/cuda:11.0-base nvidia-smi
环境部署好后,就可以去github上下载别人做的镜像,或得也可以自已构键镜像,满足公司的不同开的需求,就不需要在宿主机来部署这种复杂的环境了。

DAMO开发者矩阵,由阿里巴巴达摩院和中国互联网协会联合发起,致力于探讨最前沿的技术趋势与应用成果,搭建高质量的交流与分享平台,推动技术创新与产业应用链接,围绕“人工智能与新型计算”构建开放共享的开发者生态。
更多推荐
所有评论(0)