前言
GPU环境问题
- 统一参考官网https://www.tensorflow.org/install/docker?hl=zh-cn
- 统一在linux中使用docker安装。不使用windows.
- 只需在主机上安装 NVIDIA® GPU 驱动程序,而不必安装 NVIDIA® CUDA® 工具包。(工具包cudnn和Toolkit)
- 检查docker版本
1.请注意,随着Docker 19.03的发布,不赞成使用nvidia-docker2软件包,因为Docker运行时中现在已将NVIDIA GPU作为设备本地支持。 2.如需在 Linux 上启用 GPU 支持,请安装 NVIDIA Docker 支持。
- 请通过
docker -v检查 Docker 版本。对于 19.03 之前的版本,您需要使用 nvidia-docker2 和--runtime=nvidia标记;**对于 19.03 及之后的版本,您将需要使用nvidia-container-toolkit软件包和--gpus all标记。**这两个选项都记录在上面链接的网页上。 - 参考官网方式进行安装 nvidia-docker https://github.com/NVIDIA/nvidia-docker
- 请通过
nvidia-container-toolkit 安装
centos7
distribution=$(. /etc/os-release;echo $ID$VERSION_ID)curl https://nvidia.github.io/nvidia-docker/centos7/nvidia-docker.repo > /etc/yum.repos.d/nvidia-docker.reposudo yum install -y nvidia-container-toolkit
docker启动
docker启动tensorflow支持gpu要加上 参数 —gpus all
# Add the package repositoriesdistribution=$(. /etc/os-release;echo $ID$VERSION_ID)curl -s -L https://nvidia.github.io/nvidia-docker/gpgkey | sudo apt-key add -curl -s -L https://nvidia.github.io/nvidia-docker/$distribution/nvidia-docker.list | sudo tee /etc/apt/sources.list.d/nvidia-docker.listsudo apt-get update && sudo apt-get install -y nvidia-container-toolkitsudo systemctl restart docker# Test nvidia-smi with the latest official CUDA imagedocker run --gpus all nvidia/cuda:10.0-base nvidia-smidocker run --gpus all -it tensorflow/tensorflow:latest-gpu \python -c "import tensorflow as tf; print(tf.reduce_sum(tf.random.normal([1000, 1000])))"
第10行会返回gpu信息:
第12~13行会返回一大堆日志,我截取最后一句,含有“gpu”字样
进入容器,用以下代码检查容器环境,查看返回信息即可:
# TensorFlow and tf.kerasimport tensorflow as tffrom tensorflow import kerasfrom tensorflow.python.client import device_libprint("tensorflow版本:",tf.__version__)print("kears版本:",keras.__version__)print("是否建立在cuda上:",tf.test.is_built_with_cuda())print("""\n ***** 以下将输出环境检测日志,需要出现 device_type: "GPU" 才算成功使用GPU ***** \n""")print(device_lib.list_local_devices())print("""\n ***** 环境检测日志输出完毕,需要出现 device_type: "GPU" 才算成功使用GPU ***** \n""")
docker-compose 启动
参考官网https://github.com/nvidia/nvidia-container-runtime#installation
需要安装 nvidia-container-runtime
Ubuntu distributions
- Install the repository for your distribution by following the instructions here.
Install the
nvidia-container-runtimepackage:sudo apt-get install nvidia-container-runtime
CentOS distributions
Install the repository for your distribution by following the instructions here.
- Install the
nvidia-container-runtimepackage:sudo yum install nvidia-container-runtime
Docker Engine setup
如果您已安装nvidia-docker2软件包,则该部分已注册运行时,请不要遵循此部分。 要注册nvidia运行时,请使用以下最适合您的环境的方法。 您可能需要将新参数与现有配置合并。Systemd drop-in file
sudo mkdir -p /etc/systemd/system/docker.service.dsudo tee /etc/systemd/system/docker.service.d/override.conf <<EOF[Service]ExecStart=ExecStart=/usr/bin/dockerd --host=fd:// --add-runtime=nvidia=/usr/bin/nvidia-container-runtimeEOFsudo systemctl daemon-reloadsudo systemctl restart docker
Daemon configuration file
You can optionally reconfigure the default runtime by adding the following tosudo tee /etc/docker/daemon.json <<EOF{"runtimes": {"nvidia": {"path": "/usr/bin/nvidia-container-runtime","runtimeArgs": []}}}EOFsudo pkill -SIGHUP dockerd
/etc/docker/daemon.json:
最终的/etc/docker/daemon.json文件内容为:"default-runtime": "nvidia"
{"registry-mirrors": ["https://1nj0zren.mirror.aliyuncs.com","https://docker.mirrors.ustc.edu.cn","http://f1361db2.m.daocloud.io","https://registry.docker-cn.com"],"default-runtime": "nvidia","runtimes": {"nvidia": {"path": "/usr/bin/nvidia-container-runtime","runtimeArgs": []}}}
版本问题
注意,docker安装的tensorflow是已经包含了keras的,因为版本足够高。导入方式
from tensorflow import keras
后续使用方式与单独的keras一致。
如有必要,可以先用docker装tensloflow,再进入环境装keras
pip install keras==xxxx
tensorflow搭建
docker安装tensorflow官网教程 https://www.tensorflow.org/install/docker?hl=zh-cn
下载 TensorFlow Docker 映像
官方 TensorFlow Docker 映像位于 tensorflow/tensorflow Docker Hub 代码库中。映像版本按照以下格式进行标记:
| 标记 | 说明 |
|---|---|
latest |
TensorFlow CPU 二进制映像的最新版本。(默认版本) |
nightly |
TensorFlow 映像的每夜版。(不稳定) |
version |
指定 TensorFlow 二进制映像的版本,例如:2.1.0 |
devel |
TensorFlow master 开发环境的每夜版。包含 TensorFlow 源代码。 |
每个基本标记都有用于添加或更改功能的变体:
| 标记变体 | 说明 |
|---|---|
tag-gpu |
支持 GPU 的指定标记版本。(详见下文) |
tag-py3 |
支持 Python 3 的指定标记版本。 |
tag-jupyter |
带有 Jupyter 的指定标记版本(包含 TensorFlow 教程笔记本) |
您可以一次使用多个变体。例如,以下命令会将 TensorFlow 版本映像下载到计算机上:
docker pull tensorflow/tensorflow # latest stable release
docker pull tensorflow/tensorflow:devel-gpu # nightly dev release w/ GPU support
docker pull tensorflow/tensorflow:latest-gpu-jupyter # latest release w/ GPU support and Jupyter
如tensorflow/tensorflow:latest-gpu-jupyter,latest是第一个表格的标记,gpu 和 jupyter是第二个表格的标记,三个标记一起使用了。
版本问题—-keras和tensorflow的版本对应关系
keras和tensorflow的版本对应关系,可参考:
| Framework | Env name (—env parameter) | Description | Docker Image | Packages and Nvidia Settings |
|---|---|---|---|---|
| TensorFlow 1.14 | tensorflow-1.14 | TensorFlow 1.14.0 + Keras 2.2.5 on Python 3.6. | floydhub/tensorflow | TensorFlow-1.14 |
| TensorFlow 1.13 | tensorflow-1.13 | TensorFlow 1.13.0 + Keras 2.2.4 on Python 3.6. | floydhub/tensorflow | TensorFlow-1.13 |
| TensorFlow 1.12 | tensorflow-1.12 | TensorFlow 1.12.0 + Keras 2.2.4 on Python 3.6. | floydhub/tensorflow | TensorFlow-1.12 |
| tensorflow-1.12:py2 | TensorFlow 1.12.0 + Keras 2.2.4 on Python 2. | floydhub/tensorflow | ||
| TensorFlow 1.11 | tensorflow-1.11 | TensorFlow 1.11.0 + Keras 2.2.4 on Python 3.6. | floydhub/tensorflow | TensorFlow-1.11 |
| tensorflow-1.11:py2 | TensorFlow 1.11.0 + Keras 2.2.4 on Python 2. | floydhub/tensorflow | ||
| TensorFlow 1.10 | tensorflow-1.10 | TensorFlow 1.10.0 + Keras 2.2.0 on Python 3.6. | floydhub/tensorflow | TensorFlow-1.10 |
| tensorflow-1.10:py2 | TensorFlow 1.10.0 + Keras 2.2.0 on Python 2. | floydhub/tensorflow | ||
| TensorFlow 1.9 | tensorflow-1.9 | TensorFlow 1.9.0 + Keras 2.2.0 on Python 3.6. | floydhub/tensorflow | TensorFlow-1.9 |
| tensorflow-1.9:py2 | TensorFlow 1.9.0 + Keras 2.2.0 on Python 2. | floydhub/tensorflow | ||
| TensorFlow 1.8 | tensorflow-1.8 | TensorFlow 1.8.0 + Keras 2.1.6 on Python 3.6. | floydhub/tensorflow | TensorFlow-1.8 |
| tensorflow-1.8:py2 | TensorFlow 1.8.0 + Keras 2.1.6 on Python 2. | floydhub/tensorflow | ||
| TensorFlow 1.7 | tensorflow-1.7 | TensorFlow 1.7.0 + Keras 2.1.6 on Python 3.6. | floydhub/tensorflow | TensorFlow-1.7 |
| tensorflow-1.7:py2 | TensorFlow 1.7.0 + Keras 2.1.6 on Python 2. | floydhub/tensorflow | ||
| TensorFlow 1.5 | tensorflow-1.5 | TensorFlow 1.5.0 + Keras 2.1.6 on Python 3.6. | floydhub/tensorflow | TensorFlow-1.5 |
| tensorflow-1.5:py2 | TensorFlow 1.5.0 + Keras 2.1.6 on Python 2. | floydhub/tensorflow | ||
| TensorFlow 1.4 | tensorflow-1.4 | TensorFlow 1.4.0 + Keras 2.0.8 on Python 3.6. | floydhub/tensorflow | |
| tensorflow-1.4:py2 | TensorFlow 1.4.0 + Keras 2.0.8 on Python 2. | floydhub/tensorflow | ||
| TensorFlow 1.3 | tensorflow-1.3 | TensorFlow 1.3.0 + Keras 2.0.6 on Python 3.6. | floydhub/tensorflow | |
| tensorflow-1.3:py2 | TensorFlow 1.3.0 + Keras 2.0.6 on Python 2. | floydhub/tensorflow | ||
| TensorFlow 1.2 | tensorflow-1.2 | TensorFlow 1.2.0 + Keras 2.0.6 on Python 3.5. | floydhub/tensorflow | |
| tensorflow-1.2:py2 | TensorFlow 1.2.0 + Keras 2.0.6 on Python 2. | floydhub/tensorflow | ||
| TensorFlow 1.1 | tensorflow | TensorFlow 1.1.0 + Keras 2.0.6 on Python 3.5. | floydhub/tensorflow | |
| tensorflow:py2 | TensorFlow 1.1.0 + Keras 2.0.6 on Python 2. | floydhub/tensorflow | ||
| TensorFlow 1.0 | tensorflow-1.0 | TensorFlow 1.0.0 + Keras 2.0.6 on Python 3.5. | floydhub/tensorflow | |
| tensorflow-1.0:py2 | TensorFlow 1.0.0 + Keras 2.0.6 on Python 2. | floydhub/tensorflow | ||
| TensorFlow 0.12 | tensorflow-0.12 | TensorFlow 0.12.1 + Keras 1.2.2 on Python 3.5. | floydhub/tensorflow | |
| tensorflow-0.12:py2 | TensorFlow 0.12.1 + Keras 1.2.2 on Python 2. | floydhub/tensorflow | ||
| PyTorch 1.1 | pytorch-1.1 | PyTorch 1.1.0 + fastai 1.0.57 on Python 3.6. | floydhub/pytorch | PyTorch-1.1 |
| PyTorch 1.0 | pytorch-1.0 | PyTorch 1.0.0 + fastai 1.0.51 on Python 3.6. | floydhub/pytorch | PyTorch-1.0 |
| pytorch-1.0:py2 | PyTorch 1.0.0 on Python 2. | floydhub/pytorch | ||
| PyTorch 0.4 | pytorch-0.4 | PyTorch 0.4.1 on Python 3.6. | floydhub/pytorch | PyTorch-0.4 |
| pytorch-0.4:py2 | PyTorch 0.4.1 on Python 2. | floydhub/pytorch | ||
| PyTorch 0.3 | pytorch-0.3 | PyTorch 0.3.1 on Python 3.6. | floydhub/pytorch | PyTorch-0.3 |
| pytorch-0.3:py2 | PyTorch 0.3.1 on Python 2. | floydhub/pytorch | ||
| PyTorch 0.2 | pytorch-0.2 | PyTorch 0.2.0 on Python 3.5 | floydhub/pytorch | |
| pytorch-0.2:py2 | PyTorch 0.2.0 on Python 2. | floydhub/pytorch | ||
| PyTorch 0.1 | pytorch-0.1 | PyTorch 0.1.12 on Python 3. | floydhub/pytorch | |
| pytorch-0.1:py2 | PyTorch 0.1.12 on Python 2. | floydhub/pytorch | ||
| Theano 0.9 | theano-0.9 | Theano rel-0.8.2 + Keras 2.0.3 on Python3.5. | floydhub/theano | |
| theano-0.9:py2 | Theano rel-0.8.2 + Keras 2.0.3 on Python2. | floydhub/theano | ||
| Caffe | caffe | Caffe rc4 on Python3.5. | floydhub/caffe | |
| caffe:py2 | Caffe rc4 on Python2. | floydhub/caffe | ||
| Torch | torch | Torch 7 with Python 3 env. | floydhub/torch | |
| torch:py2 | Torch 7 with Python 2 env. | floydhub/torch | ||
| Chainer 1.23 | chainer-1.23 | Chainer 1.23.0 on Python 3. | floydhub/chainer | |
| chainer-1.23:py2 | Chainer 1.23.0 on Python 2. | floydhub/chainer | ||
| Chainer 2.0 | chainer-2.0 | Chainer 1.23.0 on Python 3. | floydhub/chainer | |
| chainer-2.0:py2 | Chainer 1.23.0 on Python 2. | floydhub/chainer | ||
| MxNet 1.0 | mxnet | MxNet 1.0.0 on Python 3.6. | floydhub/mxnet | |
| mxnet:py2 | MxNet 1.0.0 on Python 2. | floydhub/mxnet |
