GitHub Repository: tensorflow/docs-l10n
Path: blob/master/site/zh-cn/guide/gpu.ipynb
²⁵¹¹⁵ views

Kernel: Python 3

Copyright 2018 The TensorFlow Authors.

In [ ]:

#@title Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# https://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

使用 GPU

无需更改任何代码，TensorFlow 代码以及 tf.keras 模型就可以在单个 GPU 上透明运行。

注：使用 tf.config.list_physical_devices('GPU') 可以确认 TensorFlow 使用的是 GPU。

在一台或多台机器上，要顺利地在多个 GPU 上运行，最简单的方法是使用分布策略。

本指南适用于已尝试这些方法，但发现需要对 TensorFlow 使用 GPU 的方式进行精细控制的用户。要了解如何为单 GPU 和多 GPU 情景调试性能问题，请参阅优化 TensorFlow GPU 性能指南。

设置

确保已安装最新的 TensorFlow GPU 版本。

In [ ]:

import tensorflow as tf
print("Num GPUs Available: ", len(tf.config.list_physical_devices('GPU')))

概述

TensorFlow 支持在各种类型的设备上执行计算，包括 CPU 和 GPU。我们使用字符串标识符来表示这些设备，例如：

"/device:CPU:0"：机器的 CPU。
"/GPU:0"：TensorFlow 可见的机器上第一个 GPU 的速记表示法。
"/job:localhost/replica:0/task:0/device:GPU:1"：TensorFlow 可见的机器上第二个 GPU 的完全限定名称。

如果一个 TensorFlow 运算同时有 CPU 和 GPU 实现，则在默认情况下，分配运算时会优先使用 GPU 设备。例如，tf.matmul 同时有 CPU 和 GPU 内核，在具有 CPU:0 和 GPU:0 设备的系统上，将选择 GPU:0 设备来运行 tf.matmul，除非明确要求在另一个设备上运行。

如果一个 TensorFlow 运算没有相应的 GPU 实现，则该运算将回退到 CPU 设备。例如，由于 tf.cast 只有一个 CPU 内核，在具有 CPU:0 和 GPU:0 设备的系统上，即使请求在 GPU:0 设备上运行 tf.cast，也会选择 CPU:0 设备来运行该运算。

记录设备放置

为了找出将运算和张量分配到的目标设备，请将 tf.debugging.set_log_device_placement(True) 放在程序的第一行。启用设备放置记录将导致任何张量分配或运算被打印。

In [ ]:

tf.debugging.set_log_device_placement(True)

# Create some tensors
a = tf.constant([[1.0, 2.0, 3.0], [4.0, 5.0, 6.0]])
b = tf.constant([[1.0, 2.0], [3.0, 4.0], [5.0, 6.0]])
c = tf.matmul(a, b)

print(c)

以上代码将打印 MatMul 运算在 GPU:0 上执行的指示。

手动设备放置

如果您希望在自己选择的设备上执行特定运算，而不是在自动选择的设备上执行，则可以使用 with tf.device 创建设备上下文。创建完成后，该上下文中的所有运算都会在同一指定设备上运行。

In [ ]:

tf.debugging.set_log_device_placement(True)

# Place tensors on the CPU
with tf.device('/CPU:0'):
  a = tf.constant([[1.0, 2.0, 3.0], [4.0, 5.0, 6.0]])
  b = tf.constant([[1.0, 2.0], [3.0, 4.0], [5.0, 6.0]])

c = tf.matmul(a, b)
print(c)

现在，您会看到已将 a 和 b 分配给 CPU:0。由于没有为 MatMul 运算明确指定设备，TensorFlow 运行时将根据运算和可用的设备选择一个设备（本例中为 GPU:0），并且在需要时会自动在设备之间复制张量。

限制 GPU 内存增长

默认情况下，TensorFlow 会映射进程可见的所有 GPU（取决于 CUDA_VISIBLE_DEVICES）的几乎全部内存。这是为了减少内存碎片，更有效地利用设备上相对宝贵的 GPU 内存资源。为了将 TensorFlow 限制为使用一组特定的 GPU，我们使用 tf.config.set_visible_devices 方法。

In [ ]:

gpus = tf.config.list_physical_devices('GPU')
if gpus:
  # Restrict TensorFlow to only use the first GPU
  try:
    tf.config.set_visible_devices(gpus[0], 'GPU')
    logical_gpus = tf.config.list_logical_devices('GPU')
    print(len(gpus), "Physical GPUs,", len(logical_gpus), "Logical GPU")
  except RuntimeError as e:
    # Visible devices must be set before GPUs have been initialized
    print(e)

在某些情况下，我们希望进程最好只分配可用内存的一个子集，或者仅在进程需要时才增加内存使用量。TensorFlow 为此提供了两种控制方法。

第一个选项是通过调用 tf.config.experimental.set_memory_growth 来开启内存增长。此选项会尝试根据运行时分配的需求分配尽可能充足的 GPU 内存：首先分配非常少的内存，随着程序的运行，需要的 GPU 内存逐渐增多，于是扩展 TensorFlow 进程的 GPU 内存区域。内存不会被释放，因为这样会产生内存碎片。要关闭特定 GPU 的内存增长，请在分配任何张量或执行任何运算之前使用以下代码。

In [ ]:

gpus = tf.config.list_physical_devices('GPU')
if gpus:
  try:
    # Currently, memory growth needs to be the same across GPUs
    for gpu in gpus:
      tf.config.experimental.set_memory_growth(gpu, True)
    logical_gpus = tf.config.list_logical_devices('GPU')
    print(len(gpus), "Physical GPUs,", len(logical_gpus), "Logical GPUs")
  except RuntimeError as e:
    # Memory growth must be set before GPUs have been initialized
    print(e)

第二个启用此选项的方式是将环境变量 TF_FORCE_GPU_ALLOW_GROWTH 设置为 true。这是一个特定于平台的配置。

第二种方法是使用 tf.config.set_logical_device_configuration 配置虚拟 GPU 设备，并且设置可在 GPU 上分配多少总内存的硬性限制。

In [ ]:

gpus = tf.config.list_physical_devices('GPU')
if gpus:
  # Restrict TensorFlow to only allocate 1GB of memory on the first GPU
  try:
    tf.config.set_logical_device_configuration(
        gpus[0],
        [tf.config.experimental.VirtualDeviceConfiguration(memory_limit=1024)])
    logical_gpus = tf.config.list_logical_devices('GPU')
    print(len(gpus), "Physical GPUs,", len(logical_gpus), "Logical GPUs")
  except RuntimeError as e:
    # Virtual devices must be set before GPUs have been initialized
    print(e)

这在要真正限制可供 TensorFlow 进程使用的 GPU 内存量时非常有用。在本地开发中，与其他应用（如工作站 GUI）共享 GPU 时，这是常见做法。

使用多 GPU 系统上的单个 GPU

如果系统上有多个 GPU，则默认情况下会选择具有最小 ID 的 GPU。如果希望在不同的 GPU 上运行，则需要明确指定需要的 GPU：

In [ ]:

tf.debugging.set_log_device_placement(True)

try:
  # Specify an invalid GPU device
  with tf.device('/device:GPU:2'):
    a = tf.constant([[1.0, 2.0, 3.0], [4.0, 5.0, 6.0]])
    b = tf.constant([[1.0, 2.0], [3.0, 4.0], [5.0, 6.0]])
    c = tf.matmul(a, b)
except RuntimeError as e:
  print(e)

如果指定的设备不存在，则会引发 RuntimeError 错误：.../device:GPU:2 unknown device。

当指定的设备不存在时，如果希望 TensorFlow 自动选择存在且支持的设备来执行运算，可以调用 tf.config.set_soft_device_placement(True)。

In [ ]:

tf.config.set_soft_device_placement(True)
tf.debugging.set_log_device_placement(True)

# Creates some tensors
a = tf.constant([[1.0, 2.0, 3.0], [4.0, 5.0, 6.0]])
b = tf.constant([[1.0, 2.0], [3.0, 4.0], [5.0, 6.0]])
c = tf.matmul(a, b)

print(c)

使用多个 GPU

为多个 GPU 开发的模型可使用额外的资源进行扩展。如果在具有单个 GPU 的系统上进行开发，可以使用虚拟设备模拟多个 GPU。这样，无需额外的资源，您就可以轻松对多 GPU 设置进行测试。

In [ ]:

gpus = tf.config.list_physical_devices('GPU')
if gpus:
  # Create 2 virtual GPUs with 1GB memory each
  try:
    tf.config.set_logical_device_configuration(
        gpus[0],
        [tf.config.LogicalDeviceConfiguration(memory_limit=1024),
         tf.config.LogicalDeviceConfiguration(memory_limit=1024)])
    logical_gpus = tf.config.list_logical_devices('GPU')
    print(len(gpus), "Physical GPU,", len(logical_gpus), "Logical GPUs")
  except RuntimeError as e:
    # Virtual devices must be set before GPUs have been initialized
    print(e)

建立可供运行时使用的多个逻辑 GPU 后，可以通过 tf.distribute.Strategy 或手动放置来利用多个 GPU。

使用 `tf.distribute.Strategy`

使用多个 GPU 的最佳做法是使用 tf.distribute.Strategy。下面是一个简单示例：

In [ ]:

tf.debugging.set_log_device_placement(True)
gpus = tf.config.list_logical_devices('GPU')
strategy = tf.distribute.MirroredStrategy(gpus)
with strategy.scope():
  inputs = tf.keras.layers.Input(shape=(1,))
  predictions = tf.keras.layers.Dense(1)(inputs)
  model = tf.keras.models.Model(inputs=inputs, outputs=predictions)
  model.compile(loss='mse',
                optimizer=tf.keras.optimizers.SGD(learning_rate=0.2))

此程序会在每个 GPU 上运行模型的一个副本，并将输入数据拆分到每个 GPU 上，也就是所谓的“数据并行”。

有关分布策略的详细信息，请查阅此处的指南。

手动放置

tf.distribute.Strategy 通过跨设备复制计算在后台运行。您可以通过在每个 GPU 上构建模型来手动实现复制。例如：

In [ ]:

tf.debugging.set_log_device_placement(True)

gpus = tf.config.list_logical_devices('GPU')
if gpus:
  # Replicate your computation on multiple GPUs
  c = []
  for gpu in gpus:
    with tf.device(gpu.name):
      a = tf.constant([[1.0, 2.0, 3.0], [4.0, 5.0, 6.0]])
      b = tf.constant([[1.0, 2.0], [3.0, 4.0], [5.0, 6.0]])
      c.append(tf.matmul(a, b))

  with tf.device('/CPU:0'):
    matmul_sum = tf.add_n(c)

  print(matmul_sum)

Copyright 2018 The TensorFlow Authors.

使用 GPU

设置

概述

记录设备放置

手动设备放置

限制 GPU 内存增长

使用多 GPU 系统上的单个 GPU

使用多个 GPU

使用 `tf.distribute.Strategy`

手动放置

Product

Resources

Company

Copyright 2018 The TensorFlow Authors.

使用 GPU

设置

概述

记录设备放置

手动设备放置

限制 GPU 内存增长

使用多 GPU 系统上的单个 GPU

使用多个 GPU

使用 tf.distribute.Strategy

手动放置

使用 `tf.distribute.Strategy`