Book a Demo!
CoCalc Logo Icon
StoreFeaturesDocsShareSupportNewsAboutPoliciesSign UpSign In
tensorflow
GitHub Repository: tensorflow/docs-l10n
Path: blob/master/site/zh-cn/io/tutorials/azure.ipynb
25118 views
Kernel: Python 3
#@title Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. # You may obtain a copy of the License at # # https://www.apache.org/licenses/LICENSE-2.0 # # Unless required by applicable law or agreed to in writing, software # distributed under the License is distributed on an "AS IS" BASIS, # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. # See the License for the specific language governing permissions and # limitations under the License.

将 Azure Blob 存储与 TensorFlow 结合使用

小心:除 Python 软件包外,此笔记本还使用 npm install --user 安装软件包。在本地运行时要注意。

概述

本教程介绍如何通过 TensorFlow IO 的 Azure 文件系统集成,使用 TensorFlow 读写 Azure Blob 存储上的文件。

您需要有一个 Azure 存储帐户才能读写 Azure Blob 存储上的文件。Azure 存储密钥应通过环境变量提供:

os.environ['TF_AZURE_STORAGE_KEY'] = '<key>'

文件名 URI 包含存储帐户名称和容器名称:

azfs://<storage-account-name>/<container-name>/<path>

在本教程中,出于演示目的,您可以选择设置 Azurite(Azure 存储模拟器)。利用 Azurite 模拟器,您可以使用 TensorFlow 通过 Azure Blob 存储界面读写文件。

设置和使用

安装要求的软件包,然后重新启动运行时

try: %tensorflow_version 2.x except Exception: pass !pip install tensorflow-io
TensorFlow 2.x selected. Collecting tensorflow-io Downloading https://files.pythonhosted.org/packages/c0/d0/c5d7adce72c6a6d7c9a59c062150f60b5404c706578a0922f7dc2835713c/tensorflow_io-0.12.0-cp36-cp36m-manylinux2010_x86_64.whl (20.1MB) |████████████████████████████████| 20.1MB 42.7MB/s Requirement already satisfied: tensorflow<2.2.0,>=2.1.0 in /tensorflow-2.1.0/python3.6 (from tensorflow-io) (2.1.0) Requirement already satisfied: opt-einsum>=2.3.2 in /tensorflow-2.1.0/python3.6 (from tensorflow<2.2.0,>=2.1.0->tensorflow-io) (3.1.0) Requirement already satisfied: tensorflow-estimator<2.2.0,>=2.1.0rc0 in /tensorflow-2.1.0/python3.6 (from tensorflow<2.2.0,>=2.1.0->tensorflow-io) (2.1.0) Requirement already satisfied: grpcio>=1.8.6 in /tensorflow-2.1.0/python3.6 (from tensorflow<2.2.0,>=2.1.0->tensorflow-io) (1.27.2) Requirement already satisfied: wheel>=0.26; python_version >= "3" in /tensorflow-2.1.0/python3.6 (from tensorflow<2.2.0,>=2.1.0->tensorflow-io) (0.34.2) Requirement already satisfied: google-pasta>=0.1.6 in /tensorflow-2.1.0/python3.6 (from tensorflow<2.2.0,>=2.1.0->tensorflow-io) (0.1.8) Requirement already satisfied: tensorboard<2.2.0,>=2.1.0 in /tensorflow-2.1.0/python3.6 (from tensorflow<2.2.0,>=2.1.0->tensorflow-io) (2.1.0) Requirement already satisfied: wrapt>=1.11.1 in /tensorflow-2.1.0/python3.6 (from tensorflow<2.2.0,>=2.1.0->tensorflow-io) (1.12.0) Requirement already satisfied: scipy==1.4.1; python_version >= "3" in /tensorflow-2.1.0/python3.6 (from tensorflow<2.2.0,>=2.1.0->tensorflow-io) (1.4.1) Requirement already satisfied: protobuf>=3.8.0 in /tensorflow-2.1.0/python3.6 (from tensorflow<2.2.0,>=2.1.0->tensorflow-io) (3.11.3) Requirement already satisfied: termcolor>=1.1.0 in /tensorflow-2.1.0/python3.6 (from tensorflow<2.2.0,>=2.1.0->tensorflow-io) (1.1.0) Requirement already satisfied: absl-py>=0.7.0 in /tensorflow-2.1.0/python3.6 (from tensorflow<2.2.0,>=2.1.0->tensorflow-io) (0.9.0) Requirement already satisfied: keras-applications>=1.0.8 in /tensorflow-2.1.0/python3.6 (from tensorflow<2.2.0,>=2.1.0->tensorflow-io) (1.0.8) Requirement already satisfied: astor>=0.6.0 in /tensorflow-2.1.0/python3.6 (from tensorflow<2.2.0,>=2.1.0->tensorflow-io) (0.8.1) Requirement already satisfied: gast==0.2.2 in /tensorflow-2.1.0/python3.6 (from tensorflow<2.2.0,>=2.1.0->tensorflow-io) (0.2.2) Requirement already satisfied: keras-preprocessing>=1.1.0 in /tensorflow-2.1.0/python3.6 (from tensorflow<2.2.0,>=2.1.0->tensorflow-io) (1.1.0) Requirement already satisfied: numpy<2.0,>=1.16.0 in /tensorflow-2.1.0/python3.6 (from tensorflow<2.2.0,>=2.1.0->tensorflow-io) (1.18.1) Requirement already satisfied: six>=1.12.0 in /tensorflow-2.1.0/python3.6 (from tensorflow<2.2.0,>=2.1.0->tensorflow-io) (1.14.0) Requirement already satisfied: setuptools>=41.0.0 in /tensorflow-2.1.0/python3.6 (from tensorboard<2.2.0,>=2.1.0->tensorflow<2.2.0,>=2.1.0->tensorflow-io) (45.2.0) Requirement already satisfied: markdown>=2.6.8 in /tensorflow-2.1.0/python3.6 (from tensorboard<2.2.0,>=2.1.0->tensorflow<2.2.0,>=2.1.0->tensorflow-io) (3.2.1) Requirement already satisfied: google-auth<2,>=1.6.3 in /tensorflow-2.1.0/python3.6 (from tensorboard<2.2.0,>=2.1.0->tensorflow<2.2.0,>=2.1.0->tensorflow-io) (1.11.2) Requirement already satisfied: requests<3,>=2.21.0 in /tensorflow-2.1.0/python3.6 (from tensorboard<2.2.0,>=2.1.0->tensorflow<2.2.0,>=2.1.0->tensorflow-io) (2.22.0) Requirement already satisfied: werkzeug>=0.11.15 in /tensorflow-2.1.0/python3.6 (from tensorboard<2.2.0,>=2.1.0->tensorflow<2.2.0,>=2.1.0->tensorflow-io) (1.0.0) Requirement already satisfied: google-auth-oauthlib<0.5,>=0.4.1 in /tensorflow-2.1.0/python3.6 (from tensorboard<2.2.0,>=2.1.0->tensorflow<2.2.0,>=2.1.0->tensorflow-io) (0.4.1) Requirement already satisfied: h5py in /tensorflow-2.1.0/python3.6 (from keras-applications>=1.0.8->tensorflow<2.2.0,>=2.1.0->tensorflow-io) (2.10.0) Requirement already satisfied: cachetools<5.0,>=2.0.0 in /tensorflow-2.1.0/python3.6 (from google-auth<2,>=1.6.3->tensorboard<2.2.0,>=2.1.0->tensorflow<2.2.0,>=2.1.0->tensorflow-io) (4.0.0) Requirement already satisfied: rsa<4.1,>=3.1.4 in /tensorflow-2.1.0/python3.6 (from google-auth<2,>=1.6.3->tensorboard<2.2.0,>=2.1.0->tensorflow<2.2.0,>=2.1.0->tensorflow-io) (4.0) Requirement already satisfied: pyasn1-modules>=0.2.1 in /tensorflow-2.1.0/python3.6 (from google-auth<2,>=1.6.3->tensorboard<2.2.0,>=2.1.0->tensorflow<2.2.0,>=2.1.0->tensorflow-io) (0.2.8) Requirement already satisfied: idna<2.9,>=2.5 in /tensorflow-2.1.0/python3.6 (from requests<3,>=2.21.0->tensorboard<2.2.0,>=2.1.0->tensorflow<2.2.0,>=2.1.0->tensorflow-io) (2.8) Requirement already satisfied: chardet<3.1.0,>=3.0.2 in /tensorflow-2.1.0/python3.6 (from requests<3,>=2.21.0->tensorboard<2.2.0,>=2.1.0->tensorflow<2.2.0,>=2.1.0->tensorflow-io) (3.0.4) Requirement already satisfied: certifi>=2017.4.17 in /tensorflow-2.1.0/python3.6 (from requests<3,>=2.21.0->tensorboard<2.2.0,>=2.1.0->tensorflow<2.2.0,>=2.1.0->tensorflow-io) (2019.11.28) Requirement already satisfied: urllib3!=1.25.0,!=1.25.1,<1.26,>=1.21.1 in /tensorflow-2.1.0/python3.6 (from requests<3,>=2.21.0->tensorboard<2.2.0,>=2.1.0->tensorflow<2.2.0,>=2.1.0->tensorflow-io) (1.25.8) Requirement already satisfied: requests-oauthlib>=0.7.0 in /tensorflow-2.1.0/python3.6 (from google-auth-oauthlib<0.5,>=0.4.1->tensorboard<2.2.0,>=2.1.0->tensorflow<2.2.0,>=2.1.0->tensorflow-io) (1.3.0) Requirement already satisfied: pyasn1>=0.1.3 in /tensorflow-2.1.0/python3.6 (from rsa<4.1,>=3.1.4->google-auth<2,>=1.6.3->tensorboard<2.2.0,>=2.1.0->tensorflow<2.2.0,>=2.1.0->tensorflow-io) (0.4.8) Requirement already satisfied: oauthlib>=3.0.0 in /tensorflow-2.1.0/python3.6 (from requests-oauthlib>=0.7.0->google-auth-oauthlib<0.5,>=0.4.1->tensorboard<2.2.0,>=2.1.0->tensorflow<2.2.0,>=2.1.0->tensorflow-io) (3.1.0) Installing collected packages: tensorflow-io Successfully installed tensorflow-io-0.12.0

安装并设置 Azurite(可选)

如果没有可用的 Azure 存储帐户,则需要执行以下命令才能安装和设置模拟 Azure 存储界面的 Azurite:

!npm install azurite@2.7.0
npm WARN deprecated [email protected]: request has been deprecated, see https://github.com/request/request/issues/3142 npm WARN saveError ENOENT: no such file or directory, open '/content/package.json' npm notice created a lockfile as package-lock.json. You should commit this file. npm WARN enoent ENOENT: no such file or directory, open '/content/package.json' npm WARN content No description npm WARN content No repository field. npm WARN content No README data npm WARN content No license field. + [email protected] added 116 packages from 141 contributors in 6.591s
# The path for npm might not be exposed in PATH env, # you can find it out through 'npm bin' command npm_bin_path = get_ipython().getoutput('npm bin')[0] print('npm bin path: ', npm_bin_path) # Run `azurite-blob -s` as a background process. # IPython doesn't recognize `&` in inline bash cells. get_ipython().system_raw(npm_bin_path + '/' + 'azurite-blob -s &')
npm bin path: /content/node_modules/.bin

使用 TensorFlow 读写 Azure 存储上的文件

下面是使用 TensorFlow 的 API 读写 Azure 存储上的文件的一个示例。

导入 tensorflow-io 软件包后,它与 TensorFlow 中其他文件系统(例如,POSIX 或 GCS)的行为相同,因为 tensorflow-io 会自动注册 azfs 方案以供使用。

Azure 存储密钥应通过 TF_AZURE_STORAGE_KEY 环境变量提供。否则,可将 TF_AZURE_USE_DEV_STORAGE 设置为 True 以使用 Azurite 模拟器:

import os import tensorflow as tf import tensorflow_io as tfio # Switch to False to use Azure Storage instead: use_emulator = True if use_emulator: os.environ['TF_AZURE_USE_DEV_STORAGE'] = '1' account_name = 'devstoreaccount1' else: # Replace <key> with Azure Storage Key, and <account> with Azure Storage Account os.environ['TF_AZURE_STORAGE_KEY'] = '<key>' account_name = '<account>' # Alternatively, you can use a shared access signature (SAS) to authenticate with the Azure Storage Account os.environ['TF_AZURE_STORAGE_SAS'] = '<your sas>' account_name = '<account>'
pathname = 'az://{}/aztest'.format(account_name) tf.io.gfile.mkdir(pathname) filename = pathname + '/hello.txt' with tf.io.gfile.GFile(filename, mode='w') as w: w.write("Hello, world!") with tf.io.gfile.GFile(filename, mode='r') as r: print(r.read())
Hello, world!

配置

在 TensorFlow 中,始终通过环境变量完成 Azure Blob 存储的配置。下面是可用配置的完整列表:

  • TF_AZURE_USE_DEV_STORAGE:对于“az://devstoreaccount1/container/file.txt”之类的连接,设置为 1 可使用本地开发存储模拟器。该设置的优先级高于所有其他设置,所以,要使用任何其他连接,请将其设置为 unset

  • TF_AZURE_STORAGE_KEY:使用的存储帐户的帐户密钥

  • TF_AZURE_STORAGE_USE_HTTP:如果不想使用 https 传输,则可将其设置为任何值。将其设置为 unset 可使用默认值 https

  • TF_AZURE_STORAGE_BLOB_ENDPOINT:设置为 Blob 存储的端点 - 默认值为 .core.windows.net