GitHub Repository: tensorflow/docs-l10n
Path: blob/master/site/zh-cn/hub/tutorials/cropnet_on_device.ipynb
²⁵¹¹⁸ views

Kernel: Python 3

Copyright 2021 The TensorFlow Hub Authors.

Licensed under the Apache License, Version 2.0 (the "License");

In [ ]:

#@title Copyright 2021 The TensorFlow Hub Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
#     http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# ==============================================================================

微调用于植物病害检测的模型

本笔记本展示如何使用 TFDS 的数据集或您自己的农作物病害检测数据集微调 TensorFlow Hub 中的 CropNet 模型。

您将：

加载 TFDS 木薯数据集或您自己的数据
使用未知（负）示例丰富数据，以获得更稳健的模型
对数据应用图像增强
加载并微调 TF Hub 中的 CropNet 模型
导出一个 TFLite 模型，可以直接使用 Task Library、MLKit 或 TFLite 将其部署在您的应用上。

导入和依赖项

在开始之前，您需要安装一些必需的依赖项（如 Model Maker）和最新版本的 TensorFlow 数据集。

In [ ]:

!sudo apt install -q libportaudio2
## image_classifier library requires numpy <= 1.23.5
!pip install "numpy<=1.23.5"
!pip install --use-deprecated=legacy-resolver tflite-model-maker-nightly
!pip install -U tensorflow-datasets
## scann library requires tensorflow < 2.9.0
!pip install "tensorflow<2.9.0"
!pip install "tensorflow-datasets~=4.8.0"  # protobuf>=3.12.2
!pip install tensorflow-metadata~=1.10.0  # protobuf>=3.13
## tensorflowjs requires packaging < 20.10
!pip install "packaging<20.10"

In [ ]:

import matplotlib.pyplot as plt
import os
import seaborn as sns

import tensorflow as tf
import tensorflow_datasets as tfds

from tensorflow_examples.lite.model_maker.core.export_format import ExportFormat
from tensorflow_examples.lite.model_maker.core.task import image_preprocessing

from tflite_model_maker import image_classifier
from tflite_model_maker import ImageClassifierDataLoader
from tflite_model_maker.image_classifier import ModelSpec

加载用于微调的 TFDS 数据集

我们使用 TFDS 公开可用的木薯叶病害数据集。

In [ ]:

tfds_name = 'cassava'
(ds_train, ds_validation, ds_test), ds_info = tfds.load(
    name=tfds_name,
    split=['train', 'validation', 'test'],
    with_info=True,
    as_supervised=True)
TFLITE_NAME_PREFIX = tfds_name

或者加载用于微调的您自己的数据

除了使用 TFDS 数据集，您还可以使用自己的数据进行训练。以下代码片段显示了如何加载您自己的自定义数据集。有关支持的数据结构，请参阅此链接。这里提供了一个使用公开可用的木薯叶病害数据集的示例。

In [ ]:

# data_root_dir = tf.keras.utils.get_file(
#     'cassavaleafdata.zip',
#     'https://storage.googleapis.com/emcassavadata/cassavaleafdata.zip',
#     extract=True)
# data_root_dir = os.path.splitext(data_root_dir)[0]  # Remove the .zip extension

# builder = tfds.ImageFolder(data_root_dir)

# ds_info = builder.info
# ds_train = builder.as_dataset(split='train', as_supervised=True)
# ds_validation = builder.as_dataset(split='validation', as_supervised=True)
# ds_test = builder.as_dataset(split='test', as_supervised=True)

可视化训练分割的样本

让我们看几个来自数据集的示例，其中包括图像样本的类别 id 和类别名称以及它们的标签。

In [ ]:

_ = tfds.show_examples(ds_train, ds_info)

从 TFDS 数据集添加图像以用作未知示例

向训练数据集添加额外的未知（负）示例，并为它们分配一个新的未知类别标签号。目标是建立一个模型，当在实践中使用（例如在现场）时，如果看到意外情况，它可以选择预测“未知”。

下面是一个将用于对其他未知图像进行采样的数据集列表。它包括 3 个完全不同的数据集以增加多样性。其中之一是豆叶病害数据集，因此该模型可接触到除木薯以外的病害植物。

In [ ]:

UNKNOWN_TFDS_DATASETS = [{
    'tfds_name': 'imagenet_v2/matched-frequency',
    'train_split': 'test[:80%]',
    'test_split': 'test[80%:]',
    'num_examples_ratio_to_normal': 1.0,
}, {
    'tfds_name': 'oxford_flowers102',
    'train_split': 'train',
    'test_split': 'test',
    'num_examples_ratio_to_normal': 1.0,
}, {
    'tfds_name': 'beans',
    'train_split': 'train',
    'test_split': 'test',
    'num_examples_ratio_to_normal': 1.0,
}]

UNKNOWN 数据集也加载自 TFDS。

In [ ]:

# Load unknown datasets.
weights = [
    spec['num_examples_ratio_to_normal'] for spec in UNKNOWN_TFDS_DATASETS
]
num_unknown_train_examples = sum(
    int(w * ds_train.cardinality().numpy()) for w in weights)
ds_unknown_train = tf.data.Dataset.sample_from_datasets([
    tfds.load(
        name=spec['tfds_name'], split=spec['train_split'],
        as_supervised=True).repeat(-1) for spec in UNKNOWN_TFDS_DATASETS
], weights).take(num_unknown_train_examples)
ds_unknown_train = ds_unknown_train.apply(
    tf.data.experimental.assert_cardinality(num_unknown_train_examples))
ds_unknown_tests = [
    tfds.load(
        name=spec['tfds_name'], split=spec['test_split'], as_supervised=True)
    for spec in UNKNOWN_TFDS_DATASETS
]
ds_unknown_test = ds_unknown_tests[0]
for ds in ds_unknown_tests[1:]:
  ds_unknown_test = ds_unknown_test.concatenate(ds)

# All examples from the unknown datasets will get a new class label number.
num_normal_classes = len(ds_info.features['label'].names)
unknown_label_value = tf.convert_to_tensor(num_normal_classes, tf.int64)
ds_unknown_train = ds_unknown_train.map(lambda image, _:
                                        (image, unknown_label_value))
ds_unknown_test = ds_unknown_test.map(lambda image, _:
                                      (image, unknown_label_value))

# Merge the normal train dataset with the unknown train dataset.
weights = [
    ds_train.cardinality().numpy(),
    ds_unknown_train.cardinality().numpy()
]
ds_train_with_unknown = tf.data.Dataset.sample_from_datasets(
    [ds_train, ds_unknown_train], [float(w) for w in weights])
ds_train_with_unknown = ds_train_with_unknown.apply(
    tf.data.experimental.assert_cardinality(sum(weights)))

print((f"Added {ds_unknown_train.cardinality().numpy()} negative examples."
       f"Training dataset has now {ds_train_with_unknown.cardinality().numpy()}"
       ' examples in total.'))

应用增强

为了使所有图像更具多样性，您将应用一些增强，如以下方面的更改：

亮度
对比度
饱和度
色相
剪裁

这些类型的增强有助于使模型更加稳健，以适应图像输入的变化。

In [ ]:

def random_crop_and_random_augmentations_fn(image):
  # preprocess_for_train does random crop and resize internally.
  image = image_preprocessing.preprocess_for_train(image)
  image = tf.image.random_brightness(image, 0.2)
  image = tf.image.random_contrast(image, 0.5, 2.0)
  image = tf.image.random_saturation(image, 0.75, 1.25)
  image = tf.image.random_hue(image, 0.1)
  return image


def random_crop_fn(image):
  # preprocess_for_train does random crop and resize internally.
  image = image_preprocessing.preprocess_for_train(image)
  return image


def resize_and_center_crop_fn(image):
  image = tf.image.resize(image, (256, 256))
  image = image[16:240, 16:240]
  return image


no_augment_fn = lambda image: image

train_augment_fn = lambda image, label: (
    random_crop_and_random_augmentations_fn(image), label)
eval_augment_fn = lambda image, label: (resize_and_center_crop_fn(image), label)

为了应用增强，它使用 Dataset 类中的 map 方法。

In [ ]:

ds_train_with_unknown = ds_train_with_unknown.map(train_augment_fn)
ds_validation = ds_validation.map(eval_augment_fn)
ds_test = ds_test.map(eval_augment_fn)
ds_unknown_test = ds_unknown_test.map(eval_augment_fn)

将数据包装成 Model Maker 友好的格式

要将这些数据集与 Model Maker 一起使用，它们需要处于 ImageClassifierDataLoader 类中。

In [ ]:

label_names = ds_info.features['label'].names + ['UNKNOWN']

train_data = ImageClassifierDataLoader(ds_train_with_unknown,
                                       ds_train_with_unknown.cardinality(),
                                       label_names)
validation_data = ImageClassifierDataLoader(ds_validation,
                                            ds_validation.cardinality(),
                                            label_names)
test_data = ImageClassifierDataLoader(ds_test, ds_test.cardinality(),
                                      label_names)
unknown_test_data = ImageClassifierDataLoader(ds_unknown_test,
                                              ds_unknown_test.cardinality(),
                                              label_names)

运行训练

TensorFlow Hub 有多个可用于迁移学习的模型。

这里您可以选择一个，也可以继续试验其他模型以获得更好的结果。

如果您想尝试更多模型，可以从此集合添加。

In [ ]:

#@title Choose a base model

model_name = 'mobilenet_v3_large_100_224'  #@param ['cropnet_cassava', 'cropnet_concat', 'cropnet_imagenet', 'mobilenet_v3_large_100_224']

map_model_name = {
    'cropnet_cassava':
        'https://tfhub.dev/google/cropnet/feature_vector/cassava_disease_V1/1',
    'cropnet_concat':
        'https://tfhub.dev/google/cropnet/feature_vector/concat/1',
    'cropnet_imagenet':
        'https://tfhub.dev/google/cropnet/feature_vector/imagenet/1',
    'mobilenet_v3_large_100_224':
        'https://tfhub.dev/google/imagenet/mobilenet_v3_large_100_224/feature_vector/5',
}

model_handle = map_model_name[model_name]

要微调模型，您将使用 Model Maker。这使得整体解决方案更容易，因为在训练模型之后，它也会将其转换为 TFLite。

Model Maker 将进行最佳的转换，并提供所有必要的信息，以便以后在设备上轻松部署模型。

模型规范是您告诉 Model Maker 您想使用哪个基础模型的方式。

In [ ]:

image_model_spec = ModelSpec(uri=model_handle)

此处一个重要细节是设置 train_whole_model，它将使基础模型在训练期间得到微调。这会使过程变慢，但最终模型具有更高的准确性。设置 shuffle 将确保模型以随机的 shuffle 顺序看到数据，这是模型学习的最佳实践。

In [ ]:

model = image_classifier.create(
    train_data,
    model_spec=image_model_spec,
    batch_size=128,
    learning_rate=0.03,
    epochs=5,
    shuffle=True,
    train_whole_model=True,
    validation_data=validation_data)

使用测试分割评估模型

In [ ]:

model.evaluate(test_data)

要更好地理解微调模型，最好分析混淆矩阵。这将显示一个类别被预测为另一个类别的频率。

In [ ]:

def predict_class_label_number(dataset):
  """Runs inference and returns predictions as class label numbers."""
  rev_label_names = {l: i for i, l in enumerate(label_names)}
  return [
      rev_label_names[o[0][0]]
      for o in model.predict_top_k(dataset, batch_size=128)
  ]

def show_confusion_matrix(cm, labels):
  plt.figure(figsize=(10, 8))
  sns.heatmap(cm, xticklabels=labels, yticklabels=labels, 
              annot=True, fmt='g')
  plt.xlabel('Prediction')
  plt.ylabel('Label')
  plt.show()

In [ ]:

confusion_mtx = tf.math.confusion_matrix(
    list(ds_test.map(lambda x, y: y)),
    predict_class_label_number(test_data),
    num_classes=len(label_names))

show_confusion_matrix(confusion_mtx, label_names)

使用未知测试数据评估模型

在此评估中，我们希望模型的准确度近乎为 1。用于测试模型的所有图像都与常规数据集无关，因此我们希望模型能够预测“未知”类别标签。

In [ ]:

model.evaluate(unknown_test_data)

打印混淆矩阵。

In [ ]:

unknown_confusion_mtx = tf.math.confusion_matrix(
    list(ds_unknown_test.map(lambda x, y: y)),
    predict_class_label_number(unknown_test_data),
    num_classes=len(label_names))

show_confusion_matrix(unknown_confusion_mtx, label_names)

将模型导出为 TFLite 和 SavedModel

现在我们可以导出 TFLite 和 SavedModel 格式的训练后模型，以在设备上部署并用于 TensorFlow 中的推理。

In [ ]:

tflite_filename = f'{TFLITE_NAME_PREFIX}_model_{model_name}.tflite'
model.export(export_dir='.', tflite_filename=tflite_filename)

In [ ]:

# Export saved model version.
model.export(export_dir='.', export_format=ExportFormat.SAVED_MODEL)

后续步骤

您刚刚训练的模型可以在移动设备上使用，甚至可以在现场部署！

要下载模型，请点击 colab 左侧“文件”菜单的文件夹图标，然后选择下载选项。

这里使用的相同技术可应用于其他植物病害任务，这些任务可能更适合您的用例或任何其他类型的图像分类任务。如果您想要跟进并在 Android 应用上进行部署，您可以继续阅读此 Android 快速入门指南。