Path: blob/master/guides/ipynb/keras_hub/semantic_segmentation_deeplab_v3.ipynb
3283 views
Semantic Segmentation with KerasHub
Authors: Sachin Prasad, Divyashree Sreepathihalli, Ian Stenbit
Date created: 2024/10/11
Last modified: 2024/10/22
Description: DeepLabV3 training and inference with KerasHub.
Background
Semantic segmentation is a type of computer vision task that involves assigning a class label such as "person", "bike", or "background" to each individual pixel of an image, effectively dividing the image into regions that correspond to different object classes or categories.
KerasHub offers the DeepLabv3, DeepLabv3+, SegFormer, etc., models for semantic segmentation.
This guide demonstrates how to fine-tune and use the DeepLabv3+ model, developed by Google for image semantic segmentation with KerasHub. Its architecture combines Atrous convolutions, contextual information aggregation, and powerful backbones to achieve accurate and detailed semantic segmentation.
DeepLabv3+ extends DeepLabv3 by adding a simple yet effective decoder module to refine the segmentation results, especially along object boundaries. Both models have achieved state-of-the-art results on a variety of image segmentation benchmarks.
References
Encoder-Decoder with Atrous Separable Convolution for Semantic Image Segmentation Rethinking Atrous Convolution for Semantic Image Segmentation
Setup and Imports
Let's install the dependencies and import the necessary modules.
To run this tutorial, you will need to install the following packages:
keras-hub
keras
After installing keras
and keras-hub
, set the backend for keras
. This guide can be run with any backend (Tensorflow, JAX, PyTorch).
Perform semantic segmentation with a pretrained DeepLabv3+ model
The highest level API in the KerasHub semantic segmentation API is the keras_hub.models
API. This API includes fully pretrained semantic segmentation models, such as keras_hub.models.DeepLabV3ImageSegmenter
.
Let's get started by constructing a DeepLabv3 pretrained on the Pascal VOC dataset. Also, define the preprocessing function for the model to preprocess images and labels. Note: By default from_preset()
method in KerasHub loads the pretrained task weights with all the classes, 21 classes in this case.
Let us visualize the results of this pretrained model
Train a custom semantic segmentation model
In this guide, we'll assemble a full training pipeline for a KerasHub DeepLabV3 semantic segmentation model. This includes data loading, augmentation, training, metric evaluation, and inference!
Download the data
We download Pascal VOC 2012 dataset with additional annotations provided here Semantic contours from inverse detectors and split them into train dataset train_ds
and eval_ds
.
Load the dataset
For training and evaluation, let's use "sbd_train" and "sbd_eval." You can also choose any of these datasets for the load
function: 'train', 'eval', 'trainval', 'sbd_train', or 'sbd_eval'. 'sbd_train' represents the training dataset for the SBD dataset, while 'train' represents the training dataset for the VOC2012 dataset.
Preprocess the data
The preprocess_inputs utility function preprocesses inputs, converting them into a dictionary containing images and segmentation_masks. Both images and segmentation masks are resized to 512x512. The resulting dataset is then batched into groups of four image and segmentation mask pairs.
A batch of this preprocessed input training data can be visualized using the plot_images_masks
function. This function takes a batch of images and segmentation masks and prediction masks as input and displays them in a grid.
The preprocessing is applied to the evaluation dataset eval_ds
.
Data Augmentation
Keras provides a variety of image augmentation options. In this example, we will use the RandomFlip
augmentation to augment the training dataset. The RandomFlip
augmentation randomly flips the images in the training dataset horizontally or vertically. This can help to improve the model's robustness to changes in the orientation of the objects in the images.
Model Configuration
Please feel free to modify the configurations for model training and note how the training results changes. This is an great exercise to get a better understanding of the training pipeline.
The learning rate schedule is used by the optimizer to calculate the learning rate for each epoch. The optimizer then uses the learning rate to update the weights of the model. In this case, the learning rate schedule uses a cosine decay function. A cosine decay function starts high and then decreases over time, eventually reaching zero. The cardinality of the VOC dataset is 2124 with a batch size of 4. The dataset cardinality is important for learning rate decay because it determines how many steps the model will train for. The initial learning rate is proportional to 0.007 and the decay steps are 2124. This means that the learning rate will start at INITIAL_LR
and then decrease to zero over 2124 steps.
Let's take the resnet_50_imagenet
pretrained weights as a image encoder for the model, this implementation can be used both as DeepLabV3 and DeepLabV3+ with additional decoder block. For DeepLabV3+, we instantiate a DeepLabV3Backbone model by providing low_level_feature_key
as P2
a pyramid level output to extract features from resnet_50_imagenet
which acts as a decoder block. To use this model as DeepLabV3 architecture, ignore the low_level_feature_key
which defaults to None
.
Then we create DeepLabV3ImageSegmenter instance. The num_classes
parameter specifies the number of classes that the model will be trained to segment. preprocessor
argument to apply preprocessing to image input and masks.
Compile the model
The model.compile() function sets up the training process for the model. It defines the
optimization algorithm - Stochastic Gradient Descent (SGD)
the loss function - categorical cross-entropy
the evaluation metrics - Mean IoU and categorical accuracy
Semantic segmentation evaluation metrics:
Mean Intersection over Union (MeanIoU): MeanIoU measures how well a semantic segmentation model accurately identifies and delineates different objects or regions in an image. It calculates the overlap between predicted and actual object boundaries, providing a score between 0 and 1, where 1 represents a perfect match.
Categorical Accuracy: Categorical Accuracy measures the proportion of correctly classified pixels in an image. It gives a simple percentage indicating how accurately the model predicts the categories of pixels in the entire image.
In essence, MeanIoU emphasizes the accuracy of identifying specific object boundaries, while Categorical Accuracy gives a broad overview of overall pixel-level correctness.
The utility function dict_to_tuple
effectively transforms the dictionaries of training and validation datasets into tuples of images and one-hot encoded segmentation masks, which is used during training and evaluation of the DeepLabv3+ model.
Predictions with trained model
Now that the model training of DeepLabv3+ has completed, let's test it by making predications on a few sample images. Note: For demonstration purpose the model has been trained on only 1 epoch, for better accuracy and result train with more number of epochs.
Here are some additional tips for using the KerasHub DeepLabv3 model:
The model can be trained on a variety of datasets, including the COCO dataset, the PASCAL VOC dataset, and the Cityscapes dataset.
The model can be fine-tuned on a custom dataset to improve its performance on a specific task.
The model can be used to perform real-time inference on images.
Also, check out KerasHub's other segmentation models.