Book a Demo!
CoCalc Logo Icon
StoreFeaturesDocsShareSupportNewsAboutPoliciesSign UpSign In
suyashi29
GitHub Repository: suyashi29/python-su
Path: blob/master/Generative AI for Intelligent Data Handling/Day 6 Convolutional Neural Networks (CNNs).ipynb
3074 views
Kernel: Python 3 (ipykernel)

Convolutional Neural Networks (CNNs) are a specialized type of neural network architecture that excel at processing grid-like data, such as images and videos. They are designed to automatically and adaptively learn spatial hierarchies of features from input data.

Applications of CNNs:

  • Image Classification: CNNs are extensively used for tasks like object recognition and image classification. They can identify and classify objects within images into predefined categories.

  • Object Detection: CNNs can not only identify objects but also locate and outline them within an image. This is crucial for tasks like pedestrian detection in autonomous vehicles or face detection in image processing applications.

  • Semantic Segmentation: In semantic segmentation, CNNs assign a class label to each pixel in an image. This is useful for tasks like medical image analysis, where precise delineation of structures (e.g., tumors) is critical.

  • Instance Segmentation: This is a more advanced form of segmentation where, in addition to labeling pixels by category, each distinct instance of an object is uniquely identified. It's used in scenarios like robotics, where a robot must identify individual objects in a scene.

  • Object Recognition in Videos: CNNs can be applied to individual frames of a video stream to perform object recognition. This is a critical component in applications like video surveillance and action recognition.

  • Face Recognition: CNNs have proven to be highly effective in the field of face recognition. They can learn features that are distinctive to individual faces, enabling tasks like biometric authentication.

  • Style Transfer: CNNs can be used to alter the style of an image while preserving its content. This is popular in artistic applications where one might want to apply the style of a famous painting to a photograph.

  • Super-Resolution: CNNs can enhance the resolution of images, which is useful in scenarios like upscaling low-resolution images or enhancing the quality of medical images.

  • Medical Image Analysis: CNNs are widely used in medical imaging for tasks like tumor detection, organ segmentation, and disease classification.

  • Natural Language Processing (NLP): While not as commonly associated with CNNs as with recurrent networks, CNNs have been used in NLP tasks like text classification and sentiment analysis, particularly for tasks where local context is important.

The key differences between Recurrent Neural Networks (RNNs) and Convolutional Neural Networks (CNNs):

FeatureRecurrent Neural Networks (RNNs)Convolutional Neural Networks (CNNs)
Data TypeSequential data (e.g., time series, natural language)Grid-like data (e.g., images, spatial information)
ArchitectureRecurrent connections capturing temporal dependenciesConvolutional layers for hierarchical feature extraction and parameter sharing
Memory HandlingMaintains hidden state to capture sequential dependenciesFocuses on local patterns and spatial relationships, less emphasis on memory
Parameter SharingShared weights across time stepsConvolutional operations with shared weights across input
ApplicationsNatural Language Processing (NLP), time series analysis, speech recognitionImage classification, object detection, image segmentation
ChallengesVanishing gradient problem for long-term dependenciesMay struggle with capturing sequential dependencies
Typical Use CasesText generation, machine translation, speech recognitionImage classification, object detection, image segmentation
Example LibrariesKeras, PyTorch, TensorFlowKeras, PyTorch, TensorFlow

It's important to note that while RNNs and CNNs have their specific strengths and applications, in some cases, hybrid models that combine both architectures are used to address tasks that require handling both sequential and spatial information.

  • Convolutional Layers: The core building blocks of CNNs are convolutional layers. These layers apply filters or "kernels" to small, overlapping regions of the input data. This allows the network to learn spatial hierarchies of features.

  • Feature Learning: CNNs automatically learn hierarchies of features from the input data. For example, in image processing, initial layers might learn to recognize edges, while deeper layers learn to recognize complex shapes or patterns.

  • Pooling Layers: These layers reduce the spatial dimensions (width and height) of the data volume, while keeping the depth unchanged. Common pooling operations include max pooling and average pooling.

  • Activation Functions: Non-linear activation functions (like ReLU - Rectified Linear Unit) are applied after each convolutional layer to introduce non-linearity into the model. This enables the network to learn more complex patterns.

  • Fully Connected Layers: After several convolutional and pooling layers, the network typically ends with one or more fully connected layers. These layers perform the high-level reasoning on the learned features.

  • Loss Function and Optimization: The choice of loss function depends on the task (classification, regression, etc.). Common choices include categorical cross-entropy for classification tasks. Optimization techniques like stochastic gradient descent (SGD) or more advanced variants like Adam are used to minimize the loss.

  • Backpropagation: CNNs are trained using backpropagation, where the gradients of the loss function with respect to the parameters are computed and used to update the weights of the network.

  • Transfer Learning: CNNs trained on large datasets for tasks like image recognition (e.g., ImageNet) are often used as a starting point for other tasks. This is called transfer learning

image.png

Mathamatical Interpretation for CNN

Below is a simple implementation of a 2D convolution operation using pure Python

def convolution2D(input_image, kernel): input_height, input_width = len(input_image), len(input_image[0]) kernel_height, kernel_width = len(kernel), len(kernel[0]) output_height = input_height - kernel_height + 1 output_width = input_width - kernel_width + 1 # Initialize the output image output_image = [[0 for _ in range(output_width)] for _ in range(output_height)] # Perform the convolution for i in range(output_height): for j in range(output_width): # Compute the dot product between the kernel and the input region dot_product = 0 for m in range(kernel_height): for n in range(kernel_width): dot_product += input_image[i+m][j+n] * kernel[m][n] output_image[i][j] = dot_product return output_image # Define a sample 2D image and a 3x3 kernel input_image = [ [1, 2, 3, 4], [5, 6, 7, 8], [9, 10, 11, 12], [13, 14, 15, 16] ] kernel = [ [1, 0, -1], [1, 0, -1], [1, 0, -1] ] # Perform convolution output_image = convolution2D(input_image, kernel) # Print the result for row in output_image: print(row)
[-6, -6] [-6, -6]

Steps for above example:

  • convolution2D is a function that takes an input image (a 2D list) and a kernel (another 2D list) as input. It performs the convolution operation and returns the resulting image.

  • input_image is a sample 4x4 input image, and kernel is a 3x3 kernel.

  • The convolution operation is performed manually using nested loops. For each position in the output image, the dot product between the kernel and the corresponding region of the input image is computed.

Implementation using Numpy:

  • Implementing a Convolutional Neural Network (CNN) using NumPy involves creating the layers (convolution, pooling, fully connected) and implementing the forward pass. Below is an example of a simple CNN using NumPy for a binary image classification task:

import numpy as np a=np.ones((3,4),dtype=int) a 2/3
0.6666666666666666
import numpy as np def convolution2D(input_image, kernel): input_height, input_width = input_image.shape kernel_height, kernel_width = kernel.shape output_height = input_height - kernel_height + 1 output_width = input_width - kernel_width + 1 # Initialize the output image output_image = np.zeros((output_height, output_width)) # Perform the convolution for i in range(output_height): for j in range(output_width): output_image[i, j] = np.sum(input_image[i:i+kernel_height, j:j+kernel_width] * kernel) return output_image def max_pooling2D(input_image, pool_size): input_height, input_width = input_image.shape pool_height, pool_width = pool_size output_height = input_height // pool_height output_width = input_width // pool_width # Initialize the output image output_image = np.zeros((output_height, output_width)) # Perform max pooling for i in range(output_height): for j in range(output_width): output_image[i, j] = np.max(input_image[i*pool_height:(i+1)*pool_height, j*pool_width:(j+1)*pool_width]) return output_image # Sample input image input_image = np.array([[1, 2, 1, 0], [0, 1, 3, 2], [2, 0, 1, 2], [1, 2, 2, 1]]) # Sample kernel kernel = np.array([[1, 0, -1], [1, 0, -1], [1, 0, -1]]) # Sample max pooling size pool_size = (2, 2) # Perform convolution conv_output = convolution2D(input_image, kernel) # Perform max pooling pool_output = max_pooling2D(conv_output, pool_size) # Print the results print("Convolution output:") print(conv_output) print("\nMax pooling output:") print(pool_output)
Convolution output: [[-2. -1.] [-3. -2.]] Max pooling output: [[-1.]]

Explanation:

  • convolution2D is a function that takes an input image (a 2D NumPy array) and a kernel (another 2D NumPy array) as input. It performs the convolution operation and returns the resulting image.

  • max_pooling2D is a function that takes an input image and a pool size as input and performs max pooling.

  • The sample input image and kernel are provided.

  • The convolution operation is performed, followed by max pooling

Below is a simple example code for a Convolutional Neural Network (CNN) implementation using Keras. This example uses the MNIST dataset for handwritten digit classification.

import numpy as np import matplotlib.pyplot as plt from keras.models import Sequential from keras.layers import Conv2D, MaxPooling2D, Flatten, Dense from keras.datasets import mnist from keras.utils import to_categorical
# Load and preprocess the MNIST dataset (train_images, train_labels), (test_images, test_labels) = mnist.load_data() train_images = train_images.reshape((60000, 28, 28, 1)).astype('float32') / 255 test_images = test_images.reshape((10000, 28, 28, 1)).astype('float32') / 255 train_labels = to_categorical(train_labels) test_labels = to_categorical(test_labels)

MaxPolling 2D

# Build the CNN model model = Sequential() model.add(Conv2D(32, (3, 3), activation='relu', input_shape=(28, 28, 1))) model.add(MaxPooling2D((2, 2))) #model.add(Conv2D(64, (3, 3), activation='relu')) #model.add(MaxPooling2D((2, 2))) #model.add(Conv2D(64, (3, 3), activation='relu')) model.add(Flatten()) model.add(Dense(64, activation='relu')) model.add(Dense(10, activation='softmax'))
# Compile the model model.compile(optimizer='adam', loss='categorical_crossentropy', metrics=['accuracy']) # Train the model model.fit(train_images, train_labels, epochs=5, batch_size=64, validation_split=0.2)
Epoch 1/5 750/750 [==============================] - 12s 15ms/step - loss: 0.2364 - accuracy: 0.9323 - val_loss: 0.0927 - val_accuracy: 0.9735 Epoch 2/5 750/750 [==============================] - 11s 15ms/step - loss: 0.0748 - accuracy: 0.9777 - val_loss: 0.0759 - val_accuracy: 0.9770 Epoch 3/5 750/750 [==============================] - 13s 17ms/step - loss: 0.0511 - accuracy: 0.9847 - val_loss: 0.0561 - val_accuracy: 0.9827 Epoch 4/5 750/750 [==============================] - 13s 18ms/step - loss: 0.0385 - accuracy: 0.9886 - val_loss: 0.0641 - val_accuracy: 0.9807 Epoch 5/5 750/750 [==============================] - 13s 18ms/step - loss: 0.0301 - accuracy: 0.9909 - val_loss: 0.0559 - val_accuracy: 0.9838
<keras.callbacks.History at 0x1af803f1790>
# Evaluate the model on the test data test_loss, test_acc = model.evaluate(test_images, test_labels) print('Test accuracy:', test_acc) # Prediction on a single test image sample_image = test_images[3].reshape((1, 28, 28, 1)) prediction = model.predict(sample_image) predicted_label = np.argmax(prediction) # Display the sample image and its predicted label plt.imshow(test_images[4].reshape((28, 28)), cmap='gray') plt.title(f"Predicted Label: {predicted_label}") plt.show()
313/313 [==============================] - 1s 4ms/step - loss: 0.0501 - accuracy: 0.9836 Test accuracy: 0.9836000204086304 1/1 [==============================] - 0s 38ms/step
Image in a Jupyter notebook