Path: blob/master/Applied Generative AI with GANS/7 Convolutional Neural Networks (CNNs).ipynb
4866 views
Convolutional Neural Networks (CNNs) are a specialized type of neural network architecture that excel at processing grid-like data, such as images and videos. They are designed to automatically and adaptively learn spatial hierarchies of features from input data.
Applications of CNNs:
Image Classification
Object Detection
Semantic Segmentation: In semantic segmentation, CNNs assign a class label to each pixel in an image. This is useful for tasks like medical image analysis, where precise delineation of structures (e.g., tumors) is critical.
Instance Segmentation: This is a more advanced form of segmentation where, in addition to labeling pixels by category, each distinct instance of an object is uniquely identified. It's used in scenarios like robotics, where a robot must identify individual objects in a scene.
Object Recognition in Videos: CNNs can be applied to individual frames of a video stream to perform object recognition. This is a critical component in applications like video surveillance and action recognition.
Face Recognition: CNNs have proven to be highly effective in the field of face recognition. They can learn features that are distinctive to individual faces, enabling tasks like biometric authentication.
Style Transfer: CNNs can be used to alter the style of an image while preserving its content. This is popular in artistic applications where one might want to apply the style of a famous painting to a photograph.
Super-Resolution: CNNs can enhance the resolution of images, which is useful in scenarios like upscaling low-resolution images or enhancing the quality of medical images.
Medical Image Analysis: CNNs are widely used in medical imaging for tasks like tumor detection, organ segmentation, and disease classification.
Natural Language Processing (NLP): While not as commonly associated with CNNs as with recurrent networks, CNNs have been used in NLP tasks like text classification and sentiment analysis, particularly for tasks where local context is important.
Here are some key characteristics and concepts related to CNNs:
Convolutional Layers: The core building blocks of CNNs are convolutional layers. These layers apply filters or "kernels" to small, overlapping regions of the input data. This allows the network to learn spatial hierarchies of features.
Feature Learning: CNNs automatically learn hierarchies of features from the input data. For example, in image processing, initial layers might learn to recognize edges, while deeper layers learn to recognize complex shapes or patterns.
Pooling Layers: These layers reduce the spatial dimensions (width and height) of the data volume, while keeping the depth unchanged. Common pooling operations include max pooling and average pooling.
Activation Functions: Non-linear activation functions (like ReLU - Rectified Linear Unit) are applied after each convolutional layer to introduce non-linearity into the model. This enables the network to learn more complex patterns.
Fully Connected Layers: After several convolutional and pooling layers, the network typically ends with one or more fully connected layers. These layers perform the high-level reasoning on the learned features.
Loss Function and Optimization: The choice of loss function depends on the task (classification, regression, etc.). Common choices include categorical cross-entropy for classification tasks. Optimization techniques like stochastic gradient descent (SGD) or more advanced variants like Adam are used to minimize the loss.
Backpropagation: CNNs are trained using backpropagation, where the gradients of the loss function with respect to the parameters are computed and used to update the weights of the network.
Transfer Learning: CNNs trained on large datasets for tasks like image recognition (e.g., ImageNet) are often used as a starting point for other tasks. This is called transfer learning

Mathamatical Interpretation for CNN
Below is a simple implementation of a 2D convolution operation using pure Python
Steps for above example:
convolution2D is a function that takes an input image (a 2D list) and a kernel (another 2D list) as input. It performs the convolution operation and returns the resulting image.
input_image is a sample 4x4 input image, and kernel is a 3x3 kernel.
The convolution operation is performed manually using nested loops. For each position in the output image, the dot product between the kernel and the corresponding region of the input image is computed.
Implementation using Numpy:
Implementing a Convolutional Neural Network (CNN) using NumPy involves creating the layers (convolution, pooling, fully connected) and implementing the forward pass. Below is an example of a simple CNN using NumPy for a binary image classification task:
Explanation:
convolution2D is a function that takes an input image (a 2D NumPy array) and a kernel (another 2D NumPy array) as input. It performs the convolution operation and returns the resulting image.
max_pooling2D is a function that takes an input image and a pool size as input and performs max pooling.
The sample input image and kernel are provided.
The convolution operation is performed, followed by max pooling
Below is a simple example code for a Convolutional Neural Network (CNN) implementation using Keras. This example uses the MNIST dataset for handwritten digit classification.
Epoch 1/5
750/750 ━━━━━━━━━━━━━━━━━━━━ 5s 6ms/step - accuracy: 0.9326 - loss: 0.2326 - val_accuracy: 0.9744 - val_loss: 0.0928
Epoch 2/5
750/750 ━━━━━━━━━━━━━━━━━━━━ 5s 6ms/step - accuracy: 0.9773 - loss: 0.0754 - val_accuracy: 0.9811 - val_loss: 0.0686
Epoch 3/5
750/750 ━━━━━━━━━━━━━━━━━━━━ 4s 6ms/step - accuracy: 0.9849 - loss: 0.0510 - val_accuracy: 0.9820 - val_loss: 0.0597
Epoch 4/5
750/750 ━━━━━━━━━━━━━━━━━━━━ 4s 6ms/step - accuracy: 0.9879 - loss: 0.0386 - val_accuracy: 0.9845 - val_loss: 0.0522
Epoch 5/5
750/750 ━━━━━━━━━━━━━━━━━━━━ 5s 6ms/step - accuracy: 0.9914 - loss: 0.0288 - val_accuracy: 0.9847 - val_loss: 0.0557
313/313 ━━━━━━━━━━━━━━━━━━━━ 1s 2ms/step - accuracy: 0.9850 - loss: 0.0481
Test accuracy: 0.9850000143051147
1/1 ━━━━━━━━━━━━━━━━━━━━ 0s 61ms/step