Book a Demo!
CoCalc Logo Icon
StoreFeaturesDocsShareSupportNewsAboutPoliciesSign UpSign In
suyashi29
GitHub Repository: suyashi29/python-su
Path: blob/master/RNN Fundamentals/1 Understanding Neural Network.ipynb
3074 views
Kernel: Python 3 (ipykernel)

Neural Networks

A neural network, also known as an artificial neural network (ANN), is a computational model inspired by the structure and functioning of the human brain. It's a fundamental concept in machine learning and deep learning.

  • The neural network is simulated by a new environment.

  • Then the free parameters of the neural network are changed as a result of this simulation.

  • The neural network then responds in a new way to the environment because of the changes in its free parameters.

Single Neuron Unit

image-5.png

Structure of a Neural Network:

A neural network is composed of interconnected nodes, or "neurons," organized into layers.

  • Neural networks extract identifying features from data, lacking pre-programmed understanding. Network components include neurons, connections, weights, biases, propagation functions, and a learning rule. Neurons receive inputs, governed by thresholds and activation functions.

image-6.png The three main types of layers are:

Input Layer:

  • This layer contains nodes that represent the input features or variables of the problem. Each node corresponds to a feature, and the number of nodes in the input layer is equal to the number of features.

Hidden Layers:

  • These layers are between the input and output layers. Each hidden layer consists of multiple nodes (neurons). Deep learning, a subset of neural networks, is characterized by the presence of multiple hidden layers.

Output Layer:

  • This layer produces the final output of the neural network. The number of nodes in the output layer depends on the type of problem. For example, in binary classification, there might be one node (0 or 1), while in multi-class classification, there might be multiple nodes representing different classes.

Connections and Weights:

Every connection between nodes in adjacent layers is associated with a weight. These weights are the learnable parameters of the neural network and are adjusted during the training process.

Activation Functions:

Each node (except in the input layer) has an activation function that determines the output of the node based on the weighted sum of its inputs. Common activation functions include ReLU (Rectified Linear Unit), sigmoid, and tanh.

Training Process:

  • Forward Propagation: The input data is fed forward through the network. The weights and biases are applied to the inputs, and the output of each node is computed.

  • Loss Calculation: The output of the network is compared to the true target values, and a loss (or error) is calculated. This quantifies how far off the predictions are from the actual targets.

  • Backpropagation: This is the heart of the training process. The gradients (derivatives of the loss with respect to the weights) are calculated by propagating the errors backward through the network. This allows for the adjustment of weights in a direction that minimizes the loss.

  • Gradient Descent: The weights are updated using optimization algorithms like stochastic gradient descent (SGD). This process iteratively adjusts the weights to reduce the loss. Learning and Generalization: Through this iterative process of forward propagation, loss calculation, backpropagation, and weight adjustment, the neural network learns to make better predictions on the training data. The goal is for the network to generalize its learning to make accurate predictions on new, unseen data.

Applications:

Neural networks have found applications in a wide range of fields, including computer vision (e.g., image recognition), natural language processing (e.g., language translation), speech recognition, game playing (e.g., AlphaGo), autonomous vehicles, healthcare, and many more.

Deep Learning:

When a neural network has multiple hidden layers (more than one), it's often referred to as a deep neural network, and the process is known as deep learning. Deep learning has demonstrated remarkable capabilities in handling complex tasks and processing large amounts of data.

Python Libraries for Neural Network:

To work with neural networks in Python, you'll typically need a combination of the following libraries:

  • NumPy: NumPy is a fundamental package for numerical computations in Python. It provides support for working with arrays and matrices, which are essential for handling the mathematical operations involved in neural networks.

Keras or TensorFlow (or both):

  • TensorFlow: TensorFlow is a powerful open-source library for numerical computation and machine learning. It provides a comprehensive set of tools for building and training various types of neural networks.

  • Keras: Keras is an open-source neural network library that acts as an interface for various backends, including TensorFlow. It provides a high-level API for building and training neural networks, making it easy to get started.

  • PyTorch (Optional): PyTorch is another popular open-source deep learning library that provides dynamic computation graphs. It's known for its flexible and dynamic approach to building neural networks.

  • Scikit-learn (Optional): While primarily a library for traditional machine learning, scikit-learn includes various tools for preprocessing data and evaluating the performance of machine learning models, which can be useful in conjunction with neural networks.

  • Matplotlib or Seaborn (Optional): These libraries are used for data visualization, which can be helpful for understanding the performance of neural networks and visualizing results.

  • Pandas (Optional): Pandas is a versatile library for data manipulation and analysis. While not directly related to neural networks, it's often used for tasks like data preprocessing.

  • Jupyter Notebook (Optional): Jupyter notebooks are interactive environments that allow you to write and execute Python code in a document-style format. They are popular for experimenting with and visualizing neural network models.

  • CUDA and cuDNN (Optional): If you're working with deep learning models on GPU hardware, installing CUDA (NVIDIA's parallel computing platform) and cuDNN (a GPU-accelerated library for deep neural networks) can significantly speed up computations.

Multilayer Neural Network

%7BBECFE95C-038D-4690-999F-30ACA88CB176%7D.png

Basic feedforward neural network for binary classification.

# Step 1: Import necessary libraries import numpy as np from keras.models import Sequential from keras.layers import Dense # Step 2: Generate some random data for demonstration np.random.seed(0) X = np.random.rand(100, 2) # Features (2 input neurons) y = (X[:, 0] + X[:, 1] > 1).astype(int) # Binary target variable y
array([1, 1, 1, 1, 1, 1, 1, 0, 0, 1, 1, 1, 0, 1, 0, 1, 1, 0, 1, 1, 0, 0, 1, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 1, 0, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 0, 0, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 1, 0, 1, 1, 1, 0, 0, 0, 0, 1, 1, 0, 0, 0, 1, 1, 1, 0, 1, 1, 0, 0])

Create a Sequential model

## Step 3: Create a Sequential model binary_model = Sequential() # Step 4: Add layers to the model binary_model .add(Dense(3, input_dim=2, activation='relu')) # Hidden layer with 3 neurons binary_model .add(Dense(1, activation='sigmoid')) # Output layer with 1 neuron (binary classification)
# Step 5: Compile the model binary_model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy']) # Step 6: Train the model binary_model.fit(X, y, epochs=25, batch_size=12) # Step 7: Evaluate the model loss, accuracy = binary_model.evaluate(X, y) print(f'Loss: {loss:.4f}') print(f'Accuracy: {accuracy*100:.2f}%')
Epoch 1/25 9/9 [==============================] - 0s 1ms/step - loss: 0.6718 - accuracy: 0.5600 Epoch 2/25 9/9 [==============================] - 0s 1ms/step - loss: 0.6703 - accuracy: 0.5700 Epoch 3/25 9/9 [==============================] - 0s 2ms/step - loss: 0.6690 - accuracy: 0.5600 Epoch 4/25 9/9 [==============================] - 0s 2ms/step - loss: 0.6677 - accuracy: 0.5600 Epoch 5/25 9/9 [==============================] - 0s 1ms/step - loss: 0.6664 - accuracy: 0.5600 Epoch 6/25 9/9 [==============================] - 0s 1ms/step - loss: 0.6651 - accuracy: 0.5600 Epoch 7/25 9/9 [==============================] - 0s 1ms/step - loss: 0.6639 - accuracy: 0.5600 Epoch 8/25 9/9 [==============================] - 0s 2ms/step - loss: 0.6626 - accuracy: 0.5600 Epoch 9/25 9/9 [==============================] - 0s 1ms/step - loss: 0.6613 - accuracy: 0.5700 Epoch 10/25 9/9 [==============================] - 0s 2ms/step - loss: 0.6599 - accuracy: 0.5700 Epoch 11/25 9/9 [==============================] - 0s 2ms/step - loss: 0.6587 - accuracy: 0.5700 Epoch 12/25 9/9 [==============================] - 0s 2ms/step - loss: 0.6574 - accuracy: 0.5700 Epoch 13/25 9/9 [==============================] - 0s 2ms/step - loss: 0.6562 - accuracy: 0.5800 Epoch 14/25 9/9 [==============================] - 0s 2ms/step - loss: 0.6549 - accuracy: 0.5800 Epoch 15/25 9/9 [==============================] - 0s 2ms/step - loss: 0.6539 - accuracy: 0.5800 Epoch 16/25 9/9 [==============================] - 0s 2ms/step - loss: 0.6526 - accuracy: 0.5800 Epoch 17/25 9/9 [==============================] - 0s 2ms/step - loss: 0.6512 - accuracy: 0.5800 Epoch 18/25 9/9 [==============================] - 0s 2ms/step - loss: 0.6499 - accuracy: 0.5800 Epoch 19/25 9/9 [==============================] - 0s 2ms/step - loss: 0.6488 - accuracy: 0.5800 Epoch 20/25 9/9 [==============================] - 0s 2ms/step - loss: 0.6474 - accuracy: 0.5800 Epoch 21/25 9/9 [==============================] - 0s 2ms/step - loss: 0.6461 - accuracy: 0.5800 Epoch 22/25 9/9 [==============================] - 0s 2ms/step - loss: 0.6448 - accuracy: 0.6000 Epoch 23/25 9/9 [==============================] - 0s 2ms/step - loss: 0.6436 - accuracy: 0.6000 Epoch 24/25 9/9 [==============================] - 0s 1ms/step - loss: 0.6424 - accuracy: 0.6000 Epoch 25/25 9/9 [==============================] - 0s 2ms/step - loss: 0.6411 - accuracy: 0.6100 4/4 [==============================] - 0s 2ms/step - loss: 0.6402 - accuracy: 0.6100 Loss: 0.6402 Accuracy: 61.00%

Activation Functions:

  • Activation functions introduce non-linearity into the neural network. This allows the network to learn complex relationships between inputs and outputs.

Types of Activation functions:

1. Sigmoid Function:

  • Range: (0, 1) - Pros: Smooth, interpretable as probabilities, historically popular.

    • Cons: Suffers from vanishing gradients problem (gradients become very small), not used much in hidden layers anymore.

      • Formula: image.png

2. ReLU (Rectified Linear Unit)

  • Range: [0, +∞)

  • Pros: Fast to compute, helps with the vanishing gradients problem, widely used in hidden layers

  • Cons: Can suffer from the dying ReLU problem (neurons can get stuck during training and stop learning).

    • Formula: F(x) = max(0,x)

    image.png

3. Leaky ReLU

  • Range: (-∞, +∞)

  • Pros: Addresses the dying ReLU problem by allowing a small gradient for negative inputs.

  • Cons: Slightly more computationally expensive than ReLU.

  • Formula: f(x)=max(0.01x,x) (or a small, non-zero slope for negative values

image.png

3. Softmax (for multi-class classification)

  • Range: (0, 1), and the sum of all outputs is 1.

  • Purpose: Scales the outputs so that they can be interpreted as probabilities, commonly used in the output layer for multi-class classification.

image.png

  • formula: image.png

Choosing an Activation Function:

  • Generally, ReLU or its variants are preferred in hidden layers due to their computational efficiency and better performance in practice.

  • Sigmoid or softmax is commonly used in the output layer for binary or multi-class classification task

Loss Functions

  • Loss functions quantify the error or discrepancy between the predicted output of the neural network and the actual target values during training. image.png

Sparse Categorical Cross Entropy:

  • Similar to categorical cross entropy, but when target values are integers (class labels) instead of one-hot encoded vectors

Choosing a Loss Function:

The choice depends on the nature of the problem (regression, binary classification, multi-class classification). Ensure that the chosen loss function aligns with the activation function used in the output layer. Remember, the selection of activation and loss functions can significantly impact the performance of your neural network, so it's important to experiment and choose the right combination for your specific task

Implementation of a basic regression model using a feedforward neural network with PyTorch

PyTorch, a popular deep learning library. Specifically, PyTorch offers two key features:

  • Tensor computation (like NumPy): With strong GPU acceleration, PyTorch allows users to perform various mathematical operations.

  • Dynamic neural networks: PyTorch is commonly used for creating deep learning models with a tape-based autograd system.

# Import necessary libraries import torch import torch.nn as nn import torch.optim as optim import numpy as np import matplotlib.pyplot as plt # Generate some sample data for regression (predict y from x) np.random.seed(42) x = np.linspace(-10, 10, 100) y = 3.5 * x + np.random.normal(0, 5, size=x.shape) # linear relationship with some noise
# Convert numpy arrays to PyTorch tensors x_tensor = torch.tensor(x, dtype=torch.float32).view(-1, 1) y_tensor = torch.tensor(y, dtype=torch.float32).view(-1, 1) #x_tensor
## Define a simple Feedforward Neural Network model class FeedforwardNN(nn.Module): def __init__(self, input_size, hidden_size, output_size): super(FeedforwardNN, self).__init__() self.fc1 = nn.Linear(input_size, hidden_size) # First layer self.relu = nn.ReLU() # Activation function self.fc2 = nn.Linear(hidden_size, output_size) # Output layer def forward(self, x): out = self.fc1(x) out = self.relu(out) out = self.fc2(out) return out
# Model instantiation input_size = 1 hidden_size = 10 output_size = 1 model = FeedforwardNN(input_size, hidden_size, output_size) # Loss and optimizer criterion = nn.MSELoss() # Mean Squared Error for regression tasks optimizer = optim.Adam(model.parameters(), lr=0.01)
# Training the model epochs = 500 losses = [] for epoch in range(epochs): model.train() # Forward pass predictions = model(x_tensor) loss = criterion(predictions, y_tensor) # Backward pass and optimization optimizer.zero_grad() loss.backward() optimizer.step() # Store and print loss losses.append(loss.item()) if (epoch + 1) % 50 == 0: print(f'Epoch [{epoch + 1}/{epochs}], Loss: {loss.item():.4f}')
Epoch [50/500], Loss: 19.5552 Epoch [100/500], Loss: 19.5546 Epoch [150/500], Loss: 19.5540 Epoch [200/500], Loss: 19.5534 Epoch [250/500], Loss: 19.5528 Epoch [300/500], Loss: 19.5523 Epoch [350/500], Loss: 19.5517 Epoch [400/500], Loss: 19.5512 Epoch [450/500], Loss: 19.5506 Epoch [500/500], Loss: 19.5494
# Plot the loss over time plt.plot(range(epochs), losses) plt.xlabel('Epoch') plt.ylabel('Loss') plt.title('Training Loss') plt.show() # Test the model by plotting predictions model.eval() # Set model to evaluation mode with torch.no_grad(): predicted = model(x_tensor).detach().numpy() plt.plot(x, y, 'ro', label='Original data') # Plot original data plt.plot(x, predicted, 'b-', label='Fitted line') # Plot predicted data plt.legend() plt.show()
Image in a Jupyter notebookImage in a Jupyter notebook

Implement using Dataset for multiclasification

# Step 1: Import necessary libraries import numpy as np from sklearn.datasets import load_iris from sklearn.model_selection import train_test_split from keras.models import Sequential from keras.layers import Dense from keras.utils import to_categorical
from keras.utils import to_categorical # Step 2: Load and preprocess the dataset iris = load_iris() X = iris.data y = iris.target y_binary = np.where(y == 0, 0, 1) # Convert to binary classification (class 0 vs. other) # One-hot encode the binary labels y_binary = to_categorical(y_binary) # Split the data into training and testing sets X_train, X_test, y_train, y_test = train_test_split(X, y_binary, test_size=0.2, random_state=42)
feature_names = iris.feature_names print("Feature Names:", feature_names) import pandas as pd Features=pd.DataFrame(X) Features
# Step 3: Create a Sequential model model = Sequential() # Step 4: Add layers to the model model.add(Dense(8, input_dim=4, activation='relu')) # Hidden layer with 8 neurons model.add(Dense(2, activation='softmax')) # Output layer with 3 neurons (Multi-classification) # Step 5: Compile the model model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy']) # Step 6: Train the model model.fit(X_train, y_train, epochs=50, batch_size=10)
# Step 7: Evaluate the model loss, accuracy = model.evaluate(X_test, y_test) print(f'Loss: {loss:.4f}') print(f'Accuracy: {accuracy*100:.2f}%')
# Print the feature names feature_names = iris.feature_names print("Feature Names:", feature_names)
# Get the learned weights from the first layer weights = model.layers[0].get_weights()[0] # Print the feature importance based on the learned weights feature_importance = np.mean(np.abs(weights), axis=0) print("Feature Importance:", feature_importance) print("Accuracy:", accuracy)
# Identify the index of the most important feature most_important_feature_index = np.argmax(feature_importance) print("Most Important Feature Index:", most_important_feature_index)