Path: blob/main/notebooks/published/backpropagation_from_scratch/backpropagation_from_scratch.ipynb
51 views
Backpropagation from Scratch
Introduction
Backpropagation is the cornerstone algorithm for training neural networks. It provides an efficient method to compute gradients of the loss function with respect to each weight in the network by applying the chain rule of calculus systematically from the output layer back to the input layer.
This notebook implements a complete neural network from scratch, demonstrating the mathematical foundations and computational mechanics of backpropagation.
Mathematical Foundation
Neural Network Architecture
Consider a feedforward neural network with layers. For each layer , we define:
: Weight matrix connecting layer to layer
: Bias vector for layer
: Pre-activation (weighted input) at layer
: Activation (output) at layer
Forward Propagation
The forward pass computes the network output through successive transformations:
where is the activation function. For the input layer, .
Activation Functions
Sigmoid:
ReLU:
Tanh:
Loss Function
For binary classification, we use the binary cross-entropy loss:
where is the number of training examples, is the true label, and is the predicted probability.
Backpropagation Algorithm
The key insight of backpropagation is the recursive computation of error terms for each layer.
Output Layer Error ():
Hidden Layer Error ():
where denotes element-wise multiplication.
Gradient Computation:
Parameter Update (Gradient Descent)
where is the learning rate.
Implementation
We now implement a fully-connected neural network with configurable architecture to solve a nonlinear classification problem.
Activation Functions and Their Derivatives
Neural Network Class
Generate Nonlinear Dataset
We create a spiral dataset that requires nonlinear decision boundaries.
Train the Network
Visualization
Analysis of Gradient Flow
Let's examine the gradient magnitudes during training to understand the dynamics of backpropagation.
Summary
This notebook demonstrated:
Mathematical foundations of neural networks and backpropagation using the chain rule
Forward propagation: Computing and
Backpropagation: Computing gradients via
Gradient descent: Updating parameters with
The network successfully learned to classify the nonlinear spiral dataset, demonstrating the power of multi-layer networks trained with backpropagation to learn complex decision boundaries.
Key Insights
Vanishing gradients: Deeper layers may receive smaller gradient signals, especially with sigmoid/tanh activations
Weight initialization: Proper scaling (He/Xavier) prevents saturation and ensures healthy gradient flow
Learning rate: Too high causes divergence; too low causes slow convergence
Architecture: More neurons/layers can model more complex boundaries but risk overfitting