Path: blob/master/Applied Generative AI with GANS/5 Keras_Binary_Classification_NN.ipynb
4766 views
Neural Network Binary Classification using Keras
Keras is used in neural networks (NN) because it makes building and training deep learning models simple, fast, and user‑friendly, while still being powerful enough for research and production.
High‑level API
You can define a model in just a few lines
Keras used to support multiple backends; now it’s tightly integrated with TensorFlow (tf.keras)
Comparison Keras and Pytorch:
| Feature | Keras (TensorFlow) | PyTorch |
|---|---|---|
| Ease of Use | Very easy, high‑level API | Moderate, more code needed |
| Flexibility | Good but less flexible than PyTorch | Very high, research‑friendly |
| Learning Curve | Beginner‑friendly | Steeper but intuitive for Python users |
| Computation Graph | Static + Dynamic (TF 2.x eager mode) | Fully dynamic |
| Customization | Moderate | Excellent (ideal for custom models) |
| Research Adoption | High | Very high (dominates research) |
| Industry Deployment | Excellent (TF Serving, TF Lite, TF.js) | Good (TorchServe, PyTorch Mobile) |
| TPU Support | Excellent | Limited |
| GPU Performance | Strong | Strong |
| Model Prototyping | Fast and simple | Fast for experts |
| Community | Large, production-focused | Large, research-focused |
| Best For | Production, beginners, fast prototyping | Research, custom architectures, control |
Adam Optimizer
Adam stands for Adaptive Moment Estimation. It is one of the most widely used optimizers in neural networks, especially for binary classification (e.g., spam/ham, churn/no-churn, fraud/not-fraud, etc.).
Adam combines the strengths of:
Momentum → smooths gradients
RMSProp → adapts learning rate per parameter
This makes training fast, stable, and less sensitive to noisy gradients.
Why Adam is good for binary classification
Binary classification uses the binary cross-entropy loss, which often produces noisy gradients, especially with:
small datasets
imbalanced classes
sigmoid output layer
Adam stabilizes this by controlling gradient updates intelligently.
**How Adam Works (Step-by-Step) **
Adam keeps two running averages at every training step.
1️. Compute Gradient
From your loss function, the network computes the gradient of error with respect to weights:
This gradient tells the optimizer how to adjust weights.
2️. First Moment (mₜ) — Momentum
Adam calculates an exponentially smoothed average of the gradient:
Think of it as:
“Where is the gradient generally pointing?”
This filters out noise and stabilizes updates.
Typically: β₁ = 0.9
3️. Second Moment (vₜ) — RMSProp Component
Adam also tracks the average squared gradient:
This tells:
“How large is the gradient typically?”
Used to adjust learning rate per parameter.
Usually: β₂ = 0.999
4️. Bias Correction
At the beginning of training, both averages are very small. So Adam corrects their bias:
This prevents extremely tiny updates early on.
5️. Weight Update Rule
Finally, Adam updates the weights using:
Where:
α = learning rate (default 0.001)
ε = a tiny value to avoid division by zero
Visualization: Adam’s Moment Estimates
The plot below shows:
Raw gradients (noisy)
First moment (smooth trend)
Second moment (smooth squared magnitude)
Model Parameters Summary (Keras) explanation:
| Metric | Value | Explanation |
|---|---|---|
| Total Parameters | 321 (1.25 KB) | Total number of weights and biases in the neural network |
| Trainable Parameters | 321 (1.25 KB) | Parameters updated during training via backpropagation |
| Non-Trainable Parameters | 0 (0.00 B) | Parameters frozen during training (none in this model) |
Layer-wise Parameter Breakdown:
| Layer | Input Units | Output Units | Weights Calculation | Biases | Total Params |
|---|---|---|---|---|---|
| Dense (Hidden 1) | 10 | 16 | 10 × 16 = 160 | 16 | 176 |
| Dense (Hidden 2) | 16 | 8 | 16 × 8 = 128 | 8 | 136 |
| Dense (Output) | 8 | 1 | 8 × 1 = 8 | 1 | 9 |
| Total | – | – | – | – | 321 |
Memory Footprint Calculation:
| Item | Value |
|---|---|
| Parameters | 321 |
| Data type | float32 (4 bytes) |
| Total Memory | 321 × 4 = 1,284 bytes (~1.25 KB) |
Epoch 1/20
20/20 ━━━━━━━━━━━━━━━━━━━━ 2s 24ms/step - accuracy: 0.5016 - loss: 0.8376 - val_accuracy: 0.4500 - val_loss: 0.7931
Epoch 2/20
20/20 ━━━━━━━━━━━━━━━━━━━━ 0s 10ms/step - accuracy: 0.5328 - loss: 0.7379 - val_accuracy: 0.4812 - val_loss: 0.7133
Epoch 3/20
20/20 ━━━━━━━━━━━━━━━━━━━━ 0s 7ms/step - accuracy: 0.5813 - loss: 0.6619 - val_accuracy: 0.5500 - val_loss: 0.6553
Epoch 4/20
20/20 ━━━━━━━━━━━━━━━━━━━━ 0s 5ms/step - accuracy: 0.6359 - loss: 0.6051 - val_accuracy: 0.5938 - val_loss: 0.6135
Epoch 5/20
20/20 ━━━━━━━━━━━━━━━━━━━━ 0s 5ms/step - accuracy: 0.6781 - loss: 0.5597 - val_accuracy: 0.6375 - val_loss: 0.5796
Epoch 6/20
20/20 ━━━━━━━━━━━━━━━━━━━━ 0s 5ms/step - accuracy: 0.7469 - loss: 0.5213 - val_accuracy: 0.6938 - val_loss: 0.5508
Epoch 7/20
20/20 ━━━━━━━━━━━━━━━━━━━━ 0s 5ms/step - accuracy: 0.7812 - loss: 0.4885 - val_accuracy: 0.7250 - val_loss: 0.5267
Epoch 8/20
20/20 ━━━━━━━━━━━━━━━━━━━━ 0s 4ms/step - accuracy: 0.8250 - loss: 0.4587 - val_accuracy: 0.8062 - val_loss: 0.5057
Epoch 9/20
20/20 ━━━━━━━━━━━━━━━━━━━━ 0s 6ms/step - accuracy: 0.8609 - loss: 0.4320 - val_accuracy: 0.8250 - val_loss: 0.4870
Epoch 10/20
20/20 ━━━━━━━━━━━━━━━━━━━━ 0s 5ms/step - accuracy: 0.8687 - loss: 0.4079 - val_accuracy: 0.8125 - val_loss: 0.4722
Epoch 11/20
20/20 ━━━━━━━━━━━━━━━━━━━━ 0s 5ms/step - accuracy: 0.8750 - loss: 0.3864 - val_accuracy: 0.8188 - val_loss: 0.4590
Epoch 12/20
20/20 ━━━━━━━━━━━━━━━━━━━━ 0s 5ms/step - accuracy: 0.8750 - loss: 0.3686 - val_accuracy: 0.8125 - val_loss: 0.4476
Epoch 13/20
20/20 ━━━━━━━━━━━━━━━━━━━━ 0s 5ms/step - accuracy: 0.8766 - loss: 0.3523 - val_accuracy: 0.8062 - val_loss: 0.4383
Epoch 14/20
20/20 ━━━━━━━━━━━━━━━━━━━━ 0s 5ms/step - accuracy: 0.8766 - loss: 0.3390 - val_accuracy: 0.8062 - val_loss: 0.4310
Epoch 15/20
20/20 ━━━━━━━━━━━━━━━━━━━━ 0s 7ms/step - accuracy: 0.8766 - loss: 0.3272 - val_accuracy: 0.8125 - val_loss: 0.4246
Epoch 16/20
20/20 ━━━━━━━━━━━━━━━━━━━━ 0s 5ms/step - accuracy: 0.8766 - loss: 0.3171 - val_accuracy: 0.8125 - val_loss: 0.4198
Epoch 17/20
20/20 ━━━━━━━━━━━━━━━━━━━━ 0s 6ms/step - accuracy: 0.8781 - loss: 0.3084 - val_accuracy: 0.8250 - val_loss: 0.4146
Epoch 18/20
20/20 ━━━━━━━━━━━━━━━━━━━━ 0s 5ms/step - accuracy: 0.8797 - loss: 0.3004 - val_accuracy: 0.8313 - val_loss: 0.4103
Epoch 19/20
20/20 ━━━━━━━━━━━━━━━━━━━━ 0s 5ms/step - accuracy: 0.8813 - loss: 0.2927 - val_accuracy: 0.8313 - val_loss: 0.4066
Epoch 20/20
20/20 ━━━━━━━━━━━━━━━━━━━━ 0s 5ms/step - accuracy: 0.8797 - loss: 0.2856 - val_accuracy: 0.8313 - val_loss: 0.4017
Training Configuration Parameters (Keras)
Model Training Settings
| Parameter | Value | Definition | Impact |
|---|---|---|---|
| Epochs | 20 | One epoch represents a full pass of the entire training dataset through the neural network | Too few → underfitting; too many → overfitting and longer training time |
| Batch Size | 32 | Number of samples processed before the model updates its weights | Smaller batches improve generalization but slow training; larger batches speed up training but require more memory |
| Validation Split | 0.2 | Percentage of training data held out for validation during training | Helps detect overfitting and tune hyperparameters without touching test data |
| Verbose | 1 | Controls the level of logging output shown during training | Improves training visibility and debugging; no impact on model performance |
To note
| Parameter | Recommended Starting Point | When to Change |
|---|---|---|
| Epochs | 10–20 | Increase if validation loss is still decreasing |
| Batch Size | 32 | Increase if training is slow and memory allows |
| Validation Split | 0.2 | Increase for small datasets |
| Verbose | 1 | Use 0 for production runs |
7/7 ━━━━━━━━━━━━━━━━━━━━ 0s 7ms/step - accuracy: 0.7900 - loss: 0.3841
Test Accuracy: 0.79
7/7 ━━━━━━━━━━━━━━━━━━━━ 0s 19ms/step