Book a Demo!
CoCalc Logo Icon
StoreFeaturesDocsShareSupportNewsAboutPoliciesSign UpSign In
suyashi29
GitHub Repository: suyashi29/python-su
Path: blob/master/Applied Generative AI with GANS/5 Keras_Binary_Classification_NN.ipynb
4766 views
Kernel: Python 3 (ipykernel)

Neural Network Binary Classification using Keras

Keras is used in neural networks (NN) because it makes building and training deep learning models simple, fast, and user‑friendly, while still being powerful enough for research and production.

  • High‑level API

  • You can define a model in just a few lines

  • Keras used to support multiple backends; now it’s tightly integrated with TensorFlow (tf.keras)

Comparison Keras and Pytorch:

FeatureKeras (TensorFlow)PyTorch
Ease of UseVery easy, high‑level APIModerate, more code needed
FlexibilityGood but less flexible than PyTorchVery high, research‑friendly
Learning CurveBeginner‑friendlySteeper but intuitive for Python users
Computation GraphStatic + Dynamic (TF 2.x eager mode)Fully dynamic
CustomizationModerateExcellent (ideal for custom models)
Research AdoptionHighVery high (dominates research)
Industry DeploymentExcellent (TF Serving, TF Lite, TF.js)Good (TorchServe, PyTorch Mobile)
TPU SupportExcellentLimited
GPU PerformanceStrongStrong
Model PrototypingFast and simpleFast for experts
CommunityLarge, production-focusedLarge, research-focused
Best ForProduction, beginners, fast prototypingResearch, custom architectures, control
!pip install --upgrade pip !pip uninstall -y tensorflow tensorflow-gpu !pip install tensorflow please note: Python 3.11 / 3.12 TensorFlow will fail
import sys print(sys.version)
import numpy as np import tensorflow as tf from tensorflow.keras.models import Sequential from tensorflow.keras.layers import Dense from tensorflow.keras.optimizers import Adam
from sklearn.datasets import make_classification from sklearn.model_selection import train_test_split from sklearn.preprocessing import StandardScaler X, y = make_classification( n_samples=1000, n_features=10, n_informative=6, n_redundant=2, n_classes=2, random_state=42 ) X_train, X_test, y_train, y_test = train_test_split( X, y, test_size=0.2, random_state=42 ) scaler = StandardScaler() X_train = scaler.fit_transform(X_train) X_test = scaler.transform(X_test)

Adam Optimizer

Adam stands for Adaptive Moment Estimation. It is one of the most widely used optimizers in neural networks, especially for binary classification (e.g., spam/ham, churn/no-churn, fraud/not-fraud, etc.).

Adam combines the strengths of:

  • Momentum → smooths gradients

  • RMSProp → adapts learning rate per parameter

This makes training fast, stable, and less sensitive to noisy gradients.


Why Adam is good for binary classification

Binary classification uses the binary cross-entropy loss, which often produces noisy gradients, especially with:

  • small datasets

  • imbalanced classes

  • sigmoid output layer

Adam stabilizes this by controlling gradient updates intelligently.


**How Adam Works (Step-by-Step) **

Adam keeps two running averages at every training step.


1️. Compute Gradient

From your loss function, the network computes the gradient of error with respect to weights:

gt=θLtg_t = \nabla_\theta L_t

This gradient tells the optimizer how to adjust weights.


2️. First Moment (mₜ) — Momentum

Adam calculates an exponentially smoothed average of the gradient:

mt=β1mt1+(1β1)gtm_t = \beta_1 m_{t-1} + (1 - \beta_1) g_t

Think of it as:

“Where is the gradient generally pointing?”

This filters out noise and stabilizes updates.

Typically: β₁ = 0.9


3️. Second Moment (vₜ) — RMSProp Component

Adam also tracks the average squared gradient:

vt=β2vt1+(1β2)gt2v_t = \beta_2 v_{t-1} + (1 - \beta_2) g_t^2

This tells:

“How large is the gradient typically?”

Used to adjust learning rate per parameter.

Usually: β₂ = 0.999


4️. Bias Correction

At the beginning of training, both averages are very small. So Adam corrects their bias:

m^t=mt1β1t\hat m_t = \frac{m_t}{1 - \beta_1^t}v^t=vt1β2t\hat v_t = \frac{v_t}{1 - \beta_2^t}

This prevents extremely tiny updates early on.


5️. Weight Update Rule

Finally, Adam updates the weights using:

θt+1=θtαm^tv^t+ϵ\theta_{t+1} = \theta_t - \alpha \cdot \frac{\hat m_t}{\sqrt{\hat v_t} + \epsilon}

Where:

  • α = learning rate (default 0.001)

  • ε = a tiny value to avoid division by zero


Visualization: Adam’s Moment Estimates

The plot below shows:

  • Raw gradients (noisy)

  • First moment (smooth trend)

  • Second moment (smooth squared magnitude)

adam

model = Sequential([ Dense(16, activation='relu', input_shape=(X_train.shape[1],)), Dense(8, activation='relu'), Dense(1, activation='sigmoid') ]) model.compile( optimizer=Adam(learning_rate=0.001), loss='binary_crossentropy', metrics=['accuracy'] ) model.summary()
C:\Users\Suyashi144893\AppData\Local\anaconda3\Lib\site-packages\keras\src\layers\core\dense.py:106: UserWarning: Do not pass an `input_shape`/`input_dim` argument to a layer. When using Sequential models, prefer using an `Input(shape)` object as the first layer in the model instead. super().__init__(activity_regularizer=activity_regularizer, **kwargs)

Model Parameters Summary (Keras) explanation:

MetricValueExplanation
Total Parameters321 (1.25 KB)Total number of weights and biases in the neural network
Trainable Parameters321 (1.25 KB)Parameters updated during training via backpropagation
Non-Trainable Parameters0 (0.00 B)Parameters frozen during training (none in this model)

Layer-wise Parameter Breakdown:

LayerInput UnitsOutput UnitsWeights CalculationBiasesTotal Params
Dense (Hidden 1)101610 × 16 = 16016176
Dense (Hidden 2)16816 × 8 = 1288136
Dense (Output)818 × 1 = 819
Total321

Memory Footprint Calculation:

ItemValue
Parameters321
Data typefloat32 (4 bytes)
Total Memory321 × 4 = 1,284 bytes (~1.25 KB)
history = model.fit( X_train, y_train, epochs=20, batch_size=32, validation_split=0.2, verbose=1 )
Epoch 1/20 20/20 ━━━━━━━━━━━━━━━━━━━━ 2s 24ms/step - accuracy: 0.5016 - loss: 0.8376 - val_accuracy: 0.4500 - val_loss: 0.7931 Epoch 2/20 20/20 ━━━━━━━━━━━━━━━━━━━━ 0s 10ms/step - accuracy: 0.5328 - loss: 0.7379 - val_accuracy: 0.4812 - val_loss: 0.7133 Epoch 3/20 20/20 ━━━━━━━━━━━━━━━━━━━━ 0s 7ms/step - accuracy: 0.5813 - loss: 0.6619 - val_accuracy: 0.5500 - val_loss: 0.6553 Epoch 4/20 20/20 ━━━━━━━━━━━━━━━━━━━━ 0s 5ms/step - accuracy: 0.6359 - loss: 0.6051 - val_accuracy: 0.5938 - val_loss: 0.6135 Epoch 5/20 20/20 ━━━━━━━━━━━━━━━━━━━━ 0s 5ms/step - accuracy: 0.6781 - loss: 0.5597 - val_accuracy: 0.6375 - val_loss: 0.5796 Epoch 6/20 20/20 ━━━━━━━━━━━━━━━━━━━━ 0s 5ms/step - accuracy: 0.7469 - loss: 0.5213 - val_accuracy: 0.6938 - val_loss: 0.5508 Epoch 7/20 20/20 ━━━━━━━━━━━━━━━━━━━━ 0s 5ms/step - accuracy: 0.7812 - loss: 0.4885 - val_accuracy: 0.7250 - val_loss: 0.5267 Epoch 8/20 20/20 ━━━━━━━━━━━━━━━━━━━━ 0s 4ms/step - accuracy: 0.8250 - loss: 0.4587 - val_accuracy: 0.8062 - val_loss: 0.5057 Epoch 9/20 20/20 ━━━━━━━━━━━━━━━━━━━━ 0s 6ms/step - accuracy: 0.8609 - loss: 0.4320 - val_accuracy: 0.8250 - val_loss: 0.4870 Epoch 10/20 20/20 ━━━━━━━━━━━━━━━━━━━━ 0s 5ms/step - accuracy: 0.8687 - loss: 0.4079 - val_accuracy: 0.8125 - val_loss: 0.4722 Epoch 11/20 20/20 ━━━━━━━━━━━━━━━━━━━━ 0s 5ms/step - accuracy: 0.8750 - loss: 0.3864 - val_accuracy: 0.8188 - val_loss: 0.4590 Epoch 12/20 20/20 ━━━━━━━━━━━━━━━━━━━━ 0s 5ms/step - accuracy: 0.8750 - loss: 0.3686 - val_accuracy: 0.8125 - val_loss: 0.4476 Epoch 13/20 20/20 ━━━━━━━━━━━━━━━━━━━━ 0s 5ms/step - accuracy: 0.8766 - loss: 0.3523 - val_accuracy: 0.8062 - val_loss: 0.4383 Epoch 14/20 20/20 ━━━━━━━━━━━━━━━━━━━━ 0s 5ms/step - accuracy: 0.8766 - loss: 0.3390 - val_accuracy: 0.8062 - val_loss: 0.4310 Epoch 15/20 20/20 ━━━━━━━━━━━━━━━━━━━━ 0s 7ms/step - accuracy: 0.8766 - loss: 0.3272 - val_accuracy: 0.8125 - val_loss: 0.4246 Epoch 16/20 20/20 ━━━━━━━━━━━━━━━━━━━━ 0s 5ms/step - accuracy: 0.8766 - loss: 0.3171 - val_accuracy: 0.8125 - val_loss: 0.4198 Epoch 17/20 20/20 ━━━━━━━━━━━━━━━━━━━━ 0s 6ms/step - accuracy: 0.8781 - loss: 0.3084 - val_accuracy: 0.8250 - val_loss: 0.4146 Epoch 18/20 20/20 ━━━━━━━━━━━━━━━━━━━━ 0s 5ms/step - accuracy: 0.8797 - loss: 0.3004 - val_accuracy: 0.8313 - val_loss: 0.4103 Epoch 19/20 20/20 ━━━━━━━━━━━━━━━━━━━━ 0s 5ms/step - accuracy: 0.8813 - loss: 0.2927 - val_accuracy: 0.8313 - val_loss: 0.4066 Epoch 20/20 20/20 ━━━━━━━━━━━━━━━━━━━━ 0s 5ms/step - accuracy: 0.8797 - loss: 0.2856 - val_accuracy: 0.8313 - val_loss: 0.4017

Training Configuration Parameters (Keras)

Model Training Settings

ParameterValueDefinitionImpact
Epochs20One epoch represents a full pass of the entire training dataset through the neural networkToo few → underfitting; too many → overfitting and longer training time
Batch Size32Number of samples processed before the model updates its weightsSmaller batches improve generalization but slow training; larger batches speed up training but require more memory
Validation Split0.2Percentage of training data held out for validation during trainingHelps detect overfitting and tune hyperparameters without touching test data
Verbose1Controls the level of logging output shown during trainingImproves training visibility and debugging; no impact on model performance

To note

ParameterRecommended Starting PointWhen to Change
Epochs10–20Increase if validation loss is still decreasing
Batch Size32Increase if training is slow and memory allows
Validation Split0.2Increase for small datasets
Verbose1Use 0 for production runs
loss, accuracy = model.evaluate(X_test, y_test) print(f"Test Accuracy: {accuracy:.2f}")
7/7 ━━━━━━━━━━━━━━━━━━━━ 0s 7ms/step - accuracy: 0.7900 - loss: 0.3841 Test Accuracy: 0.79
predictions = model.predict(X_test) predicted_classes = (predictions > 0.5).astype(int) predicted_classes[:10]
7/7 ━━━━━━━━━━━━━━━━━━━━ 0s 19ms/step
array([[1], [0], [0], [1], [1], [0], [0], [0], [1], [1]])