CoCalc -- MNIST_dataset.ipynb

Real-time collaboration for Jupyter Notebooks, Linux Terminals, LaTeX, VS Code, R IDE, and more,
all in one place. Commercial Alternative to JupyterHub.

"Guiding Future STEM Leaders through Innovative Research Training" ~ thinkingbeyond.education

Path: ThinkingBeyond Activities / BeyondAI-2024-Mentee-Projects / palak-sumayah / MNIST_dataset.ipynb

Views: ¹⁰⁸⁵
Image: ubuntu2204

Kernel: Python 3

In [ ]:

import numpy as np
import pandas as pd
from sklearn.datasets import fetch_openml
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.linear_model import LogisticRegression
from sklearn.svm import SVC
from sklearn.tree import DecisionTreeClassifier
from sklearn.ensemble import RandomForestClassifier, GradientBoostingClassifier
from sklearn.naive_bayes import GaussianNB
from sklearn.metrics import accuracy_score, f1_score, recall_score, precision_score
import time

In [ ]:

print("Fetching MNIST dataset...")
mnist = fetch_openml('mnist_784', version=1)
X, y = mnist.data, mnist.target
y = y.astype(int)

Fetching MNIST dataset...

In [ ]:

scaler = StandardScaler()
X_scaled = scaler.fit_transform(X)

In [ ]:

X_train, X_test, y_train, y_test = train_test_split(X_scaled, y, test_size=0.2, random_state=42)

In [ ]:

def evaluate_classifier(name, clf):
    print(f"### {name} ###")
    start_time = time.time()
    clf.fit(X_train, y_train)
    train_time = time.time() - start_time
    y_pred = clf.predict(X_test)

    # Calculate metrics
    acc = accuracy_score(y_test, y_pred)
    f1 = f1_score(y_test, y_pred, average='weighted')
    recall = recall_score(y_test, y_pred, average='weighted')
    precision = precision_score(y_test, y_pred, average='weighted')

    print(f"Accuracy: {acc:.2f}")
    print(f"F1 Score: {f1:.2f}")
    print(f"Recall: {recall:.2f}")
    print(f"Precision: {precision:.2f}")
    print(f"Training Time: {train_time:.4f} seconds\n")

    # Return results as a dictionary
    return {
        "Classifier": name,
        "Accuracy": acc,
        "F1 Score": f1,
        "Recall": recall,
        "Precision": precision,
        "Training Time (s)": train_time
    }

# List of classifiers
classifiers = [
    ("Logistic Regression", LogisticRegression(max_iter=1000)),
    ("SVM with RBF Kernel", SVC(kernel="rbf", probability=True)),
    ("Decision Tree", DecisionTreeClassifier()),
    ("Random Forest", RandomForestClassifier()),
    ("Gradient Boosting", GradientBoostingClassifier()),
    ("Naive Bayes", GaussianNB())
]

In [ ]:

results = []
for name, clf in classifiers:
    results.append(evaluate_classifier(name, clf))

### Logistic Regression ###
Accuracy: 0.92
F1 Score: 0.92
Recall: 0.92
Precision: 0.92
Training Time: 56.5794 seconds

### SVM with RBF Kernel ###
Accuracy: 0.96
F1 Score: 0.96
Recall: 0.96
Precision: 0.96
Training Time: 2258.5055 seconds

### Decision Tree ###
Accuracy: 0.87
F1 Score: 0.87
Recall: 0.87
Precision: 0.87
Training Time: 24.0900 seconds

### Random Forest ###
Accuracy: 0.97
F1 Score: 0.97
Recall: 0.97
Precision: 0.97
Training Time: 49.2545 seconds

### Gradient Boosting ###

In [ ]:

df_results = pd.DataFrame(results)

In [ ]:

print("\n### Comparison Table ###\n")
print(df_results)

In [ ]:

from IPython.display import display
display(df_results)

Real-time collaboration for Jupyter Notebooks, Linux Terminals, LaTeX, VS Code, R IDE, and more,
all in one place. Commercial Alternative to JupyterHub.

Product

Resources

Company

Real-time collaboration for Jupyter Notebooks, Linux Terminals, LaTeX, VS Code, R IDE, and more, all in one place. Commercial Alternative to JupyterHub.

Real-time collaboration for Jupyter Notebooks, Linux Terminals, LaTeX, VS Code, R IDE, and more,
all in one place. Commercial Alternative to JupyterHub.