CoCalc -- How to Install PyTorch in Jupyter Notebooks Using CoCalc.ipynb

¹¹⁰ views
ubuntu2204

Kernel: Python 3 (system-wide)

Introduction: The Power of GPU vs CPU

Graphics Processing Units (GPUs) were originally designed to accelerate rendering of graphics. Their architecture, with thousands of tiny, efficient cores, is also ideal for parallel numerical computation. Central Processing Units (CPUs) are versatile and optimized for single-threaded performance. For highly parallel tasks, especially in machine learning, scientific computing, and simulation, GPUs often provide orders of magnitude speedup.

Mathematical Background

Historically, high-performance computation for mathematics relied on CPUs. However, as problems in linear algebra and calculus grew in size—especially in the context of neural networks and simulation—the need for parallel computations increased. The shift to GPUs enabled many modern breakthroughs in deep learning and big data analytics.

Key concept: GPUs excel at performing the same operation on many numbers at once—Single Instruction, Multiple Data (SIMD).
Linear algebra operations form the backbone of machine learning and scientific computation, and are easily parallelized.

Modern Applications Utilizing GPU Acceleration

Training deep neural networks (deep learning)
Molecular dynamics simulations
Large-scale matrix operations (finance, science)
Real-time video and image processing
Big data analytics

In [1]:

import numpy as np
import torch
import time

Example 1: Neural Network Training Speed (CPU vs GPU)

Train a simple neural network on random data, comparing timing on CPU and GPU.

In [2]:

import torch
import torch.nn as nn

input_size = 4000      # doubled input size
hidden_size1 = 2000    # extra hidden layer, larger
hidden_size2 = 1200    # new second hidden layer
output_size = 10
n_samples = 100000       # more data

X = torch.randn(n_samples, input_size)
y = torch.randint(0, output_size, (n_samples,))

model_cpu = nn.Sequential(
    nn.Linear(input_size, hidden_size1),
    nn.ReLU(),
    nn.Linear(hidden_size1, hidden_size2),
    nn.ReLU(),
    nn.Linear(hidden_size2, output_size)
)
criterion = nn.CrossEntropyLoss()
optimizer = torch.optim.SGD(model_cpu.parameters(), lr=0.01)

Train on CPU

In [3]:

def train(model, X, y, optimizer, criterion, epochs=3):
    for _ in range(epochs):
        optimizer.zero_grad()
        output = model(X)
        loss = criterion(output, y)
        loss.backward()
        optimizer.step()

start = time.time()
train(model_cpu, X, y, optimizer, criterion)
cpu_train_time = time.time() - start
print(f"Neural network training on CPU took {cpu_train_time:.2f} seconds.")

Out[3]:

Neural network training on CPU took 16.73 seconds.

Train on GPU

In [4]:

if torch.cuda.is_available():
    model_gpu = nn.Sequential(
        nn.Linear(input_size, hidden_size1),
        nn.ReLU(),
        nn.Linear(hidden_size1, hidden_size2),
        nn.ReLU(),
        nn.Linear(hidden_size2, output_size)
    ).cuda()
    X_gpu = X.cuda()
    y_gpu = y.cuda()
    optimizer_gpu = torch.optim.SGD(model_gpu.parameters(), lr=0.01)
    # Warmup
    train(model_gpu, X_gpu, y_gpu, optimizer_gpu, criterion)
    torch.cuda.synchronize()
    start = time.time()
    train(model_gpu, X_gpu, y_gpu, optimizer_gpu, criterion)
    torch.cuda.synchronize()
    gpu_train_time = time.time() - start
    print(f"Neural network training on GPU took {gpu_train_time:.2f} seconds.")
else:
    print("No GPU detected.")

Out[4]:

Neural network training on GPU took 0.16 seconds.

High-Level Summary

GPUs have revolutionized the way we solve large-scale mathematical, scientific, and engineering problems by accelerating computations that were previously infeasible on CPUs. The examples above demonstrate that for parallelizable tasks, like matrix multiplication and neural network training, GPUs can provide dramatic speedup over CPUs—enabling modern AI, data science, and simulation at scale.