Book a Demo!
CoCalc Logo Icon
StoreFeaturesDocsShareSupportNewsAboutPoliciesSign UpSign In
Download
85 views
ubuntu2204
Kernel: Python 3 (system-wide)

Introduction: The Power of GPU vs CPU

Graphics Processing Units (GPUs) were originally designed to accelerate rendering of graphics. Their architecture, with thousands of tiny, efficient cores, is also ideal for parallel numerical computation. Central Processing Units (CPUs) are versatile and optimized for single-threaded performance. For highly parallel tasks, especially in machine learning, scientific computing, and simulation, GPUs often provide orders of magnitude speedup.

Mathematical Background

Historically, high-performance computation for mathematics relied on CPUs. However, as problems in linear algebra and calculus grew in size—especially in the context of neural networks and simulation—the need for parallel computations increased. The shift to GPUs enabled many modern breakthroughs in deep learning and big data analytics.

  • Key concept: GPUs excel at performing the same operation on many numbers at once—Single Instruction, Multiple Data (SIMD).

  • Linear algebra operations form the backbone of machine learning and scientific computation, and are easily parallelized.

Modern Applications Utilizing GPU Acceleration

  • Training deep neural networks (deep learning)

  • Molecular dynamics simulations

  • Large-scale matrix operations (finance, science)

  • Real-time video and image processing

  • Big data analytics

import numpy as np import torch import time

Example 1: Neural Network Training Speed (CPU vs GPU)

Train a simple neural network on random data, comparing timing on CPU and GPU.

import torch import torch.nn as nn input_size = 4000 # doubled input size hidden_size1 = 2000 # extra hidden layer, larger hidden_size2 = 1200 # new second hidden layer output_size = 10 n_samples = 100000 # more data X = torch.randn(n_samples, input_size) y = torch.randint(0, output_size, (n_samples,)) model_cpu = nn.Sequential( nn.Linear(input_size, hidden_size1), nn.ReLU(), nn.Linear(hidden_size1, hidden_size2), nn.ReLU(), nn.Linear(hidden_size2, output_size) ) criterion = nn.CrossEntropyLoss() optimizer = torch.optim.SGD(model_cpu.parameters(), lr=0.01)

Train on CPU

def train(model, X, y, optimizer, criterion, epochs=3): for _ in range(epochs): optimizer.zero_grad() output = model(X) loss = criterion(output, y) loss.backward() optimizer.step() start = time.time() train(model_cpu, X, y, optimizer, criterion) cpu_train_time = time.time() - start print(f"Neural network training on CPU took {cpu_train_time:.2f} seconds.")
Neural network training on CPU took 16.73 seconds.

Train on GPU

if torch.cuda.is_available(): model_gpu = nn.Sequential( nn.Linear(input_size, hidden_size1), nn.ReLU(), nn.Linear(hidden_size1, hidden_size2), nn.ReLU(), nn.Linear(hidden_size2, output_size) ).cuda() X_gpu = X.cuda() y_gpu = y.cuda() optimizer_gpu = torch.optim.SGD(model_gpu.parameters(), lr=0.01) # Warmup train(model_gpu, X_gpu, y_gpu, optimizer_gpu, criterion) torch.cuda.synchronize() start = time.time() train(model_gpu, X_gpu, y_gpu, optimizer_gpu, criterion) torch.cuda.synchronize() gpu_train_time = time.time() - start print(f"Neural network training on GPU took {gpu_train_time:.2f} seconds.") else: print("No GPU detected.")
Neural network training on GPU took 0.16 seconds.

High-Level Summary

GPUs have revolutionized the way we solve large-scale mathematical, scientific, and engineering problems by accelerating computations that were previously infeasible on CPUs. The examples above demonstrate that for parallelizable tasks, like matrix multiplication and neural network training, GPUs can provide dramatic speedup over CPUs—enabling modern AI, data science, and simulation at scale.