Contact Us!
CoCalc Logo Icon
StoreFeaturesDocsShareSupportNewsAboutSign UpSign In
Avatar for stephanie's main branch.

Real-time collaboration for Jupyter Notebooks, Linux Terminals, LaTeX, VS Code, R IDE, and more,
all in one place. Commercial Alternative to JupyterHub.

| Download

"Guiding Future STEM Leaders through Innovative Research Training" ~ thinkingbeyond.education

Views: 1131
Image: ubuntu2204
Kernel: Python 3

Optimizing MLPs for MNIST

by: Nayra Saadawy & Priyam Raul

mentor: Prof. Devendra Singh

!pip freeze > requirements.txt #gives a list of all requirements and packages
!pip freeze | grep -E 'torch|torchvision|matplotlib' > requirements.txt # gives a filtered version of the requirements
!cat requirements.txt
absl-py==1.4.0 accelerate==1.2.1 aiohappyeyeballs==2.4.4 aiohttp==3.11.10 aiosignal==1.3.2 alabaster==1.0.0 albucore==0.0.19 albumentations==1.4.20 altair==5.5.0 annotated-types==0.7.0 anyio==3.7.1 argon2-cffi==23.1.0 argon2-cffi-bindings==21.2.0 array_record==0.5.1 arviz==0.20.0 astropy==6.1.7 astropy-iers-data==0.2024.12.16.0.35.48 astunparse==1.6.3 async-timeout==4.0.3 atpublic==4.1.0 attrs==24.3.0 audioread==3.0.1 autograd==1.7.0 babel==2.16.0 backcall==0.2.0 beautifulsoup4==4.12.3 bigframes==1.29.0 bigquery-magics==0.4.0 bleach==6.2.0 blinker==1.9.0 blis==0.7.11 blosc2==2.7.1 bokeh==3.6.2 Bottleneck==1.4.2 bqplot==0.12.43 branca==0.8.1 CacheControl==0.14.1 cachetools==5.5.0 catalogue==2.0.10 certifi==2024.12.14 cffi==1.17.1 chardet==5.2.0 charset-normalizer==3.4.0 chex==0.1.88 clarabel==0.9.0 click==8.1.7 cloudpathlib==0.20.0 cloudpickle==3.1.0 cmake==3.31.2 cmdstanpy==1.2.5 colorcet==3.1.0 colorlover==0.3.0 colour==0.1.5 community==1.0.0b1 confection==0.1.5 cons==0.4.6 contourpy==1.3.1 cryptography==43.0.3 cuda-python==12.2.1 cudf-cu12 @ https://pypi.nvidia.com/cudf-cu12/cudf_cu12-24.10.1-cp310-cp310-manylinux_2_24_x86_64.manylinux_2_28_x86_64.whl cufflinks==0.17.3 cupy-cuda12x==12.2.0 cvxopt==1.3.2 cvxpy==1.6.0 cycler==0.12.1 cymem==2.0.10 Cython==3.0.11 dask==2024.10.0 datascience==0.17.6 db-dtypes==1.3.1 dbus-python==1.2.18 debugpy==1.8.0 decorator==4.4.2 defusedxml==0.7.1 Deprecated==1.2.15 diffusers==0.31.0 distro==1.9.0 dlib==19.24.2 dm-tree==0.1.8 docker-pycreds==0.4.0 docstring_parser==0.16 docutils==0.21.2 dopamine_rl==4.1.0 duckdb==1.1.3 earthengine-api==1.4.3 easydict==1.13 editdistance==0.8.1 eerepr==0.0.4 einops==0.8.0 en-core-web-sm @ https://github.com/explosion/spacy-models/releases/download/en_core_web_sm-3.7.1/en_core_web_sm-3.7.1-py3-none-any.whl#sha256=86cc141f63942d4b2c5fcee06630fd6f904788d2f0ab005cce45aadb8fb73889 entrypoints==0.4 et_xmlfile==2.0.0 etils==1.11.0 etuples==0.3.9 eval_type_backport==0.2.0 exceptiongroup==1.2.2 fastai==2.7.18 fastcore==1.7.27 fastdownload==0.0.7 fastjsonschema==2.21.1 fastprogress==1.0.3 fastrlock==0.8.3 filelock==3.16.1 firebase-admin==6.6.0 Flask==3.1.0 flatbuffers==24.3.25 flax==0.8.5 folium==0.19.2 fonttools==4.55.3 frozendict==2.4.6 frozenlist==1.5.0 fsspec==2024.10.0 future==1.0.0 gast==0.6.0 gcsfs==2024.10.0 GDAL==3.6.4 gdown==5.2.0 geemap==0.35.1 gensim==4.3.3 geocoder==1.38.1 geographiclib==2.0 geopandas==1.0.1 geopy==2.4.1 gin-config==0.5.0 gitdb==4.0.11 GitPython==3.1.43 glob2==0.7 google==2.0.3 google-ai-generativelanguage==0.6.10 google-api-core==2.19.2 google-api-python-client==2.155.0 google-auth==2.27.0 google-auth-httplib2==0.2.0 google-auth-oauthlib==1.2.1 google-cloud-aiplatform==1.74.0 google-cloud-bigquery==3.25.0 google-cloud-bigquery-connection==1.17.0 google-cloud-bigquery-storage==2.27.0 google-cloud-bigtable==2.27.0 google-cloud-core==2.4.1 google-cloud-datastore==2.20.2 google-cloud-firestore==2.19.0 google-cloud-functions==1.19.0 google-cloud-iam==2.17.0 google-cloud-language==2.16.0 google-cloud-pubsub==2.27.1 google-cloud-resource-manager==1.14.0 google-cloud-storage==2.19.0 google-cloud-translate==3.19.0 google-colab @ file:///colabtools/dist/google_colab-1.0.0.tar.gz google-crc32c==1.6.0 google-genai==0.3.0 google-generativeai==0.8.3 google-pasta==0.2.0 google-resumable-media==2.7.2 googleapis-common-protos==1.66.0 googledrivedownloader==0.4 graphviz==0.20.3 greenlet==3.1.1 grpc-google-iam-v1==0.13.1 grpcio==1.68.1 grpcio-status==1.62.3 gspread==6.0.2 gspread-dataframe==3.3.1 gym==0.25.2 gym-notices==0.0.8 h11==0.14.0 h5netcdf==1.4.1 h5py==3.12.1 holidays==0.63 holoviews==1.20.0 html5lib==1.1 httpcore==1.0.7 httpimport==1.4.0 httplib2==0.22.0 httpx==0.28.1 huggingface-hub==0.27.0 humanize==4.11.0 hyperopt==0.2.7 ibis-framework==9.2.0 idna==3.10 imageio==2.36.1 imageio-ffmpeg==0.5.1 imagesize==1.4.1 imbalanced-learn==0.12.4 imgaug==0.4.0 immutabledict==4.2.1 importlib_metadata==8.5.0 importlib_resources==6.4.5 imutils==0.5.4 inflect==7.4.0 iniconfig==2.0.0 intel-cmplr-lib-ur==2025.0.4 intel-openmp==2025.0.4 ipyevents==2.0.2 ipyfilechooser==0.6.0 ipykernel==5.5.6 ipyleaflet==0.19.2 ipyparallel==8.8.0 ipython==7.34.0 ipython-genutils==0.2.0 ipython-sql==0.5.0 ipytree==0.2.2 ipywidgets==7.7.1 itsdangerous==2.2.0 jax==0.4.33 jax-cuda12-pjrt==0.4.33 jax-cuda12-plugin==0.4.33 jaxlib==0.4.33 jeepney==0.7.1 jellyfish==1.1.0 jieba==0.42.1 Jinja2==3.1.4 jiter==0.8.2 joblib==1.4.2 jsonpatch==1.33 jsonpickle==4.0.1 jsonpointer==3.0.0 jsonschema==4.23.0 jsonschema-specifications==2024.10.1 jupyter-client==6.1.12 jupyter-console==6.1.0 jupyter-leaflet==0.19.2 jupyter-server==1.24.0 jupyter_core==5.7.2 jupyterlab_pygments==0.3.0 jupyterlab_widgets==3.0.13 kaggle==1.6.17 kagglehub==0.3.5 keras==3.5.0 keyring==23.5.0 kiwisolver==1.4.7 langchain==0.3.12 langchain-core==0.3.25 langchain-text-splitters==0.3.3 langcodes==3.5.0 langsmith==0.2.3 language_data==1.3.0 launchpadlib==1.10.16 lazr.restfulclient==0.14.4 lazr.uri==1.0.6 lazy_loader==0.4 libclang==18.1.1 libcudf-cu12 @ https://pypi.nvidia.com/libcudf-cu12/libcudf_cu12-24.10.1-py3-none-manylinux_2_28_x86_64.whl librosa==0.10.2.post1 lightgbm==4.5.0 linkify-it-py==2.0.3 llvmlite==0.43.0 locket==1.0.0 logical-unification==0.4.6 lxml==5.3.0 marisa-trie==1.2.1 Markdown==3.7 markdown-it-py==3.0.0 MarkupSafe==3.0.2 matplotlib==3.8.0 matplotlib-inline==0.1.7 matplotlib-venn==1.1.1 mdit-py-plugins==0.4.2 mdurl==0.1.2 miniKanren==1.0.3 missingno==0.5.2 mistune==3.0.2 mizani==0.13.1 mkl==2025.0.1 ml-dtypes==0.4.1 mlxtend==0.23.3 more-itertools==10.5.0 moviepy==1.0.3 mpmath==1.3.0 msgpack==1.1.0 multidict==6.1.0 multipledispatch==1.0.0 multitasking==0.0.11 murmurhash==1.0.11 music21==9.3.0 namex==0.0.8 narwhals==1.18.4 natsort==8.4.0 nbclassic==1.1.0 nbclient==0.10.1 nbconvert==7.16.4 nbformat==5.10.4 ndindex==1.9.2 nest-asyncio==1.6.0 networkx==3.4.2 nibabel==5.3.2 nltk==3.9.1 notebook==6.5.5 notebook_shim==0.2.4 numba==0.60.0 numexpr==2.10.2 numpy==1.26.4 nvidia-cublas-cu12==12.6.4.1 nvidia-cuda-cupti-cu12==12.6.80 nvidia-cuda-nvcc-cu12==12.6.85 nvidia-cuda-runtime-cu12==12.6.77 nvidia-cudnn-cu12==9.6.0.74 nvidia-cufft-cu12==11.3.0.4 nvidia-curand-cu12==10.3.7.77 nvidia-cusolver-cu12==11.7.1.2 nvidia-cusparse-cu12==12.5.4.2 nvidia-nccl-cu12==2.23.4 nvidia-nvjitlink-cu12==12.6.85 nvtx==0.2.10 nx-cugraph-cu12 @ https://pypi.nvidia.com/nx-cugraph-cu12/nx_cugraph_cu12-24.10.0-py3-none-any.whl oauth2client==4.1.3 oauthlib==3.2.2 openai==1.57.4 opencv-contrib-python==4.10.0.84 opencv-python==4.10.0.84 opencv-python-headless==4.10.0.84 openpyxl==3.1.5 opentelemetry-api==1.29.0 opentelemetry-sdk==1.29.0 opentelemetry-semantic-conventions==0.50b0 opt_einsum==3.4.0 optax==0.2.4 optree==0.13.1 orbax-checkpoint==0.6.4 orjson==3.10.12 osqp==0.6.7.post3 packaging==24.2 pandas==2.2.2 pandas-datareader==0.10.0 pandas-gbq==0.25.0 pandas-stubs==2.2.2.240909 pandocfilters==1.5.1 panel==1.5.4 param==2.2.0 parso==0.8.4 parsy==2.1 partd==1.4.2 pathlib==1.0.1 patsy==1.0.1 peewee==3.17.8 peft==0.14.0 pexpect==4.9.0 pickleshare==0.7.5 pillow==11.0.0 platformdirs==4.3.6 plotly==5.24.1 plotnine==0.14.4 pluggy==1.5.0 ply==3.11 polars==1.9.0 pooch==1.8.2 portpicker==1.5.2 preshed==3.0.9 prettytable==3.12.0 proglog==0.1.10 progressbar2==4.5.0 prometheus_client==0.21.1 promise==2.3 prompt_toolkit==3.0.48 propcache==0.2.1 prophet==1.1.6 proto-plus==1.25.0 protobuf==4.25.5 psutil==5.9.5 psycopg2==2.9.10 ptyprocess==0.7.0 py-cpuinfo==9.0.0 py4j==0.10.9.7 pyarrow==17.0.0 pyasn1==0.6.1 pyasn1_modules==0.4.1 pycocotools==2.0.8 pycparser==2.22 pydantic==2.10.3 pydantic_core==2.27.1 pydata-google-auth==1.9.0 pydot==3.0.3 pydotplus==2.0.2 PyDrive==1.3.1 PyDrive2==1.21.3 pyerfa==2.0.1.5 pygame==2.6.1 pygit2==1.16.0 Pygments==2.18.0 PyGObject==3.42.1 PyJWT==2.10.1 pylibcudf-cu12 @ https://pypi.nvidia.com/pylibcudf-cu12/pylibcudf_cu12-24.10.1-cp310-cp310-manylinux_2_24_x86_64.manylinux_2_28_x86_64.whl pylibcugraph-cu12==24.10.0 pylibraft-cu12==24.10.0 pymc==5.19.1 pymystem3==0.2.0 pynvjitlink-cu12==0.4.0 pyogrio==0.10.0 Pyomo==6.8.2 PyOpenGL==3.1.7 pyOpenSSL==24.2.1 pyparsing==3.2.0 pyperclip==1.9.0 pyproj==3.7.0 pyshp==2.3.1 PySocks==1.7.1 pyspark==3.5.3 pytensor==2.26.4 pytest==8.3.4 python-apt==0.0.0 python-box==7.3.0 python-dateutil==2.8.2 python-louvain==0.16 python-slugify==8.0.4 python-utils==3.9.1 pytz==2024.2 pyviz_comms==3.0.3 PyYAML==6.0.2 pyzmq==24.0.1 qdldl==0.1.7.post4 ratelim==0.1.6 referencing==0.35.1 regex==2024.11.6 requests==2.32.3 requests-oauthlib==1.3.1 requests-toolbelt==1.0.0 requirements-parser==0.9.0 rich==13.9.4 rmm-cu12==24.10.0 rpds-py==0.22.3 rpy2==3.4.2 rsa==4.9 safetensors==0.4.5 scikit-image==0.25.0 scikit-learn==1.6.0 scipy==1.13.1 scooby==0.10.0 scs==3.2.7 seaborn==0.13.2 SecretStorage==3.3.1 Send2Trash==1.8.3 sentence-transformers==3.3.1 sentencepiece==0.2.0 sentry-sdk==2.19.2 setproctitle==1.3.4 shap==0.46.0 shapely==2.0.6 shellingham==1.5.4 simple-parsing==0.1.6 six==1.17.0 sklearn-pandas==2.2.0 slicer==0.0.8 smart-open==7.1.0 smmap==5.0.1 sniffio==1.3.1 snowballstemmer==2.2.0 soundfile==0.12.1 soupsieve==2.6 soxr==0.5.0.post1 spacy==3.7.5 spacy-legacy==3.0.12 spacy-loggers==1.0.5 Sphinx==8.1.3 sphinxcontrib-applehelp==2.0.0 sphinxcontrib-devhelp==2.0.0 sphinxcontrib-htmlhelp==2.1.0 sphinxcontrib-jsmath==1.0.1 sphinxcontrib-qthelp==2.0.0 sphinxcontrib-serializinghtml==2.0.0 SQLAlchemy==2.0.36 sqlglot==25.1.0 sqlparse==0.5.3 srsly==2.5.0 stanio==0.5.1 statsmodels==0.14.4 StrEnum==0.4.15 stringzilla==3.11.1 sympy==1.13.1 tables==3.10.1 tabulate==0.9.0 tbb==2022.0.0 tcmlib==1.2.0 tenacity==9.0.0 tensorboard==2.17.1 tensorboard-data-server==0.7.2 tensorflow==2.17.1 tensorflow-datasets==4.9.7 tensorflow-hub==0.16.1 tensorflow-io-gcs-filesystem==0.37.1 tensorflow-metadata==1.13.1 tensorflow-probability==0.24.0 tensorstore==0.1.71 termcolor==2.5.0 terminado==0.18.1 text-unidecode==1.3 textblob==0.17.1 tf-slim==1.1.0 tf_keras==2.17.0 thinc==8.2.5 threadpoolctl==3.5.0 tifffile==2024.12.12 timm==1.0.12 tinycss2==1.4.0 tokenizers==0.21.0 toml==0.10.2 tomli==2.2.1 toolz==0.12.1 torch @ https://download.pytorch.org/whl/cu121_full/torch-2.5.1%2Bcu121-cp310-cp310-linux_x86_64.whl torchaudio @ https://download.pytorch.org/whl/cu121/torchaudio-2.5.1%2Bcu121-cp310-cp310-linux_x86_64.whl torchsummary==1.5.1 torchvision @ https://download.pytorch.org/whl/cu121/torchvision-0.20.1%2Bcu121-cp310-cp310-linux_x86_64.whl tornado==6.3.3 tqdm==4.67.1 traitlets==5.7.1 traittypes==0.2.1 transformers==4.47.1 tweepy==4.14.0 typeguard==4.4.1 typer==0.15.1 types-pytz==2024.2.0.20241003 types-setuptools==75.6.0.20241126 typing_extensions==4.12.2 tzdata==2024.2 tzlocal==5.2 uc-micro-py==1.0.3 umf==0.9.1 uritemplate==4.1.1 urllib3==2.2.3 vega-datasets==0.9.0 wadllib==1.3.6 wandb==0.19.1 wasabi==1.1.3 wcwidth==0.2.13 weasel==0.4.1 webcolors==24.11.1 webencodings==0.5.1 websocket-client==1.8.0 websockets==14.1 Werkzeug==3.1.3 widgetsnbextension==3.6.10 wordcloud==1.9.4 wrapt==1.17.0 xarray==2024.11.0 xarray-einstats==0.8.0 xgboost==2.1.3 xlrd==2.0.1 xyzservices==2024.9.0 yarl==1.18.3 yellowbrick==1.5 yfinance==0.2.50 zipp==3.21.0
import torch # ACTIVATION FUNCTION testing using ReLU, fixing all other aspects to default import torch.nn as nn import torch.nn.init as init from torch.utils.data import DataLoader import torchvision import torchvision.transforms as transforms import matplotlib.pyplot as plt import time overall_start_time = time.time() model = nn.Sequential( nn.Linear(784, 196), # Input layer starting with number of pixels nn.ReLU(), # ReLU Activation nn.Linear(196, 49), # Hidden layer 1 nn.ReLU(), # ReLU Activation again nn.Linear(49, 10), # Hidden layer 2 ) for layer in model.modules(): if isinstance(layer, nn.Linear): init.kaiming_uniform_(layer.weight, nonlinearity='relu') if layer.bias is not None: init.zeros_(layer.bias) transform = transforms.Compose([ transforms.ToTensor(), transforms.Normalize((0.5,), (0.5,)) # ]) train_dataset = torchvision.datasets.FashionMNIST(root='./data', train=True, download=True, transform=transform) test_dataset = torchvision.datasets.FashionMNIST(root='./data', train=False, download=True, transform=transform) train_loader = DataLoader(train_dataset, batch_size=64, shuffle=True) test_loader = DataLoader(test_dataset, batch_size=64, shuffle=False) optimizer = torch.optim.Adam(model.parameters(), lr=0.001) # Adam optimization scheduler = torch.optim.lr_scheduler.StepLR(optimizer, step_size=30, gamma=0.1) loss_fn = nn.CrossEntropyLoss() # loss function num_epochs = 50 loss_values = [] for epoch in range(num_epochs): model.train() running_loss = 0.0 for batch_idx, (images, labels) in enumerate(train_loader): images = images.view(-1, 784) optimizer.zero_grad() outputs = model(images) loss = loss_fn(outputs, labels) loss.backward() optimizer.step() running_loss += loss.item() avg_train_loss = running_loss / len(train_loader) # Calculate the average training loss loss_values.append(avg_train_loss) print(f'Epoch {epoch+1}/{num_epochs} \t\t Training Loss: {avg_train_loss:.6f}') print(f"Epoch [{epoch+1}/{num_epochs}], Loss: {running_loss/len(train_loader):.4f}") # average loss plt.figure(figsize=(8, 6)) # start plotting plt.plot(range(1, num_epochs + 1), loss_values, marker='o', linestyle='-', color='b') plt.title('Loss Graph') plt.xlabel('Epoch') plt.ylabel('Loss') plt.grid(True) plt.show() end_time = time.time() overall_time = end_time - overall_start_time print(f"Total execution time: {overall_time:.2f} seconds")
Downloading http://fashion-mnist.s3-website.eu-central-1.amazonaws.com/train-images-idx3-ubyte.gz Downloading http://fashion-mnist.s3-website.eu-central-1.amazonaws.com/train-images-idx3-ubyte.gz to ./data/FashionMNIST/raw/train-images-idx3-ubyte.gz
100%|██████████| 26.4M/26.4M [00:01<00:00, 20.0MB/s]
Extracting ./data/FashionMNIST/raw/train-images-idx3-ubyte.gz to ./data/FashionMNIST/raw Downloading http://fashion-mnist.s3-website.eu-central-1.amazonaws.com/train-labels-idx1-ubyte.gz Downloading http://fashion-mnist.s3-website.eu-central-1.amazonaws.com/train-labels-idx1-ubyte.gz to ./data/FashionMNIST/raw/train-labels-idx1-ubyte.gz
100%|██████████| 29.5k/29.5k [00:00<00:00, 340kB/s]
Extracting ./data/FashionMNIST/raw/train-labels-idx1-ubyte.gz to ./data/FashionMNIST/raw Downloading http://fashion-mnist.s3-website.eu-central-1.amazonaws.com/t10k-images-idx3-ubyte.gz Downloading http://fashion-mnist.s3-website.eu-central-1.amazonaws.com/t10k-images-idx3-ubyte.gz to ./data/FashionMNIST/raw/t10k-images-idx3-ubyte.gz
100%|██████████| 4.42M/4.42M [00:00<00:00, 6.25MB/s]
Extracting ./data/FashionMNIST/raw/t10k-images-idx3-ubyte.gz to ./data/FashionMNIST/raw Downloading http://fashion-mnist.s3-website.eu-central-1.amazonaws.com/t10k-labels-idx1-ubyte.gz Downloading http://fashion-mnist.s3-website.eu-central-1.amazonaws.com/t10k-labels-idx1-ubyte.gz to ./data/FashionMNIST/raw/t10k-labels-idx1-ubyte.gz
100%|██████████| 5.15k/5.15k [00:00<00:00, 15.0MB/s]
Extracting ./data/FashionMNIST/raw/t10k-labels-idx1-ubyte.gz to ./data/FashionMNIST/raw Epoch 1/50 Training Loss: 0.491073 Epoch 2/50 Training Loss: 0.372335 Epoch 3/50 Training Loss: 0.336426 Epoch 4/50 Training Loss: 0.311305 Epoch 5/50 Training Loss: 0.294551 Epoch 6/50 Training Loss: 0.276374 Epoch 7/50 Training Loss: 0.262887 Epoch 8/50 Training Loss: 0.254526 Epoch 9/50 Training Loss: 0.243179 Epoch 10/50 Training Loss: 0.233376 Epoch 11/50 Training Loss: 0.223815 Epoch 12/50 Training Loss: 0.214887 Epoch 13/50 Training Loss: 0.209000 Epoch 14/50 Training Loss: 0.198861 Epoch 15/50 Training Loss: 0.190729 Epoch 16/50 Training Loss: 0.187529 Epoch 17/50 Training Loss: 0.178979 Epoch 18/50 Training Loss: 0.173951 Epoch 19/50 Training Loss: 0.168973 Epoch 20/50 Training Loss: 0.161324 Epoch 21/50 Training Loss: 0.157319 Epoch 22/50 Training Loss: 0.152727 Epoch 23/50 Training Loss: 0.149666 Epoch 24/50 Training Loss: 0.141848 Epoch 25/50 Training Loss: 0.139228 Epoch 26/50 Training Loss: 0.130827 Epoch 27/50 Training Loss: 0.132600 Epoch 28/50 Training Loss: 0.126852 Epoch 29/50 Training Loss: 0.127855 Epoch 30/50 Training Loss: 0.118615 Epoch 31/50 Training Loss: 0.115961 Epoch 32/50 Training Loss: 0.115706 Epoch 33/50 Training Loss: 0.110119 Epoch 34/50 Training Loss: 0.105448 Epoch 35/50 Training Loss: 0.107012 Epoch 36/50 Training Loss: 0.100668 Epoch 37/50 Training Loss: 0.095950 Epoch 38/50 Training Loss: 0.095926 Epoch 39/50 Training Loss: 0.099695 Epoch 40/50 Training Loss: 0.090311 Epoch 41/50 Training Loss: 0.090154 Epoch 42/50 Training Loss: 0.086391 Epoch 43/50 Training Loss: 0.085310 Epoch 44/50 Training Loss: 0.083046 Epoch 45/50 Training Loss: 0.084120 Epoch 46/50 Training Loss: 0.079927 Epoch 47/50 Training Loss: 0.076103 Epoch 48/50 Training Loss: 0.074748 Epoch 49/50 Training Loss: 0.080083 Epoch 50/50 Training Loss: 0.070000 Epoch [50/50], Loss: 0.0700
Image in a Jupyter notebook
Total execution time: 908.36 seconds
import torch # ACTIVATION FUNCTION testing using the sigmoid funciton, fixing all other aspects to default import torch.nn as nn import torch.nn.init as init from torch.utils.data import DataLoader import torchvision import torchvision.transforms as transforms import matplotlib.pyplot as plt import time overall_start_time = time.time() model = nn.Sequential( nn.Linear(784, 196), # Input layer starting with number of pixels nn.Sigmoid(), # sigmoid Activation nn.Linear(196, 49), # Hidden layer 1 nn.Sigmoid(), # sigmoid Activation again nn.Linear(49, 10), # Hidden layer 2 ) for layer in model.modules(): if isinstance(layer, nn.Linear): init.kaiming_uniform_(layer.weight, nonlinearity='relu') if layer.bias is not None: init.zeros_(layer.bias) transform = transforms.Compose([ transforms.ToTensor(), transforms.Normalize((0.5,), (0.5,)) # ]) train_dataset = torchvision.datasets.FashionMNIST(root='./data', train=True, download=True, transform=transform) test_dataset = torchvision.datasets.FashionMNIST(root='./data', train=False, download=True, transform=transform) train_loader = DataLoader(train_dataset, batch_size=64, shuffle=True) test_loader = DataLoader(test_dataset, batch_size=64, shuffle=False) optimizer = torch.optim.Adam(model.parameters(), lr=0.001) # Adam optimization scheduler = torch.optim.lr_scheduler.StepLR(optimizer, step_size=30, gamma=0.1) loss_fn = nn.CrossEntropyLoss() # loss function num_epochs = 50 loss_values = [] for epoch in range(num_epochs): model.train() running_loss = 0.0 for batch_idx, (images, labels) in enumerate(train_loader): images = images.view(-1, 784) optimizer.zero_grad() outputs = model(images) loss = loss_fn(outputs, labels) loss.backward() optimizer.step() running_loss += loss.item() avg_train_loss = running_loss / len(train_loader) # Calculate the average training loss loss_values.append(avg_train_loss) print(f'Epoch {epoch+1}/{num_epochs} \t\t Training Loss: {avg_train_loss:.6f}') print(f"Epoch [{epoch+1}/{num_epochs}], Loss: {running_loss/len(train_loader):.4f}") # average loss plt.figure(figsize=(8, 6)) # start plotting plt.plot(range(1, num_epochs + 1), loss_values, marker='o', linestyle='-', color='b') plt.title('Loss Graph') plt.xlabel('Epoch') plt.ylabel('Loss') plt.grid(True) plt.show() end_time = time.time() overall_time = end_time - overall_start_time print(f"Total execution time: {overall_time:.2f} seconds")
Epoch 1/50 Training Loss: 0.623159 Epoch 2/50 Training Loss: 0.396762 Epoch 3/50 Training Loss: 0.352551 Epoch 4/50 Training Loss: 0.322918 Epoch 5/50 Training Loss: 0.303704 Epoch 6/50 Training Loss: 0.287201 Epoch 7/50 Training Loss: 0.272715 Epoch 8/50 Training Loss: 0.258582 Epoch 9/50 Training Loss: 0.250206 Epoch 10/50 Training Loss: 0.237603 Epoch 11/50 Training Loss: 0.227341 Epoch 12/50 Training Loss: 0.218387 Epoch 13/50 Training Loss: 0.210624 Epoch 14/50 Training Loss: 0.199665 Epoch 15/50 Training Loss: 0.195412 Epoch 16/50 Training Loss: 0.186863 Epoch 17/50 Training Loss: 0.179738 Epoch 18/50 Training Loss: 0.170803 Epoch 19/50 Training Loss: 0.165805 Epoch 20/50 Training Loss: 0.160489 Epoch 21/50 Training Loss: 0.154385 Epoch 22/50 Training Loss: 0.147301 Epoch 23/50 Training Loss: 0.142705 Epoch 24/50 Training Loss: 0.137471 Epoch 25/50 Training Loss: 0.132235 Epoch 26/50 Training Loss: 0.128126 Epoch 27/50 Training Loss: 0.123801 Epoch 28/50 Training Loss: 0.118205 Epoch 29/50 Training Loss: 0.115828 Epoch 30/50 Training Loss: 0.108330 Epoch 31/50 Training Loss: 0.107677 Epoch 32/50 Training Loss: 0.104144 Epoch 33/50 Training Loss: 0.099967 Epoch 34/50 Training Loss: 0.093658 Epoch 35/50 Training Loss: 0.089677 Epoch 36/50 Training Loss: 0.090638 Epoch 37/50 Training Loss: 0.084376 Epoch 38/50 Training Loss: 0.081376 Epoch 39/50 Training Loss: 0.083408 Epoch 40/50 Training Loss: 0.075313 Epoch 41/50 Training Loss: 0.078010 Epoch 42/50 Training Loss: 0.070882 Epoch 43/50 Training Loss: 0.070870 Epoch 44/50 Training Loss: 0.069621 Epoch 45/50 Training Loss: 0.066462 Epoch 46/50 Training Loss: 0.063896 Epoch 47/50 Training Loss: 0.061813 Epoch 48/50 Training Loss: 0.059683 Epoch 49/50 Training Loss: 0.060118 Epoch 50/50 Training Loss: 0.055209 Epoch [50/50], Loss: 0.0552
Image in a Jupyter notebook
Total execution time: 886.70 seconds
import torch # ACTIVATION FUNCTION testing using the Tanh funciton, fixing all other aspects to default import torch.nn as nn import torch.nn.init as init from torch.utils.data import DataLoader import torchvision import torchvision.transforms as transforms import matplotlib.pyplot as plt import time overall_start_time = time.time() model = nn.Sequential( nn.Linear(784, 196), # Input layer starting with number of pixels nn.Tanh(), # tanh Activation nn.Linear(196, 49), # Hidden layer 1 nn.Tanh(), # tanh Activation again nn.Linear(49, 10), # Hidden layer 2 ) for layer in model.modules(): if isinstance(layer, nn.Linear): init.kaiming_uniform_(layer.weight, nonlinearity='relu') if layer.bias is not None: init.zeros_(layer.bias) transform = transforms.Compose([ transforms.ToTensor(), transforms.Normalize((0.5,), (0.5,)) # ]) train_dataset = torchvision.datasets.FashionMNIST(root='./data', train=True, download=True, transform=transform) test_dataset = torchvision.datasets.FashionMNIST(root='./data', train=False, download=True, transform=transform) train_loader = DataLoader(train_dataset, batch_size=64, shuffle=True) test_loader = DataLoader(test_dataset, batch_size=64, shuffle=False) optimizer = torch.optim.Adam(model.parameters(), lr=0.001) # Adam optimization scheduler = torch.optim.lr_scheduler.StepLR(optimizer, step_size=30, gamma=0.1) loss_fn = nn.CrossEntropyLoss() # loss function num_epochs = 50 loss_values = [] for epoch in range(num_epochs): model.train() running_loss = 0.0 for batch_idx, (images, labels) in enumerate(train_loader): images = images.view(-1, 784) optimizer.zero_grad() outputs = model(images) loss = loss_fn(outputs, labels) loss.backward() optimizer.step() running_loss += loss.item() avg_train_loss = running_loss / len(train_loader) # Calculate the average training loss loss_values.append(avg_train_loss) print(f'Epoch {epoch+1}/{num_epochs} \t\t Training Loss: {avg_train_loss:.6f}') print(f"Epoch [{epoch+1}/{num_epochs}], Loss: {running_loss/len(train_loader):.4f}") # average loss plt.figure(figsize=(8, 6)) # start plotting plt.plot(range(1, num_epochs + 1), loss_values, marker='o', linestyle='-', color='b') plt.title('Loss Graph') plt.xlabel('Epoch') plt.ylabel('Loss') plt.grid(True) plt.show() end_time = time.time() overall_time = end_time - overall_start_time print(f"Total execution time: {overall_time:.2f} seconds")
Epoch 1/50 Training Loss: 0.489344 Epoch 2/50 Training Loss: 0.374198 Epoch 3/50 Training Loss: 0.342553 Epoch 4/50 Training Loss: 0.327203 Epoch 5/50 Training Loss: 0.310987 Epoch 6/50 Training Loss: 0.303543 Epoch 7/50 Training Loss: 0.289979 Epoch 8/50 Training Loss: 0.279218 Epoch 9/50 Training Loss: 0.274207 Epoch 10/50 Training Loss: 0.268549 Epoch 11/50 Training Loss: 0.261003 Epoch 12/50 Training Loss: 0.255821 Epoch 13/50 Training Loss: 0.244163 Epoch 14/50 Training Loss: 0.241616 Epoch 15/50 Training Loss: 0.234600 Epoch 16/50 Training Loss: 0.229219 Epoch 17/50 Training Loss: 0.223256 Epoch 18/50 Training Loss: 0.221670 Epoch 19/50 Training Loss: 0.211989 Epoch 20/50 Training Loss: 0.209460 Epoch 21/50 Training Loss: 0.203153 Epoch 22/50 Training Loss: 0.201925 Epoch 23/50 Training Loss: 0.197228 Epoch 24/50 Training Loss: 0.194760 Epoch 25/50 Training Loss: 0.190140 Epoch 26/50 Training Loss: 0.186677 Epoch 27/50 Training Loss: 0.180780 Epoch 28/50 Training Loss: 0.176782 Epoch 29/50 Training Loss: 0.172914 Epoch 30/50 Training Loss: 0.176152 Epoch 31/50 Training Loss: 0.164171 Epoch 32/50 Training Loss: 0.166515 Epoch 33/50 Training Loss: 0.166046 Epoch 34/50 Training Loss: 0.160712 Epoch 35/50 Training Loss: 0.160786 Epoch 36/50 Training Loss: 0.151411 Epoch 37/50 Training Loss: 0.154511 Epoch 38/50 Training Loss: 0.147532 Epoch 39/50 Training Loss: 0.145198 Epoch 40/50 Training Loss: 0.143228 Epoch 41/50 Training Loss: 0.140657 Epoch 42/50 Training Loss: 0.139764 Epoch 43/50 Training Loss: 0.138779 Epoch 44/50 Training Loss: 0.134543 Epoch 45/50 Training Loss: 0.130844 Epoch 46/50 Training Loss: 0.131181 Epoch 47/50 Training Loss: 0.129612 Epoch 48/50 Training Loss: 0.124115 Epoch 49/50 Training Loss: 0.121498 Epoch 50/50 Training Loss: 0.124173 Epoch [50/50], Loss: 0.1242
Image in a Jupyter notebook
Total execution time: 885.05 seconds
import torch # ACTIVATION FUNCTION testing using the leaky ReLU funciton, fixing all other aspects to default import torch.nn as nn import torch.nn.init as init from torch.utils.data import DataLoader import torchvision import torchvision.transforms as transforms import matplotlib.pyplot as plt import time overall_start_time = time.time() model = nn.Sequential( nn.Linear(784, 196), # Input layer starting with number of pixels nn.LeakyReLU(), # leaky relu Activation nn.Linear(196, 49), # Hidden layer 1 nn.LeakyReLU(), # leaky relu Activation again nn.Linear(49, 10), # Hidden layer 2 ) for layer in model.modules(): if isinstance(layer, nn.Linear): init.kaiming_uniform_(layer.weight, nonlinearity='relu') if layer.bias is not None: init.zeros_(layer.bias) transform = transforms.Compose([ transforms.ToTensor(), transforms.Normalize((0.5,), (0.5,)) # ]) train_dataset = torchvision.datasets.FashionMNIST(root='./data', train=True, download=True, transform=transform) test_dataset = torchvision.datasets.FashionMNIST(root='./data', train=False, download=True, transform=transform) train_loader = DataLoader(train_dataset, batch_size=64, shuffle=True) test_loader = DataLoader(test_dataset, batch_size=64, shuffle=False) optimizer = torch.optim.Adam(model.parameters(), lr=0.001) # Adam optimization scheduler = torch.optim.lr_scheduler.StepLR(optimizer, step_size=30, gamma=0.1) loss_fn = nn.CrossEntropyLoss() # loss function num_epochs = 50 loss_values = [] for epoch in range(num_epochs): model.train() running_loss = 0.0 for batch_idx, (images, labels) in enumerate(train_loader): images = images.view(-1, 784) optimizer.zero_grad() outputs = model(images) loss = loss_fn(outputs, labels) loss.backward() optimizer.step() running_loss += loss.item() avg_train_loss = running_loss / len(train_loader) # Calculate the average training loss loss_values.append(avg_train_loss) print(f'Epoch {epoch+1}/{num_epochs} \t\t Training Loss: {avg_train_loss:.6f}') print(f"Epoch [{epoch+1}/{num_epochs}], Loss: {running_loss/len(train_loader):.4f}") # average loss plt.figure(figsize=(8, 6)) # start plotting plt.plot(range(1, num_epochs + 1), loss_values, marker='o', linestyle='-', color='b') plt.title('Loss Graph') plt.xlabel('Epoch') plt.ylabel('Loss') plt.grid(True) plt.show() end_time = time.time() overall_time = end_time - overall_start_time print(f"Total execution time: {overall_time:.2f} seconds")
Epoch 1/50 Training Loss: 0.486430 Epoch 2/50 Training Loss: 0.372871 Epoch 3/50 Training Loss: 0.334890 Epoch 4/50 Training Loss: 0.313318 Epoch 5/50 Training Loss: 0.296438 Epoch 6/50 Training Loss: 0.278528 Epoch 7/50 Training Loss: 0.267184 Epoch 8/50 Training Loss: 0.252480 Epoch 9/50 Training Loss: 0.243217 Epoch 10/50 Training Loss: 0.233004 Epoch 11/50 Training Loss: 0.226090 Epoch 12/50 Training Loss: 0.217958 Epoch 13/50 Training Loss: 0.210301 Epoch 14/50 Training Loss: 0.201226 Epoch 15/50 Training Loss: 0.193952 Epoch 16/50 Training Loss: 0.189236 Epoch 17/50 Training Loss: 0.178933 Epoch 18/50 Training Loss: 0.173359 Epoch 19/50 Training Loss: 0.169976 Epoch 20/50 Training Loss: 0.164831 Epoch 21/50 Training Loss: 0.157632 Epoch 22/50 Training Loss: 0.156691 Epoch 23/50 Training Loss: 0.145726 Epoch 24/50 Training Loss: 0.143349 Epoch 25/50 Training Loss: 0.137836 Epoch 26/50 Training Loss: 0.132540 Epoch 27/50 Training Loss: 0.126585 Epoch 28/50 Training Loss: 0.126027 Epoch 29/50 Training Loss: 0.122825 Epoch 30/50 Training Loss: 0.120335 Epoch 31/50 Training Loss: 0.115261 Epoch 32/50 Training Loss: 0.112070 Epoch 33/50 Training Loss: 0.109565 Epoch 34/50 Training Loss: 0.106597 Epoch 35/50 Training Loss: 0.103671 Epoch 36/50 Training Loss: 0.099289 Epoch 37/50 Training Loss: 0.098280 Epoch 38/50 Training Loss: 0.096868 Epoch 39/50 Training Loss: 0.092858 Epoch 40/50 Training Loss: 0.092839 Epoch 41/50 Training Loss: 0.087239 Epoch 42/50 Training Loss: 0.087515 Epoch 43/50 Training Loss: 0.083235 Epoch 44/50 Training Loss: 0.085039 Epoch 45/50 Training Loss: 0.080943 Epoch 46/50 Training Loss: 0.078993 Epoch 47/50 Training Loss: 0.079606 Epoch 48/50 Training Loss: 0.077269 Epoch 49/50 Training Loss: 0.075369 Epoch 50/50 Training Loss: 0.075574 Epoch [50/50], Loss: 0.0756
Image in a Jupyter notebook
Total execution time: 880.00 seconds

after testing activation functions, we can rank them based on loss as follows: least loss value: sigmoid activation: 0.0552

  • ReLU: 0.0700

  • leaky ReLU: 0.0756

  • highest loss value: tanh: 0.1242

  • the previous test showed that the best choice was sigmoid activation, and ReLU came in the second place, to fix one of them, well test both for the learning rates and watch their behaviour, then fix one of them for the next step the next parameter that will be tested will be the learning rate, the learning rates tested will be: -default (0.001 lr) which is already tested

  • 0.0001 lr

  • 0.01 lr -first, sigmoid with 0.0001 and 0.01

import torch # ACTIVATION FUNCTION testing using the sigmoid funciton, and learning rate of 0.0001 import torch.nn as nn import torch.nn.init as init from torch.utils.data import DataLoader import torchvision import torchvision.transforms as transforms import matplotlib.pyplot as plt import time overall_start_time = time.time() model = nn.Sequential( nn.Linear(784, 196), # Input layer starting with number of pixels nn.Sigmoid(), # sigmoid Activation nn.Linear(196, 49), # Hidden layer 1 nn.Sigmoid(), # sigmoid Activation again nn.Linear(49, 10), # Hidden layer 2 ) for layer in model.modules(): if isinstance(layer, nn.Linear): init.kaiming_uniform_(layer.weight, nonlinearity='relu') if layer.bias is not None: init.zeros_(layer.bias) transform = transforms.Compose([ transforms.ToTensor(), transforms.Normalize((0.5,), (0.5,)) # ]) train_dataset = torchvision.datasets.FashionMNIST(root='./data', train=True, download=True, transform=transform) test_dataset = torchvision.datasets.FashionMNIST(root='./data', train=False, download=True, transform=transform) train_loader = DataLoader(train_dataset, batch_size=64, shuffle=True) test_loader = DataLoader(test_dataset, batch_size=64, shuffle=False) optimizer = torch.optim.Adam(model.parameters(), lr=0.0001) # Adam optimization (0.0001 learning rate) scheduler = torch.optim.lr_scheduler.StepLR(optimizer, step_size=30, gamma=0.1) loss_fn = nn.CrossEntropyLoss() # loss function num_epochs = 50 loss_values = [] for epoch in range(num_epochs): model.train() running_loss = 0.0 for batch_idx, (images, labels) in enumerate(train_loader): images = images.view(-1, 784) optimizer.zero_grad() outputs = model(images) loss = loss_fn(outputs, labels) loss.backward() optimizer.step() running_loss += loss.item() avg_train_loss = running_loss / len(train_loader) # Calculate the average training loss loss_values.append(avg_train_loss) print(f'Epoch {epoch+1}/{num_epochs} \t\t Training Loss: {avg_train_loss:.6f}') print(f"Epoch [{epoch+1}/{num_epochs}], Loss: {running_loss/len(train_loader):.4f}") # average loss plt.figure(figsize=(8, 6)) # start plotting plt.plot(range(1, num_epochs + 1), loss_values, marker='o', linestyle='-', color='b') plt.title('Loss Graph') plt.xlabel('Epoch') plt.ylabel('Loss') plt.grid(True) plt.show() end_time = time.time() overall_time = end_time - overall_start_time print(f"Total execution time: {overall_time:.2f} seconds")
Epoch 1/50 Training Loss: 1.357750 Epoch 2/50 Training Loss: 0.764527 Epoch 3/50 Training Loss: 0.603403 Epoch 4/50 Training Loss: 0.526284 Epoch 5/50 Training Loss: 0.482179 Epoch 6/50 Training Loss: 0.452608 Epoch 7/50 Training Loss: 0.431605 Epoch 8/50 Training Loss: 0.414913 Epoch 9/50 Training Loss: 0.401807 Epoch 10/50 Training Loss: 0.390342 Epoch 11/50 Training Loss: 0.380848 Epoch 12/50 Training Loss: 0.371898 Epoch 13/50 Training Loss: 0.364814 Epoch 14/50 Training Loss: 0.357774 Epoch 15/50 Training Loss: 0.351222 Epoch 16/50 Training Loss: 0.345146 Epoch 17/50 Training Loss: 0.339950 Epoch 18/50 Training Loss: 0.334707 Epoch 19/50 Training Loss: 0.329543 Epoch 20/50 Training Loss: 0.325263 Epoch 21/50 Training Loss: 0.320493 Epoch 22/50 Training Loss: 0.316247 Epoch 23/50 Training Loss: 0.312365 Epoch 24/50 Training Loss: 0.308048 Epoch 25/50 Training Loss: 0.304240 Epoch 26/50 Training Loss: 0.300261 Epoch 27/50 Training Loss: 0.297047 Epoch 28/50 Training Loss: 0.293359 Epoch 29/50 Training Loss: 0.290109 Epoch 30/50 Training Loss: 0.286644 Epoch 31/50 Training Loss: 0.283765 Epoch 32/50 Training Loss: 0.280418 Epoch 33/50 Training Loss: 0.277742 Epoch 34/50 Training Loss: 0.273979 Epoch 35/50 Training Loss: 0.271113 Epoch 36/50 Training Loss: 0.268754 Epoch 37/50 Training Loss: 0.265765 Epoch 38/50 Training Loss: 0.262827 Epoch 39/50 Training Loss: 0.259958 Epoch 40/50 Training Loss: 0.257680 Epoch 41/50 Training Loss: 0.254693 Epoch 42/50 Training Loss: 0.252112 Epoch 43/50 Training Loss: 0.249682 Epoch 44/50 Training Loss: 0.247091 Epoch 45/50 Training Loss: 0.244643 Epoch 46/50 Training Loss: 0.242264 Epoch 47/50 Training Loss: 0.239826 Epoch 48/50 Training Loss: 0.237778 Epoch 49/50 Training Loss: 0.235158 Epoch 50/50 Training Loss: 0.232688 Epoch [50/50], Loss: 0.2327
Image in a Jupyter notebook
Total execution time: 883.79 seconds
import torch # ACTIVATION FUNCTION testing using the sigmoid funciton, and learning rate of 0.01 import torch.nn as nn import torch.nn.init as init from torch.utils.data import DataLoader import torchvision import torchvision.transforms as transforms import matplotlib.pyplot as plt import time overall_start_time = time.time() model = nn.Sequential( nn.Linear(784, 196), # Input layer starting with number of pixels nn.Sigmoid(), # sigmoid Activation nn.Linear(196, 49), # Hidden layer 1 nn.Sigmoid(), # sigmoid Activation again nn.Linear(49, 10), # Hidden layer 2 ) for layer in model.modules(): if isinstance(layer, nn.Linear): init.kaiming_uniform_(layer.weight, nonlinearity='relu') if layer.bias is not None: init.zeros_(layer.bias) transform = transforms.Compose([ transforms.ToTensor(), transforms.Normalize((0.5,), (0.5,)) # ]) train_dataset = torchvision.datasets.FashionMNIST(root='./data', train=True, download=True, transform=transform) test_dataset = torchvision.datasets.FashionMNIST(root='./data', train=False, download=True, transform=transform) train_loader = DataLoader(train_dataset, batch_size=64, shuffle=True) test_loader = DataLoader(test_dataset, batch_size=64, shuffle=False) optimizer = torch.optim.Adam(model.parameters(), lr=0.01) # Adam optimization (0.0001 learning rate) scheduler = torch.optim.lr_scheduler.StepLR(optimizer, step_size=30, gamma=0.1) loss_fn = nn.CrossEntropyLoss() # loss function num_epochs = 50 loss_values = [] for epoch in range(num_epochs): model.train() running_loss = 0.0 for batch_idx, (images, labels) in enumerate(train_loader): images = images.view(-1, 784) optimizer.zero_grad() outputs = model(images) loss = loss_fn(outputs, labels) loss.backward() optimizer.step() running_loss += loss.item() avg_train_loss = running_loss / len(train_loader) # Calculate the average training loss loss_values.append(avg_train_loss) print(f'Epoch {epoch+1}/{num_epochs} \t\t Training Loss: {avg_train_loss:.6f}') print(f"Epoch [{epoch+1}/{num_epochs}], Loss: {running_loss/len(train_loader):.4f}") # average loss plt.figure(figsize=(8, 6)) # start plotting plt.plot(range(1, num_epochs + 1), loss_values, marker='o', linestyle='-', color='m') plt.title('Loss Graph') plt.xlabel('Epoch') plt.ylabel('Loss') plt.grid(True) plt.show() end_time = time.time() overall_time = end_time - overall_start_time print(f"Total execution time: {overall_time:.2f} seconds")
Epoch 1/50 Training Loss: 0.610542 Epoch 2/50 Training Loss: 0.518105 Epoch 3/50 Training Loss: 0.491139 Epoch 4/50 Training Loss: 0.481820 Epoch 5/50 Training Loss: 0.469529 Epoch 6/50 Training Loss: 0.461296 Epoch 7/50 Training Loss: 0.460112 Epoch 8/50 Training Loss: 0.445243 Epoch 9/50 Training Loss: 0.455644 Epoch 10/50 Training Loss: 0.446742 Epoch 11/50 Training Loss: 0.441687 Epoch 12/50 Training Loss: 0.444206 Epoch 13/50 Training Loss: 0.445777 Epoch 14/50 Training Loss: 0.448170 Epoch 15/50 Training Loss: 0.430399 Epoch 16/50 Training Loss: 0.427066 Epoch 17/50 Training Loss: 0.428164 Epoch 18/50 Training Loss: 0.429806 Epoch 19/50 Training Loss: 0.418735 Epoch 20/50 Training Loss: 0.420601 Epoch 21/50 Training Loss: 0.426985 Epoch 22/50 Training Loss: 0.427167 Epoch 23/50 Training Loss: 0.420748 Epoch 24/50 Training Loss: 0.419190 Epoch 25/50 Training Loss: 0.416064 Epoch 26/50 Training Loss: 0.412762 Epoch 27/50 Training Loss: 0.423388 Epoch 28/50 Training Loss: 0.421093 Epoch 29/50 Training Loss: 0.418390 Epoch 30/50 Training Loss: 0.409606 Epoch 31/50 Training Loss: 0.410489 Epoch 32/50 Training Loss: 0.420917 Epoch 33/50 Training Loss: 0.422161 Epoch 34/50 Training Loss: 0.413026 Epoch 35/50 Training Loss: 0.416692 Epoch 36/50 Training Loss: 0.410712 Epoch 37/50 Training Loss: 0.407973 Epoch 38/50 Training Loss: 0.410653 Epoch 39/50 Training Loss: 0.413227 Epoch 40/50 Training Loss: 0.411169 Epoch 41/50 Training Loss: 0.410736 Epoch 42/50 Training Loss: 0.407875 Epoch 43/50 Training Loss: 0.411081 Epoch 44/50 Training Loss: 0.406286 Epoch 45/50 Training Loss: 0.407157 Epoch 46/50 Training Loss: 0.413552 Epoch 47/50 Training Loss: 0.407060 Epoch 48/50 Training Loss: 0.407997 Epoch 49/50 Training Loss: 0.414667 Epoch 50/50 Training Loss: 0.411493 Epoch [50/50], Loss: 0.4115
Image in a Jupyter notebook
Total execution time: 987.29 seconds
  • Next, trying ReLU with 0.0001 and then 0.01 learning rate

import torch # ACTIVATION FUNCTION testing using ReLU, and learning rate of 0.0001 import torch.nn as nn import torch.nn.init as init from torch.utils.data import DataLoader import torchvision import torchvision.transforms as transforms import matplotlib.pyplot as plt import time overall_start_time = time.time() model = nn.Sequential( nn.Linear(784, 196), # Input layer starting with number of pixels nn.ReLU(), # ReLU Activation nn.Linear(196, 49), # Hidden layer 1 nn.ReLU(), # ReLU Activation again nn.Linear(49, 10), # Hidden layer 2 ) for layer in model.modules(): if isinstance(layer, nn.Linear): init.kaiming_uniform_(layer.weight, nonlinearity='relu') if layer.bias is not None: init.zeros_(layer.bias) transform = transforms.Compose([ transforms.ToTensor(), transforms.Normalize((0.5,), (0.5,)) # ]) train_dataset = torchvision.datasets.FashionMNIST(root='./data', train=True, download=True, transform=transform) test_dataset = torchvision.datasets.FashionMNIST(root='./data', train=False, download=True, transform=transform) train_loader = DataLoader(train_dataset, batch_size=64, shuffle=True) test_loader = DataLoader(test_dataset, batch_size=64, shuffle=False) optimizer = torch.optim.Adam(model.parameters(), lr=0.0001) # Adam optimization scheduler = torch.optim.lr_scheduler.StepLR(optimizer, step_size=30, gamma=0.1) loss_fn = nn.CrossEntropyLoss() # loss function num_epochs = 50 loss_values = [] for epoch in range(num_epochs): model.train() running_loss = 0.0 for batch_idx, (images, labels) in enumerate(train_loader): images = images.view(-1, 784) optimizer.zero_grad() outputs = model(images) loss = loss_fn(outputs, labels) loss.backward() optimizer.step() running_loss += loss.item() avg_train_loss = running_loss / len(train_loader) # Calculate the average training loss loss_values.append(avg_train_loss) print(f'Epoch {epoch+1}/{num_epochs} \t\t Training Loss: {avg_train_loss:.6f}') print(f"Epoch [{epoch+1}/{num_epochs}], Loss: {running_loss/len(train_loader):.4f}") # average loss plt.figure(figsize=(8, 6)) # start plotting plt.plot(range(1, num_epochs + 1), loss_values, marker='o', linestyle='-', color='m') plt.title('Loss Graph') plt.xlabel('Epoch') plt.ylabel('Loss') plt.grid(True) plt.show() end_time = time.time() overall_time = end_time - overall_start_time print(f"Total execution time: {overall_time:.2f} seconds")
Epoch 1/50 Training Loss: 0.613651 Epoch 2/50 Training Loss: 0.420630 Epoch 3/50 Training Loss: 0.378545 Epoch 4/50 Training Loss: 0.352966 Epoch 5/50 Training Loss: 0.333582 Epoch 6/50 Training Loss: 0.318511 Epoch 7/50 Training Loss: 0.305931 Epoch 8/50 Training Loss: 0.295367 Epoch 9/50 Training Loss: 0.284473 Epoch 10/50 Training Loss: 0.275701 Epoch 11/50 Training Loss: 0.268150 Epoch 12/50 Training Loss: 0.260276 Epoch 13/50 Training Loss: 0.253808 Epoch 14/50 Training Loss: 0.247806 Epoch 15/50 Training Loss: 0.240589 Epoch 16/50 Training Loss: 0.235360 Epoch 17/50 Training Loss: 0.230745 Epoch 18/50 Training Loss: 0.225381 Epoch 19/50 Training Loss: 0.220176 Epoch 20/50 Training Loss: 0.215123 Epoch 21/50 Training Loss: 0.210319 Epoch 22/50 Training Loss: 0.204473 Epoch 23/50 Training Loss: 0.201269 Epoch 24/50 Training Loss: 0.197964 Epoch 25/50 Training Loss: 0.192583 Epoch 26/50 Training Loss: 0.189566 Epoch 27/50 Training Loss: 0.184053 Epoch 28/50 Training Loss: 0.180786 Epoch 29/50 Training Loss: 0.178173 Epoch 30/50 Training Loss: 0.175019 Epoch 31/50 Training Loss: 0.170429 Epoch 32/50 Training Loss: 0.167533 Epoch 33/50 Training Loss: 0.163002 Epoch 34/50 Training Loss: 0.160265 Epoch 35/50 Training Loss: 0.157373 Epoch 36/50 Training Loss: 0.153442 Epoch 37/50 Training Loss: 0.151394 Epoch 38/50 Training Loss: 0.147814 Epoch 39/50 Training Loss: 0.143295 Epoch 40/50 Training Loss: 0.140794 Epoch 41/50 Training Loss: 0.137969 Epoch 42/50 Training Loss: 0.135147 Epoch 43/50 Training Loss: 0.132733 Epoch 44/50 Training Loss: 0.131101 Epoch 45/50 Training Loss: 0.127653 Epoch 46/50 Training Loss: 0.124836 Epoch 47/50 Training Loss: 0.122102 Epoch 48/50 Training Loss: 0.119400 Epoch 49/50 Training Loss: 0.116196 Epoch 50/50 Training Loss: 0.115179 Epoch [50/50], Loss: 0.1152
Image in a Jupyter notebook
Total execution time: 880.78 seconds
import torch # ACTIVATION FUNCTION testing using ReLU, and learning rate of 0.01 import torch.nn as nn import torch.nn.init as init from torch.utils.data import DataLoader import torchvision import torchvision.transforms as transforms import matplotlib.pyplot as plt import time overall_start_time = time.time() model = nn.Sequential( nn.Linear(784, 196), # Input layer starting with number of pixels nn.ReLU(), # ReLU Activation nn.Linear(196, 49), # Hidden layer 1 nn.ReLU(), # ReLU Activation again nn.Linear(49, 10), # Hidden layer 2 ) for layer in model.modules(): if isinstance(layer, nn.Linear): init.kaiming_uniform_(layer.weight, nonlinearity='relu') if layer.bias is not None: init.zeros_(layer.bias) transform = transforms.Compose([ transforms.ToTensor(), transforms.Normalize((0.5,), (0.5,)) # ]) train_dataset = torchvision.datasets.FashionMNIST(root='./data', train=True, download=True, transform=transform) test_dataset = torchvision.datasets.FashionMNIST(root='./data', train=False, download=True, transform=transform) train_loader = DataLoader(train_dataset, batch_size=64, shuffle=True) test_loader = DataLoader(test_dataset, batch_size=64, shuffle=False) optimizer = torch.optim.Adam(model.parameters(), lr=0.01) # Adam optimization scheduler = torch.optim.lr_scheduler.StepLR(optimizer, step_size=30, gamma=0.1) loss_fn = nn.CrossEntropyLoss() # loss function num_epochs = 50 loss_values = [] for epoch in range(num_epochs): model.train() running_loss = 0.0 for batch_idx, (images, labels) in enumerate(train_loader): images = images.view(-1, 784) optimizer.zero_grad() outputs = model(images) loss = loss_fn(outputs, labels) loss.backward() optimizer.step() running_loss += loss.item() avg_train_loss = running_loss / len(train_loader) # Calculate the average training loss loss_values.append(avg_train_loss) print(f'Epoch {epoch+1}/{num_epochs} \t\t Training Loss: {avg_train_loss:.6f}') print(f"Epoch [{epoch+1}/{num_epochs}], Loss: {running_loss/len(train_loader):.4f}") # average loss plt.figure(figsize=(8, 6)) # start plotting plt.plot(range(1, num_epochs + 1), loss_values, marker='o', linestyle='-', color='m') plt.title('Loss Graph') plt.xlabel('Epoch') plt.ylabel('Loss') plt.grid(True) plt.show() end_time = time.time() overall_time = end_time - overall_start_time print(f"Total execution time: {overall_time:.2f} seconds")
Epoch 1/50 Training Loss: 0.565660 Epoch 2/50 Training Loss: 0.442644 Epoch 3/50 Training Loss: 0.411351 Epoch 4/50 Training Loss: 0.404930 Epoch 5/50 Training Loss: 0.391816 Epoch 6/50 Training Loss: 0.384859 Epoch 7/50 Training Loss: 0.375026 Epoch 8/50 Training Loss: 0.378583 Epoch 9/50 Training Loss: 0.366482 Epoch 10/50 Training Loss: 0.363885 Epoch 11/50 Training Loss: 0.353679 Epoch 12/50 Training Loss: 0.353171 Epoch 13/50 Training Loss: 0.344180 Epoch 14/50 Training Loss: 0.356926 Epoch 15/50 Training Loss: 0.343681 Epoch 16/50 Training Loss: 0.338881 Epoch 17/50 Training Loss: 0.335849 Epoch 18/50 Training Loss: 0.342563 Epoch 19/50 Training Loss: 0.336395 Epoch 20/50 Training Loss: 0.332542 Epoch 21/50 Training Loss: 0.334098 Epoch 22/50 Training Loss: 0.327308 Epoch 23/50 Training Loss: 0.333985 Epoch 24/50 Training Loss: 0.321507 Epoch 25/50 Training Loss: 0.318541 Epoch 26/50 Training Loss: 0.319549 Epoch 27/50 Training Loss: 0.323762 Epoch 28/50 Training Loss: 0.313547 Epoch 29/50 Training Loss: 0.320260 Epoch 30/50 Training Loss: 0.314008 Epoch 31/50 Training Loss: 0.307129 Epoch 32/50 Training Loss: 0.310373 Epoch 33/50 Training Loss: 0.309876 Epoch 34/50 Training Loss: 0.310303 Epoch 35/50 Training Loss: 0.300945 Epoch 36/50 Training Loss: 0.319638 Epoch 37/50 Training Loss: 0.304515 Epoch 38/50 Training Loss: 0.319913 Epoch 39/50 Training Loss: 0.294351 Epoch 40/50 Training Loss: 0.296698 Epoch 41/50 Training Loss: 0.300885 Epoch 42/50 Training Loss: 0.293036 Epoch 43/50 Training Loss: 0.304532 Epoch 44/50 Training Loss: 0.295627 Epoch 45/50 Training Loss: 0.296901 Epoch 46/50 Training Loss: 0.294613 Epoch 47/50 Training Loss: 0.302215 Epoch 48/50 Training Loss: 0.302604 Epoch 49/50 Training Loss: 0.302699 Epoch 50/50 Training Loss: 0.288628 Epoch [50/50], Loss: 0.2886
Image in a Jupyter notebook
Total execution time: 956.67 seconds

after trying different learning rates with sigmoid and ReLU activation, the results were:

  • for sigmoid activation:

    • 0.0001 learning rate: Loss: 0.2327

    • 0.001 learning rate: Loss: 0.0552

    • 0.01 learning rate: Loss: 0.4115

  • for ReLU activation:

    • 0.0001 learning rate: Loss: 0.1152

    • 0.001 learning rate: Loss: 0.0700

    • 0.01 learning rate: Loss: 0.2886

    in both cases, the results indicate that 0.001 is the best learning rate choice, thus, for the next step, the learning rate will be fixed to 0.001, and the next variable will be tested for both sigmoid and relu too.

    the next parameter is the step size:

    • 20 steps

    • 30 steps (already tried)

    • 40 steps with both relu and sigmoid

import torch # ACTIVATION FUNCTION testing using the sigmoid funciton, learning rate of 0.001, step size: 20 import torch.nn as nn import torch.nn.init as init from torch.utils.data import DataLoader import torchvision import torchvision.transforms as transforms import matplotlib.pyplot as plt import time overall_start_time = time.time() model = nn.Sequential( nn.Linear(784, 196), # Input layer starting with number of pixels nn.Sigmoid(), # sigmoid Activation nn.Linear(196, 49), # Hidden layer 1 nn.Sigmoid(), # sigmoid Activation again nn.Linear(49, 10), # Hidden layer 2 ) for layer in model.modules(): if isinstance(layer, nn.Linear): init.kaiming_uniform_(layer.weight, nonlinearity='relu') if layer.bias is not None: init.zeros_(layer.bias) transform = transforms.Compose([ transforms.ToTensor(), transforms.Normalize((0.5,), (0.5,)) # ]) train_dataset = torchvision.datasets.FashionMNIST(root='./data', train=True, download=True, transform=transform) test_dataset = torchvision.datasets.FashionMNIST(root='./data', train=False, download=True, transform=transform) train_loader = DataLoader(train_dataset, batch_size=64, shuffle=True) test_loader = DataLoader(test_dataset, batch_size=64, shuffle=False) optimizer = torch.optim.Adam(model.parameters(), lr=0.001) # Adam optimization scheduler = torch.optim.lr_scheduler.StepLR(optimizer, step_size=20, gamma=0.1) loss_fn = nn.CrossEntropyLoss() # loss function num_epochs = 50 loss_values = [] for epoch in range(num_epochs): model.train() running_loss = 0.0 for batch_idx, (images, labels) in enumerate(train_loader): images = images.view(-1, 784) optimizer.zero_grad() outputs = model(images) loss = loss_fn(outputs, labels) loss.backward() optimizer.step() running_loss += loss.item() avg_train_loss = running_loss / len(train_loader) # Calculate the average training loss loss_values.append(avg_train_loss) print(f'Epoch {epoch+1}/{num_epochs} \t\t Training Loss: {avg_train_loss:.6f}') print(f"Epoch [{epoch+1}/{num_epochs}], Loss: {running_loss/len(train_loader):.4f}") # average loss plt.figure(figsize=(8, 6)) # start plotting plt.plot(range(1, num_epochs + 1), loss_values, marker='o', linestyle='-', color='b') plt.title('Loss Graph') plt.xlabel('Epoch') plt.ylabel('Loss') plt.grid(True) plt.show() end_time = time.time() overall_time = end_time - overall_start_time print(f"Total execution time: {overall_time:.2f} seconds")
Epoch 1/50 Training Loss: 0.624846 Epoch 2/50 Training Loss: 0.395035 Epoch 3/50 Training Loss: 0.354259 Epoch 4/50 Training Loss: 0.324547 Epoch 5/50 Training Loss: 0.305715 Epoch 6/50 Training Loss: 0.288480 Epoch 7/50 Training Loss: 0.274742 Epoch 8/50 Training Loss: 0.261527 Epoch 9/50 Training Loss: 0.250387 Epoch 10/50 Training Loss: 0.238394 Epoch 11/50 Training Loss: 0.228370 Epoch 12/50 Training Loss: 0.218225 Epoch 13/50 Training Loss: 0.211006 Epoch 14/50 Training Loss: 0.203128 Epoch 15/50 Training Loss: 0.194166 Epoch 16/50 Training Loss: 0.188384 Epoch 17/50 Training Loss: 0.180789 Epoch 18/50 Training Loss: 0.174063 Epoch 19/50 Training Loss: 0.167953 Epoch 20/50 Training Loss: 0.160659 Epoch 21/50 Training Loss: 0.155177 Epoch 22/50 Training Loss: 0.150414 Epoch 23/50 Training Loss: 0.146337 Epoch 24/50 Training Loss: 0.138728 Epoch 25/50 Training Loss: 0.135017 Epoch 26/50 Training Loss: 0.129096 Epoch 27/50 Training Loss: 0.123514 Epoch 28/50 Training Loss: 0.118592 Epoch 29/50 Training Loss: 0.116223 Epoch 30/50 Training Loss: 0.111136 Epoch 31/50 Training Loss: 0.109252 Epoch 32/50 Training Loss: 0.105280 Epoch 33/50 Training Loss: 0.102373 Epoch 34/50 Training Loss: 0.097798 Epoch 35/50 Training Loss: 0.094215 Epoch 36/50 Training Loss: 0.088690 Epoch 37/50 Training Loss: 0.087968 Epoch 38/50 Training Loss: 0.080736 Epoch 39/50 Training Loss: 0.081164 Epoch 40/50 Training Loss: 0.075552 Epoch 41/50 Training Loss: 0.076904 Epoch 42/50 Training Loss: 0.072252 Epoch 43/50 Training Loss: 0.071878 Epoch 44/50 Training Loss: 0.066531 Epoch 45/50 Training Loss: 0.067229 Epoch 46/50 Training Loss: 0.061686 Epoch 47/50 Training Loss: 0.063848 Epoch 48/50 Training Loss: 0.062729 Epoch 49/50 Training Loss: 0.056605 Epoch 50/50 Training Loss: 0.060693 Epoch [50/50], Loss: 0.0607
Image in a Jupyter notebook
Total execution time: 894.34 seconds
import torch # ACTIVATION FUNCTION testing using the sigmoid funciton, learning rate of 0.001, step size: 40 import torch.nn as nn import torch.nn.init as init from torch.utils.data import DataLoader import torchvision import torchvision.transforms as transforms import matplotlib.pyplot as plt import time overall_start_time = time.time() model = nn.Sequential( nn.Linear(784, 196), # Input layer starting with number of pixels nn.Sigmoid(), # sigmoid Activation nn.Linear(196, 49), # Hidden layer 1 nn.Sigmoid(), # sigmoid Activation again nn.Linear(49, 10), # Hidden layer 2 ) for layer in model.modules(): if isinstance(layer, nn.Linear): init.kaiming_uniform_(layer.weight, nonlinearity='relu') if layer.bias is not None: init.zeros_(layer.bias) transform = transforms.Compose([ transforms.ToTensor(), transforms.Normalize((0.5,), (0.5,)) # ]) train_dataset = torchvision.datasets.FashionMNIST(root='./data', train=True, download=True, transform=transform) test_dataset = torchvision.datasets.FashionMNIST(root='./data', train=False, download=True, transform=transform) train_loader = DataLoader(train_dataset, batch_size=64, shuffle=True) test_loader = DataLoader(test_dataset, batch_size=64, shuffle=False) optimizer = torch.optim.Adam(model.parameters(), lr=0.001) # Adam optimization scheduler = torch.optim.lr_scheduler.StepLR(optimizer, step_size=40, gamma=0.1) loss_fn = nn.CrossEntropyLoss() # loss function num_epochs = 50 loss_values = [] for epoch in range(num_epochs): model.train() running_loss = 0.0 for batch_idx, (images, labels) in enumerate(train_loader): images = images.view(-1, 784) optimizer.zero_grad() outputs = model(images) loss = loss_fn(outputs, labels) loss.backward() optimizer.step() running_loss += loss.item() avg_train_loss = running_loss / len(train_loader) # Calculate the average training loss loss_values.append(avg_train_loss) print(f'Epoch {epoch+1}/{num_epochs} \t\t Training Loss: {avg_train_loss:.6f}') print(f"Epoch [{epoch+1}/{num_epochs}], Loss: {running_loss/len(train_loader):.4f}") # average loss plt.figure(figsize=(8, 6)) # start plotting plt.plot(range(1, num_epochs + 1), loss_values, marker='o', linestyle='-', color='m') plt.title('Loss Graph') plt.xlabel('Epoch') plt.ylabel('Loss') plt.grid(True) plt.show() end_time = time.time() overall_time = end_time - overall_start_time print(f"Total execution time: {overall_time:.2f} seconds")
Epoch 1/50 Training Loss: 0.621799 Epoch 2/50 Training Loss: 0.398737 Epoch 3/50 Training Loss: 0.354447 Epoch 4/50 Training Loss: 0.324915 Epoch 5/50 Training Loss: 0.306455 Epoch 6/50 Training Loss: 0.288312 Epoch 7/50 Training Loss: 0.275030 Epoch 8/50 Training Loss: 0.260451 Epoch 9/50 Training Loss: 0.248187 Epoch 10/50 Training Loss: 0.238856 Epoch 11/50 Training Loss: 0.228516 Epoch 12/50 Training Loss: 0.218785 Epoch 13/50 Training Loss: 0.211756 Epoch 14/50 Training Loss: 0.199807 Epoch 15/50 Training Loss: 0.194293 Epoch 16/50 Training Loss: 0.185550 Epoch 17/50 Training Loss: 0.179705 Epoch 18/50 Training Loss: 0.172099 Epoch 19/50 Training Loss: 0.165836 Epoch 20/50 Training Loss: 0.158619 Epoch 21/50 Training Loss: 0.154102 Epoch 22/50 Training Loss: 0.151386 Epoch 23/50 Training Loss: 0.139317 Epoch 24/50 Training Loss: 0.137678 Epoch 25/50 Training Loss: 0.129708 Epoch 26/50 Training Loss: 0.126437 Epoch 27/50 Training Loss: 0.119818 Epoch 28/50 Training Loss: 0.116499 Epoch 29/50 Training Loss: 0.113296 Epoch 30/50 Training Loss: 0.107263 Epoch 31/50 Training Loss: 0.104923 Epoch 32/50 Training Loss: 0.102044 Epoch 33/50 Training Loss: 0.096885 Epoch 34/50 Training Loss: 0.093753 Epoch 35/50 Training Loss: 0.090598 Epoch 36/50 Training Loss: 0.084427 Epoch 37/50 Training Loss: 0.086881 Epoch 38/50 Training Loss: 0.081167 Epoch 39/50 Training Loss: 0.077887 Epoch 40/50 Training Loss: 0.073476 Epoch 41/50 Training Loss: 0.074834 Epoch 42/50 Training Loss: 0.071056 Epoch 43/50 Training Loss: 0.070614 Epoch 44/50 Training Loss: 0.068946 Epoch 45/50 Training Loss: 0.063420 Epoch 46/50 Training Loss: 0.060004 Epoch 47/50 Training Loss: 0.061000 Epoch 48/50 Training Loss: 0.059794 Epoch 49/50 Training Loss: 0.059669 Epoch 50/50 Training Loss: 0.054596 Epoch [50/50], Loss: 0.0546
Image in a Jupyter notebook
Total execution time: 887.69 seconds

now, using relu

import torch # ACTIVATION FUNCTION testing using ReLU, learning rate of 0.001, step size: 20 import torch.nn as nn import torch.nn.init as init from torch.utils.data import DataLoader import torchvision import torchvision.transforms as transforms import matplotlib.pyplot as plt import time overall_start_time = time.time() model = nn.Sequential( nn.Linear(784, 196), # Input layer starting with number of pixels nn.ReLU(), # ReLU Activation nn.Linear(196, 49), # Hidden layer 1 nn.ReLU(), # ReLU Activation again nn.Linear(49, 10), # Hidden layer 2 ) for layer in model.modules(): if isinstance(layer, nn.Linear): init.kaiming_uniform_(layer.weight, nonlinearity='relu') if layer.bias is not None: init.zeros_(layer.bias) transform = transforms.Compose([ transforms.ToTensor(), transforms.Normalize((0.5,), (0.5,)) # ]) train_dataset = torchvision.datasets.FashionMNIST(root='./data', train=True, download=True, transform=transform) test_dataset = torchvision.datasets.FashionMNIST(root='./data', train=False, download=True, transform=transform) train_loader = DataLoader(train_dataset, batch_size=64, shuffle=True) test_loader = DataLoader(test_dataset, batch_size=64, shuffle=False) optimizer = torch.optim.Adam(model.parameters(), lr=0.001) # Adam optimization scheduler = torch.optim.lr_scheduler.StepLR(optimizer, step_size=20, gamma=0.1) loss_fn = nn.CrossEntropyLoss() # loss function num_epochs = 50 loss_values = [] for epoch in range(num_epochs): model.train() running_loss = 0.0 for batch_idx, (images, labels) in enumerate(train_loader): images = images.view(-1, 784) optimizer.zero_grad() outputs = model(images) loss = loss_fn(outputs, labels) loss.backward() optimizer.step() running_loss += loss.item() avg_train_loss = running_loss / len(train_loader) # Calculate the average training loss loss_values.append(avg_train_loss) print(f'Epoch {epoch+1}/{num_epochs} \t\t Training Loss: {avg_train_loss:.6f}') print(f"Epoch [{epoch+1}/{num_epochs}], Loss: {running_loss/len(train_loader):.4f}") # average loss plt.figure(figsize=(8, 6)) # start plotting plt.plot(range(1, num_epochs + 1), loss_values, marker='o', linestyle='-', color='m') plt.title('Loss Graph') plt.xlabel('Epoch') plt.ylabel('Loss') plt.grid(True) plt.show() end_time = time.time() overall_time = end_time - overall_start_time print(f"Total execution time: {overall_time:.2f} seconds")
Epoch 1/50 Training Loss: 0.502953 Epoch 2/50 Training Loss: 0.376967 Epoch 3/50 Training Loss: 0.338297 Epoch 4/50 Training Loss: 0.313946 Epoch 5/50 Training Loss: 0.295125 Epoch 6/50 Training Loss: 0.278507 Epoch 7/50 Training Loss: 0.267188 Epoch 8/50 Training Loss: 0.255041 Epoch 9/50 Training Loss: 0.243122 Epoch 10/50 Training Loss: 0.235292 Epoch 11/50 Training Loss: 0.225255 Epoch 12/50 Training Loss: 0.217930 Epoch 13/50 Training Loss: 0.210366 Epoch 14/50 Training Loss: 0.203205 Epoch 15/50 Training Loss: 0.195100 Epoch 16/50 Training Loss: 0.186156 Epoch 17/50 Training Loss: 0.183763 Epoch 18/50 Training Loss: 0.177529 Epoch 19/50 Training Loss: 0.171234 Epoch 20/50 Training Loss: 0.165097 Epoch 21/50 Training Loss: 0.157698 Epoch 22/50 Training Loss: 0.154955 Epoch 23/50 Training Loss: 0.150335 Epoch 24/50 Training Loss: 0.145079 Epoch 25/50 Training Loss: 0.141442 Epoch 26/50 Training Loss: 0.133808 Epoch 27/50 Training Loss: 0.132806 Epoch 28/50 Training Loss: 0.128929 Epoch 29/50 Training Loss: 0.123091 Epoch 30/50 Training Loss: 0.122270 Epoch 31/50 Training Loss: 0.117092 Epoch 32/50 Training Loss: 0.112772 Epoch 33/50 Training Loss: 0.111007 Epoch 34/50 Training Loss: 0.109203 Epoch 35/50 Training Loss: 0.106440 Epoch 36/50 Training Loss: 0.102480 Epoch 37/50 Training Loss: 0.100580 Epoch 38/50 Training Loss: 0.096628 Epoch 39/50 Training Loss: 0.095068 Epoch 40/50 Training Loss: 0.092266 Epoch 41/50 Training Loss: 0.087084 Epoch 42/50 Training Loss: 0.086622 Epoch 43/50 Training Loss: 0.089356 Epoch 44/50 Training Loss: 0.081335 Epoch 45/50 Training Loss: 0.081544 Epoch 46/50 Training Loss: 0.078564 Epoch 47/50 Training Loss: 0.076888 Epoch 48/50 Training Loss: 0.080203 Epoch 49/50 Training Loss: 0.074238 Epoch 50/50 Training Loss: 0.071796 Epoch [50/50], Loss: 0.0718
Image in a Jupyter notebook
Total execution time: 907.50 seconds
import torch # ACTIVATION FUNCTION testing using ReLU, learning rate of 0.001, step size: 40 import torch.nn as nn import torch.nn.init as init from torch.utils.data import DataLoader import torchvision import torchvision.transforms as transforms import matplotlib.pyplot as plt import time overall_start_time = time.time() model = nn.Sequential( nn.Linear(784, 196), # Input layer starting with number of pixels nn.ReLU(), # ReLU Activation nn.Linear(196, 49), # Hidden layer 1 nn.ReLU(), # ReLU Activation again nn.Linear(49, 10), # Hidden layer 2 ) for layer in model.modules(): if isinstance(layer, nn.Linear): init.kaiming_uniform_(layer.weight, nonlinearity='relu') if layer.bias is not None: init.zeros_(layer.bias) transform = transforms.Compose([ transforms.ToTensor(), transforms.Normalize((0.5,), (0.5,)) # ]) train_dataset = torchvision.datasets.FashionMNIST(root='./data', train=True, download=True, transform=transform) test_dataset = torchvision.datasets.FashionMNIST(root='./data', train=False, download=True, transform=transform) train_loader = DataLoader(train_dataset, batch_size=64, shuffle=True) test_loader = DataLoader(test_dataset, batch_size=64, shuffle=False) optimizer = torch.optim.Adam(model.parameters(), lr=0.001) # Adam optimization scheduler = torch.optim.lr_scheduler.StepLR(optimizer, step_size=40, gamma=0.1) loss_fn = nn.CrossEntropyLoss() # loss function num_epochs = 50 loss_values = [] for epoch in range(num_epochs): model.train() running_loss = 0.0 for batch_idx, (images, labels) in enumerate(train_loader): images = images.view(-1, 784) optimizer.zero_grad() outputs = model(images) loss = loss_fn(outputs, labels) loss.backward() optimizer.step() running_loss += loss.item() avg_train_loss = running_loss / len(train_loader) # Calculate the average training loss loss_values.append(avg_train_loss) print(f'Epoch {epoch+1}/{num_epochs} \t\t Training Loss: {avg_train_loss:.6f}') print(f"Epoch [{epoch+1}/{num_epochs}], Loss: {running_loss/len(train_loader):.4f}") # average loss plt.figure(figsize=(8, 6)) # start plotting plt.plot(range(1, num_epochs + 1), loss_values, marker='o', linestyle='-', color='m') plt.title('Loss Graph') plt.xlabel('Epoch') plt.ylabel('Loss') plt.grid(True) plt.show() end_time = time.time() overall_time = end_time - overall_start_time print(f"Total execution time: {overall_time:.2f} seconds")
Epoch 1/50 Training Loss: 0.490487 Epoch 2/50 Training Loss: 0.369046 Epoch 3/50 Training Loss: 0.335290 Epoch 4/50 Training Loss: 0.308926 Epoch 5/50 Training Loss: 0.291855 Epoch 6/50 Training Loss: 0.274969 Epoch 7/50 Training Loss: 0.261088 Epoch 8/50 Training Loss: 0.251757 Epoch 9/50 Training Loss: 0.240315 Epoch 10/50 Training Loss: 0.231177 Epoch 11/50 Training Loss: 0.222751 Epoch 12/50 Training Loss: 0.216312 Epoch 13/50 Training Loss: 0.207882 Epoch 14/50 Training Loss: 0.200015 Epoch 15/50 Training Loss: 0.189574 Epoch 16/50 Training Loss: 0.188588 Epoch 17/50 Training Loss: 0.181458 Epoch 18/50 Training Loss: 0.172322 Epoch 19/50 Training Loss: 0.167459 Epoch 20/50 Training Loss: 0.164912 Epoch 21/50 Training Loss: 0.159565 Epoch 22/50 Training Loss: 0.153575 Epoch 23/50 Training Loss: 0.147082 Epoch 24/50 Training Loss: 0.144515 Epoch 25/50 Training Loss: 0.139185 Epoch 26/50 Training Loss: 0.136907 Epoch 27/50 Training Loss: 0.129391 Epoch 28/50 Training Loss: 0.130377 Epoch 29/50 Training Loss: 0.121912 Epoch 30/50 Training Loss: 0.119934 Epoch 31/50 Training Loss: 0.116460 Epoch 32/50 Training Loss: 0.115620 Epoch 33/50 Training Loss: 0.109574 Epoch 34/50 Training Loss: 0.105539 Epoch 35/50 Training Loss: 0.106974 Epoch 36/50 Training Loss: 0.103045 Epoch 37/50 Training Loss: 0.098375 Epoch 38/50 Training Loss: 0.094156 Epoch 39/50 Training Loss: 0.097420 Epoch 40/50 Training Loss: 0.089655 Epoch 41/50 Training Loss: 0.090654 Epoch 42/50 Training Loss: 0.088371 Epoch 43/50 Training Loss: 0.085733 Epoch 44/50 Training Loss: 0.083294 Epoch 45/50 Training Loss: 0.082905 Epoch 46/50 Training Loss: 0.081175 Epoch 47/50 Training Loss: 0.081174 Epoch 48/50 Training Loss: 0.075905 Epoch 49/50 Training Loss: 0.078200 Epoch 50/50 Training Loss: 0.075266 Epoch [50/50], Loss: 0.0753
Image in a Jupyter notebook
Total execution time: 905.97 seconds

the results optained were:

  • for sigmoid:

    • 20 steps: Loss: 0.0607

    • 30 steps: Loss: 0.0552

    • 40 steps: Loss: 0.0546

  • for ReLU:

    • 20 steps: Loss: 0.0718

    • 30 steps: Loss: 0.0700

    • 40 steps: Loss: 0.0753

based on all the results collected from the start and to the end, the model that showed the least loss value included:

  • sgmoid activationem

  • 0.001 learning rate

  • 40 step size