Real-time collaboration for Jupyter Notebooks, Linux Terminals, LaTeX, VS Code, R IDE, and more,
all in one place. Commercial Alternative to JupyterHub.
Real-time collaboration for Jupyter Notebooks, Linux Terminals, LaTeX, VS Code, R IDE, and more,
all in one place. Commercial Alternative to JupyterHub.
"Guiding Future STEM Leaders through Innovative Research Training" ~ thinkingbeyond.education
Image: ubuntu2204
Optimizing MLPs for MNIST
by: Nayra Saadawy & Priyam Raul
mentor: Prof. Devendra Singh
after testing activation functions, we can rank them based on loss as follows: least loss value: sigmoid activation: 0.0552
ReLU: 0.0700
leaky ReLU: 0.0756
highest loss value: tanh: 0.1242
the previous test showed that the best choice was sigmoid activation, and ReLU came in the second place, to fix one of them, well test both for the learning rates and watch their behaviour, then fix one of them for the next step the next parameter that will be tested will be the learning rate, the learning rates tested will be: -default (0.001 lr) which is already tested
0.0001 lr
0.01 lr -first, sigmoid with 0.0001 and 0.01
Next, trying ReLU with 0.0001 and then 0.01 learning rate
after trying different learning rates with sigmoid and ReLU activation, the results were:
for sigmoid activation:
0.0001 learning rate: Loss: 0.2327
0.001 learning rate: Loss: 0.0552
0.01 learning rate: Loss: 0.4115
for ReLU activation:
0.0001 learning rate: Loss: 0.1152
0.001 learning rate: Loss: 0.0700
0.01 learning rate: Loss: 0.2886
in both cases, the results indicate that 0.001 is the best learning rate choice, thus, for the next step, the learning rate will be fixed to 0.001, and the next variable will be tested for both sigmoid and relu too.
the next parameter is the step size:
20 steps
30 steps (already tried)
40 steps with both relu and sigmoid
now, using relu
the results optained were:
for sigmoid:
20 steps: Loss: 0.0607
30 steps: Loss: 0.0552
40 steps: Loss: 0.0546
for ReLU:
20 steps: Loss: 0.0718
30 steps: Loss: 0.0700
40 steps: Loss: 0.0753
based on all the results collected from the start and to the end, the model that showed the least loss value included:
sgmoid activationem
0.001 learning rate
40 step size