Book a Demo!
CoCalc Logo Icon
StoreFeaturesDocsShareSupportNewsAboutPoliciesSign UpSign In
greyhatguy007
GitHub Repository: greyhatguy007/Machine-Learning-Specialization-Coursera
Path: blob/main/C2 - Advanced Learning Algorithms/week2/optional-labs/C2_W2_Relu.ipynb
3585 views
Kernel: Python 3

Optional Lab - ReLU activation

import numpy as np import matplotlib.pyplot as plt from matplotlib.gridspec import GridSpec plt.style.use('./deeplearning.mplstyle') import tensorflow as tf from tensorflow.keras.models import Sequential from tensorflow.keras.layers import Dense, LeakyReLU from tensorflow.keras.activations import linear, relu, sigmoid %matplotlib widget from matplotlib.widgets import Slider from lab_utils_common import dlc from autils import plt_act_trio from lab_utils_relu import * import warnings warnings.simplefilter(action='ignore', category=UserWarning)

2 - ReLU Activation

This week, a new activation was introduced, the Rectified Linear Unit (ReLU). ParseError: KaTeX parse error: Expected 'EOF', got '#' at position 32: …uad\quad\text {#̲ ReLU function}…

plt_act_trio()
Canvas(toolbar=Toolbar(toolitems=[('Home', 'Reset original view', 'home', 'home'), ('Back', 'Back to previous …
The example from the lecture on the right shows an application of the ReLU. In this example, the derived "awareness" feature is not binary but has a continuous range of values. The sigmoid is best for on/off or binary situations. The ReLU provides a continuous linear relationship. Additionally it has an 'off' range where the output is zero. The "off" feature makes the ReLU a Non-Linear activation. Why is this needed? Let's examine this below.

Why Non-Linear Activations?

The function shown is composed of linear pieces (piecewise linear). The slope is consistent during the linear portion and then changes abruptly at transition points. At transition points, a new linear function is added which, when added to the existing function, will produce the new slope. The new function is added at transition point but does not contribute to the output prior to that point. The non-linear activation function is responsible for disabling the input prior to and sometimes after the transition points. The following exercise provides a more tangible example.

The exercise will use the network below in a regression problem where you must model a piecewise linear target : The network has 3 units in the first layer. Each is required to form the target. Unit 0 is pre-programmed and fixed to map the first segment. You will modify weights and biases in unit 1 and 2 to model the 2nd and 3rd segment. The output unit is also fixed and simply sums the outputs of the first layer.

Using the sliders below, modify weights and bias to match the target. Hints: Start with w1 and b1 and leave w2 and b2 zero until you match the 2nd segment. Clicking rather than sliding is quicker. If you have trouble, don't worry, the text below will describe this in more detail.

_ = plt_relu_ex()
Canvas(toolbar=Toolbar(toolitems=[('Home', 'Reset original view', 'home', 'home'), ('Back', 'Back to previous …

The goal of this exercise is to appreciate how the ReLU's non-linear behavior provides the needed ability to turn functions off until they are needed. Let's see how this worked in this example. The plots on the right contain the output of the units in the first layer. Starting at the top, unit 0 is responsible for the first segment marked with a 1. Both the linear function zz and the function following the ReLU aa are shown. You can see that the ReLU cuts off the function after the interval [0,1]. This is important as it prevents Unit 0 from interfering with the following segment.

Unit 1 is responsible for the 2nd segment. Here the ReLU kept this unit quiet until after x is 1. Since the first unit is not contributing, the slope for unit 1, w1[1]w^{[1]}_1, is just the slope of the target line. The bias must be adjusted to keep the output negative until x has reached 1. Note how the contribution of Unit 1 extends to the 3rd segment as well.

Unit 2 is responsible for the 3rd segment. The ReLU again zeros the output until x reaches the right value.The slope of the unit, w2[1]w^{[1]}_2, must be set so that the sum of unit 1 and 2 have the desired slope. The bias is again adjusted to keep the output negative until x has reached 2.

The "off" or disable feature of the ReLU activation enables models to stitch together linear segments to model complex non-linear functions.

Congratulations!

You are now more familiar with the ReLU and the importance of its non-linear behavior.