Path: blob/main/C1 - Supervised Machine Learning: Regression and Classification/week2/Optional Labs/C1_W2_Lab02_Multiple_Variable_Soln.ipynb
3283 views
Optional Lab: Multiple Variable Linear Regression
In this lab, you will extend the data structures and previously developed routines to support multiple features. Several routines are updated making the lab appear lengthy, but it makes minor adjustments to previous routines making it quick to review.
Outline
1.3 Notation
Here is a summary of some of the notation you will encounter, updated for multiple features.
|General
Notation | Description
| Python (if applicable) | |: ------------|: ------------------------------------------------------------|| | | scalar, non bold || | | vector, bold || | | matrix, bold capital || | Regression | | | | | | training example maxtrix |
X_train
|
| | training example targets | y_train
| , | Training Example | X[i]
, y[i]
| | m | number of training examples | m
| | n | number of features in each example | n
| | | parameter: weight, | w
| | | parameter: bias | b
|
| | The result of the model evaluation at parameterized by : | f_wb
|
2 Problem Statement
You will use the motivating example of housing price prediction. The training dataset contains three examples with four features (size, bedrooms, floors and, age) shown in the table below. Note that, unlike the earlier labs, size is in sqft rather than 1000 sqft. This causes an issue, which you will solve in the next lab!
Size (sqft) | Number of Bedrooms | Number of floors | Age of Home | Price (1000s dollars) |
---|---|---|---|---|
2104 | 5 | 1 | 45 | 460 |
1416 | 3 | 2 | 40 | 232 |
852 | 2 | 1 | 35 | 178 |
You will build a linear regression model using these values so you can then predict the price for other houses. For example, a house with 1200 sqft, 3 bedrooms, 1 floor, 40 years old.
Please run the following code cell to create your X_train
and y_train
variables.
2.1 Matrix X containing our examples
Similar to the table above, examples are stored in a NumPy matrix X_train
. Each row of the matrix represents one example. When you have training examples ( is three in our example), and there are features (four in our example), is a matrix with dimensions (, ) (m rows, n columns).
notation:
is vector containing example i.
is element j in example i. The superscript in parenthesis indicates the example number while the subscript represents an element.
Display the input data.
For demonstration, and will be loaded with some initial selected values that are near the optimal. is a 1-D NumPy vector.
3.1 Single Prediction element by element
Our previous prediction multiplied one feature value by one parameter and added a bias parameter. A direct extension of our previous implementation of prediction to multiple features would be to implement (1) above using loop over each element, performing the multiply with its parameter and then adding the bias parameter at the end.
Note the shape of x_vec
. It is a 1-D NumPy vector with 4 elements, (4,). The result, f_wb
is a scalar.
3.2 Single Prediction, vector
Noting that equation (1) above can be implemented using the dot product as in (2) above. We can make use of vector operations to speed up predictions.
Recall from the Python/Numpy lab that NumPy np.dot()
[link] can be used to perform a vector dot product.
The results and shapes are the same as the previous version which used looping. Going forward, np.dot
will be used for these operations. The prediction is now a single statement. Most routines will implement it directly rather than calling a separate predict routine.
Below is an implementation of equations (3) and (4). Note that this uses a standard pattern for this course where a for loop over all m
examples is used.
Expected Result: Cost at optimal w : 1.5578904045996674e-12
5.1 Compute Gradient with Multiple Variables
An implementation for calculating the equations (6) and (7) is below. There are many ways to implement this. In this version, there is an
outer loop over all m examples.
for the example can be computed directly and accumulated
in a second loop over all n features:
is computed for each .
Expected Result: dj_db at initial w,b: -1.6739251122999121e-06 dj_dw at initial w,b: [-2.73e-03 -6.27e-06 -2.22e-06 -6.92e-05]
In the next cell you will test the implementation.
Expected Result: b,w found by gradient descent: -0.00,[ 0.2 0. -0.01 -0.07] prediction: 426.19, target value: 460 prediction: 286.17, target value: 232 prediction: 171.47, target value: 178
These results are not inspiring! Cost is still declining and our predictions are not very accurate. The next lab will explore how to improve on this.