Path: blob/main/ERRATA/old-errata/README.md
1247 views
Old Errata
For books printed before Nov 16, 2022
Chapter 2
Page 23
It seems that the predicted label ŷ and the true label y got flipped.
Page 24
It is important to note that the convergence of the perceptron is only guaranteed if the two classes are linearly separable, which means that the two classes cannot be perfectly separated by a linear decision boundary.
Linearly separable means that the two classes can be perfectly separated (it mistakenly said "cannot"). [#10]
Page 37
For context:
The mean squared error loss is shown with an added 1/2 term which, in the text, is called out as a convenience term used to make deriving the gradient easier. It seems like use of this term is completely dropped / ignored in all later uses of this loss function.
I remember that I was thinking about whether I should use the convenience term or not. I thought I cleaned this up, but it now looks like a weird hybrid where I introduce the convenience term but then don't use it.
Page 38
In the MSE derivative, there is a spot where the summation is over i but it should be over j:
Chapter 3
It should be logical_xor instead of logical_or in the text.
Page 60
The correct link is: https://www.youtube.com/watch?v=L0FU8NFpx4E instead of https://www.youtube.com/L0FU8NFpx4E.
Chapter 6
Page 185
The text says
In this case, the sweet spot appears to be between 0.01 and 0.1 of the C value.
but "between 0.1 and 1.0" would have been more accurate. [#27]
Page 188
The following import is missing:
Page 200
It says "[...] via the interp
function that we imported from SciPy" but we imported it from NumPy not SciPy. [#199]
Chapter 7
Page 232
We have
but that looks exactly like the formula for update_if_wrong_1
. The result is the same, but it would be more clear to change it to the following:
Chapter 9
Page 287
Here, the mean absolute deviation was computed
But it makes more sense to compute the median absolute deviation
Chapter 7
Pages 228 & 236
The labels "Alcohol" and "OD280/OD315 of diluted wines" should be flipped in the code (and the resulting figure).
Page 242
The value -0.69 should be -0.67 as shown in the annotated screenshot below:
Chapter 8
Page 261
An improved version:
Change text.lower()
to text
in
to catch emoticons like "😛"
Chapter 9
Page 291
In original text of the first paragraph in p. 291
We can see that the MSE on the training dataset is larger than on the test set, which is an indicator that our model is slightly overfitting the training data in this case.
should be corrected as follows:
We can see that the MSE on the training dataset is less than on the test set, which is an indicator that our model is slightly overfitting the training data in this case.
Page 292
Not an error, but in the proof showing that the is a rescaled version of the MSE,
it might be good to insert
after the first line to make it easier to follow.
Chapter 11
Page 338:
It would be better to use for the training example index for consistency later in this chapter.
Page 348:
The code comments for the NeuralNetMLP
's are outdated [#23]. Originally, I implemented the following computation
via the equivalent
(Note that in both cases z_h
is exactly the same.)
The code comments reflect the second code line. For the first line, the code comment has to change and should be
Similarly, the code comments for z_out
should be
Page 361
In the equation , the training example index is missing; it should be
Page 366
There are two duplication errors on the page as shown in the figure below:
Chapter 12
Page 380
We can also simply utilize the torch.utils.data.TensorDataset class, if the second dataset is a labeled dataset in the form of tensors. So, instead of using our self-defined Dataset class, JointDataset, we can create a joint dataset as follows:
>>> joint_dataset = JointDataset(t_x, t_y)
Here, we mistakenly used
JointDataset
again. It should have been
Page 391
On some computers, it is necessary to cast the tensor to a float tensor explicitely, that is, changing
to
Page 392
To be more explicit and to improve compatibility with certain computing environment, the line
should be changed to
Page 396
I was 99% sure I fixed that during editing, but on page 396, the Model
has a x = nn.Softmax(dim=1)(x)
layer that shouldn't be there.
Page 397
In the line
it should be is_correct.sum()
instead of is_correct.mean()
. The resulting figures etc. are all correct, though.
Page 400
The word "multilabel" should be "multiclass". [#35]
Chapter 13
Page 422
In the lines
The first line misses the /batch_size
.
Page 429
On page, we 419 defined the number of training examples as n_train = 100
. On page 429 we then write
It is technically not a mistake, but some readers may wonder where the 100/
comes from, so it might be clearer to write it as follows:
Chapter 14
Page 458
The square brackets on this page are layout errors and should be floor symbols ⌊ ⌋
Page 472
In the figure, the y_pred
value for the BCELoss
is 0.8, but it should be 0.69, because of sigmoid(0.8) = 0.69. You can find an updated figure here.
Also, in the lines
the phrases w Probas
and w Logits
should be flipped. [#34]
Page 477
The square brackets on this page are layout errors and should be floor symbols ⌊ ⌋
Chapter 15
Page 483
Not an errata, but it would be good to clarify in the infobox that users have to unzip the img_align_celeba.zip
file, which is inside the unzipped celeba
folder.
Page 488
The correct smile attribute is not attr[18]
but attr[31]
. Consequently, the plot on pg. 495 will look a bit different. The test accuracy on pg. 496 will be around 90.21%. And the pictures on pg. 497 will look a bit different. The ch14_part2.ipynb Jupyter notebook in this GitHub repository was updated accordingly 😃.
Page 508
In the following line
the bias should be b_xh
instead of b_hh
. However, the resulting output is correct.
Also, the line at the bottom is missing the closing bracket
Page 519
There is a sentence that says "Therefore, the embedding matrix in this case has the size 10×6." However, as it can be seen from the code, it should be "10×3" not "10×6". [#36]
Page 530
It would not make any difference because of the newline characters at the end, but to be technically correct, we should add a +1
to the chunk_size
in
I.e.,
(Via [#38].)
Page 532
LogSoftmax(dim=1)
is not used when defining the model -- this is correct, because nn.CrossEntropyLoss
is designed for logits, not log-probabilities. However, the output contains a false reference to LogSoftmax(dim=1)
(this is a left-over from editing, and it can be ignored). [#37]
Page 532
The learning rate (lr=0.001
) is too low here. If we change it to lr=0.005
we can get much better results.