Path: blob/master/notebooks/book1/10/logreg_pytorch.ipynb
1192 views
Logistic regression using PyTorch
We show how to fit a logistic regression model using PyTorch. The log likelihood for this model is convex, so we can compute the globally optimal MLE. This makes it easy to compare to sklearn (and other implementations).
Logistic regression using sklearn
We fit binary logistic regresion on the Iris dataset.
Automatic differentiation
In this section, we illustrate how to use autograd to compute the gradient of the negative log likelihood for binary logistic regression. We first compute the gradient by hand, and then use PyTorch's autograd feature. (See also the JAX optimization colab.)
Computing gradients by hand
PyTorch code
To compute the gradient using torch, we proceed as follows.
declare all the variables that you want to take derivatives with respect to using the requires_grad=True argumnet
define the (scalar output) objective function you want to differentiate in terms of these variables, and evaluate it at a point. This will generate a computation graph and store all the tensors.
call objective.backward() to trigger backpropagation (chain rule) on this graph.
extract the gradients from each variable using variable.grad field. (These will be torch tensors.)
See the example below.
Batch optimization using BFGS
We will use BFGS from PyTorch for fitting a logistic regression model, and compare to sklearn.
Stochastic optimization using SGD
DataLoader
First we need a way to get minbatches of data.
Vanilla SGD training loop
Use Torch SGD optimizer
Instead of writing our own optimizer, we can use a torch optimizer. This should give identical results.
Use momentum optimizer
Adding momentum helps a lot, and gives results which are very similar to batch optimization.
Modules
We can define logistic regression as multilayer perceptron (MLP) with no hidden layers. This can be defined as a sequential neural network module. Modules hide the parameters inside each layer, which makes it easy to construct complex models, as we will see later on.
Sequential model
Subclass the Module class
For more complex models (eg non-sequential), we can create our own subclass. We just need to define a 'forward' method that maps inputs to outputs, as we show below.
SGD on a module
We can optimize the parameters of a module by passing a reference to them into the optimizer, as we show below.
Batch optimization on a module
SGD does not match the results of sklearn. However, this is not because of the way we defined the model, it's just because SGD is a bad optimizer. Here we show that BFGS gives exactly the same results as sklearn.
Multi-class logistic regression
For binary classification problems, we can use a sigmoid as the final layer, to return probabilities. The corresponding loss is the binary cross entropy, nn.BCELoss(pred_prob, true_label), where pred_prob is of shape (B) with entries in [0,1], and true_label is of shape (B) with entries in 0 or 1. (Here B=batch size.) Alternatively the model can return the logit score, and use nn.BCEWithLogitsLoss(pred_score, true_label).
For multiclass classifiction, the final layer can return the log probabilities using LogSoftmax layer, combined with the negative log likelihood loss, nn.NLLLoss(pred_log_probs, true_label), where pred_log_probs is of shape B*C matrix, and true_label is of shape B with entries in {0,1,..C-1}. (Note that the target labels are integers, not sparse one-hot vectors.) Alternatively, we can just return the vector of logit scores, and use nn.CrossEntropyLoss(logits, true_label). The above two methods should give the same results.