Path: blob/master/notebooks/tutorials/sklearn_intro.ipynb
1192 views
Introduction to sklearn
Scikit-learn is a widely used Python machine learning library. There are several good tutorials on it, some of which we list below.
| Name | Notes | | ---- | ---- | |Python data science handbook| by Jake VanderPlas. Covers many python libraries. | |Hands-on Machine Learning with Scikit-Learn, Keras and TensorFlow v2| by Aurelion Geron. Covers sklearn and TF2.| |Python Machine Learning v3 | by Sebastian Raschka. Covers sklearn and TF2. |
In the sections below, we just give a few examples of how to use it.
If you want to scale up sklearn to handle datasets that do not fit into memory, and/or you want to run slow jobs in parallel (e.g., for grid search over model hyper-parameters) on multiple cores of your laptop or in the cloud, you should use ML-dask.
Install necessary libraries
Estimators
Most of sklearn is designed around the concept of "estimators", which are objects that can transform data. That is, we can think of an estimator as a function of the form , where is the input, and is the internal state (e.g., model parameters) of the object. Each estimator has two main methods: fit
and predict
. The fit method has the form f=fit(f,data)
, and updates the internal state (e.g., by computing the maximum likelihood estimate of the parameters). The predict method has the form y=predict(f,x)
. We can also have stateless estimators (with no internal parameters), which do things like preprocess the data. We give examples of all this below.
Logistic regression
We illustrate how to fit a logistic regression model using the Iris dataset.
Feature crosses for Autompg
We will use the Patsy library, which provides R-like syntax for specifying feature interactions.