Book a Demo!
CoCalc Logo Icon
StoreFeaturesDocsShareSupportNewsAboutPoliciesSign UpSign In
YStrano
GitHub Repository: YStrano/DataScience_GA
Path: blob/master/lessons/lesson_08/code/solution-code/MultiVariable_LogisticRegression-lab-solutions.ipynb
1904 views
Kernel: Python 2

Multi-Variable Logistic Regression and Classification Matrix

_ Authors: Sam Stack(DC)_

Exercise Objectives

  • Hand on experience using Multi-Variable Logistic Regression

  • Review and Exploration of the Classification Matrix and its evaluation Metrics

  • Introduction to One vs. One and One vs. Rest Classifiers.

Lets get some data. One of the most popular classification datasets for Machine learning is the Iris Dataset, which can be loaded directly from sklearn.datasets

  • Sklearn datasets are imported as dictionaries and use keys to access specific aspects.

    • iris.data : actual matrix of observations

    • iris.target : target column for classification

    • iris.feature_names : column names

import seaborn as sns import pandas as pd from sklearn import datasets iris = datasets.load_iris() X = pd.DataFrame(iris.data, columns = iris.feature_names) y = iris.target
# Examine the data X.head()
y
array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2])

Break down of classes 0 : Setosa 1 : Versicolour 2 : Virginica


Modelling This data is extreamly neat and tidy so no cleaning necessary and we can get right into modelling.

# model the data, use a cross validation technique as well from sklearn.linear_model import LogisticRegression from sklearn.model_selection import train_test_split lr = LogisticRegression() x_train, x_test, y_train, y_test = train_test_split(X,y) lr.fit(x_train,y_train)
LogisticRegression(C=1.0, class_weight=None, dual=False, fit_intercept=True, intercept_scaling=1, max_iter=100, multi_class='ovr', n_jobs=1, penalty='l2', random_state=None, solver='liblinear', tol=0.0001, verbose=0, warm_start=False)
# model.predict y_pred = lr.predict(x_test)
# evaluated model preformance with a confusion matrix. from sklearn.metrics import confusion_matrix confusion_matrix(y_test, y_pred)
array([[16, 0, 0], [ 0, 9, 2], [ 0, 1, 10]])

With a multivariable confusion matrix, some of our labellings (True Pos., True Neg., False Pos., False Neg.) get a little warped. We are not longer predicting one class from a null class we are classifiying into 3 distinguished classes.

The True diagonal stays the same as these are properly classified observations.

Class 0Class 1Class 2
Pred Class 01500
Pred Class 10110
Pred Class 20111

It is better to stick with True and False labels with multi-class to avoid ...Confusion

If you need to reffer to a False Positive or True Negative it is better to first select a specific class, such as Class 2 and refer to classification or missclassification relative to said choosen class instead of the set of all classes as a whole.

Example: True Negatives relative to Class 2 are True Positives for Class 0 and Class 1.

Speaking of our Classes? How are probabilities calculated with multi class?

  • Are they Probability of Class 0 vs. Not Class 0?

  • Or Probability of Class 0 vs. Class 1 vs. Class 2 ?

# use predict_proba to find out. lr.predict_proba(x_test)
array([[ 3.92912700e-04, 2.57921993e-01, 7.41685094e-01], [ 7.61238458e-01, 2.38687413e-01, 7.41281176e-05], [ 2.35498608e-02, 6.30350128e-01, 3.46100011e-01], [ 5.58954188e-02, 8.25684916e-01, 1.18419665e-01], [ 9.61410766e-04, 3.83514013e-01, 6.15524576e-01], [ 4.07248403e-02, 6.85467988e-01, 2.73807172e-01], [ 8.67166673e-01, 1.32810240e-01, 2.30865222e-05], [ 4.07231876e-02, 7.89206984e-01, 1.70069828e-01], [ 7.67315951e-01, 2.32614931e-01, 6.91179203e-05], [ 7.98392308e-01, 2.01564850e-01, 4.28423907e-05], [ 8.51532411e-01, 1.48434606e-01, 3.29837345e-05], [ 8.40555947e-01, 1.59409459e-01, 3.45939575e-05], [ 2.58789272e-04, 4.40496881e-01, 5.59244330e-01], [ 1.01745481e-03, 5.37887534e-01, 4.61095012e-01], [ 8.43147210e-01, 1.56804481e-01, 4.83097125e-05], [ 8.85690921e-03, 7.52553758e-01, 2.38589333e-01], [ 3.41391091e-02, 7.51725415e-01, 2.14135476e-01], [ 8.07630428e-01, 1.92216725e-01, 1.52847490e-04], [ 4.53245041e-02, 4.73028012e-01, 4.81647483e-01], [ 9.06339418e-04, 1.72571896e-01, 8.26521765e-01], [ 9.10812060e-01, 8.91752351e-02, 1.27044800e-05], [ 8.65251739e-01, 1.34705128e-01, 4.31329842e-05], [ 8.37647710e-01, 1.62338360e-01, 1.39299503e-05], [ 2.46369184e-02, 5.37995571e-01, 4.37367511e-01], [ 8.11616685e-01, 1.88293520e-01, 8.97957231e-05], [ 1.19654695e-03, 2.94117701e-01, 7.04685752e-01], [ 4.61254319e-03, 3.38450570e-01, 6.56936886e-01], [ 7.94105408e-01, 2.05765886e-01, 1.28705708e-04], [ 4.04438823e-02, 8.22627656e-01, 1.36928461e-01], [ 8.41171462e-01, 1.58648128e-01, 1.80409733e-04], [ 8.99455078e-01, 1.00490608e-01, 5.43138537e-05], [ 2.39845879e-03, 2.88916120e-01, 7.08685421e-01], [ 1.31074717e-02, 3.70508980e-01, 6.16383549e-01], [ 6.98866131e-04, 4.20002367e-01, 5.79298767e-01], [ 8.06002855e-01, 1.93857733e-01, 1.39412071e-04], [ 4.29525622e-03, 3.92564585e-01, 6.03140159e-01], [ 9.98536795e-03, 7.74464529e-01, 2.15550103e-01], [ 1.35975092e-03, 3.33654708e-01, 6.64985541e-01]])
for a,b,c in lr.predict_proba(x_test): print(sum([a,b,c]))
1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0

Looks like our probabilities of each class all add up to 1, so it is like Class 0 vs. Class 1 vs. Class 2.

What if we wanted to create a logistic regression that has Class 0 vs. Class 1 & Class 2 or just Class 0 vs. Class 2? We will cover that in a bit, but first more evaluation metrics.


Classification Reports/Matrix

Classification reports are another means of evauliation classification models and return a few metrics that are based on True Positives, False Positives and False Negatives.

from sklearn.metrics import classification_report print(classification_report(y_test, y_pred))
precision recall f1-score support 0 1.00 1.00 1.00 16 1 0.90 0.82 0.86 11 2 0.83 0.91 0.87 11 avg / total 0.92 0.92 0.92 38

Precision

  • "How many of the items selected are relevant."

  • Of the items placed into a class, how many of the are True Positives.

TruePositivesTruePositives+FalsePositives\frac{True Positives}{True Positives + False Positives}

Recall

  • "How many of the relevant items are selected."

  • Of the items that were suppose to be placed into a class, how many did we accurately place.

TruePositivesTruePositives+FalseNegatives\frac{True Positives}{True Positives + False Negatives}

F1-Score

F1 exists on a range of 0 - 1 where 0 is just aweful and 1 is perfection. F1 is considered a harmonic mean as it averages Precision and Recall. With classification models you often times have to chooise what kind of error you are willing to increase in order to reduce the other and thus you may want to optimize Precision or Recall accordingly. If you are uncertain which you should optimize, F1 score may be the metric of choice.

2precisionrecallprecision+recall2*\frac{precision * recall}{precision + recall}

Support Number of true observations in given class. The count of possible true observations.


Earlier we talked about building models relative to class combinations. Distinguishing One class from all other classes or just One specific class from another specific class. These goals are possible with Logistic Regression.

Up until this point we have used one model, but there are also Machine Learning methods that involve combining several models to come to a more refined conclusion, commonly reffered to as Ensemble Methods.

One Vs. Rest Classification.

One vs. Rest Classification is a method that builds an individual model for each class to try to distingush said specific class from the rest of the classes. Since we are only focusing on one class, Class 1 these classfiers will group Class2, Class3, Class4 into a single class of Not Class 1. Same all the way through for the rest of the classes.

1 - Class1 vs. Class2, Class3, Class4 2 - Class2 vs. Class1, Class3, Class4 3 - Class3 vs. Class1, Class2, Class4 4 - Class4 vs. Class1, Class2, Class3

One Vs. One Classification.

We train a model for every set of classes. As more classes are added this becomes more computationally expense.

1 - Class1 vs. Class2 2 - Class1 vs. Class3 3 - Class1 vs. Class4 4 - Class2 vs. Class3 5 - Class2 vs. Class4 6 - Class3 vs. Class4

One Vs. Rest Classifier

from sklearn.multiclass import OneVsRestClassifier LR = LogisticRegression() OVC = OneVsRestClassifier(LR) OVC.fit(x_train, y_train)
OneVsRestClassifier(estimator=LogisticRegression(C=1.0, class_weight=None, dual=False, fit_intercept=True, intercept_scaling=1, max_iter=100, multi_class='ovr', n_jobs=1, penalty='l2', random_state=None, solver='liblinear', tol=0.0001, verbose=0, warm_start=False), n_jobs=1)
y_pred = OVC.predict(x_test) confusion_matrix(y_test, y_pred)
array([[16, 0, 0], [ 0, 9, 2], [ 0, 1, 10]])

One Vs. One Classifier

from sklearn.multiclass import OneVsOneClassifier LR = LogisticRegression() OVO = OneVsOneClassifier(LR) OVO.fit(x_train,y_train)
OneVsOneClassifier(estimator=LogisticRegression(C=1.0, class_weight=None, dual=False, fit_intercept=True, intercept_scaling=1, max_iter=100, multi_class='ovr', n_jobs=1, penalty='l2', random_state=None, solver='liblinear', tol=0.0001, verbose=0, warm_start=False), n_jobs=1)
y_pred = OVO.predict(x_test) confusion_matrix(y_test, y_pred)
array([[16, 0, 0], [ 0, 11, 0], [ 0, 0, 11]])

One Vs. One/Rest Classifiers are not restricted to fitting using Logistic Regression. With SKLearn, any type of Classification model can be placed into the One Vs X classification ensemble.