Path: blob/master/Data Science Essentials for Data Analysts/Logistic Regression.ipynb
3074 views
logistic regression
A model is simply a system for mapping inputs to outputs.
“Logistic regression measures the relationship between the categorical dependent variable and one or more independent variables by estimating probabilities using a logistic function” (Wikipedia)
Basically about likelihood occurrence of an event
Logistic regression is used to describe data and to explain the relationship between one dependent binary variable and one or more nominal, ordinal, interval or ratio-level independent variables.In Logistic Regression, the Sigmoid Function is used.
Type of questions that a binary logistic regression can examine.
How does the probability of getting lung cancer (yes vs. no) change for every additional pound a person is overweight and for every pack of cigarettes smoked per day?
Do body weight, calorie intake, fat intake, and age have an influence on the probability of having a heart attack (yes vs. no)?
Overfitting
When selecting the model for the logistic regression analysis, another important consideration is the model fit. Adding independent variables to a logistic regression model will always increase the amount of variance explained in the log odds (typically expressed as R²). However, adding more and more variables to the model can result in overfitting, which reduces the generalizability of the model beyond the data on which the model is fit.
The Sum of Squares Regression (SSR) is the sum of the squared differences between the prediction for each observation and the population mean. The Total Sum of Squares (SST) is equal to SSR + SSE.
Thus, if the output is more than 0.5, we can classify the outcome as 1 (or positive) and if it is less than 0.5, we can classify it as 0 (or negative).
Sigmod Function
In mathematical definition way of saying the sigmoid function take any range real number and returns the output value which falls in the range of 0 to 1. Based on the convention we can expect the output value in the range of -1 to 1.
The sigmoid function produces the curve which will be in the Shape “S.” These curves used in the statistics too. With the cumulative distribution function (The output will range from 0 to 1).
Cost function
Cost function typically expressed as a difference or distance between the predicted value and the actual value. The cost function (you may also see this referred to as loss or error.) can be estimated by iteratively running the model to compare estimated predictions against “ground truth” — the known values of y.
Gradient descent is an efficient optimization algorithm that attempts to find a local or global minima of a function.(minimize cost function)
Cost that takes two parameters in input: hθ(x(i)) as hypothesis function and y(i) as output. You can think of it as the cost the algorithm has to pay if it makes a prediction hθ(x(i)) while the actual label was y(i).
cost function — it helps the learner to correct / change behaviour to minimize mistakes.
Example
Lets use logistic regression model to predict whether a student gets admitted to a university. Suppose that you are the administrator of a university department and you want to determine each applicant's chance of admission based on their results on two exams. You have historical data from previous applicants that you can use as a training set for logistic regression. For each training example, you have the applicant's scores on two exams and the admissions decision.
Insights
It looks like there is a clear decision boundary between the two classes. Now we need to implement logistic regression so we can train a model to predict the outcome.
SIG=log(Y/1-Y))
Insights
Our logistic regression classifer correctly predicted if a student was admitted or not 91% of the time
The difference between binary classification and multi-classification
The name itself signifies the key differences between binary and multi-classification. Below examples will give you the clear understanding about these two kinds of classification. Let’s first look at the binary classification problem example. Later we will look at the multi-classification problems.
Binary Classification:
The predicting target is having only 2 possible outcomes. For email spam or not prediction, the possible 2 outcome for the target is email is spam or not spam.
On a final note, binary classification is the task of predicting the target class from two possible outcomes.
Given the subject and the email text predicting, Email Spam or not.
Sunny or rainy day prediction, using the weather information.
Based on the bank customer history, Predicting whether to give the loan or not.
Multi-Classification:
In the multi-classification problem, the idea is to use the training dataset to come up with any classification algorithm.
Given the dimensional information of the object, Identifying the shape of the object.
Identifying the different kinds of vehicles.
Based on the color intensities, Predicting the color type.
Key Difference:
In the logistic regression, the black function which takes the input features and calculates the probabilities of the possible two outcomes is the Sigmoid Function. Later the high probabilities target class is the final predicted class from the logistic regression classifier.
When it comes to the multinomial logistic regression the function is the Softmax Function. I am not going to much details about the properties of sigmoid and softmax functions and how the multinomial logistic regression algorithms work.
Softmax: Used for the multi-classification task.
Sigmoid: Used for the binary classification task.
Softmax function:
Softmax function calculates the probabilities distribution of the event over ‘n’ different events.The main advantage of using Softmax is the output probabilities range. The range will 0 to 1, and the sum of all the probabilities will be equal to one. If the softmax function used for multi-classification model it returns the probabilities of each class and the target class will have the high probability.