Book a Demo!
CoCalc Logo Icon
StoreFeaturesDocsShareSupportNewsAboutPoliciesSign UpSign In
YStrano
GitHub Repository: YStrano/DataScience_GA
Path: blob/master/lessons/lesson_08/code/Models - Logistic Regression, (Statsmodel), .ipynb
1904 views
Kernel: Python 3
import pandas as pd import numpy as np import seaborn as sns import matplotlib.pyplot as plt import json %matplotlib inline # set max printout options for pandas: pd.options.display.max_columns = 50 pd.options.display.max_colwidth = 300
df = pd.read_table('../data/evergreen_sites.tsv')
df['is_news'] = df['is_news'].str.replace('?','0').astype(int)
pd.crosstab(df['is_news'], df['label'], margins=True)
Test the hypothesis with...

Logistic Regression using statsmodels.

The sm.logit function from statsmodels.formula.api will perform a logistic regression using a formula string.

import statsmodels.formula.api as sm from scipy import stats stats.chisqprob = lambda chisq, df: stats.chi2.sf(chisq, df) import statsmodels.formula.api as smf result = smf.logit('label ~ is_news', data=df) result = result.fit() result.summary()
Optimization terminated successfully. Current function value: 0.692751 Iterations 3

Logistic Regression

# Fit a logistic regression model and store the class predictions. from sklearn.linear_model import LogisticRegression logreg = LogisticRegression() #create object #feature_cols = [] X = df[['is_news']] #create X (if you are passing a single column or array, you need to double [[]] so that it reads as a df) y = df['label'] #create y logreg.fit(X, y) #fit pred = logreg.predict(X) #predict logreg.score(X, y) #this returns the accuracy
0.5133198106828939