GitHub Repository: suyashi29/python-su
Path: blob/master/Key Python Libraries/Key Python Libraries - Day 3.ipynb
³⁰⁷⁴ views

Kernel: Python 3 (ipykernel)

regression: 
    profit = m1(sales)+M2(sEASON)_M3(LOCALITY): Trend profit 
    
 classification
 A,B,C = Class y 
 color, sepal,petal
 predict: new values of color sepal,petal : A,B or C
 
 labelled 
 
 
 UnSupervised : Society : Food items , Shoping mall
 
 Income  Spend  What 
 300      100   appreaels 
 300      40     Food items
 group: similar - 
 G1 G2 
 
 High income low spend : food items : 20
 Medium income high spend: Malls : 60
 Medium income low spend: food items :20

Machine Learning Libraries

Simple linear regression

It is the most straight forward case having a single scalar predictor variable x and a single scalar response variable y.
The equation for this regression is given as y=a+bx
The expansion to multiple and vector-valued predictor variables is known as multiple linear regression. It is also known as multivariable linear regression

Y = MX+C 
m=? , c=?
X and Y
m=9, c=9
yprec=Xnew

In [1]:

import pandas as pd
d=pd.read_excel("Exam_Score.xlsx")
d.head(3)

Out[1]:

In [2]:

X = d.iloc[:, :-1].values  ## Feature
y = d.iloc[:, 1].values    ## Target

In [3]:

## Linear regression

from sklearn.model_selection import train_test_split  
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=0) 
from sklearn.linear_model import LinearRegression  
reg = LinearRegression()  
reg.fit(X_train, y_train)

Out[3]:

In [4]:

print(reg.intercept_)

Out[4]:

1.0569674549746466

In [5]:

print(reg.coef_)

Out[5]:

[10.15275289]

SCORES = 1.06+11(S.H)+E

Good score or accuracy
Cross validation
underfitting , overfitting 

Predict for new values 
S.H = 6,8

Logistic regression

It is a supervised learning classification algorithm used to predict the probability of a target variable
In Logistic regression, instead of fitting a regression line, we fit an "S" shaped logistic function, which predicts two maximum values (0 or 1).

A B
p(A) = .60
p(B)= .10
X1- Class A 
boundary line is 50 per probability

In [7]:

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
%matplotlib inline
a_data=pd.ExcelFile(r"C:\Users\suyashi144893\Documents\data Sets\admission.xlsx").parse("Sheet2")

a_data.head(2)

Out[7]:

In [8]:

## using Sklearn 
X = a_data.iloc[:,:-1]# Excluding last two column =features
y = a_data.Admitted # Target variable
import sklearn
from sklearn.model_selection import train_test_split  
X_train,X_test,y_train,y_test=train_test_split(X,y,test_size=0.25,random_state=0)

In [12]:

# import the class
from sklearn.linear_model import LogisticRegression

# instantiate the model (using the default parameters)
logreg = LogisticRegression()

# fit the model with data
logreg.fit(X_train,y_train)

#
y_pred=logreg.predict(X_test)

In [10]:

logreg.intercept_

Out[10]:

array([-20.88995327])

In [11]:

from sklearn import metrics
cnf_matrix = metrics.confusion_matrix(y_test, y_pred)
cnf_matrix 

#TP = 13 , #TN = 8, #FP = 0, #FN= 2

Out[11]:

array([[13,  0],
       [ 2,  8]], dtype=int64)

In [16]:

### Plot pedigree and diabetes and add the logistic fit
import seaborn as sns
sns.set()
sns.regplot(x = "Exam1", y = "Admitted", 
            y_jitter = 0.03,
            data = a_data, 
            logistic = True,
            ci = None)

# Display the plot
plt.show()
#jitter:Add uniform random noise of this size to either the x or y variables. 
#The noise is added to a copy of the data after fitting the regression, and only influences the look of the scatterplot. This can be helpful when plotting variables that take discret

Out[16]:

In [ ]:

## Logistic Regression
from sklearn.datasets import load_iris
from sklearn.linear_model import LogisticRegression
X, y = load_iris(return_X_y=True)
clf = LogisticRegression(random_state=0).fit(X, y)
clf.predict(X[:2,:])

clf.predict_proba(X[:2, :])

clf.score(X, y)

## Regression:
R2 , RMSE 


## Classification: 

C.F = [TP  FP
       FN  TN]
TP: correct classification
FP : incorrect 
FN: belongs to class but failed to identify 
TN: 
    
Acc = TP+TN/TP+TN+FP+FN

Precision= TP / TP+FN

K-Nearest Neighbors

It is one of the most basic yet essential classification algorithms in Machine Learning.
It belongs to the supervised learning domain and finds intense application in pattern recognition, data mining and intrusion detection

In [18]:

X = [[1], [2], [3.5], [4.5],[3],[5.5]]
y = [0, 0, 1, 1,0,1]
from sklearn.neighbors import KNeighborsClassifier
neigh = KNeighborsClassifier(n_neighbors=3)
neigh.fit(X, y)

print(neigh.predict([[2.3]]))

print(neigh.predict_proba([[0.9]]))

Out[18]:

In [20]:

print(neigh.predict([[2.3]]))
print(neigh.predict_proba([[2.3]]))

Out[20]:

[0]
[[0.66666667 0.33333333]]

Decision Tree

Decision Trees (DTs) are a non-parametric supervised learning method used for classification and regression.
The goal is to create a model that predicts the value of a target variable by learning simple decision rules inferred from the data features.
Decision trees perform classification without requiring much computation

## Decision Tree
from sklearn import tree
X = [[0, 0], [1, 1]]
Y = [0, 1]
clf = tree.DecisionTreeClassifier()
clf = clf.fit(X, Y)

## After being fitted, the model can then be used to predict the class of samples:
clf.predict([[2, 0]])

## As an alternative to outputting a specific class, the probability of each class can be predicted, 
## which is the fraction of training samples of the class in a leaf:
clf.predict_proba([[2, 0]])

## DecisionTreeClassifier is capable of both binary (where the labels are [-1, 1]) classification 
## and multiclass (where the labels are [0, …, K-1]) classification.

## Using the Iris dataset, we can construct a tree as follows
from sklearn.datasets import load_iris
from sklearn import tree
X, y = load_iris(return_X_y=True)
clf = tree.DecisionTreeClassifier()
clf = clf.fit(X, y)
clf = tree.DecisionTreeClassifier(random_state=0)
iris = load_iris()

## Decision trees can also be applied to regression problems, using the DecisionTreeRegressor class.
from sklearn import tree
X = [[0, 0], [2, 2]]
y = [0.5, 2.5]
clf = tree.DecisionTreeRegressor()
clf = clf.fit(X, y)
clf.predict([[1, 1]])

pip install git+http://github.com/scikit-learn/scikit-learn.git

SVM

Support-vector machines (SVMs, also support-vector networks) are supervised learning models with associated learning algorithms that analyze data for classification and regression analysis
Algorithm tries to find a boundary that divides the data in such a way that the misclassification error can be minimized.
Select the hyper-plane which segregates the classes best
Chooses the decision boundary that maximizes the distance from the nearest data points of all the classes.
The most optimal decision boundary is the one which has maximum margin from the nearest points of all the classes(maximum margin classifier)

In [21]:

print(__doc__)

# Author: Gael Varoquaux <gael dot varoquaux at normalesup dot org>
# License: BSD 3 clause

# Standard scientific Python imports
import matplotlib.pyplot as plt

# Import datasets, classifiers and performance metrics
from sklearn import datasets, svm, metrics
from sklearn.model_selection import train_test_split

Out[21]:

Automatically created module for IPython interactive environment

In [22]:

digits = datasets.load_digits()

_, axes = plt.subplots(nrows=1, ncols=5, figsize=(10, 3))
for ax, image, label in zip(axes, digits.images, digits.target):
    ax.set_axis_off()
    ax.imshow(image, cmap=plt.cm.gray_r, interpolation='nearest')
    ax.set_title('Training: %i' % label)

Out[22]:

In [23]:

# flatten the images
n_samples = len(digits.images)
data = digits.images.reshape((n_samples, -1))

# Create a classifier: a support vector classifier
clf = svm.SVC(gamma=0.001)

# Split data into 50% train and 50% test subsets
X_train, X_test, y_train, y_test = train_test_split(
    data, digits.target, test_size=0.5, shuffle=False)

# Learn the digits on the train subset
clf.fit(X_train, y_train)

# Predict the value of the digit on the test subset
predicted = clf.predict(X_test)

In [24]:

_, axes = plt.subplots(nrows=1, ncols=4, figsize=(10, 3))
for ax, image, prediction in zip(axes, X_test, predicted):
    ax.set_axis_off()
    image = image.reshape(8, 8)
    ax.imshow(image, cmap=plt.cm.gray_r, interpolation='nearest')
    ax.set_title(f'Prediction: {prediction}')

Out[24]:

In [25]:

print(f"Classification report for classifier {clf}:\n"
      f"{metrics.classification_report(y_test, predicted)}\n")

Out[25]:

Classification report for classifier SVC(gamma=0.001):
              precision    recall  f1-score   support

           0       1.00      0.99      0.99        88
           1       0.99      0.97      0.98        91
           2       0.99      0.99      0.99        86
           3       0.98      0.87      0.92        91
           4       0.99      0.96      0.97        92
           5       0.95      0.97      0.96        91
           6       0.99      0.99      0.99        91
           7       0.96      0.99      0.97        89
           8       0.94      1.00      0.97        88
           9       0.93      0.98      0.95        92

    accuracy                           0.97       899
   macro avg       0.97      0.97      0.97       899
weighted avg       0.97      0.97      0.97       899

Machine Learning Libraries

Simple linear regression

Logistic regression

K-Nearest Neighbors

Decision Tree

SVM

Product

Resources

Company