Book a Demo!
CoCalc Logo Icon
StoreFeaturesDocsShareSupportNewsAboutPoliciesSign UpSign In
suyashi29
GitHub Repository: suyashi29/python-su
Path: blob/master/Key Python Libraries/Key Python Libraries - Day 3.ipynb
3074 views
Kernel: Python 3 (ipykernel)
regression: profit = m1(sales)+M2(sEASON)_M3(LOCALITY): Trend profit classification A,B,C = Class y color, sepal,petal predict: new values of color sepal,petal : A,B or C labelled UnSupervised : Society : Food items , Shoping mall Income Spend What 300 100 appreaels 300 40 Food items group: similar - G1 G2 High income low spend : food items : 20 Medium income high spend: Malls : 60 Medium income low spend: food items :20

Machine Learning Libraries

Simple linear regression

  • It is the most straight forward case having a single scalar predictor variable x and a single scalar response variable y.

  • The equation for this regression is given as y=a+bx

  • The expansion to multiple and vector-valued predictor variables is known as multiple linear regression. It is also known as multivariable linear regression

image.png

Y = MX+C m=? , c=? X and Y m=9, c=9 yprec=Xnew
import pandas as pd d=pd.read_excel("Exam_Score.xlsx") d.head(3)
X = d.iloc[:, :-1].values ## Feature y = d.iloc[:, 1].values ## Target
## Linear regression from sklearn.model_selection import train_test_split X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=0) from sklearn.linear_model import LinearRegression reg = LinearRegression() reg.fit(X_train, y_train)
print(reg.intercept_)
1.0569674549746466
print(reg.coef_)
[10.15275289]
SCORES = 1.06+11(S.H)+E Good score or accuracy Cross validation underfitting , overfitting Predict for new values S.H = 6,8

Logistic regression

  • It is a supervised learning classification algorithm used to predict the probability of a target variable

  • In Logistic regression, instead of fitting a regression line, we fit an "S" shaped logistic function, which predicts two maximum values (0 or 1).

image.png

A B p(A) = .60 p(B)= .10 X1- Class A boundary line is 50 per probability
import numpy as np import pandas as pd import matplotlib.pyplot as plt %matplotlib inline a_data=pd.ExcelFile(r"C:\Users\suyashi144893\Documents\data Sets\admission.xlsx").parse("Sheet2") a_data.head(2)
## using Sklearn X = a_data.iloc[:,:-1]# Excluding last two column =features y = a_data.Admitted # Target variable import sklearn from sklearn.model_selection import train_test_split X_train,X_test,y_train,y_test=train_test_split(X,y,test_size=0.25,random_state=0)
# import the class from sklearn.linear_model import LogisticRegression # instantiate the model (using the default parameters) logreg = LogisticRegression() # fit the model with data logreg.fit(X_train,y_train) # y_pred=logreg.predict(X_test)
logreg.intercept_
array([-20.88995327])
from sklearn import metrics cnf_matrix = metrics.confusion_matrix(y_test, y_pred) cnf_matrix #TP = 13 , #TN = 8, #FP = 0, #FN= 2
array([[13, 0], [ 2, 8]], dtype=int64)
### Plot pedigree and diabetes and add the logistic fit import seaborn as sns sns.set() sns.regplot(x = "Exam1", y = "Admitted", y_jitter = 0.03, data = a_data, logistic = True, ci = None) # Display the plot plt.show() #jitter:Add uniform random noise of this size to either the x or y variables. #The noise is added to a copy of the data after fitting the regression, and only influences the look of the scatterplot. This can be helpful when plotting variables that take discret
Image in a Jupyter notebook
## Logistic Regression from sklearn.datasets import load_iris from sklearn.linear_model import LogisticRegression X, y = load_iris(return_X_y=True) clf = LogisticRegression(random_state=0).fit(X, y) clf.predict(X[:2,:]) clf.predict_proba(X[:2, :]) clf.score(X, y)
## Regression: R2 , RMSE ## Classification: C.F = [TP FP FN TN] TP: correct classification FP : incorrect FN: belongs to class but failed to identify TN: Acc = TP+TN/TP+TN+FP+FN Precision= TP / TP+FN

K-Nearest Neighbors

  • It is one of the most basic yet essential classification algorithms in Machine Learning.

  • It belongs to the supervised learning domain and finds intense application in pattern recognition, data mining and intrusion detection

image.png

X = [[1], [2], [3.5], [4.5],[3],[5.5]] y = [0, 0, 1, 1,0,1] from sklearn.neighbors import KNeighborsClassifier neigh = KNeighborsClassifier(n_neighbors=3) neigh.fit(X, y) print(neigh.predict([[2.3]])) print(neigh.predict_proba([[0.9]]))
print(neigh.predict([[2.3]])) print(neigh.predict_proba([[2.3]]))
[0] [[0.66666667 0.33333333]]

Decision Tree

  • Decision Trees (DTs) are a non-parametric supervised learning method used for classification and regression.

  • The goal is to create a model that predicts the value of a target variable by learning simple decision rules inferred from the data features.

  • Decision trees perform classification without requiring much computation

image.png

## Decision Tree from sklearn import tree X = [[0, 0], [1, 1]] Y = [0, 1] clf = tree.DecisionTreeClassifier() clf = clf.fit(X, Y)
## After being fitted, the model can then be used to predict the class of samples: clf.predict([[2, 0]])
## As an alternative to outputting a specific class, the probability of each class can be predicted, ## which is the fraction of training samples of the class in a leaf: clf.predict_proba([[2, 0]])
## DecisionTreeClassifier is capable of both binary (where the labels are [-1, 1]) classification ## and multiclass (where the labels are [0, …, K-1]) classification. ## Using the Iris dataset, we can construct a tree as follows from sklearn.datasets import load_iris from sklearn import tree X, y = load_iris(return_X_y=True) clf = tree.DecisionTreeClassifier() clf = clf.fit(X, y) clf = tree.DecisionTreeClassifier(random_state=0) iris = load_iris()
## Decision trees can also be applied to regression problems, using the DecisionTreeRegressor class. from sklearn import tree X = [[0, 0], [2, 2]] y = [0.5, 2.5] clf = tree.DecisionTreeRegressor() clf = clf.fit(X, y) clf.predict([[1, 1]])
pip install git+http://github.com/scikit-learn/scikit-learn.git

SVM

  • Support-vector machines (SVMs, also support-vector networks) are supervised learning models with associated learning algorithms that analyze data for classification and regression analysis

  • Algorithm tries to find a boundary that divides the data in such a way that the misclassification error can be minimized.

  • Select the hyper-plane which segregates the classes best

  • Chooses the decision boundary that maximizes the distance from the nearest data points of all the classes.

  • The most optimal decision boundary is the one which has maximum margin from the nearest points of all the classes(maximum margin classifier)

image.png

print(__doc__) # Author: Gael Varoquaux <gael dot varoquaux at normalesup dot org> # License: BSD 3 clause # Standard scientific Python imports import matplotlib.pyplot as plt # Import datasets, classifiers and performance metrics from sklearn import datasets, svm, metrics from sklearn.model_selection import train_test_split
Automatically created module for IPython interactive environment
digits = datasets.load_digits() _, axes = plt.subplots(nrows=1, ncols=5, figsize=(10, 3)) for ax, image, label in zip(axes, digits.images, digits.target): ax.set_axis_off() ax.imshow(image, cmap=plt.cm.gray_r, interpolation='nearest') ax.set_title('Training: %i' % label)
Image in a Jupyter notebook
# flatten the images n_samples = len(digits.images) data = digits.images.reshape((n_samples, -1)) # Create a classifier: a support vector classifier clf = svm.SVC(gamma=0.001) # Split data into 50% train and 50% test subsets X_train, X_test, y_train, y_test = train_test_split( data, digits.target, test_size=0.5, shuffle=False) # Learn the digits on the train subset clf.fit(X_train, y_train) # Predict the value of the digit on the test subset predicted = clf.predict(X_test)
_, axes = plt.subplots(nrows=1, ncols=4, figsize=(10, 3)) for ax, image, prediction in zip(axes, X_test, predicted): ax.set_axis_off() image = image.reshape(8, 8) ax.imshow(image, cmap=plt.cm.gray_r, interpolation='nearest') ax.set_title(f'Prediction: {prediction}')
Image in a Jupyter notebook
print(f"Classification report for classifier {clf}:\n" f"{metrics.classification_report(y_test, predicted)}\n")
Classification report for classifier SVC(gamma=0.001): precision recall f1-score support 0 1.00 0.99 0.99 88 1 0.99 0.97 0.98 91 2 0.99 0.99 0.99 86 3 0.98 0.87 0.92 91 4 0.99 0.96 0.97 92 5 0.95 0.97 0.96 91 6 0.99 0.99 0.99 91 7 0.96 0.99 0.97 89 8 0.94 1.00 0.97 88 9 0.93 0.98 0.95 92 accuracy 0.97 899 macro avg 0.97 0.97 0.97 899 weighted avg 0.97 0.97 0.97 899