Path: blob/master/Prostate cancer prediction model.ipynb
64 views
Kernel: Python [default]
In [30]:
In [3]:
In [4]:
Out[4]:
In [5]:
Out[5]:
<class 'pandas.core.frame.DataFrame'>
Int64Index: 100 entries, 1 to 100
Data columns (total 9 columns):
diagnosis_result 100 non-null object
radius 100 non-null int64
texture 100 non-null int64
perimeter 100 non-null int64
area 100 non-null int64
smoothness 100 non-null float64
compactness 100 non-null float64
symmetry 100 non-null float64
fractal_dimension 100 non-null float64
dtypes: float64(4), int64(4), object(1)
memory usage: 7.8+ KB
Since the diagnosis_result column is in text , we need to convert it to binary so that it can be fed into the algorithm.
In [6]:
In [7]:
Out[7]:
In [8]:
In [9]:
Out[9]:
In [10]:
In [12]:
In [13]:
In [14]:
In [15]:
In [16]:
In [17]:
Out[17]:
KNeighborsClassifier(algorithm='auto', leaf_size=30, metric='minkowski',
metric_params=None, n_jobs=1, n_neighbors=1, p=2,
weights='uniform')
In [18]:
In [19]:
In [20]:
Out[20]:
precision recall f1-score support
0.0 0.79 0.69 0.73 16
1.0 0.74 0.82 0.78 17
avg / total 0.76 0.76 0.76 33
In [21]:
In [25]:
Out[25]:
[<matplotlib.lines.Line2D at 0xdfe6400>]
It can be seen from the above plot that the error is least when k=4
In [26]:
In [27]:
Out[27]:
KNeighborsClassifier(algorithm='auto', leaf_size=30, metric='minkowski',
metric_params=None, n_jobs=1, n_neighbors=4, p=2,
weights='uniform')
In [28]:
In [29]:
Out[29]:
precision recall f1-score support
0.0 0.92 0.75 0.83 16
1.0 0.80 0.94 0.86 17
avg / total 0.86 0.85 0.85 33
The model is now 86% precise, compared to 76% when a random value of K was chosen
In [ ]: