Path: blob/master/april_18/lessons/lesson-08/code/solution-code/solution-code-8.ipynb
1905 views
Kernel: Python 2
In [1]:
In [2]:
In [3]:
Out[3]:
Axes(0.125,0.125;0.775x0.775)
sepal length (cm) sepal width (cm) petal length (cm) \
count 150.000000 150.000000 150.000000
mean 5.843333 3.054000 3.758667
std 0.828066 0.433594 1.764420
min 4.300000 2.000000 1.000000
25% 5.100000 2.800000 1.600000
50% 5.800000 3.000000 4.350000
75% 6.400000 3.300000 5.100000
max 7.900000 4.400000 6.900000
petal width (cm) target
count 150.000000 150.000000
mean 1.198667 1.000000
std 0.763161 0.819232
min 0.100000 0.000000
25% 0.300000 0.000000
50% 1.300000 1.000000
75% 1.800000 2.000000
max 2.500000 2.000000
In [4]:
Out[4]:
0.666666666667
More specific solution
For the class, this solution is as simple it really needs to be in order to get a very good prediction score. But: Why, or when, does this fail? What attributes make this a great data set for learning classification algorithms? What makes it not as great?
In [5]:
Out[5]:
0.946666666667
Using distance: KNN implementation
In [6]:
Out[6]:
[0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 1 1 1
1 1 1 2 1 1 1 1 1 2 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 1 2 2 2 2
2 2 2 2 2 2 2 2 1 2 2 2 2 2 2 2 2 2 2 2 2 2 1 2 2 2 2 2 2 2 2 2 2 2 2 2 2
2 2]
[0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 2 2 2
2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2
2 2]
0.96
Do we see a change in performaance with using the distance weight?
In [7]:
Out[7]:
[0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 1 1 1
1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 2 2 2
2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2
2 2]
[0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 2 2 2
2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2
2 2]
0.993333333333
Solution to solving K
This is only one approach to the problem, but adding in the 'distance' parameter (instead of uniform) would only be additive; note that the code would need some editing to handle it properly if done in the grid search; alternatively, make the change directly in the estimator.
In [8]:
Out[8]:
[mean: 0.90667, std: 0.09752, params: {'n_neighbors': 2},
mean: 0.90667, std: 0.09286, params: {'n_neighbors': 3},
mean: 0.90667, std: 0.09286, params: {'n_neighbors': 4},
mean: 0.91333, std: 0.08327, params: {'n_neighbors': 5},
mean: 0.90667, std: 0.09286, params: {'n_neighbors': 6},
mean: 0.92000, std: 0.08589, params: {'n_neighbors': 7},
mean: 0.91333, std: 0.08844, params: {'n_neighbors': 8},
mean: 0.92000, std: 0.09092, params: {'n_neighbors': 9},
mean: 0.92000, std: 0.09092, params: {'n_neighbors': 10},
mean: 0.91333, std: 0.08589, params: {'n_neighbors': 11},
mean: 0.89333, std: 0.10625, params: {'n_neighbors': 12},
mean: 0.90667, std: 0.08273, params: {'n_neighbors': 13},
mean: 0.90000, std: 0.09428, params: {'n_neighbors': 14},
mean: 0.90000, std: 0.09428, params: {'n_neighbors': 15},
mean: 0.88667, std: 0.11851, params: {'n_neighbors': 16},
mean: 0.88000, std: 0.12754, params: {'n_neighbors': 17},
mean: 0.86667, std: 0.12111, params: {'n_neighbors': 18},
mean: 0.88667, std: 0.11662, params: {'n_neighbors': 19},
mean: 0.86667, std: 0.13499, params: {'n_neighbors': 20},
mean: 0.86667, std: 0.13499, params: {'n_neighbors': 21},
mean: 0.86667, std: 0.13499, params: {'n_neighbors': 22},
mean: 0.86667, std: 0.13499, params: {'n_neighbors': 23},
mean: 0.84667, std: 0.17075, params: {'n_neighbors': 24},
mean: 0.86000, std: 0.14667, params: {'n_neighbors': 25},
mean: 0.84667, std: 0.17075, params: {'n_neighbors': 26},
mean: 0.84667, std: 0.15434, params: {'n_neighbors': 27},
mean: 0.82000, std: 0.18809, params: {'n_neighbors': 28},
mean: 0.80000, std: 0.19437, params: {'n_neighbors': 29},
mean: 0.78667, std: 0.21250, params: {'n_neighbors': 30},
mean: 0.77333, std: 0.21848, params: {'n_neighbors': 31},
mean: 0.74000, std: 0.27520, params: {'n_neighbors': 32},
mean: 0.75333, std: 0.26043, params: {'n_neighbors': 33},
mean: 0.72000, std: 0.30155, params: {'n_neighbors': 34},
mean: 0.69333, std: 0.31510, params: {'n_neighbors': 35},
mean: 0.68000, std: 0.33506, params: {'n_neighbors': 36},
mean: 0.68667, std: 0.30955, params: {'n_neighbors': 37},
mean: 0.64000, std: 0.38320, params: {'n_neighbors': 38},
mean: 0.64000, std: 0.38320, params: {'n_neighbors': 39},
mean: 0.64000, std: 0.38320, params: {'n_neighbors': 40},
mean: 0.38667, std: 0.42248, params: {'n_neighbors': 41},
mean: 0.37333, std: 0.43123, params: {'n_neighbors': 42},
mean: 0.37333, std: 0.43123, params: {'n_neighbors': 43},
mean: 0.37333, std: 0.43123, params: {'n_neighbors': 44},
mean: 0.37333, std: 0.43123, params: {'n_neighbors': 45},
mean: 0.37333, std: 0.43123, params: {'n_neighbors': 46},
mean: 0.37333, std: 0.43123, params: {'n_neighbors': 47},
mean: 0.36000, std: 0.41655, params: {'n_neighbors': 48},
mean: 0.36000, std: 0.41655, params: {'n_neighbors': 49},
mean: 0.36000, std: 0.41655, params: {'n_neighbors': 50},
mean: 0.36000, std: 0.42864, params: {'n_neighbors': 51},
mean: 0.34667, std: 0.42667, params: {'n_neighbors': 52},
mean: 0.34667, std: 0.42667, params: {'n_neighbors': 53},
mean: 0.33333, std: 0.41312, params: {'n_neighbors': 54},
mean: 0.34667, std: 0.42667, params: {'n_neighbors': 55},
mean: 0.32000, std: 0.40089, params: {'n_neighbors': 56},
mean: 0.32667, std: 0.40683, params: {'n_neighbors': 57},
mean: 0.32000, std: 0.40089, params: {'n_neighbors': 58},
mean: 0.32667, std: 0.40683, params: {'n_neighbors': 59},
mean: 0.24667, std: 0.36246, params: {'n_neighbors': 60},
mean: 0.20667, std: 0.28783, params: {'n_neighbors': 61},
mean: 0.10667, std: 0.13233, params: {'n_neighbors': 62},
mean: 0.10667, std: 0.13233, params: {'n_neighbors': 63},
mean: 0.10667, std: 0.13233, params: {'n_neighbors': 64},
mean: 0.10667, std: 0.13233, params: {'n_neighbors': 65},
mean: 0.10667, std: 0.13233, params: {'n_neighbors': 66},
mean: 0.10667, std: 0.13233, params: {'n_neighbors': 67},
mean: 0.10667, std: 0.13233, params: {'n_neighbors': 68},
mean: 0.10667, std: 0.13233, params: {'n_neighbors': 69},
mean: 0.10667, std: 0.13233, params: {'n_neighbors': 70},
mean: 0.10667, std: 0.13233, params: {'n_neighbors': 71},
mean: 0.10667, std: 0.13233, params: {'n_neighbors': 72},
mean: 0.10667, std: 0.13233, params: {'n_neighbors': 73},
mean: 0.10667, std: 0.13233, params: {'n_neighbors': 74},
mean: 0.08667, std: 0.11851, params: {'n_neighbors': 75},
mean: 0.08667, std: 0.11851, params: {'n_neighbors': 76},
mean: 0.08667, std: 0.11851, params: {'n_neighbors': 77},
mean: 0.08667, std: 0.11851, params: {'n_neighbors': 78},
mean: 0.08667, std: 0.11851, params: {'n_neighbors': 79},
mean: 0.08667, std: 0.11851, params: {'n_neighbors': 80},
mean: 0.07333, std: 0.10414, params: {'n_neighbors': 81},
mean: 0.07333, std: 0.10414, params: {'n_neighbors': 82},
mean: 0.07333, std: 0.10414, params: {'n_neighbors': 83},
mean: 0.07333, std: 0.10414, params: {'n_neighbors': 84},
mean: 0.06667, std: 0.10328, params: {'n_neighbors': 85},
mean: 0.06667, std: 0.10328, params: {'n_neighbors': 86},
mean: 0.06667, std: 0.10328, params: {'n_neighbors': 87},
mean: 0.06667, std: 0.10328, params: {'n_neighbors': 88},
mean: 0.06667, std: 0.10328, params: {'n_neighbors': 89},
mean: 0.06667, std: 0.10328, params: {'n_neighbors': 90},
mean: 0.06667, std: 0.10328, params: {'n_neighbors': 91},
mean: 0.06667, std: 0.10328, params: {'n_neighbors': 92},
mean: 0.06667, std: 0.10328, params: {'n_neighbors': 93},
mean: 0.06667, std: 0.10328, params: {'n_neighbors': 94},
mean: 0.06667, std: 0.10328, params: {'n_neighbors': 95},
mean: 0.06667, std: 0.10328, params: {'n_neighbors': 96},
mean: 0.06667, std: 0.10328, params: {'n_neighbors': 97},
mean: 0.06667, std: 0.10328, params: {'n_neighbors': 98},
mean: 0.06667, std: 0.10328, params: {'n_neighbors': 99}]
In [9]:
Out[9]:
[<matplotlib.lines.Line2D at 0x110d8d6d0>]
Zoom in to look at fit before first dive around 25:
In [10]:
Out[10]:
[<matplotlib.lines.Line2D at 0x110e95450>]
In [ ]: