Path: blob/master/lessons/lesson_03/code/solution-code/code_3 (done).ipynb
1904 views
Kernel: Python 3
Lesson 3 Code
Instructor: Amy Roberts, PhD
In [1]:
Part 1. Basic Stats
Read in the examples
In [2]:
Out[2]:
example1 example2 example3
0 18 75 55
1 24 87 47
2 17 49 38
3 21 68 66
4 24 75 56
5 16 84 64
6 29 98 44
7 18 92 39
Instructor example: Calculate the mean for each coloumn
In [3]:
Out[3]:
example1 20.875
example2 78.500
example3 51.125
dtype: float64
Students: Calculate median, mode, max, min for example
Note: All answers should match your hand calculations
In [4]:
Out[4]:
example1 29
example2 98
example3 66
dtype: int64
In [5]:
Out[5]:
example1 16
example2 49
example3 38
dtype: int64
In [6]:
Out[6]:
example1 19.5
example2 79.5
example3 51.0
dtype: float64
In [7]:
Out[7]:
Part 2. Box Plot
Instructor: Interquartile range
In [8]:
Out[8]:
50% Quartile:
example1 19.5
example2 79.5
example3 51.0
Name: 0.5, dtype: float64
Median (red line of the box)
example1 19.5
example2 79.5
example3 51.0
dtype: float64
In [9]:
Out[9]:
25% (bottome of the box)
example1 17.75
example2 73.25
example3 42.75
Name: 0.25, dtype: float64
75% (top of the box)
example1 24.00
example2 88.25
example3 58.00
Name: 0.75, dtype: float64
In [10]:
Out[10]:
<matplotlib.axes._subplots.AxesSubplot at 0xb325c50>
Student: Create plots for examples 2 and 3 and check the quartiles
In [11]:
Out[11]:
<matplotlib.axes._subplots.AxesSubplot at 0xb4f4470>
In [12]:
Out[12]:
<matplotlib.axes._subplots.AxesSubplot at 0xb578048>
In [13]:
Out[13]:
<matplotlib.axes._subplots.AxesSubplot at 0xb537080>
What does the circle in example 2 represent?
Answer:
Part 3. Standard Deviation and Variance
Variance: The variance is how much the predictions for a given point vary between different realizations of the model.
Standard Deviation: The square root of the variance
<img(src='../../assets/images/biasVsVarianceImage.png', style="width: 30%; height: 30%")>
In Pandas
Let's calculate variance by hand first.
<img(src='../../assets/images/samplevarstd.png', style="width: 50%; height: 50%")>
In [14]:
Out[14]:
0 18
1 24
2 17
3 21
4 24
5 16
6 29
7 18
Name: example1, dtype: int64
mean = 20.875
n = 8
In [15]:
Out[15]:
8.265625 9.765625 15.015625 0.015625 9.765625 23.765625 66.015625 8.265625
140.875
7
20.125
In [16]:
Out[16]:
Variance
20.125
Students: Calculate the standard deviation by hand for each sample
Recall that the standard deviation is the square root of the variance.
In [28]:
Out[28]:
example1 20.125000
example2 238.571429
example3 116.125000
dtype: float64
In [29]:
Out[29]:
example 1 SD = 4.4860896112315904
example 2 SD = 15.445757637616873
example 3 SD = 10.776131031126154
In [30]:
Out[30]:
example1 4.486090
example2 15.445758
example3 10.776131
dtype: float64
In [17]:
Out[17]:
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 8 entries, 0 to 7
Data columns (total 3 columns):
example1 8 non-null int64
example2 8 non-null int64
example3 8 non-null int64
dtypes: int64(3)
memory usage: 272.0 bytes
In [18]:
In [19]:
Out[19]:
4.4860896112315904 15.44575762374344 10.776131031126154
In [20]:
Out[20]:
example1 4.486090
example2 15.445758
example3 10.776131
dtype: float64
Short Cut!
In [21]:
Out[21]:
Student: Check understanding
Which value in the above table is the median?
Answer:
Part 4: Correlation
In [22]:
Out[22]:
In [23]:
Out[23]:
<matplotlib.axes._subplots.AxesSubplot at 0xb9701d0>
In [24]:
Out[24]:
<seaborn.axisgrid.PairGrid at 0xba0d828>
In [25]:
Out[25]:
<seaborn.axisgrid.PairGrid at 0xb6612e8>
In [26]:
Out[26]:
In [ ]: