GitHub Repository: YStrano/DataScience_GA
Path: blob/master/lessons/lesson_02/code/solution-code/Code_2.ipynb
¹⁹⁰⁴ views

Kernel: Python 3

Solutions to Lesson 2

Lab 2 Solution

This is a quiz given in Roger Peng's Coursera class Computing for Data Analysis. _

Sourced from Research Computing MeetUp's Python course.

In [1]:

import pandas as pd
import os

data = pd.read_csv(os.path.join('..', '..', 'assets', 'dataset', 'ozone.csv'))

In [2]:

print(data.head())

Out[2]:

   Ozone  Solar.R  Wind  Temp  Month  Day
 41.0    190.0   7.4    67      5    1
 36.0    118.0   8.0    72      5    2
 12.0    149.0  12.6    74      5    3
 18.0    313.0  11.5    62      5    4
  NaN      NaN  14.3    56      5    5

Print the column names of the dataset to the screen, one column name per line.

In [ ]:

list(data.columns)

In [3]:

for x in data.columns.values:
    print(x)

Out[3]:

Ozone
Solar.R
Wind
Temp
Month
Day

In [ ]:

data.iloc[:2]

In [ ]:

data.loc[:2]

Extract the first 2 rows of the data frame and print them to the console. What does the output look like?

In [4]:

tmp = data.ix[0:1] # or data.head(2)
print(tmp.head())

Out[4]:

   Ozone  Solar.R  Wind  Temp  Month  Day
0   41.0    190.0   7.4    67      5    1
1   36.0    118.0   8.0    72      5    2

C:\Users\ystrano\AppData\Local\Continuum\anaconda3\lib\site-packages\ipykernel_launcher.py:1: DeprecationWarning: 
.ix is deprecated. Please use
.loc for label based indexing or
.iloc for positional indexing

See the documentation here:
http://pandas.pydata.org/pandas-docs/stable/indexing.html#ix-indexer-is-deprecated
  """Entry point for launching an IPython kernel.

How many observations (i.e. rows) are in this data frame?

In [5]:

print(len(data))

Out[5]:

153

Extract the last 2 rows of the data frame and print them to the console. What does the output look like?

In [ ]:

data.iloc[-2:]

In [ ]:

data.iloc[[-2,-1]]

In [6]:

tmp = data.tail(2)
print(tmp.head())

Out[6]:

     Ozone  Solar.R  Wind  Temp  Month  Day
151   18.0    131.0   8.0    76      9   29
152   20.0    223.0  11.5    68      9   30

What is the value of Ozone in the 47th row?

In [7]:

print(data.ix[46:48,])

Out[7]:

    Ozone  Solar.R  Wind  Temp  Month  Day
 21.0    191.0  14.9    77      6   16
 37.0    284.0  20.7    72      6   17
 20.0     37.0   9.2    65      6   18

In [ ]:

print(data.ix[46:48])

In [ ]:

print(data.iloc[46:48])

In [ ]:

print(data.loc[46:48])

In [8]:

print(data.ix[47,'Ozone'])

Out[8]:

37.0

In [ ]:

print(data.loc[47]['Ozone'])

How many missing values are in the Ozone column of this data frame?

In [9]:

print(data['Ozone'].isnull().sum())
print(len(data) - len(data['Ozone'].dropna()))

Out[9]:

37
37

What is the mean of the Ozone column in this dataset? Exclude missing values (coded as NA) from this calculation.

In [10]:

print(data['Ozone'].mean())

Out[10]:

42.12931034482759

Extract the subset of rows of the data frame where Ozone values are above 31 and Temp values are above 90. What is the mean of "Solar.R" in this subset?

In [ ]:

##### Note - if you want to iterate through a dataframe, set a for loop that itterates through the length of the dataframe and then `iloc` into each row

In [ ]:

solar_l = []
for i in range(len(data)):
    row = data.iloc[i]
    if row['Ozone'] > 31 and row['Temp'] > 90:
        solar_l.append(row['Solar.R'])

In [ ]:

pd.Series(solar_l).mean()

In [ ]:

total = 0
for x in solar_l:
    total += x
total / len(solar_l)

In [ ]:

subset = data[(data['Ozone'] > 31) & (data['Temp'] > 90)]
print(subset)

In [ ]:

subset['Solar.R'].mean()

In [11]:

print(data[(data.Ozone > 31) & (data.Temp > 90)].head())

Out[11]:

     Ozone  Solar.R  Wind  Temp  Month  Day
  97.0    267.0   6.3    92      7    8
  97.0    272.0   5.7    92      7    9
 76.0    203.0   9.7    97      8   28
118.0    225.0   2.3    94      8   29
 84.0    237.0   6.3    96      8   30

In [12]:

print(data[(data.Ozone > 31) & (data.Temp > 90)]['Solar.R'].mean())

Out[12]:

212.8

What is the mean of "Temp" when "Month" is equal to 6?

In [13]:

print(data[data['Month']==6]['Temp'].mean())

print(data[data.Month==6].Temp.mean())

print(data[data.Month==6]['Temp'].mean())

Out[13]:

79.1
79.1

What was the maximum ozone value in the month of May (i.e. Month = 5)?

In [14]:

print(data[data['Month']==5]['Ozone'].max())

print(data[data.Month==5].Ozone.max())

Out[14]:

115.0

Next Steps

Recommended Resources

Name	Description
Official Pandas Tutorials	Wes & Company's selection of tutorials and lectures
Julia Evans Pandas Cookbook	Great resource with eamples from weather, bikes and 311 calls
Learn Pandas Tutorials	A great series of Pandas tutorials from Dave Rojas
Research Computing Python Data PYNBs	A super awesome set of python notebooks from a meetup-based course exclusively devoted to pandas