Book a Demo!
CoCalc Logo Icon
StoreFeaturesDocsShareSupportNewsAboutPoliciesSign UpSign In
YStrano
GitHub Repository: YStrano/DataScience_GA
Path: blob/master/lessons/lesson_02/code/solution-code/Code_2.ipynb
1904 views
Kernel: Python 3

Solutions to Lesson 2

Lab 2 Solution

This is a quiz given in Roger Peng's Coursera class Computing for Data Analysis. _

Sourced from Research Computing MeetUp's Python course.

import pandas as pd import os data = pd.read_csv(os.path.join('..', '..', 'assets', 'dataset', 'ozone.csv'))
print(data.head())
Ozone Solar.R Wind Temp Month Day 0 41.0 190.0 7.4 67 5 1 1 36.0 118.0 8.0 72 5 2 2 12.0 149.0 12.6 74 5 3 3 18.0 313.0 11.5 62 5 4 4 NaN NaN 14.3 56 5 5

Print the column names of the dataset to the screen, one column name per line.

list(data.columns)
for x in data.columns.values: print(x)
Ozone Solar.R Wind Temp Month Day
data.iloc[:2]
data.loc[:2]

Extract the first 2 rows of the data frame and print them to the console. What does the output look like?

tmp = data.ix[0:1] # or data.head(2) print(tmp.head())
Ozone Solar.R Wind Temp Month Day 0 41.0 190.0 7.4 67 5 1 1 36.0 118.0 8.0 72 5 2
C:\Users\ystrano\AppData\Local\Continuum\anaconda3\lib\site-packages\ipykernel_launcher.py:1: DeprecationWarning: .ix is deprecated. Please use .loc for label based indexing or .iloc for positional indexing See the documentation here: http://pandas.pydata.org/pandas-docs/stable/indexing.html#ix-indexer-is-deprecated """Entry point for launching an IPython kernel.

How many observations (i.e. rows) are in this data frame?

print(len(data))
153

Extract the last 2 rows of the data frame and print them to the console. What does the output look like?

data.iloc[-2:]
data.iloc[[-2,-1]]
tmp = data.tail(2) print(tmp.head())
Ozone Solar.R Wind Temp Month Day 151 18.0 131.0 8.0 76 9 29 152 20.0 223.0 11.5 68 9 30

What is the value of Ozone in the 47th row?

print(data.ix[46:48,])
Ozone Solar.R Wind Temp Month Day 46 21.0 191.0 14.9 77 6 16 47 37.0 284.0 20.7 72 6 17 48 20.0 37.0 9.2 65 6 18
print(data.ix[46:48])
print(data.iloc[46:48])
print(data.loc[46:48])
print(data.ix[47,'Ozone'])
37.0
print(data.loc[47]['Ozone'])

How many missing values are in the Ozone column of this data frame?

print(data['Ozone'].isnull().sum()) print(len(data) - len(data['Ozone'].dropna()))
37 37

What is the mean of the Ozone column in this dataset? Exclude missing values (coded as NA) from this calculation.

print(data['Ozone'].mean())
42.12931034482759

Extract the subset of rows of the data frame where Ozone values are above 31 and Temp values are above 90. What is the mean of "Solar.R" in this subset?

##### Note - if you want to iterate through a dataframe, set a for loop that itterates through the length of the dataframe and then `iloc` into each row
solar_l = [] for i in range(len(data)): row = data.iloc[i] if row['Ozone'] > 31 and row['Temp'] > 90: solar_l.append(row['Solar.R'])
pd.Series(solar_l).mean()
total = 0 for x in solar_l: total += x total / len(solar_l)
subset = data[(data['Ozone'] > 31) & (data['Temp'] > 90)] print(subset)
subset['Solar.R'].mean()
print(data[(data.Ozone > 31) & (data.Temp > 90)].head())
Ozone Solar.R Wind Temp Month Day 68 97.0 267.0 6.3 92 7 8 69 97.0 272.0 5.7 92 7 9 119 76.0 203.0 9.7 97 8 28 120 118.0 225.0 2.3 94 8 29 121 84.0 237.0 6.3 96 8 30
print(data[(data.Ozone > 31) & (data.Temp > 90)]['Solar.R'].mean())
212.8

What is the mean of "Temp" when "Month" is equal to 6?

print(data[data['Month']==6]['Temp'].mean()) print(data[data.Month==6].Temp.mean()) print(data[data.Month==6]['Temp'].mean())
79.1 79.1

What was the maximum ozone value in the month of May (i.e. Month = 5)?

print(data[data['Month']==5]['Ozone'].max()) print(data[data.Month==5].Ozone.max())
115.0

Next Steps

Recommended Resources

NameDescription
Official Pandas TutorialsWes & Company's selection of tutorials and lectures
Julia Evans Pandas CookbookGreat resource with eamples from weather, bikes and 311 calls
Learn Pandas TutorialsA great series of Pandas tutorials from Dave Rojas
Research Computing Python Data PYNBsA super awesome set of python notebooks from a meetup-based course exclusively devoted to pandas