Book a Demo!
CoCalc Logo Icon
StoreFeaturesDocsShareSupportNewsAboutPoliciesSign UpSign In
ycchen00
GitHub Repository: ycchen00/Introduction-to-Data-Science-in-Python
Path: blob/main/quiz/quiz3.ipynb
3223 views
Kernel: Python 3

Q1

Consider the two DataFrames shown below, both of which have Name as the index. Which of the following expressions can be used to get the data of all students (from student_df) including their roles as staff, where nan denotes no role? MergingDataFrame_ed %E5%9B%BE%E7%89%87.png

import numpy as np import pandas as pd
# First we create two DataFrames, staff and students. staff_df = pd.DataFrame([{'Name': 'Kelly', 'Role': 'Director of HR'}, {'Name': 'Sally', 'Role': 'Course liasion'}, {'Name': 'James', 'Role': 'Grader'}]) # And lets index these staff by name staff_df = staff_df.set_index('Name') # Now we'll create a student dataframe student_df = pd.DataFrame([{'Name': 'James', 'School': 'Business'}, {'Name': 'Mike', 'School': 'Law'}, {'Name': 'Sally', 'School': 'Engineering'}]) # And we'll index this by name too student_df = student_df.set_index('Name')
staff_df
student_df
pd.merge(student_df, staff_df, how='right', left_index=True, right_index=True)
# Correct pd.merge(student_df, staff_df, how='left', left_index=True, right_index=True)
pd.merge(staff_df, student_df, how='left', left_index=True, right_index=True)
# pd.merge(staff_df, student_df, how='right', left_index=False, right_index=True) print('Wrong! : Must pass left_on or left_index=True')
Wrong! : Must pass left_on or left_index=True

Q2

Consider a DataFrame named df with columns named P2010, P2011, P2012, P2013, P2014 and P2015 containing float values. We want to use the apply method to get a new DataFrame named result_df with a new column AVG. The AVG column should average the float values across P2010 to P2015. The apply method should also remove the 6 original columns (P2010 to P2015). For that, what should be the value of x and y in the given code? PandasIdioms_ed

df = pd.read_csv('../resources/week-3/datasets/census.csv') \ .rename(columns={ \ 'POPESTIMATE2010': 'P2010', 'POPESTIMATE2011': 'P2011', 'POPESTIMATE2012': 'P2012', 'POPESTIMATE2013': 'P2013', 'POPESTIMATE2014': 'P2014', 'POPESTIMATE2015': 'P2015'}).dropna() \ [['P2010', 'P2011', 'P2012', 'P2013','P2014', 'P2015']] # [['POPESTIMATE2010', # 'POPESTIMATE2011', # 'POPESTIMATE2012', # 'POPESTIMATE2013', # 'POPESTIMATE2014', # 'POPESTIMATE2015']] \ df.head()
# axis = 1 == axis = 'columns' x=1 y=1 frames = ['P2010', 'P2011', 'P2012', 'P2013','P2014', 'P2015'] df['AVG'] = df[frames].apply(lambda z: np.mean(z), axis=x) result_df = df.drop(frames,axis=y) result_df .head()

Q3

Consider the Dataframe df below, instatiated with a list of grades, ordered from best grade to worst. Which of the following options can be used to substitute X in the code given below, if we want to get all the grades between 'A' and 'B' where 'A' is better than 'B'? Scales

import pandas as pd df = pd.DataFrame(['A+', 'A', 'A-', 'B+', 'B', 'B-', 'C+', 'C', 'C-', 'D+', 'D'], index=['excellent', 'excellent', 'excellent', 'good', 'good', 'good', 'ok', 'ok', 'ok', 'poor', 'poor'], columns = ['Grades']) df
# Correct my_categories= pd.CategoricalDtype(categories=['D','D+','C-','C','C+','B-','B','B+','A-','A','A+'], ordered=True)
# my_categories= pd.CategoricalDtype(categories=['A+', 'A', 'A-', 'B+', 'B', 'B-', 'C+', 'C', 'C-', 'D+', 'D']) print('ERROE! Unordered Categoricals can only compare equality or not')
ERROE! Unordered Categoricals can only compare equality or not
# my_categories= pd.CategoricalDtype(categories=['D','D+','C-','C','C+','B-','B','B+','A-','A','A+']) print('ERROE! Unordered Categoricals can only compare equality or not')
ERROE! Unordered Categoricals can only compare equality or not
# my_categories= (['A+', 'A', 'A-', 'B+', 'B', 'B-', 'C+', 'C', 'C-', 'D+', 'D'],ordered=True) print('SyntaxError: invalid syntax')
SyntaxError: invalid syntax
grades = df['Grades'].astype(my_categories) result = grades[(grades>'B') & (grades<'A')] result
excellent A- good B+ Name: Grades, dtype: category Categories (11, object): ['D' < 'D+' < 'C-' < 'C' ... 'B+' < 'A-' < 'A' < 'A+']

Q4

Consider the DataFrame df shown in the image below. Which of the following can return the head of the pivot table as shown in the image below df? PivotTable_ed %E5%9B%BE%E7%89%87.png

df = pd.read_csv('../resources/week-3/datasets/cwurData.csv')#[['world_rank','institution','country']] def create_category(ranking): # Since the rank is just an integer, I'll just do a bunch of if/elif statements if (ranking >= 1) & (ranking <= 100): return "First Tier Top Unversity" elif (ranking >= 101) & (ranking <= 200): return "Second Tier Top Unversity" elif (ranking >= 201) & (ranking <= 300): return "Third Tier Top Unversity" return "Other Top Unversity" # Now we can apply this to a single column of data to create a new series df['Rank_Level'] = df['world_rank'].apply(lambda x: create_category(x)) df.head()
df.pivot_table(values='score', index='country', columns='Rank_Level', aggfunc=[np.median]).head()
df.pivot_table(values='score', index='Rank_Level', columns='country', aggfunc=[np.median]).head()
df.pivot_table(values='score', index='Rank_Level', columns='country', aggfunc=[np.median], margins=True).head()
# Correct df.pivot_table(values='score', index='country', columns='Rank_Level', aggfunc=[np.median], margins=True).head()

Q5

Assume that the date '11/29/2019' in MM/DD/YYYY format is the 4th day of the week, what will be the result of the following? DateFunctionality_ed

import pandas as pd (pd.Timestamp('11/29/2019') + pd.offsets.MonthEnd()).weekday()
5

Q6

Consider a DataFrame df. We want to create groups based on the column group_key in the DataFrame and fill the nan values with group means using:

filling_mean = lambda g: g.fillna(g.mean())

Which of the following is correct for performing this task? GroupBy_ed

ipl_data = {'Team': ['Riders', 'Riders', 'Devils', 'Devils', 'Kings', 'kings', 'Kings', 'Kings', 'Riders', 'Royals', 'Royals', 'Riders'], 'Rank': [1, 2, 2, 3, 3,4 ,1 ,1,2 , 4,1,2], 'Year': [2014,2015,2014,2015,2014,2015,2016,2017,2016,2014,None,2017], 'Points':[876,789,863,None,741,812,None,788,694,701,804,690]} df = pd.DataFrame(ipl_data) df
filling_mean = lambda g: g.fillna(g.mean()) group_key='Team'
# df.groupby(group_key).aggregate(filling_mean) print('ValueError: Shape of passed values is (4, 5), indices imply (3, 5)')
ValueError: Shape of passed values is (4, 5), indices imply (3, 5)
# df.groupby(group_key).filling_mean() print("AttributeError: 'DataFrameGroupBy' object has no attribute 'filling_mean'")
AttributeError: 'DataFrameGroupBy' object has no attribute 'filling_mean'
df.groupby(group_key).transform(filling_mean)
# Correct df.groupby(group_key).apply(filling_mean)

Q7

Consider the DataFrames above, both of which have a standard integer based index. Which of the following can be used to get the data of all students (from student_df) and merge it with their staff roles where nan denotes no role? MergingDataFrame_ed %E5%9B%BE%E7%89%87.png

staff_df = pd.DataFrame([{'First Name': 'Kelly', 'Last Name': 'Desjardins', 'Role': 'Director of HR'}, {'First Name': 'Sally', 'Last Name': 'Brooks', 'Role': 'Course liasion'}, {'First Name': 'James', 'Last Name': 'Wilde', 'Role': 'Grader'}]) student_df = pd.DataFrame([{'First Name': 'James', 'Last Name': 'Hammond', 'School': 'Business'}, {'First Name': 'Mike', 'Last Name': 'Smith', 'School': 'Law'}, {'First Name': 'Sally', 'Last Name': 'Brooks', 'School': 'Engineering'}])
student_df
staff_df
pd.merge(staff_df, student_df, how='outer', on=['First Name','Last Name'])
pd.merge(student_df, staff_df, how='inner', on=['First Name','Last Name'])
# Correct pd.merge(staff_df, student_df, how='right', on=['First Name','Last Name'])
pd.merge(student_df, staff_df, how='right', on=['First Name','Last Name'])

Q8

Consider a DataFrame df with columns name, reviews_per_month, and review_scores_value. This DataFrame also consists of several missing values. Which of the following can be used to: i) calculate the number of entries in the name column, and ii) calculate the mean and standard deviation of the reviews_per_month, grouping by different review_scores_value? GroupBy_ed

df=pd.read_csv("../resources/week-3/datasets/listings.csv")[['name', 'reviews_per_month', 'review_scores_value']] df.head()
df.agg({'name':len,'reviews_per_month':(np.mean,np.std)})
df.agg({'name':len,'reviews_per_month':(np.nanmean,np.nanstd)})
df.groupby('review_scores_value').agg({'name':len,'reviews_per_month':(np.nanmean,np.nanstd)})
df.groupby('review_scores_value').agg({'name':len,'reviews_per_month':(np.mean,np.std)})

Q9

What will be the result of the following code?: DateFunctionality_ed

import pandas as pd pd.Period('01/12/2019', 'M') + 5
Period('2019-06', 'M')

Q10

Which of the following is not a valid expression to create a Pandas GroupBy object from the DataFrame shown below? GroupBy_ed %E5%9B%BE%E7%89%87.png

df = pd.DataFrame([{'class': 'fruit', 'avg calories per unit': '95'}, {'class': 'fruit', 'avg calories per unit': '202'}, {'class': 'vegetable', 'avg calories per unit': '164'}, {'class': 'vegetable', 'avg calories per unit': None}, {'class': 'vegetable', 'avg calories per unit': '207'}, ],['apple','mango','potato','onion','broccoli']) df
grouped = df.groupby(['class','avg calories per unit']) # print(grouped) # grouped.head() for group, frame in grouped: print(group)
('fruit', '202') ('fruit', '95') ('vegetable', '164') ('vegetable', '207')
grouped = df.groupby('class') # grouped.head() for group, frame in grouped: print(group)
fruit vegetable
grouped = df.groupby('class',axis=0) # grouped.head() for group, frame in grouped: print(group)
fruit vegetable
df.groupby('vegetable')
--------------------------------------------------------------------------- KeyError Traceback (most recent call last) <ipython-input-46-a901e4cdd01b> in <module>() ----> 1 df.groupby('vegetable') c:\users\syy\appdata\local\programs\python\python36-32\lib\site-packages\pandas\core\frame.py in groupby(self, by, axis, level, as_index, sort, group_keys, squeeze, observed, dropna) 6523 squeeze=squeeze, 6524 observed=observed, -> 6525 dropna=dropna, 6526 ) 6527 c:\users\syy\appdata\local\programs\python\python36-32\lib\site-packages\pandas\core\groupby\groupby.py in __init__(self, obj, keys, axis, level, grouper, exclusions, selection, as_index, sort, group_keys, squeeze, observed, mutated, dropna) 531 observed=observed, 532 mutated=self.mutated, --> 533 dropna=self.dropna, 534 ) 535 c:\users\syy\appdata\local\programs\python\python36-32\lib\site-packages\pandas\core\groupby\grouper.py in get_grouper(obj, key, axis, level, sort, observed, mutated, validate, dropna) 784 in_axis, name, level, gpr = False, None, gpr, None 785 else: --> 786 raise KeyError(gpr) 787 elif isinstance(gpr, Grouper) and gpr.key is not None: 788 # Add key to exclusions KeyError: 'vegetable'