GitHub Repository: ycchen00/Introduction-to-Data-Science-in-Python
Path: blob/main/quiz/FinalQuiz.ipynb
²²⁴⁶ views

Kernel: Python 3

Final Quiz

Q1

Consider the given NumPy arrays a and b. What will be the value of c after the following code is executed?

In [2]:

import numpy as np

a = np.arange(8)
b = a[4:6]
b[:] = 40
c = a[4] + a[6]

c

Out[2]:

46

Q2

Given the string s as shown below, which of the following expressions will be True?

In [3]:

import re
s = 'ABCAC'

In [16]:

print(re.split('A', s))
len(re.split('A', s)) == 2

Out[16]:

['', 'BC', 'C']

False

In [8]:

bool(re.match('A', s)) == True

Out[8]:

True

In [15]:

print(re.match('A', s))
re.match('A', s) == True

Out[15]:

<_sre.SRE_Match object; span=(0, 1), match='A'>

False

In [13]:

# len(re.search('A', s)) == 2
print(re.search('A', s))
print("TypeError: object of type '_sre.SRE_Match' has no len()")

Out[13]:

<_sre.SRE_Match object; span=(0, 1), match='A'>
TypeError: object of type '_sre.SRE_Match' has no len()

Q3

Consider a string s. We want to find all characters (other than A) which are followed by triple A, i.e., have AAA to the right. We don’t want to include the triple A in the output and just want the character immediately preceding AAA . Complete the code given below that would output the required result.

In [18]:

def result():
    s = 'ACAABAACAAABACDBADDDFSDDDFFSSSASDAFAAACBAAAFASD'

    result = []
    # compete the pattern below
    pattern = "(\w)(?=[A]{3})"
    for item in re.finditer(pattern, s):
      # identify the group number below.
      result.append(item.group())
      
    return result

In [19]:

result()

Out[19]:

['C', 'F', 'B']

Q4

Consider the following 4 expressions regarding the above pandas Series df. All of them have the same value except one expression. Can you identify which one it is?

In [23]:

import pandas as pd
df=pd.Series({'d':4,'b':7,'a':-5,'c':3})
df

Out[23]:

d    4
b    7
a   -5
c    3
dtype: int64

In [24]:

df.iloc[0]

Out[24]:

4

In [25]:

df['d']

Out[25]:

4

In [26]:

df.index[0]

Out[26]:

'd'

In [27]:

df[0]

Out[27]:

4

Q5

Consider the two pandas Series objects shown belwo, representing the no. of items of different yogurt flavors that were sold in a day from two different stores, s1 and s2. Which of the following statements is True regarding the Series s3 defined below?

In [28]:

s1=pd.Series({
    'Mango':20,
    'Strawberry':15,
    'Blueberry':18,
    'Vanilla':31
})

s2=pd.Series({
    'Mango':20,
    'Strawberry':20,
    'Vanilla':30,
    'Banana':15,
    'Plain':20
})

In [29]:

s1

Out[29]:

Mango         20
Strawberry    15
Blueberry     18
Vanilla       31
dtype: int64

In [30]:

s2

Out[30]:

Mango         20
Strawberry    20
Vanilla       30
Banana        15
Plain         20
dtype: int64

In [31]:

s3=s1.add(s2)

In [32]:

s3

Out[32]:

Banana         NaN
Blueberry      NaN
Mango         40.0
Plain          NaN
Strawberry    35.0
Vanilla       61.0
dtype: float64

In [33]:

s3['Blueberry']==s1['Blueberry']

Out[33]:

False

In [35]:

s3['Mango'] >= s1.add(s2,fill_value=0)['Mango']

Out[35]:

True

In [36]:

s3['Blueberry'] >= s1.add(s2,fill_value=0)['Blueberry']

Out[36]:

False

In [34]:

s3['Plain']>=s3['Mango']

Out[34]:

False

Q6

In the following list of statements regarding a DataFrame df, one or more statements are correct. Can you identify all the correct statements?

In [40]:

data = pd.DataFrame(data=[['bar','one','z','1'],
                          ['bar','two','v','2'],
                          ['foo','one','x','3'],
                          ['foo','two','w','4']],
                   columns=['a','b','c','d'])
                    
data

Out[40]:

In [51]:

indexed1 = data.set_index('c')
indexed1

Out[51]:

In [53]:

indexed2 = indexed1.set_index('a')
indexed2

Out[53]:

In [48]:

reindexed1 = data.set_index('c')
reindexed1

Out[48]:

In [50]:

reindexed2 = reindexed1.reset_index()
reindexed2

Out[50]:

Q7

Consider the Series object S defined below. Which of the following is an incorrect way to slice S such that we obtain all data points corresponding to the indices 'b', 'c', and 'd'?

In [55]:

S = pd.Series(np.arange(5), index=['a', 'b', 'c', 'd', 'e'])
S

Out[55]:

a    0
b    1
c    2
d    3
e    4
dtype: int32

In [56]:

S['b':'e']

Out[56]:

b    1
c    2
d    3
e    4
dtype: int32

In [57]:

S[['b','c','d']]

Out[57]:

b    1
c    2
d    3
dtype: int32

In [58]:

S[S<=3][S>0]

Out[58]:

b    1
c    2
d    3
dtype: int32

In [59]:

S[1:4]

Out[59]:

b    1
c    2
d    3
dtype: int32

Q8

Consider the DataFrame df shown above with indexes 'R1', 'R2', 'R3', and 'R4'. In the following code, a new DataFrame df_new is created using df. What will be the value of df_new[1] after the below code is executed?

In [61]:

df = pd.DataFrame([
    {'a':5,'b':6,'c':20},
    {'a':5,'b':82,'c':28},
    {'a':71,'b':31,'c':92},
    {'a':67,'b':37,'c':49}], 
    index=['R1', 'R2', 'R3','R4'])
df

Out[61]:

In [66]:

f = lambda x: x.max() + x.min()
df_new = df.apply(f)

df_new[1]

Out[66]:

88

Q9

Consider the DataFrame named new_df shown above. Which of the following expressions will output the result (showing the head of a DataFrame) below?

In [69]:

import pandas as pd
import numpy as np
df = pd.read_csv('../resources/week-3/datasets/cwurData.csv')
df.head()

Out[69]:

In [71]:

def create_category(ranking):
    if (ranking >= 1) & (ranking <= 100):
        return "First Tier Top Unversity"
    elif (ranking >= 101) & (ranking <= 200):
        return "Second Tier Top Unversity"
    elif (ranking >= 201) & (ranking <= 300):
        return "Third Tier Top Unversity"
    return "Other Top Unversity"

df['Rank_Level'] = df['world_rank'].apply(lambda x: create_category(x))

new_df=df.pivot_table(values='score', index='country', columns='Rank_Level', aggfunc=[np.mean, np.max], 
               margins=True)

new_df.head()

Out[71]:

In [73]:

new_df.unstack()

Out[73]:

      Rank_Level                country             
mean  First Tier Top Unversity  Argentina                    NaN
                                Australia                47.9425
                                Austria                      NaN
                                Belgium                  51.8750
                                Brazil                       NaN
                                                          ...   
amax  All                       Uganda                   44.4000
                                United Arab Emirates     44.3600
                                United Kingdom           97.6400
                                Uruguay                  44.3500
                                All                     100.0000
Length: 600, dtype: float64

In [74]:

new_df.stack()

Out[74]:

In [75]:

new_df.stack().stack()

Out[75]:

country    Rank_Level                     
Argentina  Other Top Unversity        mean     44.672857
                                      amax     45.660000
           All                        mean     44.672857
                                      amax     45.660000
Australia  First Tier Top Unversity   mean     47.942500
                                                 ...    
All        Second Tier Top Unversity  amax     51.290000
           Third Tier Top Unversity   mean     46.843450
                                      amax     47.930000
           All                        mean     47.798395
                                      amax    100.000000
Length: 386, dtype: float64

In [76]:

new_df.unstack().unstack()

Out[76]:

Q10

Consider the DataFrame df shown above. What will be the output (rounded to the nearest integer) when the following code related to df is executed:

In [78]:

df = pd.DataFrame([
    {'Item':'item_1','Store':'A','Quantity sold':10},
    {'Item':'item_1','Store':'B','Quantity sold':20},
    {'Item':'item_1','Store':'C','Quantity sold':None},
    {'Item':'item_2','Store':'A','Quantity sold':5},
    {'Item':'item_2','Store':'B','Quantity sold':10},
    {'Item':'item_2','Store':'C','Quantity sold':15}])
df

Out[78]:

In [79]:

df.groupby('Item').sum().iloc[0]['Quantity sold']

Out[79]:

30.0

Final Quiz

Q1

Q2

Q3

Q4

Q5

Q6

Q7

Q8

Q9

Q10

Product

Resources

Company