GitHub Repository: ycchen00/Introduction-to-Data-Science-in-Python
Path: blob/main/quiz/quiz2.ipynb
³²²³ views

Kernel: Python 3

Q1

For the following code, which of the following statements will not return True?

In [1]:

import pandas as pd

sdata = {'Ohio': 35000, 'Texas': 71000, 'Oregon': 16000, 'Utah': 5000}
obj1 = pd.Series(sdata)
states = ['California', 'Ohio', 'Oregon', 'Texas']
obj2 = pd.Series(sdata, index=states)
obj3 = pd.isnull(obj2)

In [3]:

import math
math.isnan(obj2['California'])

Out[3]:

True

In [4]:

obj3['California']

Out[4]:

True

In [5]:

x=obj2['California']
obj2['California']!=x

Out[5]:

True

In [6]:

obj2['California']==None

Out[6]:

False

Q2

In the above python code, the keys of the dictionary d represent student ranks and the value for each key is a student name. Which of the following can be used to extract rows with student ranks that are lower than or equal to 3?

In [7]:

import pandas as pd
d = {
    '1': 'Alice',
    '2': 'Bob',
    '3': 'Rita',
    '4': 'Molly',
    '5': 'Ryan'
}
S = pd.Series(d)

In [8]:

S.iloc[0:3]

Out[8]:

  Alice
    Bob
   Rita
dtype: object

Q3

Suppose we have a DataFrame named df. We want to change the original DataFrame df in a way that all the column names are cast to upper case. Which of the following expressions is incorrect to perform the same?

In [17]:

import pandas as pd
df = pd.read_csv('../resources/week-2/datasets/Admission_Predict.csv', index_col=0)

In [23]:

df = pd.read_csv('../resources/week-2/datasets/Admission_Predict.csv', index_col=0)
df=df.rename(mapper=lambda x: x.upper(), axis='columns')
df

Out[23]:

In [24]:

df = pd.read_csv('../resources/week-2/datasets/Admission_Predict.csv', index_col=0)
df=df.rename(mapper=lambda x: x.upper(), axis=1)
df

Out[24]:

In [26]:

df = pd.read_csv('../resources/week-2/datasets/Admission_Predict.csv', index_col=0)
df.rename(mapper=lambda x: x.upper(), axis=1,inplace='True')
df

Out[26]:

In [27]:

df = pd.read_csv('../resources/week-2/datasets/Admission_Predict.csv', index_col=0)
df.rename(mapper=lambda x: x.upper(), axis=1)
df

Out[27]:

Q4

For the given DataFrame df we want to keep only the records with a toefl score greater than 105. Which of the following will not work?

In [32]:

df = pd.read_csv('../resources/week-2/datasets/Admission_Predict.csv', index_col=0)
df.columns = [x.lower().strip() for x in df.columns]
df=df[['gre score','toefl score']]

In [30]:

df

Out[30]:

In [33]:

df.where(df['toefl score']>105)

Out[33]:

In [34]:

df[df['toefl score']>105]

Out[34]:

In [35]:

df.where(df['toefl score']>105).dropna()

Out[35]:

Q5

Which of the following can be used to create a DataFrame in Pandas? √: Python dict / Pandas Series object / 2D ndarray ×:

Q6

Which of the following is an incorrect way to drop entries from the Pandas DataFrame named df shown below?

In [36]:

import pandas as pd
df = pd.read_csv('../resources/week-2/datasets/Admission_Predict.csv', index_col=0)
df = df.head()

In [37]:

df

Out[37]:

In [41]:

df.drop('LOR ',axis = 1)

Out[41]:

In [43]:

df.drop(1)

Out[43]:

In [44]:

df.drop([2,3])

Out[44]:

In [38]:

df.drop('SOP')

Out[38]:

---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
<ipython-input-38-552270aa5b54> in <module>()
----> 1 df.drop('SOP')

c:\users\syy\appdata\local\programs\python\python36-32\lib\site-packages\pandas\core\frame.py in drop(self, labels, axis, index, columns, level, inplace, errors)
   4172             level=level,
   4173             inplace=inplace,
-> 4174             errors=errors,
   4175         )
   4176 
c:\users\syy\appdata\local\programs\python\python36-32\lib\site-packages\pandas\core\generic.py in drop(self, labels, axis, index, columns, level, inplace, errors)
   3885         for axis, labels in axes.items():
   3886             if labels is not None:
-> 3887                 obj = obj._drop_axis(labels, axis, level=level, errors=errors)
   3888 
   3889         if inplace:
c:\users\syy\appdata\local\programs\python\python36-32\lib\site-packages\pandas\core\generic.py in _drop_axis(self, labels, axis, level, errors)
   3919                 new_axis = axis.drop(labels, level=level, errors=errors)
   3920             else:
-> 3921                 new_axis = axis.drop(labels, errors=errors)
   3922             result = self.reindex(**{axis_name: new_axis})
   3923 
c:\users\syy\appdata\local\programs\python\python36-32\lib\site-packages\pandas\core\indexes\base.py in drop(self, labels, errors)
   5282         if mask.any():
   5283             if errors != "ignore":
-> 5284                 raise KeyError(f"{labels[mask]} not found in axis")
   5285             indexer = indexer[~mask]
   5286         return self.delete(indexer)
KeyError: "['SOP'] not found in axis"

Q7

For the Series s1 and s2 defined below, which of the following statements will give an error?

In [45]:

import pandas as pd
s1 = pd.Series({1: 'Alice', 2: 'Jack', 3: 'Molly'})
s2 = pd.Series({'Alice': 1, 'Jack': 2, 'Molly': 3})

In [49]:

s1.loc[1]

Out[49]:

'Alice'

In [46]:

s2[1]

Out[46]:

2

In [47]:

s2.iloc[1]

Out[47]:

2

In [48]:

s2.loc[1]

Out[48]:

---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
c:\users\syy\appdata\local\programs\python\python36-32\lib\site-packages\pandas\core\indexes\base.py in get_loc(self, key, method, tolerance)
   2894             try:
-> 2895                 return self._engine.get_loc(casted_key)
   2896             except KeyError as err:
pandas\_libs\index.pyx in pandas._libs.index.IndexEngine.get_loc()
pandas\_libs\index.pyx in pandas._libs.index.IndexEngine.get_loc()
pandas\_libs\hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()
pandas\_libs\hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()
KeyError: 1

The above exception was the direct cause of the following exception:
KeyError                                  Traceback (most recent call last)
<ipython-input-48-42fe26c38f36> in <module>()
----> 1 s2.loc[1]

c:\users\syy\appdata\local\programs\python\python36-32\lib\site-packages\pandas\core\indexing.py in __getitem__(self, key)
    877 
    878             maybe_callable = com.apply_if_callable(key, self.obj)
--> 879             return self._getitem_axis(maybe_callable, axis=axis)
    880 
    881     def _is_scalar_access(self, key: Tuple):
c:\users\syy\appdata\local\programs\python\python36-32\lib\site-packages\pandas\core\indexing.py in _getitem_axis(self, key, axis)
   1108         # fall thru to straight lookup
   1109         self._validate_key(key, axis)
-> 1110         return self._get_label(key, axis=axis)
   1111 
   1112     def _get_slice_axis(self, slice_obj: slice, axis: int):
c:\users\syy\appdata\local\programs\python\python36-32\lib\site-packages\pandas\core\indexing.py in _get_label(self, label, axis)
   1057     def _get_label(self, label, axis: int):
   1058         # GH#5667 this will fail if the label is not present in the axis.
-> 1059         return self.obj.xs(label, axis=axis)
   1060 
   1061     def _handle_lowerdim_multi_index_axis0(self, tup: Tuple):
c:\users\syy\appdata\local\programs\python\python36-32\lib\site-packages\pandas\core\generic.py in xs(self, key, axis, level, drop_level)
   3489             loc, new_index = self.index.get_loc_level(key, drop_level=drop_level)
   3490         else:
-> 3491             loc = self.index.get_loc(key)
   3492 
   3493             if isinstance(loc, np.ndarray):
c:\users\syy\appdata\local\programs\python\python36-32\lib\site-packages\pandas\core\indexes\base.py in get_loc(self, key, method, tolerance)
   2895                 return self._engine.get_loc(casted_key)
   2896             except KeyError as err:
-> 2897                 raise KeyError(key) from err
   2898 
   2899         if tolerance is not None:
KeyError: 1

Q8

Which of the following statements is incorrect?

TIP: Keep in mind that iloc and loc are not methods, they are attributes

Q9

For the given DataFrame df shown above, we want to get all records with a toefl score greater than 105 but smaller than 115. Which of the following expressions is incorrect to perform the same?

In [52]:

df = pd.read_csv('../resources/week-2/datasets/Admission_Predict.csv', index_col=0)
df.columns = [x.lower().strip() for x in df.columns]
df=df[['gre score','toefl score']]
df=df.head()
df

Out[52]:

In [53]:

df[(df['toefl score'].isin(range(106,115)))]

Out[53]:

In [55]:

df[df['toefl score'].gt(105) & df['toefl score'].lt(115)]

Out[55]:

In [56]:

df[(df['toefl score']>105) & (df['toefl score']<115)]

Out[56]:

In [57]:

(df['toefl score']>105) & (df['toefl score']<115)

Out[57]:

Serial No.
  False
   True
  False
   True
  False
Name: toefl score, dtype: bool

Q10

Which of the following is the correct way to extract all information related to the student named Alice from the DataFrame df given below:

In [59]:

students = [{'Name': 'Alice',
              'Age': 20,
              'Gender': 'F'},
            {'Name': 'Jack',
             'Age': 22,
             'Gender': 'M'}]

df = pd.DataFrame(students, index=['Mathematics','Sociology'])

df

Out[59]:

In [65]:

# df['Mathematics']
# df['Alice']
# df.iloc['Mathematics']
df.T['Mathematics']

Out[65]:

Name      Alice
Age          20
Gender        F
Name: Mathematics, dtype: object

Q1

Q2

Q3

Q4

Q5

Q6

Q7

Q8

Q9

Q10

Product

Resources

Company