Path: blob/main/resources/week-2/DataFrameDataStructure_ed.ipynb
3223 views
The DataFrame data structure is the heart of the Panda's library. It's a primary object that you'll be working with in data analysis and cleaning tasks.
The DataFrame is conceptually a two-dimensional series object, where there's an index and multiple columns of content, with each column having a label. In fact, the distinction between a column and a row is really only a conceptual distinction. And you can think of the DataFrame itself as simply a two-axes labeled array.
---------------------------------------------------------------------------
KeyError Traceback (most recent call last)
c:\users\syy\appdata\local\programs\python\python36-32\lib\site-packages\pandas\core\indexes\base.py in get_loc(self, key, method, tolerance)
2894 try:
-> 2895 return self._engine.get_loc(casted_key)
2896 except KeyError as err:
pandas\_libs\index.pyx in pandas._libs.index.IndexEngine.get_loc()
pandas\_libs\index.pyx in pandas._libs.index.IndexEngine.get_loc()
pandas\_libs\index.pyx in pandas._libs.index.IndexEngine._get_loc_duplicates()
pandas\_libs\index.pyx in pandas._libs.index.IndexEngine._maybe_get_bool_indexer()
pandas\_libs\index.pyx in pandas._libs.index.IndexEngine._unpack_bool_indexer()
KeyError: 'Name'
The above exception was the direct cause of the following exception:
KeyError Traceback (most recent call last)
<ipython-input-14-b44ae13e0b9f> in <module>()
1 # In practice, this works really well since you're often trying to add or drop new columns. However,
2 # this also means that you get a key error if you try and use .loc with a column name
----> 3 df.loc['Name']
c:\users\syy\appdata\local\programs\python\python36-32\lib\site-packages\pandas\core\indexing.py in __getitem__(self, key)
877
878 maybe_callable = com.apply_if_callable(key, self.obj)
--> 879 return self._getitem_axis(maybe_callable, axis=axis)
880
881 def _is_scalar_access(self, key: Tuple):
c:\users\syy\appdata\local\programs\python\python36-32\lib\site-packages\pandas\core\indexing.py in _getitem_axis(self, key, axis)
1108 # fall thru to straight lookup
1109 self._validate_key(key, axis)
-> 1110 return self._get_label(key, axis=axis)
1111
1112 def _get_slice_axis(self, slice_obj: slice, axis: int):
c:\users\syy\appdata\local\programs\python\python36-32\lib\site-packages\pandas\core\indexing.py in _get_label(self, label, axis)
1057 def _get_label(self, label, axis: int):
1058 # GH#5667 this will fail if the label is not present in the axis.
-> 1059 return self.obj.xs(label, axis=axis)
1060
1061 def _handle_lowerdim_multi_index_axis0(self, tup: Tuple):
c:\users\syy\appdata\local\programs\python\python36-32\lib\site-packages\pandas\core\generic.py in xs(self, key, axis, level, drop_level)
3489 loc, new_index = self.index.get_loc_level(key, drop_level=drop_level)
3490 else:
-> 3491 loc = self.index.get_loc(key)
3492
3493 if isinstance(loc, np.ndarray):
c:\users\syy\appdata\local\programs\python\python36-32\lib\site-packages\pandas\core\indexes\base.py in get_loc(self, key, method, tolerance)
2895 return self._engine.get_loc(casted_key)
2896 except KeyError as err:
-> 2897 raise KeyError(key) from err
2898
2899 if tolerance is not None:
KeyError: 'Name'
In this lecture you've learned about the data structure you'll use the most in pandas, the DataFrame. The dataframe is indexed both by row and column, and you can easily select individual rows and project the columns you're interested in using the familiar indexing methods from the Series class. You'll be gaining a lot of experience with the DataFrame in the content to come.