Path: blob/main/resources/week-3/GroupBy_ed.ipynb
3226 views
Sometimes we want to select data based on groups and understand aggregated data on a group level. We have seen that even though Pandas allows us to iterate over every row in a dataframe, it is geneally very slow to do so. Fortunately Pandas has a groupby() function to speed up such task. The idea behind the groupby() function is that it takes some dataframe, splits it into chunks based on some key values, applies computation on those chunks, then combines the results back together into another dataframe. In pandas this is refered to as the split-apply-combine pattern.
Splitting
Applying
Aggregation
Transformation
Filtering
Applying
Groupby is a powerful and commonly used tool for data cleaning and data analysis. Once you have grouped the data by some category you have a dataframe of just those values and you can conduct aggregated analsyis on the segments that you are interested. The groupby() function follows a split-apply-combine approach - first the data is split into subgroups, then you can apply some transformation, filtering, or aggregation, then the results are combined automatically by pandas for us.