Book a Demo!
CoCalc Logo Icon
StoreFeaturesDocsShareSupportNewsAboutPoliciesSign UpSign In
guipsamora
GitHub Repository: guipsamora/pandas_exercises
Path: blob/master/03_Grouping/Alcohol_Consumption/Exercise_with_solutions.ipynb
613 views
Kernel: Python 3

Ex - GroupBy

Check out Alcohol Consumption Exercises Video Tutorial to watch a data scientist go through the exercises

Introduction:

GroupBy can be summarized as Split-Apply-Combine.

Special thanks to: https://github.com/justmarkham for sharing the dataset and materials.

Check out this Diagram

Step 1. Import the necessary libraries

import pandas as pd

Step 2. Import the dataset from this address.

Step 3. Assign it to a variable called drinks.

drinks = pd.read_csv('https://raw.githubusercontent.com/justmarkham/DAT8/master/data/drinks.csv') drinks.head()

Step 4. Which continent drinks more beer on average?

drinks.groupby('continent').beer_servings.mean()
continent AF 61.471698 AS 37.045455 EU 193.777778 OC 89.687500 SA 175.083333 Name: beer_servings, dtype: float64

Step 5. For each continent print the statistics for wine consumption.

drinks.groupby('continent').wine_servings.describe()
continent AF count 53.000000 mean 16.264151 std 38.846419 min 0.000000 25% 1.000000 50% 2.000000 75% 13.000000 max 233.000000 AS count 44.000000 mean 9.068182 std 21.667034 min 0.000000 25% 0.000000 50% 1.000000 75% 8.000000 max 123.000000 EU count 45.000000 mean 142.222222 std 97.421738 min 0.000000 25% 59.000000 50% 128.000000 75% 195.000000 max 370.000000 OC count 16.000000 mean 35.625000 std 64.555790 min 0.000000 25% 1.000000 50% 8.500000 75% 23.250000 max 212.000000 SA count 12.000000 mean 62.416667 std 88.620189 min 1.000000 25% 3.000000 50% 12.000000 75% 98.500000 max 221.000000 dtype: float64

Step 6. Print the mean alcohol consumption per continent for every column

drinks.groupby('continent').mean()

Step 7. Print the median alcohol consumption per continent for every column

drinks.groupby('continent').median()

Step 8. Print the mean, min and max values for spirit consumption.

This time output a DataFrame

drinks.groupby('continent').spirit_servings.agg(['mean', 'min', 'max'])