Book a Demo!
CoCalc Logo Icon
StoreFeaturesDocsShareSupportNewsAboutPoliciesSign UpSign In
guipsamora
GitHub Repository: guipsamora/pandas_exercises
Path: blob/master/03_Grouping/Alcohol_Consumption/Exercise_with_solutions.ipynb
819 views
Kernel: Python 3 (ipykernel)

Ex - GroupBy

Check out Alcohol Consumption Exercises Video Tutorial to watch a data scientist go through the exercises

Introduction:

GroupBy can be summarized as Split-Apply-Combine.

Special thanks to: https://github.com/justmarkham for sharing the dataset and materials.

Check out this Diagram

Step 1. Import the necessary libraries

import pandas as pd

Step 2. Import the dataset from this address.

Step 3. Assign it to a variable called drinks.(Watch the values of Column continent NA (North America), and how Pandas interprets it!

drinks = pd.read_csv('https://raw.githubusercontent.com/justmarkham/DAT8/master/data/drinks.csv',keep_default_na=False) drinks.head()

Step 4. Which continent drinks more beer on average?

drinks.groupby('continent').beer_servings.mean()
continent AF 61.471698 AS 37.045455 EU 193.777778 NA 145.434783 OC 89.687500 SA 175.083333 Name: beer_servings, dtype: float64

Step 5. For each continent print the statistics for wine consumption.

drinks.groupby('continent').wine_servings.describe()

Step 6. Print the mean alcohol consumption per continent for every column

drinks.groupby('continent').mean(numeric_only=True)

Step 7. Print the median alcohol consumption per continent for every column

drinks.groupby('continent').median(numeric_only=True)

Step 8. Print the mean, min and max values for spirit consumption for each Continent.

This time output a DataFrame

drinks.groupby('continent').spirit_servings.agg(['mean', 'min', 'max'])