Book a Demo!
CoCalc Logo Icon
StoreFeaturesDocsShareSupportNewsAboutPoliciesSign UpSign In
guipsamora
GitHub Repository: guipsamora/pandas_exercises
Path: blob/master/03_Grouping/Regiment/Exercises_solutions.ipynb
613 views
Kernel: Python 3

Regiment

Check out Regiment Exercises Video Tutorial to watch a data scientist go through the exercises

Introduction:

Special thanks to: http://chrisalbon.com/ for sharing the dataset and materials.

Step 1. Import the necessary libraries

import pandas as pd

Step 2. Create the DataFrame with the following values:

raw_data = {'regiment': ['Nighthawks', 'Nighthawks', 'Nighthawks', 'Nighthawks', 'Dragoons', 'Dragoons', 'Dragoons', 'Dragoons', 'Scouts', 'Scouts', 'Scouts', 'Scouts'], 'company': ['1st', '1st', '2nd', '2nd', '1st', '1st', '2nd', '2nd','1st', '1st', '2nd', '2nd'], 'name': ['Miller', 'Jacobson', 'Ali', 'Milner', 'Cooze', 'Jacon', 'Ryaner', 'Sone', 'Sloan', 'Piger', 'Riani', 'Ali'], 'preTestScore': [4, 24, 31, 2, 3, 4, 24, 31, 2, 3, 2, 3], 'postTestScore': [25, 94, 57, 62, 70, 25, 94, 57, 62, 70, 62, 70]}

Step 3. Assign it to a variable called regiment.

Don't forget to name each column

regiment = pd.DataFrame(raw_data, columns = raw_data.keys()) regiment

Step 4. What is the mean preTestScore from the regiment Nighthawks?

regiment[regiment['regiment'] == 'Nighthawks'].groupby('regiment').mean()

Step 5. Present general statistics by company

regiment.groupby('company').describe()

Step 6. What is the mean of each company's preTestScore?

regiment.groupby('company').preTestScore.mean()
company 1st 6.666667 2nd 15.500000 Name: preTestScore, dtype: float64

Step 7. Present the mean preTestScores grouped by regiment and company

regiment.groupby(['regiment', 'company']).preTestScore.mean()
regiment company Dragoons 1st 3.5 2nd 27.5 Nighthawks 1st 14.0 2nd 16.5 Scouts 1st 2.5 2nd 2.5 Name: preTestScore, dtype: float64

Step 8. Present the mean preTestScores grouped by regiment and company without heirarchical indexing

regiment.groupby(['regiment', 'company']).preTestScore.mean().unstack()

Step 9. Group the entire dataframe by regiment and company

regiment.groupby(['regiment', 'company']).mean()

Step 10. What is the number of observations in each regiment and company

regiment.groupby(['company', 'regiment']).size()
company regiment 1st Dragoons 2 Nighthawks 2 Scouts 2 2nd Dragoons 2 Nighthawks 2 Scouts 2 dtype: int64

Step 11. Iterate over a group and print the name and the whole data from the regiment

# Group the dataframe by regiment, and for each regiment, for name, group in regiment.groupby('regiment'): # print the name of the regiment print(name) # print the data of that regiment print(group)
Dragoons regiment company name preTestScore postTestScore 4 Dragoons 1st Cooze 3 70 5 Dragoons 1st Jacon 4 25 6 Dragoons 2nd Ryaner 24 94 7 Dragoons 2nd Sone 31 57 Nighthawks regiment company name preTestScore postTestScore 0 Nighthawks 1st Miller 4 25 1 Nighthawks 1st Jacobson 24 94 2 Nighthawks 2nd Ali 31 57 3 Nighthawks 2nd Milner 2 62 Scouts regiment company name preTestScore postTestScore 8 Scouts 1st Sloan 2 62 9 Scouts 1st Piger 3 70 10 Scouts 2nd Riani 2 62 11 Scouts 2nd Ali 3 70