Book a Demo!
CoCalc Logo Icon
StoreFeaturesDocsShareSupportNewsAboutPoliciesSign UpSign In
guipsamora
GitHub Repository: guipsamora/pandas_exercises
Path: blob/master/06_Stats/US_Baby_Names/Exercises.ipynb
548 views
Kernel: Python [default]

US - Baby Names

Introduction:

We are going to use a subset of US Baby Names from Kaggle. In the file it will be names from 2004 until 2014

Step 1. Import the necessary libraries

Step 2. Import the dataset from this address.

Step 3. Assign it to a variable called baby_names.

Step 4. See the first 10 entries

Step 5. Delete the column 'Unnamed: 0' and 'Id'

Step 6. Is there more male or female names in the dataset?

Step 7. Group the dataset by name and assign to names

Step 8. How many different names exist in the dataset?

Step 9. What is the name with most occurrences?

Step 10. How many different names have the least occurrences?

Step 11. What is the median name occurrence?

Step 12. What is the standard deviation of names?

Step 13. Get a summary with the mean, min, max, std and quartiles.