GitHub Repository: guipsamora/pandas_exercises
Path: blob/master/06_Stats/US_Baby_Names/Exercises.ipynb
⁵⁴⁸ views

Kernel: Python [default]

US - Baby Names

Introduction:

We are going to use a subset of US Baby Names from Kaggle. In the file it will be names from 2004 until 2014

Step 1. Import the necessary libraries

In [ ]:

Step 2. Import the dataset from this address.

Step 3. Assign it to a variable called baby_names.

In [ ]:

Step 4. See the first 10 entries

In [ ]:

Step 5. Delete the column 'Unnamed: 0' and 'Id'

In [ ]:

Step 6. Is there more male or female names in the dataset?

In [ ]:

Step 7. Group the dataset by name and assign to names

In [ ]:

Step 8. How many different names exist in the dataset?

In [ ]:

Step 9. What is the name with most occurrences?

In [ ]:

Step 10. How many different names have the least occurrences?

In [ ]:

Step 11. What is the median name occurrence?

In [ ]:

Step 12. What is the standard deviation of names?

In [ ]:

Step 13. Get a summary with the mean, min, max, std and quartiles.

In [ ]: