GitHub Repository: guipsamora/pandas_exercises
Path: blob/master/06_Stats/US_Baby_Names/Solutions.ipynb
⁵⁴⁸ views

Kernel: Python [default]

US - Baby Names

Introduction:

We are going to use a subset of US Baby Names from Kaggle. In the file it will be names from 2004 until 2014

Step 1. Import the necessary libraries

In [1]:

Step 2. Import the dataset from this address.

Step 3. Assign it to a variable called baby_names.

In [2]:

Out[2]:

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1016395 entries, 0 to 1016394
Data columns (total 7 columns):
Unnamed: 0    1016395 non-null int64
Id            1016395 non-null int64
Name          1016395 non-null object
Year          1016395 non-null int64
Gender        1016395 non-null object
State         1016395 non-null object
Count         1016395 non-null int64
dtypes: int64(4), object(3)
memory usage: 54.3+ MB

Step 4. See the first 10 entries

In [3]:

Out[3]:

Step 5. Delete the column 'Unnamed: 0' and 'Id'

In [4]:

Out[4]:

Step 6. Are there more male or female names in the dataset?

In [5]:

Out[5]:

F    558846
M    457549
Name: Gender, dtype: int64

Step 7. Group the dataset by name and assign to names

In [6]:

Out[6]:

(17632, 1)

Step 8. How many different names exist in the dataset?

In [7]:

Out[7]:

17632

Step 9. What is the name with most occurrences?

In [8]:

Out[8]:

'Jacob'

Step 10. How many different names have the least occurrences?

In [9]:

Out[9]:

2578

Step 11. What is the median name occurrence?

In [10]:

Out[10]:

Step 12. What is the standard deviation of names?

In [11]:

Out[11]:

11006.069467891111

Step 13. Get a summary with the mean, min, max, std and quartiles.

In [12]:

Out[12]:

US - Baby Names

Introduction:

Step 1. Import the necessary libraries

Step 2. Import the dataset from this address.

Step 3. Assign it to a variable called baby_names.

Step 4. See the first 10 entries

Step 5. Delete the column 'Unnamed: 0' and 'Id'

Step 6. Are there more male or female names in the dataset?

Step 7. Group the dataset by name and assign to names

Step 8. How many different names exist in the dataset?

Step 9. What is the name with most occurrences?

Step 10. How many different names have the least occurrences?

Step 11. What is the median name occurrence?

Step 12. What is the standard deviation of names?

Step 13. Get a summary with the mean, min, max, std and quartiles.

Product

Resources

Company