MPG Cars

Check out Cars Exercises Video Tutorial to watch a data scientist go through the exercises

Introduction:

The following exercise utilizes data from UC Irvine Machine Learning Repository

Step 1. Import the necessary libraries

In [24]:

import pandas as pd
import numpy as np

Step 2. Import the first dataset cars1 and cars2.

Step 3. Assign each to a to a variable called cars1 and cars2

In [2]:

cars1 = pd.read_csv("https://raw.githubusercontent.com/guipsamora/pandas_exercises/master/05_Merge/Auto_MPG/cars1.csv")
cars2 = pd.read_csv("https://raw.githubusercontent.com/guipsamora/pandas_exercises/master/05_Merge/Auto_MPG/cars2.csv")

print(cars1.head())
print(cars2.head())

Out[2]:

    mpg  cylinders  displacement horsepower  weight  acceleration  model  \
18.0          8           307        130    3504          12.0     70   
15.0          8           350        165    3693          11.5     70   
18.0          8           318        150    3436          11.0     70   
16.0          8           304        150    3433          12.0     70   
17.0          8           302        140    3449          10.5     70   

   origin                        car  Unnamed: 9  Unnamed: 10  Unnamed: 11  \
     1  chevrolet chevelle malibu         NaN          NaN          NaN   
     1          buick skylark 320         NaN          NaN          NaN   
     1         plymouth satellite         NaN          NaN          NaN   
     1              amc rebel sst         NaN          NaN          NaN   
     1                ford torino         NaN          NaN          NaN   

   Unnamed: 12  Unnamed: 13  
        NaN          NaN  
        NaN          NaN  
        NaN          NaN  
        NaN          NaN  
        NaN          NaN  
    mpg  cylinders  displacement horsepower  weight  acceleration  model  \
33.0          4            91         53    1795          17.4     76   
20.0          6           225        100    3651          17.7     76   
18.0          6           250         78    3574          21.0     76   
18.5          6           250        110    3645          16.2     76   
17.5          6           258         95    3193          17.8     76   

   origin                 car  
     3         honda civic  
     1      dodge aspen se  
     1   ford granada ghia  
     1  pontiac ventura sj  
     1       amc pacer d/l  

Step 4. Oops, it seems our first dataset has some unnamed blank columns, fix cars1

In [12]:

cars1 = cars1.loc[:, "mpg":"car"]
cars1.head()

Out[12]:

Step 5. What is the number of observations in each dataset?

In [14]:

print(cars1.shape)
print(cars2.shape)

Out[14]:

(198, 9)
(200, 9)

Step 6. Join cars1 and cars2 into a single DataFrame called cars

In [23]:

cars = cars1.append(cars2)
cars

Out[23]:

Step 7. Oops, there is a column missing, called owners. Create a random number Series from 15,000 to 73,000.

In [33]:

nr_owners = np.random.randint(15000, high=73001, size=398, dtype='l')
nr_owners

Out[33]:

array([29487, 25680, 65268, 31827, 69215, 72602, 52693, 58440, 16183,
       45014, 32318, 72942, 62163, 35951, 57625, 59355, 36533, 67048,
       58159, 69743, 25146, 22755, 44966, 46792, 56553, 65013, 55908,
       69563, 22030, 59561, 15593, 52998, 54795, 16169, 24809, 35580,
       46590, 38792, 43099, 37166, 21390, 56496, 68606, 21110, 56334,
       45477, 51961, 27625, 51176, 30796, 61809, 65450, 67375, 23342,
       27499, 50585, 57302, 56191, 60281, 32865, 58605, 66374, 15315,
       31791, 28670, 38796, 69214, 41055, 32353, 31574, 65799, 42998,
       72785, 18415, 31977, 29812, 65439, 21161, 60871, 67151, 22179,
       32821, 55392, 34586, 67937, 31646, 66397, 35258, 63815, 71291,
       51130, 27684, 49648, 52691, 50681, 68185, 32635, 51553, 28970,
       19112, 26035, 67666, 55471, 51477, 62055, 53003, 41265, 18565,
       48851, 48673, 45832, 67891, 57638, 29240, 41236, 16950, 31449,
       50528, 22397, 15876, 26414, 16736, 23896, 46104, 17583, 65951,
       38538, 31443, 19299, 46095, 31239, 19290, 38051, 68575, 61755,
       22560, 34460, 35395, 34608, 56906, 44895, 48429, 20900, 49770,
       50513, 59402, 26893, 37233, 19036, 20523, 18765, 46333, 42831,
       53698, 25218, 63106, 16928, 34901, 43674, 65453, 54428, 68502,
       19043, 20325, 45039, 29466, 49672, 67972, 30547, 22522, 69354,
       40489, 72887, 15724, 51442, 65182, 64555, 42138, 72988, 20861,
       67898, 20768, 36415, 47480, 16820, 48739, 62610, 43473, 23002,
       43488, 62581, 37724, 63019, 44912, 35595, 59188, 51814, 65283,
       53479, 27660, 38237, 22957, 47870, 15533, 41944, 51830, 56676,
       57481, 48529, 72220, 66675, 50099, 30585, 25436, 49195, 26050,
       24899, 37213, 25870, 67447, 23808, 71275, 67572, 18545, 43553,
       54858, 23077, 33705, 31282, 26298, 23742, 36110, 51491, 18019,
       60655, 27453, 35563, 63627, 35315, 56717, 59281, 55634, 18415,
       59570, 47320, 20110, 18425, 19352, 18032, 31816, 28573, 66030,
       54723, 21592, 37160, 59518, 35629, 47619, 52359, 34566, 64932,
       24072, 39445, 31203, 63975, 62041, 70175, 51029, 32058, 19428,
       65553, 50799, 48190, 68061, 68201, 53389, 15901, 44585, 54723,
       30446, 63716, 57488, 67134, 22033, 53694, 40002, 24854, 59747,
       59827, 53378, 53196, 68686, 20784, 28181, 33044, 41694, 39857,
       57296, 69021, 17359, 29794, 22515, 55877, 22806, 50027, 56787,
       50844, 17420, 65259, 19141, 40204, 19530, 30116, 34973, 15641,
       53492, 59574, 59082, 64400, 70163, 43058, 69696, 67996, 26158,
       32936, 45461, 47390, 32368, 15400, 40895, 16572, 31776, 62121,
       56704, 39335, 27716, 52565, 50831, 45049, 25173, 25018, 18606,
       71177, 66288, 46754, 68175, 35829, 24959, 54792, 19059, 29092,
       58736, 62938, 44733, 17884, 33905, 33965, 24641, 52257, 28178,
       29515, 37703, 56036, 51556, 23590, 61888, 70224, 53730, 41328,
       16501, 30360, 54106, 29101, 35631, 56173, 30424, 46887, 23657,
       17723, 71709, 45270, 30380, 27779, 33774, 36379, 47127, 63625,
       16750, 65740, 53802, 40995, 37487, 42791, 21825, 69344, 63210,
       15982, 20259])

Step 8. Add the column owners to cars

In [34]:

cars['owners'] = nr_owners
cars.tail()

Out[34]:

MPG Cars

Introduction:

Step 1. Import the necessary libraries

Step 2. Import the first dataset cars1 and cars2.

Step 3. Assign each to a to a variable called cars1 and cars2

Step 4. Oops, it seems our first dataset has some unnamed blank columns, fix cars1

Step 5. What is the number of observations in each dataset?

Step 6. Join cars1 and cars2 into a single DataFrame called cars

Step 7. Oops, there is a column missing, called owners. Create a random number Series from 15,000 to 73,000.

Step 8. Add the column owners to cars

Product

Resources

Company