Book a Demo!
CoCalc Logo Icon
StoreFeaturesDocsShareSupportNewsAboutPoliciesSign UpSign In
guipsamora
GitHub Repository: guipsamora/pandas_exercises
Path: blob/master/05_Merge/Auto_MPG/Exercises_with_solutions.ipynb
548 views
Kernel: Python [default]

MPG Cars

Check out Cars Exercises Video Tutorial to watch a data scientist go through the exercises

Introduction:

The following exercise utilizes data from UC Irvine Machine Learning Repository

Step 1. Import the necessary libraries

import pandas as pd import numpy as np

Step 2. Import the first dataset cars1 and cars2.

Step 3. Assign each to a to a variable called cars1 and cars2

cars1 = pd.read_csv("https://raw.githubusercontent.com/guipsamora/pandas_exercises/master/05_Merge/Auto_MPG/cars1.csv") cars2 = pd.read_csv("https://raw.githubusercontent.com/guipsamora/pandas_exercises/master/05_Merge/Auto_MPG/cars2.csv") print(cars1.head()) print(cars2.head())
mpg cylinders displacement horsepower weight acceleration model \ 0 18.0 8 307 130 3504 12.0 70 1 15.0 8 350 165 3693 11.5 70 2 18.0 8 318 150 3436 11.0 70 3 16.0 8 304 150 3433 12.0 70 4 17.0 8 302 140 3449 10.5 70 origin car Unnamed: 9 Unnamed: 10 Unnamed: 11 \ 0 1 chevrolet chevelle malibu NaN NaN NaN 1 1 buick skylark 320 NaN NaN NaN 2 1 plymouth satellite NaN NaN NaN 3 1 amc rebel sst NaN NaN NaN 4 1 ford torino NaN NaN NaN Unnamed: 12 Unnamed: 13 0 NaN NaN 1 NaN NaN 2 NaN NaN 3 NaN NaN 4 NaN NaN mpg cylinders displacement horsepower weight acceleration model \ 0 33.0 4 91 53 1795 17.4 76 1 20.0 6 225 100 3651 17.7 76 2 18.0 6 250 78 3574 21.0 76 3 18.5 6 250 110 3645 16.2 76 4 17.5 6 258 95 3193 17.8 76 origin car 0 3 honda civic 1 1 dodge aspen se 2 1 ford granada ghia 3 1 pontiac ventura sj 4 1 amc pacer d/l

Step 4. Oops, it seems our first dataset has some unnamed blank columns, fix cars1

cars1 = cars1.loc[:, "mpg":"car"] cars1.head()

Step 5. What is the number of observations in each dataset?

print(cars1.shape) print(cars2.shape)
(198, 9) (200, 9)

Step 6. Join cars1 and cars2 into a single DataFrame called cars

cars = cars1.append(cars2) cars

Step 7. Oops, there is a column missing, called owners. Create a random number Series from 15,000 to 73,000.

nr_owners = np.random.randint(15000, high=73001, size=398, dtype='l') nr_owners
array([29487, 25680, 65268, 31827, 69215, 72602, 52693, 58440, 16183, 45014, 32318, 72942, 62163, 35951, 57625, 59355, 36533, 67048, 58159, 69743, 25146, 22755, 44966, 46792, 56553, 65013, 55908, 69563, 22030, 59561, 15593, 52998, 54795, 16169, 24809, 35580, 46590, 38792, 43099, 37166, 21390, 56496, 68606, 21110, 56334, 45477, 51961, 27625, 51176, 30796, 61809, 65450, 67375, 23342, 27499, 50585, 57302, 56191, 60281, 32865, 58605, 66374, 15315, 31791, 28670, 38796, 69214, 41055, 32353, 31574, 65799, 42998, 72785, 18415, 31977, 29812, 65439, 21161, 60871, 67151, 22179, 32821, 55392, 34586, 67937, 31646, 66397, 35258, 63815, 71291, 51130, 27684, 49648, 52691, 50681, 68185, 32635, 51553, 28970, 19112, 26035, 67666, 55471, 51477, 62055, 53003, 41265, 18565, 48851, 48673, 45832, 67891, 57638, 29240, 41236, 16950, 31449, 50528, 22397, 15876, 26414, 16736, 23896, 46104, 17583, 65951, 38538, 31443, 19299, 46095, 31239, 19290, 38051, 68575, 61755, 22560, 34460, 35395, 34608, 56906, 44895, 48429, 20900, 49770, 50513, 59402, 26893, 37233, 19036, 20523, 18765, 46333, 42831, 53698, 25218, 63106, 16928, 34901, 43674, 65453, 54428, 68502, 19043, 20325, 45039, 29466, 49672, 67972, 30547, 22522, 69354, 40489, 72887, 15724, 51442, 65182, 64555, 42138, 72988, 20861, 67898, 20768, 36415, 47480, 16820, 48739, 62610, 43473, 23002, 43488, 62581, 37724, 63019, 44912, 35595, 59188, 51814, 65283, 53479, 27660, 38237, 22957, 47870, 15533, 41944, 51830, 56676, 57481, 48529, 72220, 66675, 50099, 30585, 25436, 49195, 26050, 24899, 37213, 25870, 67447, 23808, 71275, 67572, 18545, 43553, 54858, 23077, 33705, 31282, 26298, 23742, 36110, 51491, 18019, 60655, 27453, 35563, 63627, 35315, 56717, 59281, 55634, 18415, 59570, 47320, 20110, 18425, 19352, 18032, 31816, 28573, 66030, 54723, 21592, 37160, 59518, 35629, 47619, 52359, 34566, 64932, 24072, 39445, 31203, 63975, 62041, 70175, 51029, 32058, 19428, 65553, 50799, 48190, 68061, 68201, 53389, 15901, 44585, 54723, 30446, 63716, 57488, 67134, 22033, 53694, 40002, 24854, 59747, 59827, 53378, 53196, 68686, 20784, 28181, 33044, 41694, 39857, 57296, 69021, 17359, 29794, 22515, 55877, 22806, 50027, 56787, 50844, 17420, 65259, 19141, 40204, 19530, 30116, 34973, 15641, 53492, 59574, 59082, 64400, 70163, 43058, 69696, 67996, 26158, 32936, 45461, 47390, 32368, 15400, 40895, 16572, 31776, 62121, 56704, 39335, 27716, 52565, 50831, 45049, 25173, 25018, 18606, 71177, 66288, 46754, 68175, 35829, 24959, 54792, 19059, 29092, 58736, 62938, 44733, 17884, 33905, 33965, 24641, 52257, 28178, 29515, 37703, 56036, 51556, 23590, 61888, 70224, 53730, 41328, 16501, 30360, 54106, 29101, 35631, 56173, 30424, 46887, 23657, 17723, 71709, 45270, 30380, 27779, 33774, 36379, 47127, 63625, 16750, 65740, 53802, 40995, 37487, 42791, 21825, 69344, 63210, 15982, 20259])

Step 8. Add the column owners to cars

cars['owners'] = nr_owners cars.tail()