Path: blob/master/10_Deleting/Wine/Exercises_code_and_solutions.ipynb
613 views
Kernel: Python [default]
Wine
Introduction:
This exercise is a adaptation from the UCI Wine dataset. The only pupose is to practice deleting data with pandas.
Step 1. Import the necessary libraries
In [2]:
Step 2. Import the dataset from this address.
Step 3. Assign it to a variable called wine
In [3]:
Out[3]:
Step 4. Delete the first, fourth, seventh, nineth, eleventh, thirteenth and fourteenth columns
In [4]:
Out[4]:
Step 5. Assign the columns as below:
The attributes are (donated by Riccardo Leardi, riclea '@' anchem.unige.it):
alcohol
malic_acid
alcalinity_of_ash
magnesium
flavanoids
proanthocyanins
hue
In [5]:
Out[5]:
Step 6. Set the values of the first 3 rows from alcohol as NaN
In [6]:
Out[6]:
Step 7. Now set the value of the rows 3 and 4 of magnesium as NaN
In [7]:
Out[7]:
Step 8. Fill the value of NaN with the number 10 in alcohol and 100 in magnesium
In [8]:
Out[8]:
Step 9. Count the number of missing values
In [9]:
Out[9]:
alcohol 0
malic_acid 0
alcalinity_of_ash 0
magnesium 0
flavanoids 0
proanthocyanins 0
hue 0
dtype: int64
Step 10. Create an array of 10 random numbers up until 10
In [10]:
Out[10]:
array([2, 3, 0, 5, 0, 9, 4, 0, 7, 2])
Step 11. Use random numbers you generated as an index and assign NaN value to each of cell.
In [11]:
Out[11]:
Step 12. How many missing values do we have?
In [12]:
Out[12]:
alcohol 7
malic_acid 0
alcalinity_of_ash 0
magnesium 0
flavanoids 0
proanthocyanins 0
hue 0
dtype: int64
Step 13. Delete the rows that contain missing values
In [13]:
Out[13]:
Step 14. Print only the non-null values in alcohol
In [14]:
Out[14]:
1 True
6 True
8 True
10 True
11 True
12 True
13 True
14 True
15 True
16 True
17 True
18 True
19 True
20 True
21 True
22 True
23 True
24 True
25 True
26 True
27 True
28 True
29 True
30 True
31 True
32 True
33 True
34 True
35 True
36 True
...
147 True
148 True
149 True
150 True
151 True
152 True
153 True
154 True
155 True
156 True
157 True
158 True
159 True
160 True
161 True
162 True
163 True
164 True
165 True
166 True
167 True
168 True
169 True
170 True
171 True
172 True
173 True
174 True
175 True
176 True
Name: alcohol, dtype: bool
In [15]:
Out[15]:
1 10.00
6 14.06
8 13.86
10 14.12
11 13.75
12 14.75
13 14.38
14 13.63
15 14.30
16 13.83
17 14.19
18 13.64
19 14.06
20 12.93
21 13.71
22 12.85
23 13.50
24 13.05
25 13.39
26 13.30
27 13.87
28 14.02
29 13.73
30 13.58
31 13.68
32 13.76
33 13.51
34 13.48
35 13.28
36 13.05
...
147 13.32
148 13.08
149 13.50
150 12.79
151 13.11
152 13.23
153 12.58
154 13.17
155 13.84
156 12.45
157 14.34
158 13.48
159 12.36
160 13.69
161 12.85
162 12.96
163 13.78
164 13.73
165 13.45
166 12.82
167 13.58
168 13.40
169 12.20
170 12.77
171 14.16
172 13.71
173 13.40
174 13.27
175 13.17
176 14.13
Name: alcohol, dtype: float64
Step 15. Reset the index, so it starts with 0 again
In [16]:
Out[16]:
BONUS: Create your own question and answer it.
In [ ]: