SharedLTC_DA_FL / Learn-to-code-for-data-analysis / 1_Having_a_go_at_it / Project_1_Hal.ipynbOpen in CoCalc
Week 1 Project FutureLearn Learn to Code for Data Analysis

Project 1: Deaths by tuberculosis sample notebook

by Michel Wermelinger and Hal Snyder, 6 June 2018

This is the project notebook for the first part of The Open University's Learn to code for Data Analysis course.

In 2000, the United Nations set eight Millenium Development Goals (MDGs) to reduce poverty and diseases, improve gender equality and environmental sustainability, etc. Each goal is quantified and time-bound, to be achieved by the end of 2015. Goal 6 is to have halted and started reversing the spread of HIV, malaria and tuberculosis (TB). TB doesn't make headlines like Ebola, SARS (severe acute respiratory syndrome) and other epidemics, but is far deadlier. For more information, see the World Health Organisation (WHO) page http://www.who.int/gho/tb/en/.

Given the population and number of deaths due to TB in some countries during one year, the following questions will be answered:

  • What is the total, maximum, minimum and average number of deaths in that year?
  • Which countries have the most and the least deaths?
  • What is the death rate (deaths per 100,000 inhabitants) for each country?
  • Which countries have the lowest and highest death rate?

The death rate allows for a better comparison of countries with widely different population sizes.

The data

The data consists of total population and total number of deaths due to TB (excluding HIV) in 2013 in each of the BRICS (Brazil, Russia, India, China, South Africa) and Portuguese-speaking countries.

The data was taken in July 2015 from http://apps.who.int/gho/data/node.main.POP107?lang=en (population) and http://apps.who.int/gho/data/node.main.1317?lang=en (deaths). The uncertainty bounds of the number of deaths were ignored.

The data was collected into an Excel file which should be in the same folder as this notebook.

import warnings
warnings.simplefilter('ignore', FutureWarning)

from pandas import *
data = read_excel('WHO POP TB all.xls')
data
Country Population (1000s) TB deaths
0 Afghanistan 30552 13000.00
1 Albania 3173 20.00
2 Algeria 39208 5100.00
3 Andorra 79 0.26
4 Angola 21472 6900.00
5 Antigua and Barbuda 90 1.20
6 Argentina 41446 570.00
7 Armenia 2977 170.00
8 Australia 23343 45.00
9 Austria 8495 29.00
10 Azerbaijan 9413 360.00
11 Bahamas 377 1.80
12 Bahrain 1332 9.60
13 Bangladesh 156595 80000.00
14 Barbados 285 2.00
15 Belarus 9357 850.00
16 Belgium 11104 18.00
17 Belize 332 20.00
18 Benin 10323 1300.00
19 Bhutan 754 88.00
20 Bolivia (Plurinational State of) 10671 430.00
21 Bosnia and Herzegovina 3829 190.00
22 Botswana 2021 440.00
23 Brazil 200362 4400.00
24 Brunei Darussalam 418 13.00
25 Bulgaria 7223 150.00
26 Burkina Faso 16935 1500.00
27 Burundi 10163 2300.00
28 Côte d'Ivoire 20316 4000.00
29 Cabo Verde 499 150.00
... ... ... ...
164 Suriname 539 12.00
165 Swaziland 1250 1100.00
166 Sweden 9571 13.00
167 Switzerland 8078 17.00
168 Syrian Arab Republic 21898 450.00
169 Tajikistan 8208 570.00
170 Thailand 67010 8100.00
171 The former Yugoslav republic of Macedonia 2107 33.00
172 Timor-Leste 1133 990.00
173 Togo 6817 810.00
174 Tonga 105 2.50
175 Trinidad and Tobago 1341 29.00
176 Tunisia 10997 230.00
177 Turkey 74933 310.00
178 Turkmenistan 5240 1300.00
179 Tuvalu 10 2.80
180 Uganda 37579 4100.00
181 Ukraine 45239 6600.00
182 United Arab Emirates 9346 64.00
183 United Kingdom of Great Britain and Northern I... 63136 340.00
184 United Republic of Tanzania 49253 6000.00
185 United States of America 320051 490.00
186 Uruguay 3407 40.00
187 Uzbekistan 28934 2200.00
188 Vanuatu 253 16.00
189 Venezuela (Bolivarian Republic of) 30405 480.00
190 Viet Nam 91680 17000.00
191 Yemen 24407 990.00
192 Zambia 14539 3600.00
193 Zimbabwe 14150 5700.00

194 rows × 3 columns

The range of the problem

The column of interest is the last one.

tbColumn = data['TB deaths']

The total number of deaths in 2013 is:

tbColumn.sum()
1072677.97

The largest and smallest number of deaths in a single country are:

tbColumn.max()
240000.0
tbColumn.min()
0.0

From less than 20 to almost a quarter of a million deaths is a huge range. The average number of deaths, over all countries in the data, can give a better idea of the seriousness of the problem in each country. The average can be computed as the mean or the median. Given the wide range of deaths, the median is probably a more sensible average measure.

tbColumn.mean()
5529.267886597938
tbColumn.median()
315.0

The median is far lower than the mean. This indicates that some of the countries had a very high number of TB deaths in 2013, pushing the value of the mean up.

The most affected

To see the most affected countries, the table is sorted in ascending order by the last column, which puts those countries in the last rows.

data.sort_values('TB deaths')
Country Population (1000s) TB deaths
147 San Marino 31 0.00
125 Niue 1 0.01
111 Monaco 38 0.03
3 Andorra 79 0.26
129 Palau 21 0.36
40 Cook Islands 21 0.41
118 Nauru 10 0.67
76 Iceland 330 0.93
68 Grenada 106 1.10
5 Antigua and Barbuda 90 1.20
113 Montenegro 621 1.20
152 Seychelles 93 1.40
105 Malta 429 1.50
143 Saint Kitts and Nevis 54 1.60
11 Bahamas 377 1.80
14 Barbados 285 2.00
144 Saint Lucia 182 2.20
99 Luxembourg 530 2.20
44 Cyprus 1141 2.30
174 Tonga 105 2.50
50 Dominica 72 2.70
137 Qatar 2169 2.70
179 Tuvalu 10 2.80
145 Saint Vincent and the Grenadines 109 3.10
126 Norway 5043 4.40
146 Samoa 190 6.10
121 New Zealand 4506 6.30
103 Maldives 345 7.60
12 Bahrain 1332 9.60
164 Suriname 539 12.00
... ... ... ...
160 South Sudan 11296 4500.00
119 Nepal 27797 4600.00
2 Algeria 39208 5100.00
193 Zimbabwe 14150 5700.00
184 United Republic of Tanzania 49253 6000.00
181 Ukraine 45239 6600.00
46 Democratic People's Republic of Korea 24895 6700.00
4 Angola 21472 6900.00
158 Somalia 10496 7700.00
31 Cameroon 22254 7800.00
170 Thailand 67010 8100.00
88 Kenya 44354 9100.00
163 Sudan 37964 9700.00
30 Cambodia 15135 10000.00
100 Madagascar 22925 12000.00
0 Afghanistan 30552 13000.00
141 Russian Federation 142834 17000.00
190 Viet Nam 91680 17000.00
115 Mozambique 25834 18000.00
159 South Africa 52776 25000.00
116 Myanmar 53259 26000.00
134 Philippines 98394 27000.00
58 Ethiopia 94101 30000.00
36 China 1393337 41000.00
47 Democratic Republic of the Congo 67514 46000.00
128 Pakistan 182143 49000.00
78 Indonesia 249866 64000.00
13 Bangladesh 156595 80000.00
124 Nigeria 173615 160000.00
77 India 1252140 240000.00

194 rows × 3 columns

The table raises the possibility that a large number of deaths may be partly due to a large population. To compare the countries on an equal footing, the death rate per 100,000 inhabitants is computed.

populationColumn = data['Population (1000s)']
data['TB deaths (per 100,000)'] = tbColumn * 100 / populationColumn
data
Country Population (1000s) TB deaths TB deaths (per 100,000)
0 Afghanistan 30552 13000.00 42.550406
1 Albania 3173 20.00 0.630318
2 Algeria 39208 5100.00 13.007549
3 Andorra 79 0.26 0.329114
4 Angola 21472 6900.00 32.134873
5 Antigua and Barbuda 90 1.20 1.333333
6 Argentina 41446 570.00 1.375284
7 Armenia 2977 170.00 5.710447
8 Australia 23343 45.00 0.192777
9 Austria 8495 29.00 0.341377
10 Azerbaijan 9413 360.00 3.824498
11 Bahamas 377 1.80 0.477454
12 Bahrain 1332 9.60 0.720721
13 Bangladesh 156595 80000.00 51.087199
14 Barbados 285 2.00 0.701754
15 Belarus 9357 850.00 9.084108
16 Belgium 11104 18.00 0.162104
17 Belize 332 20.00 6.024096
18 Benin 10323 1300.00 12.593238
19 Bhutan 754 88.00 11.671088
20 Bolivia (Plurinational State of) 10671 430.00 4.029613
21 Bosnia and Herzegovina 3829 190.00 4.962131
22 Botswana 2021 440.00 21.771400
23 Brazil 200362 4400.00 2.196025
24 Brunei Darussalam 418 13.00 3.110048
25 Bulgaria 7223 150.00 2.076699
26 Burkina Faso 16935 1500.00 8.857396
27 Burundi 10163 2300.00 22.631113
28 Côte d'Ivoire 20316 4000.00 19.688915
29 Cabo Verde 499 150.00 30.060120
... ... ... ... ...
164 Suriname 539 12.00 2.226345
165 Swaziland 1250 1100.00 88.000000
166 Sweden 9571 13.00 0.135827
167 Switzerland 8078 17.00 0.210448
168 Syrian Arab Republic 21898 450.00 2.054982
169 Tajikistan 8208 570.00 6.944444
170 Thailand 67010 8100.00 12.087748
171 The former Yugoslav republic of Macedonia 2107 33.00 1.566208
172 Timor-Leste 1133 990.00 87.378641
173 Togo 6817 810.00 11.882060
174 Tonga 105 2.50 2.380952
175 Trinidad and Tobago 1341 29.00 2.162565
176 Tunisia 10997 230.00 2.091479
177 Turkey 74933 310.00 0.413703
178 Turkmenistan 5240 1300.00 24.809160
179 Tuvalu 10 2.80 28.000000
180 Uganda 37579 4100.00 10.910349
181 Ukraine 45239 6600.00 14.589182
182 United Arab Emirates 9346 64.00 0.684785
183 United Kingdom of Great Britain and Northern I... 63136 340.00 0.538520
184 United Republic of Tanzania 49253 6000.00 12.181999
185 United States of America 320051 490.00 0.153101
186 Uruguay 3407 40.00 1.174053
187 Uzbekistan 28934 2200.00 7.603511
188 Vanuatu 253 16.00 6.324111
189 Venezuela (Bolivarian Republic of) 30405 480.00 1.578688
190 Viet Nam 91680 17000.00 18.542757
191 Yemen 24407 990.00 4.056213
192 Zambia 14539 3600.00 24.760988
193 Zimbabwe 14150 5700.00 40.282686

194 rows × 4 columns

data.sort_values('TB deaths (per 100,000)')
Country Population (1000s) TB deaths TB deaths (per 100,000)
147 San Marino 31 0.00 0.000000
111 Monaco 38 0.03 0.078947
126 Norway 5043 4.40 0.087250
120 Netherlands 16759 20.00 0.119339
137 Qatar 2169 2.70 0.124481
166 Sweden 9571 13.00 0.135827
121 New Zealand 4506 6.30 0.139814
185 United States of America 320051 490.00 0.153101
16 Belgium 11104 18.00 0.162104
32 Canada 35182 62.00 0.176226
8 Australia 23343 45.00 0.192777
113 Montenegro 621 1.20 0.193237
44 Cyprus 1141 2.30 0.201578
82 Israel 7733 16.00 0.206905
167 Switzerland 8078 17.00 0.210448
45 Czech Republic 10702 28.00 0.261633
76 Iceland 330 0.93 0.281818
60 Finland 5426 17.00 0.313306
43 Cuba 11266 37.00 0.328422
3 Andorra 79 0.26 0.329114
9 Austria 8495 29.00 0.341377
105 Malta 429 1.50 0.349650
65 Germany 82727 300.00 0.362639
81 Ireland 4627 18.00 0.389021
177 Turkey 74933 310.00 0.413703
99 Luxembourg 530 2.20 0.415094
48 Denmark 5619 24.00 0.427122
11 Bahamas 377 1.80 0.477454
86 Jordan 7274 35.00 0.481166
83 Italy 60990 310.00 0.508280
... ... ... ... ...
29 Cabo Verde 499 150.00 30.060120
58 Ethiopia 94101 30000.00 31.880639
4 Angola 21472 6900.00 32.134873
131 Papua New Guinea 7321 2400.00 32.782407
31 Cameroon 22254 7800.00 35.049879
106 Marshall Islands 53 21.00 39.622642
160 South Sudan 11296 4500.00 39.837110
193 Zimbabwe 14150 5700.00 40.282686
0 Afghanistan 30552 13000.00 42.550406
153 Sierra Leone 6092 2600.00 42.678923
39 Congo 4448 2000.00 44.964029
95 Lesotho 2074 960.00 46.287367
159 South Africa 52776 25000.00 47.370017
33 Central African Republic 4616 2200.00 47.660312
116 Myanmar 53259 26000.00 48.818040
96 Liberia 4294 2100.00 48.905449
13 Bangladesh 156595 80000.00 51.087199
100 Madagascar 22925 12000.00 52.344602
92 Lao People's Democratic Republic 6770 3600.00 53.175775
62 Gabon 1672 910.00 54.425837
117 Namibia 2303 1300.00 56.448111
30 Cambodia 15135 10000.00 66.072019
47 Democratic Republic of the Congo 67514 46000.00 68.134017
115 Mozambique 25834 18000.00 69.675621
71 Guinea-Bissau 1704 1200.00 70.422535
158 Somalia 10496 7700.00 73.361280
172 Timor-Leste 1133 990.00 87.378641
165 Swaziland 1250 1100.00 88.000000
124 Nigeria 173615 160000.00 92.157936
49 Djibouti 873 870.00 99.656357

194 rows × 4 columns

Conclusions

All countries had a total of about 1073 thousand deaths due to TB in 2013. The median shows that half of these coutries had fewer than 315 deaths. The much higher mean (over 5,529) indicates that some countries had a very high number. The least affected were San Marino and Niue, with 0 and 0.01 deaths respectively, and the most affected were Nigeria and India with 160 thousand and 240 thousand deaths in a single year. However, taking the population size into account, the least affected were San Marino and Monaco with less than 0.08 deaths per 100 thousand inhabitants, and the most affected were Nigeria and Djibouti with over 92 deaths per 100,000 inhabitants.

One should not forget that most values are estimates, and that the chosen countries are a small sample of all the world's countries. Nevertheless, they convey the message that TB is still a major cause of fatalities, and that there is a huge disparity between countries, with several ones being highly affected.