ubuntu2004
Loops II - lists
A Python list is a sequence of numbers or strings. Going through a list one item at a time and doing something with each item is a very common thing to do. In Python this is called looping through a list.
Looping through a list of strings
In this example the iterating variable is called dna_seq
.
In Notebook 9 we tested whether the start codon "ATG" was in a particular DNA sequence like so
We can do exactly the same but now with each DNA sequence in turn in the list.
On the first iteration of the loop the string "GCTACGCTGGC" is assigned to the variable dna_seq
.
We then test if the start codon "ATG" is in dna_seq
. Notice that the condition if 'ATG' in dna_seq:
is idented because it is within the loop. But also notice that the print()
statement is doubly-indented because it is inside the condition which is inside the loop.
Looping through a list of numbers
To help motivate your understanding of looping through lists, let's find the average diameter of types of white blood cells found in human blood. The diameters are given in this table.
Cell type | Diameter (m) |
---|---|
Neutrophil | 11 |
Eosinophil | 11 |
Basophil | 13.5 |
Small lymphocyte | 7.5 |
Large lymphocyte | 13.5 |
Monocyte | 22.5 |
Now let's find the average diameter and assign it to a variable called average_diameter
.
The following code shows one way to do this without using a loop.
The average is the sum of all the values, which we access using the each item's index within the list, divided by the number of items in the list, which is six. Remember that diameters[0]
refers to the first item in the list, in this case 11.
Notice that we report the average to 2 decimal places. A rule of thumb is to report descriptive statistics to 1 significant figure (sig. fig. for short) more than the data. The cell type diameters have a maximum of 3 significant figures (e.g. 13.5), so the average should have 4 significant figures. To do that use the format notation :.4g
in an f-string.
Calculating the average diameter like this is tedious and inefficient. It is also not re-useable. Which means that if we have a different set of data with a different number of values we would have to write the whole code over again.
What we really want is code that will take a list of numbers of any size and calculate the average of those numbers. That's where loops come in handy.
To calculate an average we need two things: 1) the number of values in the list and 2) the sum of all the values in the list.
Step 1 is simple. The number of values, or items, in a list is its length which is obtained with the len()
function.
Next let's see how to code Step 2, the sum of all values in the list.
Sum the values in a list
Let's plan how we might sum the diameters in the list diameters
.
We need a variable that keeps a running sum of the diameters. This variable will initially be set to zero.
Loop through the list one diameter at a time.
Add the current diameter to the running sum.
The following code shows how this is implemented.
The first thing we need is a variable to store the sum of the diameters as we loop through the list one item at a time. Let's call this variable sum_of_diameters
so that it's clear what this variable means.
We have to initialise its value to zero:
The loop starts at the line for d in diameters:
. The colon at the end of the line tells Python that anything indented that follows is within the loop.
The iterating variable d
, is assigned the value of each item in the loop one at a time. On the first iteration of the loop d
is assigned the value 11.
The running sum sum_of_diameters
is incremented by the value of d
. Notice that the line sum_of_diameters += d
is indented. This tells Python that it is within the loop.
We print out the value of d
and sum_of_diameters
to show what happens in the loop. Normally we wouldn't do this.
As there are six items in the list, the loop will execute six times, each time incrementing sum_of_diameters
by the current value of d
.
Once the loop has finished the code drops out of the bottom of the loop and prints the final sum in an f-string.
Average value of items in a list
We're almost there. We have the number of values in the list diameters
and we've summed them. We now need to put it all together into a small program to calculate the average.
Run the following code to see the output. The print()
statement within the loop has been removed as it is not needed.
This code produces exactly the same average as the code at the top of the Notebook. It looks more complicated - it is - but it has two important advantages: it is general and reusable.
It is general because we can calculate the average of any list of numbers however long it may be. It is reusable because we don't need to edit the code each time we want to calculate the average of a list of numbers.