Book a Demo!
CoCalc Logo Icon
StoreFeaturesDocsShareSupportNewsAboutPoliciesSign UpSign In
Download
10327 views
ubuntu2004
Kernel: Python 3 (system-wide)

Loops II - lists

A Python list is a sequence of numbers or strings. Going through a list one item at a time and doing something with each item is a very common thing to do. In Python this is called looping through a list.

Looping through a list of strings

Run the following code to see how to loop through a list of strings.
# Assign a list of DNA sequences to a list called DNA_sequences. DNA_sequences = ["GCTACGCTGGC", "ATGCACGACGT", "TAAGCCGGTAG", "AGTTGGAAATC"] # Loop through the list one item at a time and print each DNA sequence. for dna_seq in DNA_sequences: print( dna_seq )
GCTACGCTGGC ATGCACGACGT TAAGCCGGTAG AGTTGGAAATC

In this example the iterating variable is called dna_seq.

In Notebook 9 we tested whether the start codon "ATG" was in a particular DNA sequence like so

if 'ATG' in dna_seq: print( 'The DNA sequence contains a start codon' )

We can do exactly the same but now with each DNA sequence in turn in the list.

Run the following code to see how this is done.
DNA_sequences = ["GCTACGCTGGC", "ATGCACGACGT", "TAAGCCGGTAG", "AGTTGGAAATC"] # Loop through DNA_sequences one sequence at a time. for dna_seq in DNA_sequences: # Test if the start codon ATG is in the current DNA sequence. if 'ATG' in dna_seq: print( f'The DNA sequence {dna_seq} contains a start codon' ) else: print( f'The DNA sequence {dna_seq} does not contain a start codon' )
The DNA sequence GCTACGCTGGC does not contain a start codon The DNA sequence ATGCACGACGT contains a start codon The DNA sequence TAAGCCGGTAG does not contain a start codon The DNA sequence AGTTGGAAATC does not contain a start codon

On the first iteration of the loop the string "GCTACGCTGGC" is assigned to the variable dna_seq.

We then test if the start codon "ATG" is in dna_seq. Notice that the condition if 'ATG' in dna_seq: is idented because it is within the loop. But also notice that the print() statement is doubly-indented because it is inside the condition which is inside the loop.

Looping through a list of numbers

To help motivate your understanding of looping through lists, let's find the average diameter of types of white blood cells found in human blood. The diameters are given in this table.

Cell typeDiameter (μ\mum)
Neutrophil11
Eosinophil11
Basophil13.5
Small lymphocyte7.5
Large lymphocyte13.5
Monocyte22.5

Now let's find the average diameter and assign it to a variable called average_diameter.

The following code shows one way to do this without using a loop.

The average is the sum of all the values, which we access using the each item's index within the list, divided by the number of items in the list, which is six. Remember that diameters[0] refers to the first item in the list, in this case 11.

Run the following code to see the output and make sure you understand what the code is doing.
diameters = [11, 11, 13.5, 7.5, 13.5, 22.5] # Calculate the average diameter: sum the values and divide by the number of values. average_diameters = (diameters[0] + diameters[1] + diameters[2] + diameters[3] + diameters[4] + diameters[5]) / 6 print( f'Average diameter = {average_diameters:.4g} micrometers' )

Notice that we report the average to 2 decimal places. A rule of thumb is to report descriptive statistics to 1 significant figure (sig. fig. for short) more than the data. The cell type diameters have a maximum of 3 significant figures (e.g. 13.5), so the average should have 4 significant figures. To do that use the format notation :.4g in an f-string.

Calculating the average diameter like this is tedious and inefficient. It is also not re-useable. Which means that if we have a different set of data with a different number of values we would have to write the whole code over again.

What we really want is code that will take a list of numbers of any size and calculate the average of those numbers. That's where loops come in handy.

To calculate an average we need two things: 1) the number of values in the list and 2) the sum of all the values in the list.

Step 1 is simple. The number of values, or items, in a list is its length which is obtained with the len() function.

Next let's see how to code Step 2, the sum of all values in the list.

Sum the values in a list

Let's plan how we might sum the diameters in the list diameters.

  1. We need a variable that keeps a running sum of the diameters. This variable will initially be set to zero.

  2. Loop through the list one diameter at a time.

  3. Add the current diameter to the running sum.

The following code shows how this is implemented.

Read the code to understand it then run it to see the results.
diameters = [11, 11, 13.5, 7.5, 13.5, 22.5] # Initialise the running sum of diameters to zero. sum_of_diameters = 0 # Loop through the list own diameter at a time. for d in diameters: # Add the current diameter to the running sum of diameters. sum_of_diameters += d # Print out the running sum to see how it grows. print( f'This diameter is {d} so the running sum of diameters = {sum_of_diameters}' ) print( f'The sum of the diameters is {sum_of_diameters} micrometers' )
This diameter is 11 so the running sum of diameters = 11 This diameter is 11 so the running sum of diameters = 22 This diameter is 13.5 so the running sum of diameters = 35.5 This diameter is 7.5 so the running sum of diameters = 43.0 This diameter is 13.5 so the running sum of diameters = 56.5 This diameter is 22.5 so the running sum of diameters = 79.0 The sum of the diameters is 79.0 micrometers

The first thing we need is a variable to store the sum of the diameters as we loop through the list one item at a time. Let's call this variable sum_of_diameters so that it's clear what this variable means.

We have to initialise its value to zero:

sum_of_diameters = 0

The loop starts at the line for d in diameters:. The colon at the end of the line tells Python that anything indented that follows is within the loop.

The iterating variable d, is assigned the value of each item in the loop one at a time. On the first iteration of the loop d is assigned the value 11.

The running sum sum_of_diameters is incremented by the value of d. Notice that the line sum_of_diameters += d is indented. This tells Python that it is within the loop.

We print out the value of d and sum_of_diameters to show what happens in the loop. Normally we wouldn't do this.

As there are six items in the list, the loop will execute six times, each time incrementing sum_of_diameters by the current value of d.

Once the loop has finished the code drops out of the bottom of the loop and prints the final sum in an f-string.

Average value of items in a list

We're almost there. We have the number of values in the list diameters and we've summed them. We now need to put it all together into a small program to calculate the average.

Run the following code to see the output. The print() statement within the loop has been removed as it is not needed.

diameters = [11, 11, 13.5, 7.5, 13.5, 22.5] # The number of items in the list diameters. n = len( diameters ) # Initialise the sum of values in to zero. sum_of_diameters = 0 # Loop through the list summing the diameters one at a time. for d in diameters: sum_of_diameters += d print( f'There are {n} items in the list' ) print( f'The sum of diameters is {sum_of_diameters} micrometers' ) print( f'The average diameter of white blood cell types is {sum_of_diameters/n:.4g} micrometers' )
There are 6 items in the list The sum of diameters is 79.0 micrometers The average diameter of white blood cell types is 13.17 micrometers

This code produces exactly the same average as the code at the top of the Notebook. It looks more complicated - it is - but it has two important advantages: it is general and reusable.

It is general because we can calculate the average of any list of numbers however long it may be. It is reusable because we don't need to edit the code each time we want to calculate the average of a list of numbers.

Exercise Notebook

Next Notebook