¹¹¹⁰⁴ views
ubuntu2004

Kernel: Python 3 (system-wide)

Workshop 3

Lists and loops

Task 3.1

We've seen that the string method find() returns the index of the first occurrence of a character within a string. Often we want to find all indicies of a character in a string.

Write some code that loops through the following DNA sequence and outputs the indicies of all occurrences of the base "T".

Hint: You should be able to modify your code from Exercise 10.3 for this task.

In [0]:

dna_seq = 'TTTATGTATCCTTA'

In [0]:

idx = 0

for base in dna_seq:
    if base == 'T':
        print(idx)
    idx += 1

Task 3.2

Using code, find and print the minimum and maximum values in the following list.

Hint: Use indexing on the sorted list.

In [0]:

eggs_laid = [10,11,14,16,13,14,12,10,16,11,12,13,16,11,10,14,16,16,15,12,11,13,14,15,12,14,12,15,13,13,11]

In [0]:

sorted_eggs_laid = sorted(eggs_laid)
print(sorted_eggs_laid)
print(sorted_eggs_laid[0], sorted_eggs_laid[-1])

Task 3.3

Find and print the middle (also called the median) value of the sorted list of eggs_laid. Do not find the middle value by hand: find it using code.

Hint 1: Use the list's length. For example, if the length is 5, the middle value is at the 3rd position or index 2.
Hint 2: If you get the error "list indices must be integers or slices, not float", remember that the result of the division operator / is a float even when the divisor and dividend are integers. That means you need integer division //.

In [0]:

eggs_laid = [10,11,14,16,13,14,12,10,16,11,12,13,16,11,10,14,16,16,15,12,11,13,14,15,12,14,12,15,13,13,11]

In [0]:

sorted_eggs_laid = sorted(eggs_laid)
n = len(sorted_eggs_laid)
print(sorted_eggs_laid[n//2])

Task 3.4

There are three steps in calculating a median of a list of numbers:

Sort the values from lowest to highest.
Find the number of values.
Find the middle value
- If the number of values is odd the median is the middle value.
- If the number of values is even the median is the average of the middle two values.
- E.g., the median of the values [0.5, 0.6, 0.9] is 0.6. The median of the values [0.5, 0.6, 0.9, 1.1] is (0.6+0.9)/2 = 0.75.
Write code to calculate the median of a list of numbers of any length.
Apply your code to finding the median of the following two lists.

Hint 1: You will need to sort the list, find its length, make a decision on which are the middle value(s).
Hint 2: For a list with an even number of values be careful to select the middle elements with the correct indicies.

In [0]:

wing_lengths = [2.9, 2.93, 2.87, 3.22, 3.1, 2.75, 2.81, 3.15, 3.21, 2.98, 2.99, 2.89, 2.78, 3.15, 2.92, 2.9, 3.12, 2.96, 3.22, 3.26, 2.79, 2.85, 2.94, 2.86, 3.21, 3.21, 2.73, 3.0, 2.94, 2.57, 3.06, 2.95, 3.33, 3.1, 3.19, 2.93, 2.89, 2.81, 3.04, 2.89, 2.81, 3.27, 2.58, 3.3, 3.1, 3.08, 2.89, 3.09, 2.91, 2.75, 3.13, 2.94, 3.35, 2.56, 3.46, 2.93, 2.81, 3.09, 3.25, 2.84, 2.62, 2.89, 3.22, 3.17, 3.13, 3.42, 2.69, 3.11, 3.44, 2.88, 2.46, 3.21, 3.03, 2.88, 2.82, 3.18, 3.11, 2.66, 2.97, 3.1, 2.94, 2.84, 2.7, 3.02, 2.76, 2.91, 3.26, 3.02, 2.91, 3.13, 3.15, 3.23, 2.62, 3.11, 3.19, 3.07, 2.87, 3.3, 3.04, 3.03, 3.04, 2.67]

haemoglobin_levels = [12.5, 15.1, 12.6, 10.4, 15.7, 9.2, 17.6, 12.9, 10.6, 12.3, 17.9, 14.0, 15.5, 12.5, 10.6]

In [0]:

# x = sorted(wing_lengths)
x = sorted(haemoglobin_levels)

n = len(x)
print(n)
if n % 2 == 1:
    # Odd number of items
    print( x[n//2] )
else:
    # Even number of items
    print( (x[n//2-1] + x[n//2])/2. )

Task 3.5

In Task 2.12 you wrote some code to test if a word is a palindrome.

Now modify your code so that it tests and prints whether each word in the following list is a palindrome.

In [0]:

words = ['golf', 'level', 'spoon', 'reverser', 'noon', 'racecar', 'cell', 'rotator', 'tape', 'stats', 'bridge', 'lagoon', 'tenet']

In [0]:

for word in words:
    rev_word = word[::-1]

    if word == rev_word:
        print( f'{word} is a palindrome' )

Task 3.6

In Exercise 4.4 you calculated the cumulative number of emperor penguins that joined an Antarctic breeding colony on the first three days of the season. The first three week's values are given below.

Print out the cumulative number of penguins on each day of the first three weeks.

In [0]:

arriving_penguins = [10, 156, 73, 376, 786, 432, 1035, 901, 1102, 2567, 1571, 916, 1560, 632, 943, 246, 654, 1456, 504, 632, 185]

In [0]:

day = 0
total = 0

print('Day\tTotal')

for n in arriving_penguins:
    total += n
    print( f'{day}\t{total}' )
    day += 1

Task 3.7

How many days does it take the colony to just pass 10,000 penguins?

Hint: Modify the code in Task 3.6 to break out of the loop when the total passes 10,000.

In [0]:

day = 0
total = 0

print('Day\tTotal')

for n in arriving_penguins:
    total += n
    if total > 10000:
        print( f'{day}\t{total}' )
        break

    day += 1

Task 3.8

Modify your code from Task 3.7 to use range() so that you do not need a separate variable to count the number of days until the colony passes 10,000 penguins.

In [0]:

total = 0

print('Day\tTotal')

for day in range(len(arriving_penguins)):
    total += arriving_penguins[day]

    if total > 10000:
        print( f'{day}\t{total}' )
        break

Task 3.9

A single nighttime survey of bats in the Forest of Dean produced a list of the species of all individual bats caught, measured and released.

Create a new list of unique bat species caught.
Print the list of unique species and the number of unique species.

Hint: Create an empty list and only append a bat species to this list if it is not already in the list as you loop through bat_list.

In [0]:

bat_list = ['Plecotus austriacus', 'Pipistrellus nathusii', 'Myotis daubentonii', 'Nyctalus noctula', 'Pipistrellus pipistrellus', 'Pipistrellus pipistrellus', 'Pipistrellus nathusii', 'Pipistrellus nathusii', 'Eptesicus serotinus', 'Myotis bechsteinii', 'Pipistrellus nathusii', 'Pipistrellus pygmaeus', 'Pipistrellus pipistrellus', 'Plecotus austriacus', 'Myotis daubentonii', 'Nyctalus noctula', 'Myotis brandtii', 'Myotis mystacinus', 'Pipistrellus nathusii', 'Pipistrellus pygmaeus', 'Pipistrellus nathusii', 'Rhinolophus hipposideros', 'Nyctalus leisleri', 'Pipistrellus pipistrellus', 'Pipistrellus nathusii', 'Nyctalus noctula', 'Plecotus austriacus', 'Pipistrellus nathusii', 'Myotis nattereri', 'Pipistrellus pipistrellus', 'Pipistrellus nathusii', 'Plecotus auritus', 'Barbastella barbastellus', 'Pipistrellus nathusii', 'Myotis brandtii', 'Pipistrellus pipistrellus', 'Myotis nattereri']

In [0]:

species = []

for bat in bat_list:
    if bat not in species:
        species.append(bat)

print(species)
print(len(species))

Task 3.10

In the bat survey, each bat's wingspan was measured. These are given in centimetres, for each bat, in the list wingspans below.

Using the bat_list and wingspans lists, print the average wingspan of Pipistrellus nathusii to 2dp.

Hint 1: Loop through the two lists simultaneously. Each time you encounter Pipistrellus nathusii in bat_list append its wingspan to another list. Once finished looping bat_list, calculate the average wingspan using the list of wingspans you have constructed.
Hint 2: Rather than summing the values in the list by looping over the list as we did in Notebook 12, you might want to use the inbuilt sum() function. Google it to find out how to use it.

In [0]:

wingspans = [20.4, 21.1, 17.1, 16.7, 24.4, 17.8, 20.1, 21.2, 20.8, 18.4, 20.0, 20.8, 19.4, 16.9, 18.0, 18.5, 18.0, 17.5, 18.7, 21.6, 21.6, 20.1, 20.6, 22.0, 20.0, 16.8, 24.2, 15.4, 21.2, 22.2, 26.1, 21.5, 18.9, 18.5, 19.9, 20.9, 18.4]

In [0]:

wingspan = []

for i in range(len(bat_list)):
    if bat_list[i] == 'Pipistrellus nathusii':
        wingspan.append(wingspans[i])

print(wingspan)
print(f'Average wingspan of P. nathusii is {sum(wingspan)/len(wingspan):.2f} cm')

Task 3.11

Repeated, short sequences are of interest to geneticists as they suggest the existence of transposible elements within genomic DNA.

Search for and print the first sequence in the following list that starts with "TATA" and has a second "TATA" repeat later in the sequence.
If no sequences are found then print that none was found.

Hint: Loop through the DNA sequences. If a sequence starts with "TATA" test whether the sequence contains a second "TATA" substring. If it does break out of the loop otherwise move onto the next sequence.

In [0]:

dna_sequences = ['TATAGGTATTACGA', 'GATTAGGATGAA', 'TAGCCGGGTATA', 'TATAGGTAGGATATA', 'TATAGGGTTGAAGT']

In [0]:

found = ''

for seq in dna_sequences:
    if seq[:4] == 'TATA' and 'TATA' in seq[4:]:
        found = seq
        break

if found:
    print(seq)
else:
    print('No sequence found')