¹⁰³²⁷ views
ubuntu2004

Kernel: Python 3 (system-wide)

Workshop 4

Dictionaries

Task 4.1

The table below tabulates the average diameters of white blood cells in human blood.

Cell type	Diameter ( $\mu$ m)
Neutrophil	11
Eosinophil	11
Basophil	13.5
Small lymphocyte	7.5
Large lymphocyte	13.5
Monocyte	22.5

By simultaneously looping through the two following lists, construct a dictionary with cell types as keys and diameters as values.

Hint: Begin with an empty dictionary and, as you loop through the lists, add key:value pairs.

In [0]:

cell_types = ['Neutrophil', 'Eosinophil', 'Basophil', 'Small lymphocyte', 'Large lymphocyte', 'Monocyte']
diameters = [11, 11, 13.5, 7.5, 13.5, 22.5]

In [0]:

white_cells = {}

for i in range(len(cell_types)):
    key = cell_types[i]
    value = diameters[i]
    white_cells[key] = value

print(white_cells)

Task 4.2

Loop through the dictionary you have just constructed and convert the diameters from micrometers to millimeters.

Hint: 1$\mu$m equals 0.001mm

In [0]:

for cell, diameter in white_cells.items():
    white_cells[cell] = diameter/1000

print(white_cells)

Task 4.3

Below is the list of bats identified in the bat survey from the Forest of Dean.

Use dictionaries to answer these questions:

How many of each bat species were observed?
How many different bat species were observed?
- Hint: Remember, each key in a dictionary is unique.
How many species in the genus Pipistrellus were observed?
- Hint 1: Loop through the keys of the dictionary you constructed and count the number of keys starting with "Pipistrellus".
- Hint 2: You might like to google the string method startswith().

In [0]:

bat_list = ['Plecotus austriacus', 'Pipistrellus nathusii', 'Myotis daubentonii', 'Nyctalus noctula', 'Pipistrellus pipistrellus', 'Pipistrellus pipistrellus', 'Pipistrellus nathusii', 'Pipistrellus nathusii', 'Eptesicus serotinus', 'Myotis bechsteinii', 'Pipistrellus nathusii', 'Pipistrellus pygmaeus', 'Pipistrellus pipistrellus', 'Plecotus austriacus', 'Myotis daubentonii', 'Nyctalus noctula', 'Myotis brandtii', 'Myotis mystacinus', 'Pipistrellus nathusii', 'Pipistrellus pygmaeus', 'Pipistrellus nathusii', 'Rhinolophus hipposideros', 'Nyctalus leisleri', 'Pipistrellus pipistrellus', 'Pipistrellus nathusii', 'Nyctalus noctula', 'Plecotus austriacus', 'Pipistrellus nathusii', 'Myotis nattereri', 'Pipistrellus pipistrellus', 'Pipistrellus nathusii', 'Plecotus auritus', 'Barbastella barbastellus', 'Pipistrellus nathusii', 'Myotis brandtii', 'Pipistrellus pipistrellus', 'Myotis nattereri']

In [0]:

species = {}
for bat in bat_list:
    if bat not in species:
        species[bat] = 1
    else:
        species[bat] += 1

for s, c in species.items():
    print( f'{s}\t\t{c}')

print(f'The number of different bat species observed is {len(species)}')

count = 0
for s in species:
    if s.startswith('Pipistrellus'):
        count += 1
print(f'The number species in the genus Pipistrellus is {count}')

Task 4.4

Write code to translate the following DNA sequence.

In [0]:

# Assign the genetic code to a dictionary variable called "genetic_code".
# Keys are codons and values are amino acid letters.
# Stop codons are represented by the underscore character "_".

genetic_code = {
     'TTT': 'F', 'TCT': 'S', 'TAT': 'Y', 'TGT': 'C',
     'TTC': 'F', 'TCC': 'S', 'TAC': 'Y', 'TGC': 'C',
     'TTA': 'L', 'TCA': 'S', 'TAA': '_', 'TGA': '_',
     'TTG': 'L', 'TCG': 'S', 'TAG': '_', 'TGG': 'W',
     'CTT': 'L', 'CCT': 'P', 'CAT': 'H', 'CGT': 'R',
     'CTC': 'L', 'CCC': 'P', 'CAC': 'H', 'CGC': 'R',
     'CTA': 'L', 'CCA': 'P', 'CAA': 'Q', 'CGA': 'R',
     'CTG': 'L', 'CCG': 'P', 'CAG': 'Q', 'CGG': 'R',
     'ATT': 'I', 'ACT': 'T', 'AAT': 'N', 'AGT': 'S',
     'ATC': 'I', 'ACC': 'T', 'AAC': 'N', 'AGC': 'S',
     'ATA': 'I', 'ACA': 'T', 'AAA': 'K', 'AGA': 'R',
     'ATG': 'M', 'ACG': 'T', 'AAG': 'K', 'AGG': 'R',
     'GTT': 'V', 'GCT': 'A', 'GAT': 'D', 'GGT': 'G',
     'GTC': 'V', 'GCC': 'A', 'GAC': 'D', 'GGC': 'G',
     'GTA': 'V', 'GCA': 'A', 'GAA': 'E', 'GGA': 'G',
     'GTG': 'V', 'GCG': 'A', 'GAG': 'E', 'GGG': 'G'}

In [0]:

dna_seq = 'TTTATGTATCCTTATATCACAACTCGAAGATTCTTCTTCTGCACGAGAAGCGTGGGAATCATGGAATAA'

In [0]:

protein_seq = ''

for i in range(0, len(dna_seq), 3):
    codon = dna_seq[i:i+3]
    amino_acid = genetic_code[codon]
    protein_seq += amino_acid

print(protein_seq)

Task 4.5

Protein synthesis begins at the start codon "ATG". In Task 4.4 we started translating the DNA sequence at its first base. Instead we should have started translation at the first "ATG" codon.

In Task 2.11 you wrote some code to find the index of "ATG" in a DNA sequence.

Incorporate that code into your DNA translation code to start translation at "ATG" and not before.
You should first check if "ATG" is in the DNA sequence. If it isn't print that the DNA sequence does not contain a start codon, otherwise translate the sequence as normal.

In [0]:

protein_seq = ''

start_idx = dna_seq.find('ATG')

if start_idx != -1:
    for i in range(start_idx, len(dna_seq), 3):
        codon = dna_seq[i:i+3]
        amino_acid = genetic_code[codon]
        protein_seq += amino_acid

    print(protein_seq)

else:
    print('Sequence has no start codon')

Task 4.6

It is useful to be able to search long DNA sequences to find palindromic sequences.

Write a program to print all palindromic sequences between 4 and 8 basepairs long inclusive in the following sequence.

Hint 1: This is a difficult task so you should write an algorithm on paper first before attempting to code it. By writing an algorithm first you are able to spot potential problems early instead of staring blankly at a broken piece of code.
Hint 2: You will need two nested loops. The outermost loop to go through the different palindrome sizes (4 to 8), the first nested loop to go through the DNA sequence from left to right, and a second nested loop to construct the reverse complement of all substrings.

In [0]:

complement_table = {
    'A':'T',
    'C':'G',
    'G':'C',
    'T':'A'}

dna_sequence = 'ATGAGATAGAAGAGCGCATCGATCGATGGACCGATCGATCGATTCGCGAGCTCGCGATCGATCGGCCGATATCGCGCGATATGCGCTGCGTACGCACGATCGATCGATGGTAATCGTACGACTTCGAAGTCGCGC'

In [0]:

for size in range(4, 9):
    for i in range( len(dna_sequence)-size+1 ):
        dna_seq = dna_sequence[i:i+size]
        complement_seq = ''

        for base in dna_seq:
            complement_base = complement_table[base]
            complement_seq += complement_base

        if complement_seq[::-1] == dna_seq:
            print(dna_seq)