Book a Demo!
CoCalc Logo Icon
StoreFeaturesDocsShareSupportNewsAboutPoliciesSign UpSign In
Download
10327 views
ubuntu2004
Kernel: Python 3 (system-wide)

Workshop 4

Dictionaries

Task 4.1

Video answer>

The table below tabulates the average diameters of white blood cells in human blood.

Cell typeDiameter (μ\mum)
Neutrophil11
Eosinophil11
Basophil13.5
Small lymphocyte7.5
Large lymphocyte13.5
Monocyte22.5

By simultaneously looping through the two following lists, construct a dictionary with cell types as keys and diameters as values.

  • Hint: Begin with an empty dictionary and, as you loop through the lists, add key:value pairs.

cell_types = ['Neutrophil', 'Eosinophil', 'Basophil', 'Small lymphocyte', 'Large lymphocyte', 'Monocyte'] diameters = [11, 11, 13.5, 7.5, 13.5, 22.5]
white_cells = {} for i in range(len(cell_types)): key = cell_types[i] value = diameters[i] white_cells[key] = value print(white_cells)

Task 4.2

Video answer>

Loop through the dictionary you have just constructed and convert the diameters from micrometers to millimeters.

  • Hint: 1$\mu$m equals 0.001mm

for cell, diameter in white_cells.items(): white_cells[cell] = diameter/1000 print(white_cells)

Task 4.3

Video answer>

Below is the list of bats identified in the bat survey from the Forest of Dean.

Use dictionaries to answer these questions:

  1. How many of each bat species were observed?

  2. How many different bat species were observed?

    • Hint: Remember, each key in a dictionary is unique.

  3. How many species in the genus Pipistrellus were observed?

    • Hint 1: Loop through the keys of the dictionary you constructed and count the number of keys starting with "Pipistrellus".

    • Hint 2: You might like to google the string method startswith().

bat_list = ['Plecotus austriacus', 'Pipistrellus nathusii', 'Myotis daubentonii', 'Nyctalus noctula', 'Pipistrellus pipistrellus', 'Pipistrellus pipistrellus', 'Pipistrellus nathusii', 'Pipistrellus nathusii', 'Eptesicus serotinus', 'Myotis bechsteinii', 'Pipistrellus nathusii', 'Pipistrellus pygmaeus', 'Pipistrellus pipistrellus', 'Plecotus austriacus', 'Myotis daubentonii', 'Nyctalus noctula', 'Myotis brandtii', 'Myotis mystacinus', 'Pipistrellus nathusii', 'Pipistrellus pygmaeus', 'Pipistrellus nathusii', 'Rhinolophus hipposideros', 'Nyctalus leisleri', 'Pipistrellus pipistrellus', 'Pipistrellus nathusii', 'Nyctalus noctula', 'Plecotus austriacus', 'Pipistrellus nathusii', 'Myotis nattereri', 'Pipistrellus pipistrellus', 'Pipistrellus nathusii', 'Plecotus auritus', 'Barbastella barbastellus', 'Pipistrellus nathusii', 'Myotis brandtii', 'Pipistrellus pipistrellus', 'Myotis nattereri']
species = {} for bat in bat_list: if bat not in species: species[bat] = 1 else: species[bat] += 1 for s, c in species.items(): print( f'{s}\t\t{c}') print(f'The number of different bat species observed is {len(species)}') count = 0 for s in species: if s.startswith('Pipistrellus'): count += 1 print(f'The number species in the genus Pipistrellus is {count}')

Task 4.4

Video answer>

Write code to translate the following DNA sequence.

# Assign the genetic code to a dictionary variable called "genetic_code". # Keys are codons and values are amino acid letters. # Stop codons are represented by the underscore character "_". genetic_code = { 'TTT': 'F', 'TCT': 'S', 'TAT': 'Y', 'TGT': 'C', 'TTC': 'F', 'TCC': 'S', 'TAC': 'Y', 'TGC': 'C', 'TTA': 'L', 'TCA': 'S', 'TAA': '_', 'TGA': '_', 'TTG': 'L', 'TCG': 'S', 'TAG': '_', 'TGG': 'W', 'CTT': 'L', 'CCT': 'P', 'CAT': 'H', 'CGT': 'R', 'CTC': 'L', 'CCC': 'P', 'CAC': 'H', 'CGC': 'R', 'CTA': 'L', 'CCA': 'P', 'CAA': 'Q', 'CGA': 'R', 'CTG': 'L', 'CCG': 'P', 'CAG': 'Q', 'CGG': 'R', 'ATT': 'I', 'ACT': 'T', 'AAT': 'N', 'AGT': 'S', 'ATC': 'I', 'ACC': 'T', 'AAC': 'N', 'AGC': 'S', 'ATA': 'I', 'ACA': 'T', 'AAA': 'K', 'AGA': 'R', 'ATG': 'M', 'ACG': 'T', 'AAG': 'K', 'AGG': 'R', 'GTT': 'V', 'GCT': 'A', 'GAT': 'D', 'GGT': 'G', 'GTC': 'V', 'GCC': 'A', 'GAC': 'D', 'GGC': 'G', 'GTA': 'V', 'GCA': 'A', 'GAA': 'E', 'GGA': 'G', 'GTG': 'V', 'GCG': 'A', 'GAG': 'E', 'GGG': 'G'}
dna_seq = 'TTTATGTATCCTTATATCACAACTCGAAGATTCTTCTTCTGCACGAGAAGCGTGGGAATCATGGAATAA'
protein_seq = '' for i in range(0, len(dna_seq), 3): codon = dna_seq[i:i+3] amino_acid = genetic_code[codon] protein_seq += amino_acid print(protein_seq)

Task 4.5

Protein synthesis begins at the start codon "ATG". In Task 4.4 we started translating the DNA sequence at its first base. Instead we should have started translation at the first "ATG" codon.

In Task 2.11 you wrote some code to find the index of "ATG" in a DNA sequence.

  1. Incorporate that code into your DNA translation code to start translation at "ATG" and not before.

  2. You should first check if "ATG" is in the DNA sequence. If it isn't print that the DNA sequence does not contain a start codon, otherwise translate the sequence as normal.

protein_seq = '' start_idx = dna_seq.find('ATG') if start_idx != -1: for i in range(start_idx, len(dna_seq), 3): codon = dna_seq[i:i+3] amino_acid = genetic_code[codon] protein_seq += amino_acid print(protein_seq) else: print('Sequence has no start codon')

Task 4.6

Video answer>

It is useful to be able to search long DNA sequences to find palindromic sequences.

Write a program to print all palindromic sequences between 4 and 8 basepairs long inclusive in the following sequence.

  • Hint 1: This is a difficult task so you should write an algorithm on paper first before attempting to code it. By writing an algorithm first you are able to spot potential problems early instead of staring blankly at a broken piece of code.

  • Hint 2: You will need two nested loops. The outermost loop to go through the different palindrome sizes (4 to 8), the first nested loop to go through the DNA sequence from left to right, and a second nested loop to construct the reverse complement of all substrings.

complement_table = { 'A':'T', 'C':'G', 'G':'C', 'T':'A'} dna_sequence = 'ATGAGATAGAAGAGCGCATCGATCGATGGACCGATCGATCGATTCGCGAGCTCGCGATCGATCGGCCGATATCGCGCGATATGCGCTGCGTACGCACGATCGATCGATGGTAATCGTACGACTTCGAAGTCGCGC'
for size in range(4, 9): for i in range( len(dna_sequence)-size+1 ): dna_seq = dna_sequence[i:i+size] complement_seq = '' for base in dna_seq: complement_base = complement_table[base] complement_seq += complement_base if complement_seq[::-1] == dna_seq: print(dna_seq)