Book a Demo!
CoCalc Logo Icon
StoreFeaturesDocsShareSupportNewsAboutPoliciesSign UpSign In
Download
10327 views
ubuntu2004
Kernel: Python 3 (system-wide)

Dictionaries III - counting things

Dictionaries can be used for many things. In this course we will cover two common uses:

  1. Counting things (covered in this Notebook)

  2. Lookup table (covered in the next Notebook)

Using dictionaries to keep counts

We've already seen an example of counting something. In Notebook 10 we counted the number of "f"s in the sentence "The embryo of Festuca fusca, a native plant species of Old-World countries, is almost half the full length of its grain."

But what if we had wanted to count the number of occurrences of each and every letter in the string rather than just the "f"s. How can we code this?

In Notebook 10 we had a variable called count that kept a tally of the number of "f"s as we looped through the string. One way to count each and every letter would be to have a tally variable for each and every letter. For example, count_a, count_b, count_c, and so on. Writing the code this way would be very tedious as we would need 26 separate variables.

Instead we can use a dictionary with its property that each key must be unique.

Remember that a dictionary is essentially just a table of two columns. One column is the unique key and the other is its value. So for a tally of letters in the above sentence we would have a table that looks like this:

letter (key)tally (value)
a8
b1
......
z0

Translating this into a dictionary, each key will be a letter of the alphabet and its value will be the number of times it occurs in the sentence, i.e.,

counts = { 'a':8, 'b':1., ..., 'z':0 }

Of course, we don't want to write this out manually - that's just as bad as having a count variable for each letter. What we want is for Python to construct the dictionary for us.

First, let's write down the steps involved if we were to do this task by hand. This is often a good way to solve a task: write first, code later. Here are the steps involved:

  1. Setup an empty table, the first column is "letter" and the second column is its "tally".

  2. Starting at the beginning, go through the sentence one letter at a time.

  3. If the current letter is already in our table we have seen it earlier in the sentence; increment its tally by one.

  4. If the current letter is not in the table this is the first time it has occurred in the sentence; add it to the table and set its tally to one.

This is what is called an algorithm: a set of simple instructions to perform a task. Now let's translate this into some Python code.

  1. Initialise an empty dictionary (each key:value pair will be a letter:tally pair).

  2. Loop through the sentence one letter at a time.

  3. If the current letter is in the dictionary, increment its tally by one.

  4. Otherwise add the letter to the dictionary and set its tally to one.

Finally let's write the Python code. Read it first to understand what it is doing before running it.
sentence = 'The embryo of Festuca fusca, a native plant species of Old-World countries, is almost half the full length of its grain.' # Initialise an empty dictionary. # Keys will be letters (as these are unique) and values will be counts of each letter. # Items in the dictionary will be added as we loop through the sentence one character at a time. counts = {} # Loop through the sentence one character at a time. character is the iterating variable. for character in sentence: # Test if the dictionary already contains the current character as a key. if character in counts: # If the character is a key in the dictionary increment its value by 1. counts[character] += 1 else: # The character is not a key in the dictionary so this is its first occurrence. # Add the character to the dictionary and set its values to 1. counts[character] = 1 # Output the final counts. counts
{'T': 1, 'h': 4, 'e': 9, ' ': 20, 'm': 2, 'b': 1, 'r': 4, 'y': 1, 'o': 7, 'f': 6, 'F': 1, 's': 8, 't': 8, 'u': 4, 'c': 4, 'a': 8, ',': 2, 'n': 5, 'i': 6, 'v': 1, 'p': 2, 'l': 8, 'O': 1, 'd': 2, '-': 1, 'W': 1, 'g': 2, '.': 1}

First we initialise an empty dictionary called counts to store the tally of each character in the sentence.

Next, starting from the first character "T", we loop through the sentence one character at a time.

We test if "T" already exists as a key in the dictionary counts.

As the dictionary is empty "T" is not a key in it so we add it as a key and set its value to 1.

The loop moves onto the next character in the sentence which is "h".

Whenever we come across a character we have seen before its key will be in the dictionary and so we can increment its value by 1.

The output is not quite what we want. First, we've got lower and uppercase letters, but we really only want lowercase. Second we've got punctuation which we don't want. Finally, it would be better if the output was ordered alphabetically.

The easiest way to use only lowercase letters is to convert the whole sentence to lowercase. We covered converting a string to lowercase in Notebook 7 on string methods. We could do it like so:

for character in sentence.lower():

which will convert all the characters in sentence to lowercase. Which means the first character will be "t" rather than "T".

To ignore punctuation we should test whether the iterating variable character is a letter or punctuation. If you google "python test character is a letter" you'll find the answer is to use the isalpha() method. As this is a test we should use a conditional if statement. If character is a letter then character.isalpha() is True and we can then test if it is in the dictionary as before. If character is punctuation then character.isalpha() is False and we can ignore it and move onto the next character in the sentence.

Finally, to output the counts alphabetically we want to loop through the dictionary sorted on the key. See Notebook 16 on how to do this.

The modified code is given below. Read it first so you understand what it is doing before running it.
sentence = 'The embryo of Festuca fusca, a native plant species of Old-World countries, is almost half the full length of its grain.' # Initialise an empty dictionary. # Keys will be letters (as these are unique) and values will be counts of each letter. # Items in the dictionary will be added as we loop through the sentence one character at a time. counts = {} # Loop through the sentence one character at a time. character is the iterating variable. # All characters in "sentence" are converted to lowercase. for character in sentence.lower(): # Test if character is a letter and not punctuation. # If character is a letter then character.isalpha() is True and we go on to test if it is in the dictionary. # If character is punctuation then character.isalpha() is False. We do nothing and move on to the next character. if character.isalpha(): # Test if the dictionary already contains the current character. if character in counts: # If the character is in the dictionary increment its value by 1. counts[character] += 1 else: # The character is not in the dictionary so this is its first occurence. # Add the character to the dictionary and set its count to 1. counts[character] = 1 # Output the final counts. # Loop through the dictionary one item at a time sorted alphabetically. for letter, count in sorted( counts.items() ): print(letter, count)
a 8 b 1 c 4 d 2 e 9 f 7 g 2 h 4 i 6 l 8 m 2 n 5 o 8 p 2 r 4 s 8 t 9 u 4 v 1 w 1 y 1

Exercise Notebook

Next Notebook