Python Coding

¹⁰³²⁷ views
ubuntu2004

Kernel: Python 3 (system-wide)

Lists

The following table tabulates the average diameters of white blood cells in human blood.

Cell type	Diameter ( $\mu$ m)
Neutrophil	11
Eosinophil	11
Basophil	13.5
Small lymphocyte	7.5
Large lymphocyte	13.5
Monocyte	22.5

Let's say we wanted to use Python code to calculate the average diameter of white blood cell types. One way to code it would be to have a variable for each diameter, e.g.,

d1 = 11
d2 = 11
d3 = 13.5
d4 = 7.5
d5 = 13.5
d6 = 22.5

and then find the average and assign it to a numerical variable called average_diameter like so

average_diameter = (d1 + d2 + d3 + d4 + d5 + d6) / 6

That's okay if we have a few values. But if we have hundreds or millions of values in a set of data, coding them like this is tedious, maybe impossible and not re-useable.

Fortunately Python has a data type, called a list, that can store multiple values simultaneously. Lists allow us to write simpler, more efficient and re-useable code.

A list is a sequence of items

Using white blood cells as an example, the following code assigns a sequence of numbers to a list variable called diameters and assigns a sequence of strings to another list variable called cell_types. Python knows they are list variables because of the pair of square brackets surrounding the sequences of comma-separated numbers and strings.

The individual values in a list are called elements or items.

In [5]:

# Assign a sequence of numbers to a list variable called diameters.
diameters = [11, 11, 13.5, 7.5, 13.5, 22.5]

# Assign a sequence of strings to a list variable called cell_types.
cell_types = ['Neutrophil', 'Eosinophil', 'Basophil', 'Small lymphocyte', 'Large lymphocyte', 'Monocyte']

print( diameters )
print( cell_types )

Out[5]:

[11, 11, 13.5, 7.5, 13.5, 22.5]
['Neutrophil', 'Eosinophil', 'Basophil', 'Small lymphocyte', 'Large lymphocyte', 'Monocyte']

Length of a list

The number of items in a list is called its length. Similarly for strings, the length of a list is returned by the function len() as demonstrated in the following code.

In [6]:

print( f'The number of items in diameters is {len( diameters )}' )

Out[6]:

The number of items in diameters is 6

Creating an empty list

When we created the lists diameters and cell_types above they were initialised with sequences of numbers and strings respectively.

Sometimes we want to initialise an empty list. To do this we simply use a pair of square brackets with nothing between them like so:

In [7]:

# Initialise an empty list and assign it to a variable called cell_types.
cell_types = []

print( cell_types )

Out[7]:

[]

Adding items to a list: append()

To add items to the end of a list we use its append() method. This is shown in the code below.

In [8]:

# Create an empty list.
cell_types = []
print( cell_types )

# Append "Neutrophil" to the end of the list.
cell_types.append("Neutrophil")
print( cell_types )

# Append "Eosinophil" to the end of the list.
cell_types.append("Eosinophil")
print( cell_types )

Out[8]:

[]
['Neutrophil']
['Neutrophil', 'Eosinophil']

Notice that the first time we print the list it is empty. Then we add "Neutrophil", so we have a list with one item. Then we add "Eosinophil" to the end of the list resulting in a list with two items.

We can add as many items as we want to a list.

Sorting a list

Lists can be sorted using either the sorted() function or the sort() method. The difference between the two options is that sorted() will create a new sorted list, leaving the original list intact, whereas sort() will sort a list in-place, i.e., the original list is modified.

In [9]:

diameters = [11, 11, 13.5, 7.5, 13.5, 22.5]

# Sort the diameters list in place using the .sort() method.
diameters.sort()

print( diameters )

Out[9]:

[7.5, 11, 11, 13.5, 13.5, 22.5]

In [10]:

diameters = [11, 11, 13.5, 7.5, 13.5, 22.5]

# Sort the diameters list and assign it to new variable called ascending_diameters using the sorted() function.
ascending_diameters = sorted( diameters )

print( ascending_diameters ) # the new, sorted list
print( diameters )           # the original list is not modified

Out[10]:

[7.5, 11, 11, 13.5, 13.5, 22.5]
[11, 11, 13.5, 7.5, 13.5, 22.5]

Items can be sorted in descending order by specifying reverse=True in sort() or sorted() like so:

In [11]:

diameters = [11, 11, 13.5, 7.5, 13.5, 22.5]

# Sort diameters in reverse and assign it to new variable called descending_diameters using the sorted() function.
descending_diameters = sorted( diameters, reverse=True )
print( descending_diameters )

# Sort diameters in reverse in-place using the .sort() method.
diameters.sort( reverse=True )
print( diameters )

Out[11]:

[22.5, 13.5, 13.5, 11, 11, 7.5]
[22.5, 13.5, 13.5, 11, 11, 7.5]

Accessing values in a list

Accessing individual items

Lists are ordered: each item occurs at a position known as its index. (This is just like indicies of characters in a string.)

An item's value can be accessed by using the index for that item. As for strings, the first item of the list is at index 0, the second item is at index 1, and so on. The last item in the list has index -1, the penultimate item has index -2, and so on.

Run the following code to see how indexing works.

In [13]:

cell_types = ['Neutrophil', 'Eosinophil', 'Basophil', 'Small lymphocyte', 'Large lymphocyte', 'Monocyte']

# Access the first item of cell_types and assign it to a string variable called cell.
cell = cell_types[0]
print( cell )

# Directly print the third item of cell_types.
print( cell_types[-1] )

# Access the last item of cell_types and assign it to a string variable called last_cell.
last_cell = cell_types[-1]
print( last_cell )

Out[13]:

Neutrophil
Monocyte
Monocyte

IndexError

If we use an index that is larger than the number of items in the list we get an IndexError: list index out of range as demonstrated in the following code.

In [14]:

print( cell_types[1000] )

Out[14]:

---------------------------------------------------------------------------
IndexError                                Traceback (most recent call last)
<ipython-input-14-19da7042051e> in <module>
----> 1 print( cell_types[1000] )

IndexError: list index out of range

Accessing multiple items: slicing

Multiple items can also be accessed simultaneously by specifying a range of indices using slices exactly as for strings. The slice notation is [start index : stop index], or [start index : stop index : step size] for extended slices.

Run the following code to see how this is done.

In [15]:

# Print the first to third elements in the list (indicies 0, 1 and 2).
print( cell_types[0:3] )

# As for strings, the extended slice notation [::-1] reverses a list,
# i.e, it steps through the list from end to beginning.
print( cell_types[::-1])

Out[15]:

['Neutrophil', 'Eosinophil', 'Basophil']
['Monocyte', 'Large lymphocyte', 'Small lymphocyte', 'Basophil', 'Eosinophil', 'Neutrophil']

Notice that the type of the variable printed is a list (it has square brackets). So when we access an individual item of a list we get back just that item, whether it is a string or number. When we access a slice of a list we get back a list.

Testing membership of a list

You can also use in and not in to test if an item is in a list.

Try different words in the following code to see how it works.

In [16]:

cell_types = ['Neutrophil', 'Eosinophil', 'Basophil', 'Small lymphocyte', 'Large lymphocyte', 'Monocyte']

if 'Basophil' in cell_types:
    print( f'Basophil is a white blood cell')

else:
    print( f'Basophil is not a white blood cell')

Out[16]:

Basophil is a white blood cell

Getting the index of an item

As for strings we can find the index of the first occurrence of an item in a list. With strings we use the find() method. For lists we use the index() method like so:

In [17]:

print( cell_types.index('Small lymphocyte') )

Out[17]:

3

One difference with strings though. If the item we are searching for isn't in the list we get a ValueError.

Try running the following code to see the error.

In [18]:

print( cell_types.index('Leucocyte') )

Out[18]:

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-18-244feef246a1> in <module>
----> 1 print( cell_types.index('Leucocyte') )

ValueError: 'Leucocyte' is not in list

Exercise Notebook

Lists

Next Notebook

Loops II - lists

Lists

A list is a sequence of items

Length of a list

Creating an empty list

Adding items to a list: append()

Sorting a list

Accessing values in a list

Accessing individual items

IndexError

Accessing multiple items: slicing

Testing membership of a list

Getting the index of an item

Exercise Notebook

Next Notebook

Product

Resources

Company