CoCalc -- Python Refresher

GitHub Repository: DataScienceUWL/DS775
Path: blob/main/Lessons/Lesson 01 - LP 1/extras/Python Refresher - Loops.ipynb
⁸⁷¹ views

Kernel: Python 3 (system-wide)

Loops Introduction

Loops are fundamental programming concepts, and they're used extensively when we're hand-coding optimization problems. Understanding how loops work will be crucial to your ability to successfully complete your homework. Generally, we use for loops and while loops. Both are similar, but have slightly different stopping conditions.

The easiest way to understand what loops are doing is to practice using them with simple conditions, printing out your variables for each iteration of the loop. Flow diagram images in this tutorial are from https://medium.com/datadriveninvestor/how-to-understand-for-and-while-loop-visually-c11052479df5

For Loops

For loops are used when you have a specific number of iterations you'd like to loop. You can either loop using a static number, or you can loop over an object that is iterable (a list, a dictionary, a numpy array, a dataframe, etc.)

Conceptually, a for loop looks like this:

In other words, for each item in a sequence, we execute some code. If at the end of the code execution, there is another item in the sequence, we go back to the top of our code block and execute again, using the next item of the sequence. Let's see how it works with looping over a numpy array of random numbers. We'll manually sum up the random numbers using a for loop.

In [0]:

#import numpy
import numpy as np

#create a random seed so we get consistent results
np.random.seed(123)

# Create an iterable item, in this case a numpy array of 10 random numbers between 1 and 100
# Generally, iterables should have plural variable names, so you can track that they contain multiple values
items = np.random.randint(low=1, high=100, size=10)

#look at what is in items
print(items)

#This is the beginning of the loop. We will loop over each item in the array. Python makes that easy with iterables.
#We can just loop over the iterable itself, telling Python what variable name we want to give to the item in each loop
for i in items:
    print('This item is:', i)
    
#it doesn't matter what you call the variable, it can be anything. 
for item in items:
    print('This item is:', item)

So far we've looped through twice, but we haven't done anything to total our numbers. This time, we'll add to our total with each loop. We have to have a variable that gets set before we start our loop, so that we can add to it inside the loop. If we initialized the total variable inside the loop, we'd be resetting it each time.

In [0]:

#we need a variable that is set to zero BEFORE the loop starts. 
total = 0
for i in items:
    total += i #+= is Python's way of saying, add this to the existing total
    #watch our total increase
    print(total)
    
#we're now outside the loop. It's done looping. We can print the final total.        
print('Final total: ', total)

What if we only wanted to add to our total if our current item was an even number? We can evaluate our current item inside the loop and use a conditional if statement to decide whether or not to add to the total.

In [0]:

#we need a variable that is set to zero BEFORE the loop starts. 
#(If we didn't reset this here, we'd still be adding to the total from above)
total = 0

for i in items:
    #the % operator is modulus. It returns the remainder if you divided the first number by the second
    #we can use it to determine if something is even, because the result will be zero if it's divided by 2
    print('Modulus is', i%2)
    if i%2 == 0:
        total += i
        print('Adding to total equals: ', total)

#we're now outside the loop. It's done looping. We can print the final total.        
print('\nFinal total: ', total)

That's a lot of lines of code just to sum some numbers. Python has shortcut ways to do these kinds of for loops. One of those is called a list comprehension. If all we wanted to do was print the sum of all the even numbers, we could do it in a single line, like below.

In [0]:

sum([i for i in items if i%2==0])

Let's breakdown what's happening in that list comprehension, starting with the line inside the inner set of brackets [].

This line i for i in items is our for loop. Notice the syntax is a little bit different. First we're telling Python to return i, and then we're telling it what i is (each item in the list of items).

What this returns is a list with just the items that meet our condition.

In [0]:

[i for i in items if i%2==0]

Python is smart enough that you can just wrap that whole list in sum, to get the sum of all the items in the list. Because this is the only line in the code block, it automatically prints. But we could also save this to another variable, or print it by wrapping the whole thing with print().

What if we needed a boolean vector instead with true for even and false for odd. The syntax is slightly different. The value we want if our condition is met comes first (True), then our conditional logic, then our else value, and finally we tell Python what we are looping over.

In [0]:

[True if i%2==0 else False for i in items]

Now you try. Create a list comprehension that adds up the ODD numbers of items in our item list. (Hint: the answer is 443.)

In [0]:

While Loops

While loops happen while some condition is true.

In a flow diagram while loops look like this:

We can do everything that we did above using a while loop with a few changes in syntax.

In [0]:

#first we'll set a counter
counter = 0

#now we'll figure out how long our numpy array is
max_length = len(items)

#check how long our items array is
print('Max length is:', max_length)

#now we'll loop "while" our counter is less than the max_length

while counter < max_length:
    #we have to increment the counter manually or our loop will run forever. 
    counter +=1  
    
    #Let's see what's in our counter variable each time
    print('Counter is', counter)

When you're using a while loop, you're not directly looping over an object, so you don't have the item. You have an index that you can use to fetch the item from the iterable. Let's see how that's done.

In [0]:

#remember to set the counter to zero again before we try to loop again
counter = 0
while counter < max_length:   
    #we have to increment the counter manually or our loop will run forever
    counter +=1     
    #fetch the item from the array
    print('Counter is', counter)
    print('Item is', items[counter])

Oh oh. What happened? We were going along just fine and then bam, error. Can you see what the problem is?

Python indexes start from zero. Since we incremented our counter at the start of the loop, we started counting from one. We ran out of items in the array before we finished our loop.

We can easily fix this by moving our counter += 1 to the end of our loop.

In [0]:

#remember to set the counter to zero again before we try to loop again
counter = 0
while counter < max_length:   
     
    #fetch the item from the array
    print('Counter is', counter)
    print('Item is', items[counter])
    
    #we have to increment the counter manually or our loop will run forever
    counter +=1

Again, if we only wanted to total all our numbers, it works the same way as it did in the for loop, but we have to fetch the item to add.

In [0]:

#we need two variables that are set to zero BEFORE the loop starts. 
total = 0
counter = 0
while counter < max_length:   
    total += items[counter]
    #watch our total increase
    print(total)
    counter +=1 
    
#we're now outside the loop. It's done looping. We can print the final total.        
print('Final total: ', total)

Conditionally adding to the total works similarly, too - we're just fetching what to evaluate and add. Since we'd have to fetch the item twice in this code, it makes sense to set it as a local variable. This local variable will get rewritten each time we loop through the code.

In [0]:

#we need two variables that are set to zero BEFORE the loop starts. 
total = 0
counter = 0
while counter < max_length:   
    #fetch the item into a local variable
    this_item = items[counter]
    
    if this_item%2 == 0:
        total += this_item
    #watch our total increase (or not)
    print(total)
    counter +=1 
    
#we're now outside the loop. It's done looping. We can print the final total.        
print('Final total: ', total)

More Complex Data

We can also loop over more complex data structures. In this course, we'll often loop over dictionaries with compound keys. Understanding how to access data in dictionaries of this type is important. It looks complex, but it's really exactly like what we've done before. The dictionary is an iterable, just like a simple list.

Let's start by creating a list of possible driving routes by making a list of tuples of 2 cities. Note: a tuple is a collection which is ordered and unchangeable. A tuple can work as a key in a dictionary, because we know we won't be changing anything in the tuple (messing up the key). https://www.w3schools.com/python/python_tuples.asp

In [0]:

routes = [
    ('Madison', 'Chicago'),
    ('Milwaukee', 'Madison'),
    ('Minneapolis', 'Eau Claire')
    ]
print(routes)

Our list of cities doesn't do us much good without some additional information about the cities. What if we wanted to know distance and traffic level (light, medium, heavy). We could make 2 dictionaries and zip our additional information together with our tuple keys.

In [0]:

route_distances = zip(routes, [166.3, 89.9,92.5])
   
print(route_distances)

But, this is just a zip object, which isn't what we want. We really want a dictionary. We could do it in another step, or we could do it all in one step.

In [0]:

#2-step approach
route_distances = dict(route_distances)
print(route_distances)

Let's do it all in one step for our traffic conditions. We'll nest the zip function inside a dict function.

In [0]:

#1 step approach
route_traffic = dict(zip(routes, ['heavy', 'light', 'medium']))
print(route_traffic)

If we want to loop over our route_distances dictionary, we use a for loop like we have before. But, this time, because our key is a tuple, we need to give Python a tuple to feed each iteration into. That's what we're doing with the (t, f) bit. We can pull back our value for each tuple by using bracket notation.

In [0]:

for (t, f) in route_distances:
    print('The distance from {0} to {1} is {2} miles.'.format(f, t, route_distances[(t,f)]))

What if we knew our list of cities, but we weren't sure which cities were in our distances dictionary? We could loop over each possible combination of cities to find out.

In [0]:

cities = ['Madison', 'Chicago', 'Milwaukee', 'Minneapolis', 'Eau Claire']

#we have tuples, so let's use nested loops
for ct in cities: #this is the "to" city loop
    for cf in cities: #this is the from city loop
        if (ct, cf) in route_distances: #here we're making sure that we have this particular to-from city combination
            print('The distance from {0} to {1} is {2} miles.'.format(cf, ct, route_distances[(ct,cf)]))
        else:
            print('We have no information about {0} to {1}'.format(ct, cf))

We can also do the same thing by first determining all permutations of pairs of items in our city list, and then looping over the permutations.

In [0]:

#import the itertools permutations 
from itertools import permutations 

#if we want to print this, we need to cast it to a list
city_combos = list(permutations(cities, 2))

#see what this returns
print(city_combos)

#now we can loop over all possible pairs of cities
for (t,f) in city_combos:
    if (t,f) in route_distances: #we only want to print the ones that exist
        print('The distance from {0} to {1} is {2} miles.'.format(f, t, route_distances[(t,f)]))

Just like with lists, you can use comprehensions with dictionaries. Let's get the average distance of routes with Madison in the to or from. First we'll do it the long way:

In [0]:

#create a list variable that will contain all the distances that we have for routes that include Madison
#we'll use numpy array to get a shortcut to averaging
total_madison_route_distance = np.array([])

#loop over our route_distances
for (t,f) in route_distances:
    if t == 'Madison' or f == 'Madison':
        total_madison_route_distance = np.append(total_madison_route_distance, route_distances[(t,f)])


#see what we have
print(total_madison_route_distance)

#use numpy average to get the average
print(np.mean(total_madison_route_distance))

Now let's do it with a comprehension. We'll use a regular list and roll our own function for determining the average of a list.

In [0]:

### this is a function that determines the average of a list
def Average(lst): 
    return sum(lst) / len(lst)

#we're generating the list of just those routes that contain Madison
total_madison_route_distance = [route_distances[(t,f)] for (t,f) in route_distances if t == 'Madison' or f == 'Madison']

#print the average
print(Average(total_madison_route_distance))

One final example. If what we want to do with our list comprehension is a little bit messy, we can also write a function that we use in the list comprehension. In this scenario, we want to create a new dictionary that holds the time to drive our routes. We know that, on average, people drive 70 mph on these roads (all our destinations are connected by interstates). But, if traffic is heavy, it takes approximately 10% longer to get there. We'll make a function that determines drive time and use it to generate a new dictionary of route drive times.

In [0]:

#our function takes in the to and from of our tuple keys
def calcDriveTime(t,f):
    #get the traffic for this route
    traffic = route_traffic[(t,f)]
    if traffic == 'heavy':
        return ((route_distances[(t,f)]/70) * 1.1)
    else:
        return (route_distances[(t,f)]/70)

#this is the dictionary comprehension - our keys are to the left of the : and our value to the right    
route_drive_times = {(t,f):calcDriveTime(t,f) for (t,f) in route_distances}   

#see what we got
route_drive_times

Loops Introduction

For Loops

While Loops

More Complex Data

Product

Resources

Company