Project: CSCI 195

Path: Class Samples / datascience / Homework-3-Solutions / hw3.ipynb

Views: ⁵⁹³⁴
Image: ubuntu2004

Kernel: Python 3 (system-wide)

Part 1 - Working with heights of the US presidents

Execute the cell below to load a NumPy array that contains the heights of the US presidents.

In [2]:

import pandas as pd
import numpy as np
heights = np.array(pd.read_csv('president_heights.csv')['height(cm)'])

Write one or more statements in the cell below that prints out the heights of the first five presidents.

In [3]:

print(heights[0:5])

[189 170 189 163 183]

Write one or more statements in the cell below to print out:

the smallest height
the average height
the maximum height in the heights array.

In [4]:

print(np.min(heights))
print(np.mean(heights))
print(np.max(heights))

163
179.74418604651163
193

The data in the file doesn't include Presidents Trump (190 cm) and Biden (182 cm). Write one or more statements in the cell below that adds these two values to the end of heights.

In [5]:

heights = np.append(heights, [190, 182])

Write one or more statements in the cells below that prints out the heights of the last 10 presidents, which should now include Presidents Trump and Biden.

In [6]:

print(heights[-10:])

[182 183 177 185 188 188 182 185 190 182]

Write one or more statements in the cell below to create a numpy array named president_heights. The number of rows in president_heights should be the same as the number of rows in heights, and there should be 4 columns. Each value in president_heights should be initialized to the integer value zero.

In [20]:

president_heights = np.zeros((heights.shape[0], 4), dtype=np.int32)

Write one or more statements in the cell below to set the values in the first column of president_heights to be the same as the values in heights.

In [21]:

president_heights[:,0] = heights

Write one or more statements in the cell below to set the values in the second column of president_heights to be the height of each president in inches. Use the conversion factor 1 cm = 0.393701 in. Store the values in heights as ints.

In [22]:

president_heights[:, 1] = (heights*0.393701).astype('int')

Write one or more statements in the cell below to set the values in the third column of president_heights to be the number of feet for each president, as an integer.

In [23]:

president_heights[:, 2] = (president_heights[:, 1] / 12).astype('int')

Finally, write one or more statements to set the values in the fourth column of president_heights to be the number of inches for each president. Use the % operator to get the remainder of dividing the number of inches by 12.

In [24]:

president_heights[:, 3] = (president_heights[:, 1] % 12)

Execute the cell below to print out all the values in the president_heights array.

In [25]:

print(president_heights)

[[189  74   6   2]
 [170  66   5   6]
 [189  74   6   2]
 [163  64   5   4]
 [183  72   6   0]
 [171  67   5   7]
 [185  72   6   0]
 [168  66   5   6]
 [173  68   5   8]
 [183  72   6   0]
 [173  68   5   8]
 [173  68   5   8]
 [175  68   5   8]
 [178  70   5  10]
 [183  72   6   0]
 [193  75   6   3]
 [178  70   5  10]
 [173  68   5   8]
 [174  68   5   8]
 [183  72   6   0]
 [183  72   6   0]
 [180  70   5  10]
 [168  66   5   6]
 [170  66   5   6]
 [178  70   5  10]
 [182  71   5  11]
 [180  70   5  10]
 [183  72   6   0]
 [178  70   5  10]
 [182  71   5  11]
 [188  74   6   2]
 [175  68   5   8]
 [179  70   5  10]
 [183  72   6   0]
 [193  75   6   3]
 [182  71   5  11]
 [183  72   6   0]
 [177  69   5   9]
 [185  72   6   0]
 [188  74   6   2]
 [188  74   6   2]
 [182  71   5  11]
 [185  72   6   0]
 [190  74   6   2]
 [182  71   5  11]]

Part 2 - working with the 2015-2019 population data

Write one or more statements in the cell below that loads the contents of the JSON file fips_codes.json into a variable named fips_codes_original.

In [26]:

import json
with open("fips_codes.json") as fips_codes_file:
    fips_codes_original = json.load(fips_codes_file)

Write a statement in the cell below that uses a dictionary comprehension to create a dictionary named fips_codes from the values in the fips_codes_original dictionary.

The new dictionary's keys should be the result of converting the keys in fips_codes_original from strings to ints
The new dictionary's values should be the same as they were in fips_codes_original

For example, suppose fips_codes_original contains the keys '01' and '02' with values ['Alabama', 'AL'] and ['Alaska', 'AK']. Then fips_codes should contain the integer keys 1 and 2 with the same values.

In [27]:

fips_codes = {int(code): fips_codes_original[code] for code in fips_codes_original}

Write 2 statements in the cell below that print out the values in fips_codes for the keys 1 and 2.

In [28]:

print(fips_codes[1])
print(fips_codes[2])

['AL', 'Alabama']
['AK', 'Alaska']

In the cell below, define a function named print_fips_info that takes 2 arguments:

A dictionary containing named codes containing the FIPS codes
An integer named code that is a FIPS code for a state

The function should print out the FIPS code, full state name, and abbreviation corresponding to the value of code. The output should look like exactly like this

FIPS code 26 is the state 'Michigan' ('MI')

Do not hard-code the values 26, Michigan and MI. Instead obtain them from the codes dictionary using the value of the code argument.

In [29]:

def print_fips_info(codes, code):
    state = codes[code]
    abbr = state[0]
    full_name = state[1]
    print(f"FIPS code '{code}' is the state '{full_name}' ('{abbr}')")

Write a statement in the cell below that calls the print_fips_info function, passing fips_codes and 26 as the values for the arguments.

In [30]:

print_fips_info(fips_codes, 26)

FIPS code '26' is the state 'Michigan' ('MI')

Write a statement in the cell below that loads the saved NumPy array in the file state_pops_with_region.npy, storing it into an array named pops. You can reference page 117 (section 4.4) from the course textbook to find information on how to load a saved NumPy array.

In [32]:

pops = np.load('state_pops_with_region.npy')

Write one or more statements in the cell below that prints out the size of the pops array, in the format:

There are R rows and C columns in the pops array

where R and C are the actual number of rows and columns.

In [33]:

size = pops.shape
print(f"There are {size[0]} rows and {size[1]} columns in the pops array")

There are 250 rows and 4 columns in the pops array

Write one or more statements in the cell below that prints out contents of the 50th row in pops, like this:

Alabama's population in 2016 was 4,863,300.

Use the fips_codes dictionary to obtain the name of the state. Do not hard-code the year or population in the print statement.

In [34]:

(year, code, pop, region) = pops[50]

print(f"State: {fips_codes[code][1]}'s population in {year} was {pop:,}.")

State: Alabama's population in 2016 was 4,863,300.

Write one or more statements in the cell below that create an array named midwest_2019 containing the codes and populations for all of the states in the Midwest region in the year 2019. The region number is contained in the 4th column of pops, and the value for states in the Midwest region is 1. Do not use a for loop.

Print the value of midwest_2019 after computing it. The output should look like this:

[[      39 11689100]
 [      55  5822434]
 [      17 12671821]
 [      18  6732219]
 [      19  3155070]
 [      26  9986857]]

In [35]:

twenty_nineteen = pops[:,0] == 2019
midwest = pops[:,3] == 1
midwest_2019 = pops[np.logical_and(twenty_nineteen, midwest)]
print(midwest_2019[:, [1,2]])

[[      39 11689100]
 [      55  5822434]
 [      17 12671821]
 [      18  6732219]
 [      19  3155070]
 [      26  9986857]]

Write code in the cell below that creates and prints a NumPy array named midwest_codes_2019 containing just the FIPS codes for the data in midwest_2019.

In [36]:

midwest_codes_2019 = midwest_2019[:,1]
print(midwest_codes_2019)

[39 55 17 18 19 26]

Write one or more statements in the cell below that uses a list comprehension to create a list named midwest_names_2019 containing the full names of the states contained in midwest_codes_2019. You can get the names from the fips_codes dictionary. Print the value of midwest_names_2019.

In [37]:

state_names = [fips_codes[row][1] for row in midwest_codes_2019]
print(state_names)

['Ohio', 'Wisconsin', 'Illinois', 'Indiana', 'Iowa', 'Michigan']

Study the examples of using argsort found on pages 478-479 of the textbook carefully. Then write one or more statements in the cell below to print the FIPS code and population data in the midwest_2019 array, sorted in ascending order by population. To do so,

Apply argsort to the population column of midwest_2019 only to obtain an array of indexes named sort_indexes
Use sort_indexes as an indexer to "rearrange" the rows of midwest_2019, storing the result in sorted_by_population
Print just columns 1 and 2 of sorted_by_population.

The result should be

[[      19  3155070]
 [      55  5822434]
 [      18  6732219]
 [      26  9986857]
 [      39 11689100]
 [      17 12671821]]

In [38]:

sort_indexes = np.argsort(midwest_2019[:, 2])
sorted_by_population = midwest_2019[sort_indexes]
print(sorted_by_population[:, [1,2]])

[[      19  3155070]
 [      55  5822434]
 [      18  6732219]
 [      26  9986857]
 [      39 11689100]
 [      17 12671821]]

Write code in the cell below that prints out the state names and populations for the states in the Midwest region, sorted in descending order by population. Since NumPy arrays can only contain a single data type, I used a list comprehension to create a list of tuples containing (state name, population) pairs from the data in the midwest_2019 array. Then I iterated over that list of tuples to produce the output, which should look like this:

Illinois: 12,671,821
Ohio: 11,689,100
Michigan: 9,986,857
Indiana: 6,732,219
Wisconsin: 5,822,434
Iowa: 3,155,070

In [39]:

sorted_midwest_2019 = midwest_2019[np.argsort(midwest_2019[:,2])]
pops_with_state = [(fips_codes[row[1]][1], row[2]) for row in reversed(sorted_midwest_2019)]

for row in pops_with_state:
    print(f"{row[0]}: {row[1]:,}")

Illinois: 12,671,821
Ohio: 11,689,100
Michigan: 9,986,857
Indiana: 6,732,219
Wisconsin: 5,822,434
Iowa: 3,155,070

Write one or more statements in the cell below to compute and print out the total population of the Midwest region in 2019 in millions. The output should look like this:

The total population of the midwest in 2019 was 50.06M

In [40]:

total_pop = np.sum(midwest_2019[:,2])
print(f"The total population of the midwest in 2019 was {total_pop / 1e6:.2f}M")

The total population of the midwest in 2019 was 50.06M

Part 3 - Image processing

As we saw in class, a digital photograph is a 3 dimensional array. The first two dimensions represent the rows and columns of the image, while the 3rd dimension contains color information for each pixel in the image. In color images, the color information contains 3 values, corresponding to intensities of red, green, and blue. In the last part of this assignment, you'll perform some image manipulation operations.

Get started by executing the cell below to import the PIL library and read in an image. PIL stands for Python Imaging Library.

In [41]:

from PIL import Image

img = Image.open('dutch.jpg')
display(img)

Now write a statement that uses the NumPy copy function to create a copy of the array named img, storing the result in an array named extra_blue.

In [42]:

extra_blue = np.copy(img)

Write statements in the cell below that

Sets the blue (3rd) component of all of the pixels in extra_blue to 255.
Uses the Image.fromarray method to create a new Image named extra_blue_img from the extra_blue array.
Uses the display function to display extra_blue_img.

In [43]:

extra_blue[:, :, 2] = 255
extra_blue_img = Image.fromarray(extra_blue)
display(extra_blue_img)

Now, in the cell below do something similar as you did for extra_blue to create a copy of img named blue_removed. blue_removed should have the blue component set to 0. Display the resulting image after creating it.

In [44]:

blue_removed = np.copy(img)
blue_removed[:, :, 2] = 0
display(Image.fromarray(blue_removed))

In the cell below:

Create a copy of img named gray
Use np.mean with an appropriate axis= argument to compute the average of the RGB values for each pixel, storing the result in an array named average (what will the shape of average be?) Then
- set the red values in gray to the values in average
- set the green values in gray to the values in average
- set the blue values in gray to the values in average
Use Image.fromarray to create an image named gray_image from gray
Call display to display the Image named gray_image.

In [45]:

gray = np.copy(img)
average = np.mean(gray, axis=2)
gray[:, :, 0] = average
gray[:, :, 1] = average
gray[:, :, 2] = average
gray_image = Image.fromarray(gray)
display(gray_image)

Part 1 - Working with heights of the US presidents

Part 2 - working with the 2015-2019 population data

Part 3 - Image processing

Product

Resources

Company