Contact
CoCalc Logo Icon
StoreFeaturesDocsShareSupport News AboutSign UpSign In
| Download
Project: CSCI 195
Views: 5934
Image: ubuntu2004
Kernel: Python 3 (system-wide)

Part 1 - Working with heights of the US presidents

Execute the cell below to load a NumPy array that contains the heights of the US presidents.

import pandas as pd import numpy as np heights = np.array(pd.read_csv('president_heights.csv')['height(cm)'])

Write one or more statements in the cell below that prints out the heights of the first five presidents.

print(heights[0:5])
[189 170 189 163 183]

Write one or more statements in the cell below to print out:

  • the smallest height

  • the average height

  • the maximum height in the heights array.

print(np.min(heights)) print(np.mean(heights)) print(np.max(heights))
163 179.74418604651163 193

The data in the file doesn't include Presidents Trump (190 cm) and Biden (182 cm). Write one or more statements in the cell below that adds these two values to the end of heights.

heights = np.append(heights, [190, 182])

Write one or more statements in the cells below that prints out the heights of the last 10 presidents, which should now include Presidents Trump and Biden.

print(heights[-10:])
[182 183 177 185 188 188 182 185 190 182]

Write one or more statements in the cell below to create a numpy array named president_heights. The number of rows in president_heights should be the same as the number of rows in heights, and there should be 4 columns. Each value in president_heights should be initialized to the integer value zero.

president_heights = np.zeros((heights.shape[0], 4), dtype=np.int32)

Write one or more statements in the cell below to set the values in the first column of president_heights to be the same as the values in heights.

president_heights[:,0] = heights

Write one or more statements in the cell below to set the values in the second column of president_heights to be the height of each president in inches. Use the conversion factor 1 cm = 0.393701 in. Store the values in heights as ints.

president_heights[:, 1] = (heights*0.393701).astype('int')

Write one or more statements in the cell below to set the values in the third column of president_heights to be the number of feet for each president, as an integer.

president_heights[:, 2] = (president_heights[:, 1] / 12).astype('int')

Finally, write one or more statements to set the values in the fourth column of president_heights to be the number of inches for each president. Use the % operator to get the remainder of dividing the number of inches by 12.

president_heights[:, 3] = (president_heights[:, 1] % 12)

Execute the cell below to print out all the values in the president_heights array.

print(president_heights)
[[189 74 6 2] [170 66 5 6] [189 74 6 2] [163 64 5 4] [183 72 6 0] [171 67 5 7] [185 72 6 0] [168 66 5 6] [173 68 5 8] [183 72 6 0] [173 68 5 8] [173 68 5 8] [175 68 5 8] [178 70 5 10] [183 72 6 0] [193 75 6 3] [178 70 5 10] [173 68 5 8] [174 68 5 8] [183 72 6 0] [183 72 6 0] [180 70 5 10] [168 66 5 6] [170 66 5 6] [178 70 5 10] [182 71 5 11] [180 70 5 10] [183 72 6 0] [178 70 5 10] [182 71 5 11] [188 74 6 2] [175 68 5 8] [179 70 5 10] [183 72 6 0] [193 75 6 3] [182 71 5 11] [183 72 6 0] [177 69 5 9] [185 72 6 0] [188 74 6 2] [188 74 6 2] [182 71 5 11] [185 72 6 0] [190 74 6 2] [182 71 5 11]]

Part 2 - working with the 2015-2019 population data

Write one or more statements in the cell below that loads the contents of the JSON file fips_codes.json into a variable named fips_codes_original.

import json with open("fips_codes.json") as fips_codes_file: fips_codes_original = json.load(fips_codes_file)

Write a statement in the cell below that uses a dictionary comprehension to create a dictionary named fips_codes from the values in the fips_codes_original dictionary.

  • The new dictionary's keys should be the result of converting the keys in fips_codes_original from strings to ints

  • The new dictionary's values should be the same as they were in fips_codes_original

For example, suppose fips_codes_original contains the keys '01' and '02' with values ['Alabama', 'AL'] and ['Alaska', 'AK']. Then fips_codes should contain the integer keys 1 and 2 with the same values.

fips_codes = {int(code): fips_codes_original[code] for code in fips_codes_original}

Write 2 statements in the cell below that print out the values in fips_codes for the keys 1 and 2.

print(fips_codes[1]) print(fips_codes[2])
['AL', 'Alabama'] ['AK', 'Alaska']

In the cell below, define a function named print_fips_info that takes 2 arguments:

  • A dictionary containing named codes containing the FIPS codes

  • An integer named code that is a FIPS code for a state

The function should print out the FIPS code, full state name, and abbreviation corresponding to the value of code. The output should look like exactly like this

FIPS code 26 is the state 'Michigan' ('MI')

Do not hard-code the values 26, Michigan and MI. Instead obtain them from the codes dictionary using the value of the code argument.

def print_fips_info(codes, code): state = codes[code] abbr = state[0] full_name = state[1] print(f"FIPS code '{code}' is the state '{full_name}' ('{abbr}')")

Write a statement in the cell below that calls the print_fips_info function, passing fips_codes and 26 as the values for the arguments.

print_fips_info(fips_codes, 26)
FIPS code '26' is the state 'Michigan' ('MI')

Write a statement in the cell below that loads the saved NumPy array in the file state_pops_with_region.npy, storing it into an array named pops. You can reference page 117 (section 4.4) from the course textbook to find information on how to load a saved NumPy array.

pops = np.load('state_pops_with_region.npy')

Write one or more statements in the cell below that prints out the size of the pops array, in the format:

There are R rows and C columns in the pops array

where R and C are the actual number of rows and columns.

size = pops.shape print(f"There are {size[0]} rows and {size[1]} columns in the pops array")
There are 250 rows and 4 columns in the pops array

Write one or more statements in the cell below that prints out contents of the 50th row in pops, like this:

Alabama's population in 2016 was 4,863,300.

Use the fips_codes dictionary to obtain the name of the state. Do not hard-code the year or population in the print statement.

(year, code, pop, region) = pops[50] print(f"State: {fips_codes[code][1]}'s population in {year} was {pop:,}.")
State: Alabama's population in 2016 was 4,863,300.

Write one or more statements in the cell below that create an array named midwest_2019 containing the codes and populations for all of the states in the Midwest region in the year 2019. The region number is contained in the 4th column of pops, and the value for states in the Midwest region is 1. Do not use a for loop.

Print the value of midwest_2019 after computing it. The output should look like this:

[[ 39 11689100] [ 55 5822434] [ 17 12671821] [ 18 6732219] [ 19 3155070] [ 26 9986857]]
twenty_nineteen = pops[:,0] == 2019 midwest = pops[:,3] == 1 midwest_2019 = pops[np.logical_and(twenty_nineteen, midwest)] print(midwest_2019[:, [1,2]])
[[ 39 11689100] [ 55 5822434] [ 17 12671821] [ 18 6732219] [ 19 3155070] [ 26 9986857]]

Write code in the cell below that creates and prints a NumPy array named midwest_codes_2019 containing just the FIPS codes for the data in midwest_2019.

midwest_codes_2019 = midwest_2019[:,1] print(midwest_codes_2019)
[39 55 17 18 19 26]

Write one or more statements in the cell below that uses a list comprehension to create a list named midwest_names_2019 containing the full names of the states contained in midwest_codes_2019. You can get the names from the fips_codes dictionary. Print the value of midwest_names_2019.

state_names = [fips_codes[row][1] for row in midwest_codes_2019] print(state_names)
['Ohio', 'Wisconsin', 'Illinois', 'Indiana', 'Iowa', 'Michigan']

Study the examples of using argsort found on pages 478-479 of the textbook carefully. Then write one or more statements in the cell below to print the FIPS code and population data in the midwest_2019 array, sorted in ascending order by population. To do so,

  • Apply argsort to the population column of midwest_2019 only to obtain an array of indexes named sort_indexes

  • Use sort_indexes as an indexer to "rearrange" the rows of midwest_2019, storing the result in sorted_by_population

  • Print just columns 1 and 2 of sorted_by_population.

The result should be

[[ 19 3155070] [ 55 5822434] [ 18 6732219] [ 26 9986857] [ 39 11689100] [ 17 12671821]]
sort_indexes = np.argsort(midwest_2019[:, 2]) sorted_by_population = midwest_2019[sort_indexes] print(sorted_by_population[:, [1,2]])
[[ 19 3155070] [ 55 5822434] [ 18 6732219] [ 26 9986857] [ 39 11689100] [ 17 12671821]]

Write code in the cell below that prints out the state names and populations for the states in the Midwest region, sorted in descending order by population. Since NumPy arrays can only contain a single data type, I used a list comprehension to create a list of tuples containing (state name, population) pairs from the data in the midwest_2019 array. Then I iterated over that list of tuples to produce the output, which should look like this:

Illinois: 12,671,821 Ohio: 11,689,100 Michigan: 9,986,857 Indiana: 6,732,219 Wisconsin: 5,822,434 Iowa: 3,155,070
sorted_midwest_2019 = midwest_2019[np.argsort(midwest_2019[:,2])] pops_with_state = [(fips_codes[row[1]][1], row[2]) for row in reversed(sorted_midwest_2019)] for row in pops_with_state: print(f"{row[0]}: {row[1]:,}")
Illinois: 12,671,821 Ohio: 11,689,100 Michigan: 9,986,857 Indiana: 6,732,219 Wisconsin: 5,822,434 Iowa: 3,155,070

Write one or more statements in the cell below to compute and print out the total population of the Midwest region in 2019 in millions. The output should look like this:

The total population of the midwest in 2019 was 50.06M
total_pop = np.sum(midwest_2019[:,2]) print(f"The total population of the midwest in 2019 was {total_pop / 1e6:.2f}M")
The total population of the midwest in 2019 was 50.06M

Part 3 - Image processing

As we saw in class, a digital photograph is a 3 dimensional array. The first two dimensions represent the rows and columns of the image, while the 3rd dimension contains color information for each pixel in the image. In color images, the color information contains 3 values, corresponding to intensities of red, green, and blue. In the last part of this assignment, you'll perform some image manipulation operations.

Get started by executing the cell below to import the PIL library and read in an image. PIL stands for Python Imaging Library.

from PIL import Image img = Image.open('dutch.jpg') display(img)
Image in a Jupyter notebook

Now write a statement that uses the NumPy copy function to create a copy of the array named img, storing the result in an array named extra_blue.

extra_blue = np.copy(img)

Write statements in the cell below that

  • Sets the blue (3rd) component of all of the pixels in extra_blue to 255.

  • Uses the Image.fromarray method to create a new Image named extra_blue_img from the extra_blue array.

  • Uses the display function to display extra_blue_img.

extra_blue[:, :, 2] = 255 extra_blue_img = Image.fromarray(extra_blue) display(extra_blue_img)
Image in a Jupyter notebook

Now, in the cell below do something similar as you did for extra_blue to create a copy of img named blue_removed. blue_removed should have the blue component set to 0. Display the resulting image after creating it.

blue_removed = np.copy(img) blue_removed[:, :, 2] = 0 display(Image.fromarray(blue_removed))
Image in a Jupyter notebook

In the cell below:

  • Create a copy of img named gray

  • Use np.mean with an appropriate axis= argument to compute the average of the RGB values for each pixel, storing the result in an array named average (what will the shape of average be?) Then

    • set the red values in gray to the values in average

    • set the green values in gray to the values in average

    • set the blue values in gray to the values in average

  • Use Image.fromarray to create an image named gray_image from gray

  • Call display to display the Image named gray_image.

gray = np.copy(img) average = np.mean(gray, axis=2) gray[:, :, 0] = average gray[:, :, 1] = average gray[:, :, 2] = average gray_image = Image.fromarray(gray) display(gray_image)
Image in a Jupyter notebook