GitHub Repository: DataScienceUWL/DS775
Path: blob/main/Homework/Lesson 10 HW - Simulation/Homework_10.ipynb
⁸⁷¹ views

Kernel: Python 3 (system-wide)

Lesson 10 Homework - Simulation

When asking questions about homework in Piazza please use a tag in the subject line like HW1.3 to refer to Homework 1, Question 3. So the subject line might be HW1.3 question. Note there are no spaces in "HW1.3". This really helps keep Piazza easily searchable for everyone!

For full credit, all code in this notebook must be both executed in this notebook and copied to the Canvas quiz where indicated.

Question 1 - Textbook Problem 20.1-2 (Manually Graded) (5 points)

Hints for 1

It should be apparent that the answer is between 0.2 and 0.4 since the probability of rain is always one of those.

The weather can be considered a stochastic system, because it evolves in a probabilistic manner from one day to the next. Suppose for a certain location that this probabilistic series satisfies the following description:

The probability of rain tomorrow is 0.6 if it is raining today. The probability of its being clear (no rain) tomorrow is 0.8 if it is clear today.

We've modified the problem a bit from the text book: use uniformly distributed random numbers to simulate a sequence of 1000 days starting from a clear day. Your code should output the the approximate probability of a rainy day based on your simulation results.

Hints:

use a for loop
start with a clear day
inside the for loop, if today is clear and rand() < .8, then tomorrow is clear, otherwise tomorrow is rainy
you'll need to add a similar if statement to predict tomorrow's weather if today is rainy
count the number of rainy days/simulation size to estimate the probability of rain
round to 2 digits to compare your answer to the answer choices
even with a random seed, there could be some variance. Choose the closest answer to what you got.
You can increase the number of simulated days to be more certain of your answer.

In [3]:

# retain this seed
np.random.seed(10)

SimSize = 1000

weather = np.ones(SimSize) # ones for clear, zeros for rain

for i in range(1,SimSize):
    r = np.random.uniform(0,1)
    # weather[i-1] is today
    # weather[i] is tomorrow
    # look at weather[i-1] to determine how to predict today
    if weather[i-1] == 1:
        if r < 0.8:
            weather[i] = 1
        else:
            weather[i] = 0
    else: # it rained yesterday
        ...

Question 2 (2 points)

What is the approximate probability of rain?

Question 3 - Textbook Problem 20.6-7 (Manually Graded) (5 points)

Hints for 3

Change your interest rates both to a constant 0.08 to see if you get the same result as the example code. Note, you'll need to put the rates in decimal form.

Now that Jennifer is in middle school, her parents have decided they must start saving for her college education. They have $10,000 to invest right now. Furthermore, they plan to save another $4,000 yearly until Jennifer starts college five years later. They plan to split their investment evenly between stock and bond funds. The stock fund has had an average annual return of 8 percent with a standard deviation of 6 percent. The bond fund has had an average annual return of 4 percent with a standard deviation of 3 percent. (Assume a normal distribution for both.)

Assume that the initial investment ($10,000) is made right now (at the beginning of year 1) and is split evenly between the two funds (i.e., $5,000 in each fund). The returns of each fund are allowed to accumulate (i.e., are reinvested) in the same fund and no redistribution will be done before Jennifer starts college. Furthermore, four additional investments of $4,000 will be made and split evenly between both funds ($2,000 each) at the end of each year plus another $4,000 of savings will be available at the end of year 5, just in time for Jennifer to begin college. Use a 1000-trial simulation to answer questions 4-7.

Hints:

Generate new interest rates randomly each year.
Store the total amount after each simulation (stocks + bonds).
To determine probability of the fund being at least 35k, count the number of times the fund reached 35k or more and divide by sim size.
Easiest implementation is a for loop.
Remember to round at the end, as per each question's instructions.
You can increase the number of simulated days to be more certain of your answer.
To check your work, you could set the interest rate for the stocks to a constant 0.08 and see if you get $19,079.84 at the end of the 5th year as demonstrated by this code snippet:

A = 5000 # initial investment
r = .08 # annual interest rate
n = 5 # number of years
I = 2000 # additional investment at end of year year
print("end of year, amount")
for i in range(n):
    A = (1+r)*A + I
    print(i+1,A)

In [0]:

# here is some starter pseudocode

np.random.seed(777)  # Retain this seed

initialize storage array
for i in range(1000):
    
    stocks = 5000
    bonds = 5000
    for y in range(5):
        generate random interest rates
        stocks = stocks + ....
        bonds = 
    storage_array[i] = stocks + bonds

Question 4 (2 points)

What will be the expected value (mean) of the college fund at the end of year 5, rounded to the nearest 1000 (e.g. 24570 becomes 25000)?

Question 5 (2 points)

What will be the standard deviation of the college fund at the end of year 5, rounded to the nearest 1000?

Question 6 (2 points)

What is the probability that the college fund at the end of year 5 will be at least $35,000?

Question 7 (2 points)

What is the probability that the college fund at the end of year 5 will be at least $40,000, rounded to two digits?

Question 8 - Textbook Problem 20.6-9 (Manually Graded) (5 points)

Road Pavers, Inc. (RPI) is considering bidding on a county road construction project. RPI has estimated that the cost of this particular project would be $5 million. In addition, the cost of putting together a bid is estimated to be $50,000. The county also will receive four other bids on the project from competitors of RPI. Past experience with these competitors suggests that each competitor’s bid is most likely to be 20 percent over the project cost of $5 million, but could be as low as 5 percent over or as much as 40 percent over this cost. Assume a triangular distribution for each of these bids.

Suppose that RPI bids $5.7 million on the project. Write a function that takes in RPI's bid and the simulation size and returns arrays of statistics about the profit.

Run your function with a simulation size of 1,000 trials. (You should call your function once. It should simulate profits 1000 times and return an array of profits. You can use the array of profits to compute statistics after you call the function)

Hints:

RPI's bid is constant, but that constant changes for some of the questions
Generate 4 bids from triangular dist (numpy.random.triangular) with low = 5 * 1.05, mode = 5 * 1.2, high = 5 * 1.4.
Compare RPI's bid to the lowest (min) of the 4 competitor bids.
If RPI bid < smallest bid then RPI wins and their profit is RPI bid - (5 + .05).
If RPI loses their profit is -.05.
Each round of the simulation track their profit, or tracks wins and losses.
Compute statistics (mean, 5th percentile, 95th percentile)
You can increase the number of simulated days to be more certain of your answer.

In [3]:

# here is some starter code, this won't run yet

def find_profits(bid, ssize=1000):
    '''
    Parameters:
    bid - the amount RPI is bidding
    ssize - the simulation size (1000 for your final solution - start smaller until you have a working function)
    
    Returns:
    An array of ssize simulated profits
    '''
    np.random.seed(88) #retain this seed
    #add your code from here 
    
    profits = np.zeros(ssize)
    
    for i in range(ssize):
        best_comp_bid = np.min(np.random.uniform(0,100,4)) # change to triangular distribution
        if bid < best_comp_bid: # RPI wins
            profits[i] = ...
        else:
            profits[i] = ...
            
    return profits

## After you use the function you can use the returned profits array to calculate statistics

Question 9 (2 points)

What is the probability that RPI will win the bid, rounded to the nearest single digit?

Question 10 (2 points)

Into which range would RPI's average profit fall?

150,000 - 199,99
200,000 - 229,999
230,000 - 269,999
270,000 - 315,999
316,000 - 329,999

Question 11 (2 points)

Generate a parameter analysis report to consider eight equally spaced bids between $5.3 million and $6 million in order to forecast RPI’s mean profit for each bid. Which of these bid ranges maximizes RPI’s mean profit?

Hints:

Repeat the steps from question 8, with each of the possible bids passed in to your function.
Find average profit for each possible bid.
For which of the bids is average profit maximized?

Answer Options:

5.3 or 5.4
5.5 or 5.6
5.7 or 5.8
5.9 or 6.0

In [6]:

np.random.seed(88) #retain this seed

Question 12 (Manually Graded) (2 points)

Generate a trend chart for the eight bids considered in Question 11.

Submit both your code and your trend chart.

Hints:

Graph horizontal bid amount and vertical mean profit
Remember to add 5th and 95th percentiles to plot as in lesson

In [0]:

Question 13 (Manually Graded) (4 points)

There's an example of this in the lesson. We suggest you use minimize_scalar from scipy.optimize.

Perform an automated search to find the bid that maximizes RPI’s mean profit. You should consider bids in the range of 5.3 to 6 million. (Use one of the optimization tools we've studied.)

In [0]:

Sausage Making - Simulation with Optimization

Make sure you start with our code from Lesson 1 or Lesson 2. To get the amount of full-priced pork you need to add the amounts of pork used for economy and premium sausages and subtract x.

We're going to revisit the Sausage Factory problem from Lessons 1 and 2, but this time, we're going to introduce uncertainty.

In Lesson 2, we optimized the cost of our sausage making, by altering the ingredients in each sausage type to meet minimum requirements and fulfill a demand of 350 economy sausages and 500 premium sausages a week.

In reality, our demand fluctuates week to week.

The basic set up of the problem is the same.

We're going to make sausages by blending pork, wheat, and starch. Our objective is to minimize the cost of making the sausages. The table below shows the ingredients available, the cost, and the amount of each ingredient available from our supplier:

Ingredient	Cost ($/kg)	Amount (kg)
Pork	4.32	7 kg
Wheat	2.46	20.0
Starch	1.86	17

We want to make 2 types of sausage:

Economy ( > 40% pork )
Premium ( > 60% pork )

Each sausage is 50 grams (0.05 kg).

According to government regulations, the most starch we can use in our sausages is 25% by weight.

New Information:

The price for pre-purchased pork is $3.10, for a discount of $1.22 per kg. We will be fairly conservative in our estimates of pre-purchase, and we are setting our minimum pork used to our pre-purchased amount, so we will only need to calculate the total discount in our objective, not a penalty for buying full-priced pork at $4.32 per kg.

While our demand fluctuates, we know that our demand for economy sausages is between 325 and 375 each week and our demand for premium sausage is between 450 and 550 each week. Demand appears to be uniformly distributed (use np.random.randint).

This is a prescriptive analytics problem! You are prescribing the amount of pork to buy under contract to minimize future cost in an uncertain future.

Question 14 (Manually Graded) (4 points)

Write a function that takes in the amount of discount pork we will pre-purchase as a variable (x).

Inside the function, write a 1000 iteration loop that solves the optimization problem with random variables for demand and x as your stand in for the minimum amount of discount pork produced. Track the amount of full-price pork used and the cost of each iteration and return both from your function.

You may base your Pyomo model on either the concrete or abstract formulations in Lesson 1 and Lesson 2 presentations, respectively.

After you've written your function, demonstrate it using 20 as the amount of discount pork purchased. Plot histograms of the Kilograms of Full-Price Pork purchased and the cost.

Submit both your code and the two histograms in Canvas.

Hints:

You need to write the LP in terms of this variable x
Put the whole LP inside a function
Account for the discount by subtracting 1.22*x from the objective
x is like the 23 kg in the original model
x + 7 is the max amount of pork you can use
Inside the function you have a loop around the LP that repeats the solution 1000 times, each time with diff. random demands.
Track the kg of full-priced pork, and the minimized cost.
Return the kg of full-priced pork and the minimized cost as numpy arrays.

In [0]:

# starter code

def get_pork_cost(x):
    '''
    Parameters:
    x - the amount of discount pork purchased
    
    Returns:
    finalCost: An array of the optimized costs
    fullPricePorkUsed: An array of the total full-price pork used
    '''
    np.random.seed(814) #retain this seed
    #start your code here
    
    initialize storage arrays with numpy
    for i in range(1000):
        solve pyomo model to get min cost
        cost[i] = model.cost()

Notice that the final cost and the amount of full-price pork purchased have distributions that fairly closely match each other. That's because full-price pork is the most expensive component of our sausages. The more full-price pork we use, the more expensive our product is. We'd like to optimize the amount of full-price pork we use.

Question 15 (2 points)

Which range does the average optimized cost fall into, when pre-purchasing 20kgs of pork?

80-99
100-125
126-150
151-170

In [0]:

Question 16 (Manually Graded) (5 points)

Write a loop to call your function for all the values of pre-order quantities between and including 17 and 28.

Store the values of:

pre-order quantities
the mean, 5th percentile and 95th percentile of the cost.

After your loop, print the values in a dataframe and generate a trend chart showing the mean cost versus the pre-purchased pork amount. Your trend chart should should include trend lines as in the lesson.

Note: This may take a while to run.

Hints:

You want to find the best amount to pre-purchase.
Use a loop to check x = 17, 18, 19, ... 28.
Make a table including x and the other quantities listed above.

In [0]:

Question 17 (Manually Graded) (2 points)

Based on your dataframe and trend chart, what is the optimal amount of pork to pre-purchase. Why?

In [0]:

Lesson 10 Homework - Simulation

When asking questions about homework in Piazza please use a tag in the subject line like HW1.3 to refer to Homework 1, Question 3. So the subject line might be HW1.3 question. Note there are no spaces in "HW1.3". This really helps keep Piazza easily searchable for everyone!

Question 1 - Textbook Problem 20.1-2 (Manually Graded) (5 points)

Hints for 1

Question 2 (2 points)

Question 3 - Textbook Problem 20.6-7 (Manually Graded) (5 points)

Hints for 3

Question 4 (2 points)

Question 5 (2 points)

Question 6 (2 points)

Question 7 (2 points)

Question 8 - Textbook Problem 20.6-9 (Manually Graded) (5 points)

Question 9 (2 points)

Question 10 (2 points)

Question 11 (2 points)

Question 12 (Manually Graded) (2 points)

Question 13 (Manually Graded) (4 points)

Sausage Making - Simulation with Optimization

Question 14 (Manually Graded) (4 points)

Question 15 (2 points)

Question 16 (Manually Graded) (5 points)

Question 17 (Manually Graded) (2 points)

Product

Resources

Company