GitHub Repository: DataScienceUWL/DS775
Path: blob/main/Lessons/Lesson 10 - Simulation/Lesson_10.ipynb
⁸⁷¹ views

Kernel: Python 3 (system-wide)

In [5]:

# EXECUTE FIRST

# computational imports
import numpy as np
import pandas as pd
from scipy.optimize import minimize_scalar
import time

# plotting imports
import matplotlib.pyplot as plt
import seaborn as sns
sns.set_style("darkgrid")

# for playing videos, customize height and width if desired
# for best results keep 16:9 ratio
def play_video(vid_name, w=640, h=360):
    from IPython.display import display, IFrame
    from IPython.core.display import HTML
    vid_path = "https://media.uwex.edu/content/ds/ds775_r19/" + vid_name + "/index.html"
    print(vid_path)
    hlink = '<a href = ' + vid_path + ' target = """_blank""">Open video in new tab</a>'
    display(IFrame(vid_path, width=w, height=h))
    display(HTML(hlink))

# helper function for nicely printing dollar amounts
def dollar_print(x):
    if x < 0:
        return(f"-${-x:,.2f}")
    else:
        return(f"${x:,.2f}")

Lesson 10: Simulation

Simulation Basics

Simulation Basics (video)

There is just one video for this lesson that gives an overview of the topic:

In [6]:

# fixme - remove nn = 1 and nn = nn + 1 from the code below, they don't do anything

# execute this cell for video
play_video("ds775_lesson8-simulation-highlights")

Out[6]:

Simulation: What is it?

Simulation is using a computer to imitate the operation of a process or system in order to estimate its actual performance.

Components of a simulation model:

a definition of the state of the system
identification of the possible states of the system
identification of possible events that could change the value of the system
methods for randomly generating events
a way to relate the state transitions to the events that brought them about

Motivation for Simulation

Simulation is used for

mathematical models that are not tractable
analyzing stochastic (random/probabilistic) systems that operate indefinitely to gain insight into the behavior of the system over time
analyzing and designing systems that would otherwise be very time consuming and/or expensive
experimenting with a system without actually implementing it

Limitations of Simulation

Simulations have inherent variability, so they provide only statistical estimates rather than exact results (analytical methods provide exact results when tractable)
Simulations compare various alternative without necessarily finding the optimal one
Even with today's computers, some complex simulations still require large amounts of computing time and expense in programming and analysis
Simulations provide only numerical measures about the performance of a system and cause-and-effect relationships are not always evident
Simulations results apply only to the conditions that were simulated
Sensitivity analysis can be unwieldy in large simulations

Self-Assessment: Simulation

True or False: Simulation is using a computer to imitate the operation of a process or system in order to estimate its actual performance.

Self-Assessment: Simulating a System

True or False: Simulation provides a way of experimenting with proposed systems or policies without actually implementing them.

Self Assessment: Simulation and Time

True or False: Simulation is used for analyzing stochastic systems that operate indefinitely to gain insight into the behavior of the system over time.

Self-Assessment: Generalizing Simulation Results

True or False: Simulations results should be generalized beyond the conditions that were simulated.

Self-Assessment: Type of Simulation

(Hint: Review the subsection titled "Discrete-Event versus Continuous Simulation" in Section 20.1 before answering this one.)

Which of the following would be modeled by discrete-even simulation? Select all that apply.

a. The number of products sold over time.

b. The air pressure in submarine during its time under the water.

c. The arrival of customers to a queue.

d. Whether is rains or not in a day over a 10-year period.

e. The temperature of an engine over a period of operation.

Formulating a Simulation Process

In order to construct a simulation, the following questions may help guide the formulation of the process that is being simulated.

What variables are involved?
Which variables are discrete and which are ones are continuous?
How do variables relate to each other?
What formulas or relationships are needed?
Are some variables dependent on the outcomes of others?
How do the various events and outcomes relate chronologically?
How will the passage of time be marked?
How will the outcome variables be captured and summarized?
How will the simulation performance be evaluated? Numerical summaries? Probabilities?

Simulations come in all shapes and sizes. It takes a lot of creativity, programming skills, and ability to model processes with random variables and functions and equations and put it all together so that the important statistics can be captured on each simulation run and summarize those in some way so that yo can gain insight into the process or system that you are simulating.

Basic Simulation Tools

Generating Discrete Events

Categorical Outcomes

Suppose there is a very large bowl of Skittles where 20% are purple, 16% yellow, 21% green, 18% orange, 13% red, and 12% blue. Also suppose we are to select one Skittle randomly from the bowl and make note of the color.

The cell below contains code to simulate this random outcome. You can re-run the cell as many times as you like to see different outcomes. In a later cell we will address how to capture a series of outputs.

In [7]:

rand = np.random.uniform(low=0.0, high=1.0, size=1)

if rand <= 0.20: 
    color = "purple"
elif  (rand > 0.20 and rand <= .36):
    color = "yellow"
elif  (rand > 0.36 and rand <=.57):
    color = "green"    
elif  (rand > 0.57 and rand <=.75):
    color = "orange"  
elif  (rand > 0.75 and rand <=.88):
    color = "red"
else:
    color = "blue"

print(f"The uniform random number is {rand[0]:0.4f}")  # this isn't necessary to print, but I'm displaying it so you can see how it works
print(f"The color of this randomly selected Skittle is {color}.") # this is a randomly generated discrete event

Out[7]:

The uniform random number is 0.9970
The color of this randomly selected Skittle is blue.

That entire if_else structure can be replaced by one command using numpy.random.choice. The code to do this is below:

In [8]:

# 20% are purple, 16% yellow, 21% green, 18% orange, 13% red, and 12% blue

colors = np.array(["purple", "yellow", "green", "orange", "red", "blue"])
prob = np.array([.20, .16, .21, .18, .13, .12])

color = np.random.choice(colors, size = 1, p = prob)

print(f"The color of this randomly selected Skittle is {color[0]}.") # this is a randomly generated discrete event

Out[8]:

The color of this randomly selected Skittle is yellow.

Random Integers

Suppose uniformly distributed random integers are needed, such as the order quantity or demand in the Freddie the Newsboy simulation (see Hillier p. 923).

The following code will generate a uniformly distributed random integer in the specified range. Again, the cell can be executed as many times as you like to see different outcomes, but they are not being stored. In a later cell we will address how to capture a series of outputs.

In [9]:

demand = np.random.randint(low=40.0, high=70.0, size=1)

print(demand[0])

Out[9]:

47

Generating Continuous Outcomes

There are many continuous distributions to choose from. The Hillier textbook sections 20.3-20.4 discuss some of what is going on "behind the scenes" when you use computer code to generate random numbers (also called pseudo-random numbers because of their reproducibility with random number seeds - more on that below).

Look under Distributions on the Numpy Manual for a list of options.

In this course, the focus will be on implementation rather than the mathematics behind random number generation. A few common options are in the cell that follows.

In [10]:

# 1 observation from a normal distribution with mean 20 and std dev 4
a = np.random.normal(loc=20,scale=4,size=1)
print(f"Normal rv:{a[0]:12.2f}")

# 1 observation from an exponential distribution with mean 3
b = np.random.exponential(scale=3,size=1)
print(f"Exponential rv: {b[0]:6.2f}")

# 1 observation from a 2-parameter Weibull distribution with shape 2 and scale 10
c = 10*np.random.weibull(a=2,size=1)
print(f"Weibull rv: {c[0]:10.2f}")

# 1 observation from a lognormal distribution with mean 100 and std dev 15
d = np.random.lognormal(mean=3,sigma=0.5,size=1)
print(f"Lognormal rv: {d[0]:8.2f}")

# 1 observation from a triangular distribution with min = 10, peak = 20, max = 50
e = np.random.triangular(left=10,mode=20,right=50,size=1)
print(f"Triangular rv: {e[0]:8.2f}")

Out[10]:

Normal rv:       19.06
Exponential rv:   3.64
Weibull rv:      13.73
Lognormal rv:    25.23
Triangular rv:    20.14

You may not have heard of some of these probability distributions and that's OK. If you find yourself in the situtation where you need to

Note: Be aware that many distributions, including the exponential, Weibull, and lognormal, can have different parameterizations so be sure to consult the documentation of the software you are using to be sure of what you are generating.

We've included a separate notebook called lognormal.ipynb that you can use to understand how to set parameters for the lognormal distribution.

Generating Replications

Using Conditions (for, while, if, elif, else) vs. Using Numpy Arrays

Let's generate 1000 replicates of the Skittle selection. We'll do this two ways - with a for loop and with a Numpy array. We've written a function that takes in the simulation size and a boolean (defaulted to true) to choose whether to run looping code or generate the data using Numpy arrays. We've added some timer code to demonstrate the difference between using loops and arrays in terms of speed.

Run the code in the next cell several times. Which method is faster?

In [11]:

#create a function that uses either a loop or an array to generate skittles data
def getSkittles(simSize, loop=True):
    # initialize the array to the desired size and data type (string in this case)
    color = np.full(shape=simSize, fill_value="colors") 

    if loop:
        # capture each simulation run using an index (i in this case)
        for i in range(simSize):
            rand = np.random.uniform(low=0.0, high=1.0, size=1)

            if rand <= 0.20: 
                color[i] = "purple"
            elif  (rand > 0.20 and rand <= .36):  
                color[i] = "yellow"
            elif  (rand > 0.36 and rand <=.57):
                color[i] = "green"    
            elif  (rand > 0.57 and rand <=.75):
                color[i] = "orange"  
            elif  (rand > 0.75 and rand <=.88):
                color[i] = "red"
            else:
                color[i] = "blue"
    else:
        # alternately you could replace the whole loop with this:
        colors = np.array(["purple", "yellow", "green", "orange", "red", "blue"])
        prob = np.array([.20, .16, .21, .18, .13, .12])
        color = np.random.choice( colors, size = simSize, p=prob)

    return color

tic = time.perf_counter()
skittles_loop = getSkittles(1000, 'loop')
toc = time.perf_counter()
print(f"The loop ran in {toc - tic:0.4f} seconds")

tic = time.perf_counter()
skittles_array = getSkittles(1000, 'array')
toc = time.perf_counter()
print(f"The array ran in {toc - tic:0.4f} seconds")

Out[11]:

The loop ran in 0.0212 seconds
The array ran in 0.0170 seconds

In interpreted languages like Python and R it is generally faster to avoid loops when possible.

In the next cell, we generate 10000 skittles and summarize the result with a frequency table and bar graph. Are the frequencies about what you'd expect, given our original probabilities (20% purple, 16% yellow, 21% green, 18% orange, 13% red, and 12% blue)?

In [12]:

#get the skittles
size = 10000
skittles = getSkittles(size, 'array')

# summarize the counts in a frequency table
unique_elements, counts_elements = np.unique(skittles, return_counts=True)
print("\nFrequency of each color:")
for color,freq in zip(unique_elements,counts_elements):
    print(f"  {color}: {freq} ({round((freq/size) * 100)}%) ")

# summarize the results in a par graph
plt.figure(figsize=[8,4])
plt.bar(unique_elements, counts_elements)
plt.xlabel('Skittles Color')
plt.ylabel('Frequency')
plt.xticks(unique_elements, unique_elements, rotation=30)
plt.show()

Out[12]:

Frequency of each color:
  blue: 1174 (12%) 
  green: 2116 (21%) 
  orange: 1846 (18%) 
  purple: 1999 (20%) 
  red: 1320 (13%) 
  yellow: 1545 (15%) 

Using Arrays for Continuous Distributions

Using arrays when generating data from continuous distributions is even easier, as most of them are built-in to Numpy. In the next cell, we generate 40 observations from a normal distribution with mean of 20 and standard deviation of 4.

You can change the simulation size to something larger - say 1000, and you should get a graph that shows a normal distribution (peaked at the mean, trailing off at both ends).

In [13]:

# set the size of the simulation
SimSize = 40

# generate multiple observations from a normal distribution with mean 20 and std dev 4
a = np.random.normal(loc=20,scale=4,size=SimSize)

# show beginning of observations

print(f"First 20 observations:")
print(', '.join(f'{x:.1f}' for x in a[:20]) )

# summarize with some descriptive statistics
print(f"\nThe average is {np.mean(a):2.1f}")
print(f"The standard deviation is {np.std(a):2.1f}")

plt.figure(figsize = (8,5))
# display Winnings in a histogram
plt.hist(a,color="olive")
plt.ylabel('Frequency')
plt.xlabel('Normal(20,4) Variable')

Out[13]:

First 20 observations:
18.1, 19.4, 18.3, 18.9, 20.5, 15.9, 18.8, 14.5, 22.4, 26.3, 13.4, 22.3, 15.1, 16.0, 18.1, 23.3, 17.3, 17.7, 24.2, 17.7

The average is 19.7
The standard deviation is 3.7

Text(0.5, 0, 'Normal(20,4) Variable')

Random Number Seeds

You should have noticed when running the cells above multiple times that the results vary each time the random numbers are generated. A simulation can be reproduced exactly by specifying a random number seed so that the (pseudo-)random numbers generated will have the same initial value to start the process of random number generation.

Run the following cell a few times to see if the results vary. Change the random number seed to a different value and run the cell again. Just pick any number you want for the seed. Did the results change when the random number seed changed?

Go back to the original seed of 5 and run the cell again. What happens? Do you recognize the result?

In [14]:

np.random.seed(seed=5)

# set the size of the simulation
SimSize = 40

# generate multiple observations from a normal distribution with mean 20 and std dev 4
a = np.random.normal(loc=20,scale=4,size=SimSize)

# comment this out for large simulation sizes
print("First 20 observations:")
print(', '.join(f'{x:.1f}' for x in a[:20]))

# summarize with some descriptive statistics
print("\nThe average is {:2.1f}".format(np.mean(a)))
print("The standard deviation is {:2.1f}".format(np.std(a)))

plt.figure(figsize=(8,5))
# display Winnings in a histogram
plt.hist(a,color="orange")
plt.ylabel('Frequency')
plt.xlabel('Normal(20,4) Variable')

Out[14]:

First 20 observations:
21.8, 18.7, 29.7, 19.0, 20.4, 26.3, 16.4, 17.6, 20.8, 18.7, 15.2, 19.2, 18.6, 22.4, 13.3, 17.2, 24.6, 27.4, 14.0, 22.6

The average is 20.2
The standard deviation is 3.6

Text(0.5, 0, 'Normal(20,4) Variable')

Self-Assessment: Discrete-Event Simulation

Textbook Problem 20.1

20.1-1. Use the uniform random numbers in cells C13:C18 of Fig. 20.1 to generate six random observations for each of the following situations. (hint: don't use the computer for this one, just use the random numbers printed in cells C13:C18 of Fig. 20.1 on p. 896)

(a) Throwing an unbiased coin.

(b) A baseball pitcher who throws a strike 60 percent of the time and a ball 40 percent of the time.

(c) The color of a traffic light found by a randomly arriving car when it is green 40 percent of the time, yellow 10 percent of the time, and red 50 percent of the time.

Self-Assessment: Discrete-Event Simulation 2

Textbook Problem 20.3 (a,b,e)

Jessica Williams, manager of Kitchen Appliances for the Midtown Department Store, feels that her inventory levels of stoves have been running higher than necessary. Before revising the inventory policy for stoves, she records the number sold each day over a period of 25 days, as summarized below.

$\begin{array}{l|ccccc} \hline \text{Number sold} & 2 & 3 & 4 & 5 & 6 \\ \hline \text{Number of days} & 4 & 7 & 8 & 5 & 1 \\ \hline \end{array}$

(a) Use these data to estimate the probability distribution of daily sales.

(b) Calculate the mean of the distribution obtained in part (a).

(e) Formulate a model in Python for performing a simulation of the daily sales. Perform 300 replications and obtain the average of the sales over the 300 simulated days. This can be done with a loop or with numpy.random.choice. (Use np.random.seed(seed=222) and let's see if we all get the same answer.)

Note: this example shows a simulation for a situation where the outcome of interest, the mean sales in this case, can be computed analytically and so there is really no need to simulate it. The analytical solution is a constant, whereas the simulation has inherent variability. Simulation is best employed in situation where the analytical solution is intractable or at least so difficult that simulation is worthwhile.

Examples

Coin Flip Simulation (from Textbook)

In the Coin-Flipping Game simulation example on pp. 894-899 in the Hillier textbook each play of the game involves repeatedly flipping an unbiased coin until the difference between the number of heads tossed and the number of tails is 3. If you decide to play the game, you are required to pay $1 for each flip of the coin. You are not allowed to quit during a play of the game. You receive $8 at the end of each play of the game.

This situation may have an analytical solution, but it would take considerable work to get through it. In this case, good insight into the behavior of this game can be gleaned from a fairly simple simulation. The textbook authors discuss the Excel implementation of this simulation in detail. The same simulation is constructed in Python in the cell below.

Run the simulation a few times to see the variability in the results. Notice the types of summaries that can be made of the simulation results: descriptive statistics like mean, standard deviation, minimum, and maximum, as well as graphical summaries like histograms or boxlots.

In this textbook, they commonly ask for 1000 replications in a given simulation, but with today's computing power you could easily increase that number to 10,000 or 100,000 or more depending on what you have to work with. In simulation, a bigger the simulation size means more precise simulation results (i.e. the results are closer to the actual underlying values).

In [15]:

# Coin-Flipping Game Simulation
# 1. Each play of the game involves repeatedly flipping an
#    unbiased coin until the difference between the number of
#    heads tossed and the number of tails is 3.
# 2. If you decide to play the game, you are required to pay
#    $1 for each flip of the coin. You are not allowed to quit
#    during a play of the game.
# 3. You receive $8 at the end of each play of the game.

# np.random.seed(seed=123) # optional - specify a seed to reproduce the simulation exactly
SimSize = 1000  # set number of trials in the simulation

RequiredDifference = 3
CashAtEnd = 8

Winnings = np.zeros(SimSize)
FlipTotal = np.zeros(SimSize)

for nn in range(SimSize):

    # Simulate one round of the coin-flipping game
    heads = 0  # number of heads observed
    tails = 0  # number of tails
    Difference = 0
    Flips = 0
    while Difference < RequiredDifference:
        if np.random.uniform(low=0, high=1, size=1) < 0.5:
            heads += 1  # add one to the total heads
        else:
            tails += 1  # add one to the total tails

        Difference = abs(heads - tails)
        Flips = heads + tails

    # Summarize results for each round of the coin flipping game
    Winnings[nn] = CashAtEnd - Flips

    # capture the number of flips
    FlipTotal[nn] = Flips

# Summarize results for the simulation
print("The average amount won is " + dollar_print(np.mean(Winnings)))
print("The standard deviation of the winnings is " + dollar_print(np.std(Winnings)))
print("The minimum amount won is " + dollar_print(np.min(Winnings)))
print("The maximum amount won is " + dollar_print(np.max(Winnings)))

p = np.sum(Winnings > 0) / SimSize
print(f"\nThe probability of winning money is {p:.3f}.")

print(f"\nThe average number of flips is {np.mean(FlipTotal):.1f}")
print(f"The minimum number of flips is {np.min(FlipTotal):.0f}")
print(f"The maximum number of flips is {np.max(FlipTotal):.0f}")

plt.figure(figsize=(10,8))

# display Winnings in a histogram
plt.subplot(211)
plt.hist(Winnings, color="g")
plt.ylabel('Frequency')

# display Winnings in a boxplot
plt.subplot(212)
plt.boxplot(Winnings, vert=False)
plt.xlabel('Winnings')

Out[15]:

The average amount won is -$0.51
The standard deviation of the winnings is $6.23
The minimum amount won is -$31.00
The maximum amount won is $5.00

The probability of winning money is 0.598.

The average number of flips is 8.5
The minimum number of flips is 3
The maximum number of flips is 39

Text(0.5, 0, 'Winnings')

Textbook Problem 20.6-3

The Avery Co. factory has been having a maintenance problem with the control panel for one of its production processes. This control panel contains four identical electromechanical relays that have been the cause of the trouble. The problem is that the relays fail fairly frequently, thereby forcing the control panel (and the production process it controls) to be shut down while a replacement is made. The current practice is to replace the relays only when they fail.

The average total cost of doing this has been $3.19 per hour. To attempt to reduce this cost, a proposal has been made to replace all four relays whenever any one of them fails to reduce the frequency with which the control panel must be shut down. Would this actually reduce the cost?

The pertinent data are the following. For each relay, the operating time until failure has approximately a uniform distribution from 1,000 to 2,000 hours. The control panel must be shut down for one hour to replace one relay or for two hours to replace all four relays. The total cost associated with shutting down the control panel and replacing relays is $1,000 per hour plus $200 for each new relay.

Use simulation on a spreadsheet to evaluate the cost of the proposal and compare it to the current practice. Perform 1,000 trials (where the end of each trial coincides with the end of a shutdown of the control panel) and determine the average cost per hour.

In [16]:

# Simulation for Problem 20.6-3

# set.seed(8) # optional - specify a seed to reproduce the simulation exactly
SimSize=1000 # set number of trials in the simulation

# Generate rondom observations for the 4 relays for the entire simultion
relay1 = np.random.uniform(low=1000, high=2000, size=SimSize)
relay2 = np.random.uniform(low=1000, high=2000, size=SimSize)
relay3 = np.random.uniform(low=1000, high=2000, size=SimSize)
relay4 = np.random.uniform(low=1000, high=2000, size=SimSize)

# create an array of failure times
failures = np.array([relay1,relay2,relay3,relay4])

# find the min of each column, which is the time to first failure (TTFF)
TTFF = np.amin(failures,axis=0) 

# time to replace all 4 relays is 2 hours, a constant here
RepairTime = 2

# compure total time, time to failure plus the repair time
TotalTime = TTFF + RepairTime

# compute the total cost $1000/hr for 2 hours, $200/relay for 4 relays, constant
TotalCost = 1000*2+200*4

# compute the cost per hour
CostPerHour = TotalCost/TotalTime

# Summarize results for the simulation
# we're using dollar_print() defined in the previous cel

print(f"The average cost is " + dollar_print(np.mean(CostPerHour)))
print(f"The standard deviation of the cost is " + dollar_print(np.std(CostPerHour)))

# compare CostPerHour to the current cost of $3.19
plt.figure(figsize=(8,5))

# display Winnings in a histogram
plt.hist(CostPerHour, color="sienna")
plt.ylabel('Frequency')
plt.xlabel('Cost Per Hour')

Out[16]:

The average cost is $2.35
The standard deviation of the cost is $0.30

Text(0.5, 0, 'Cost Per Hour')

The average cost per hour is near $2.37 per hour, which is well below the current average of $3.19 per hour. In fact, even the maximum cost per hour of the 1000 replications in the simulation is less than the $3.19 per hour, so clearly the policy of replacing all four relays when the any one of them fails is more cost-effective.

Further Analysis of Simulation Results

Parameter Analysis and Trend Charts

The Freddy the Newsboy simulation on pp. 923-939 of the Hillier textbook is constructed below in the next Python code cell.

Freddie the Newsboy Simulation

In [17]:

# Freddie the Newsboy Simulation - LOOP version

# np.random.seed(seed=123) # optional - specify a seed to reproduce the simulation exactly
sim_size = 10000  # set number of trials in the simulation

# establish the constants in the simulation
unit_sale_price = 2.50
unit_purchase_cost = 1.50
unit_salvage_value = 0.50

def profit_simulator(qty):
    
    # initialize array for storing profits (the value doesn't matter since we'll replace it later)
    profit_sim = np.zeros(sim_size)
    
    for i in range(sim_size):
        
        # generate random data for this iteration of the problem
        demand = np.random.randint( low = 40, high = 70 )
        
        # compute the profit based on the random data
        sales_revenue = unit_sale_price * min( qty, demand )
        purchasing_cost = unit_purchase_cost * qty
        salvage_value = unit_salvage_value * max( qty - demand, 0 )
        profit_sim[i] = sales_revenue - purchasing_cost + salvage_value

    return profit_sim

profit = profit_simulator(60)

# summarize results (using dollar_print defined two cells ago)
print(f"The average profit is " + dollar_print(np.mean(profit)) )
print(f"The standard deviation of the profit is " + dollar_print(np.std(profit)))

# display Winnings in a histogram
fig, ax = plt.subplots(figsize=(8, 5));
ax.hist(profit);
plt.ylabel('Frequency');
plt.xlabel('Profit');

Out[17]:

The average profit is $45.80
The standard deviation of the profit is $13.67

In [18]:

# Freddie the Newsboy Simulation - VECTORIZED version

# np.random.seed(seed=123) # optional - specify a seed to reproduce the simulation exactly
sim_size = 10000  # set number of trials in the simulation

# establish the constants in the simulation
unit_sale_price = 2.50
unit_purchase_cost = 1.50
unit_salvage_value = 0.50

def profit_simulator(qty):
    order_quantity = np.full(sim_size, qty)
    demand = np.random.randint(low=40, high=70, size=sim_size)
    sales_revenue = unit_sale_price * np.minimum(order_quantity, demand)
    purchasing_cost = unit_purchase_cost * order_quantity
    salvage_value = unit_salvage_value * np.maximum(order_quantity - demand,
                                                    np.zeros(sim_size))
    profit_sim = sales_revenue - purchasing_cost + salvage_value
    return (profit_sim)

profit = profit_simulator(60)

# summarize results (using dollar_print defined two cells ago)
print(f"The average profit is " + dollar_print(np.mean(profit)) )
print(f"The standard deviation of the profit is " + dollar_print(np.std(profit)))

# display Winnings in a histogram
fig, ax = plt.subplots(figsize=(8, 5));
ax.hist(profit);
plt.ylabel('Frequency');
plt.xlabel('Profit');

Out[18]:

The average profit is $46.05
The standard deviation of the profit is $13.61

Freddie the Newsboy Simulation with Parameter Analysis for Order Quantity

In [19]:

# Freddie the Newsboy Simulation with Parameter Analysis for Order Quantity
# simulation constants and sim_size defined in previous cell

def profit_stats(qty):
    # input: qty = order quantity
    # output: [mean profit, 5th percentile, 95th percentile]
    profit = profit_simulator(qty)
    mean_profit = np.mean(profit)
    [lower, upper] = np.quantile(profit, [.05, .95])
    return ((mean_profit, lower, upper))

order_quantity = np.arange(40, 71, 5)  # need to go over 70 to include 70
num_quantities = len(order_quantity)
mean_profit = np.zeros(num_quantities)
pctl_05 = np.zeros(num_quantities)
pctl_95 = np.zeros(num_quantities)

# collect simulation results for all order quantites
for i in range(num_quantities):
    mean_profit[i], pctl_05[i], pctl_95[i] = profit_stats(order_quantity[i])

# put results in dataframe for nice display and plotting in next cell
df = pd.DataFrame({
    'OrderQuantity': order_quantity,
    'MeanProfit': mean_profit,
    'Pctl_05': pctl_05,
    'Pctl_95': pctl_95
})
display(df)

Out[19]:

In [20]:

# plot results and trend chart showing middle 90% of simulated profits for each order quantity
fig, ax = plt.subplots(figsize=(6, 5))
ax.plot('OrderQuantity','MeanProfit',data=df, linestyle='-', marker='o');
plt.xlabel('Order Quantity', fontsize=12);
plt.ylabel('Mean Profit', fontsize=12);
plt.title("Trend Chart");
ax.fill_between('OrderQuantity','Pctl_05','Pctl_95',data=df,color="#b9cfe7", edgecolor="#b9cfe7");

Out[20]:

In [0]:

Optimization Within a Simulation

Freddie the Newsboy: find maximum average profit

In [21]:

# find maximum average profit
# results will vary slightly on each run

# to maximize we minimize the negative profit
def neg_mean_profit( qty ):
    return( -profit_stats(qty)[0] ) # simulate stats and return just mean

result = minimize_scalar(neg_mean_profit,bounds=[40,70],method='bounded')

# summarize results
print("The maximum average profit is approximately " + dollar_print(-result['fun']))
print(f"The optimal order quantity is approximately {result['x']:.0f}")

Out[21]:

The maximum average profit is approximately $47.16
The optimal order quantity is approximately 54

More Self-Assessment

Self-Assessment: Simulation Results

True or False: Simulations provide exact results just like analytical methods.

Self-Assessment: Simulating Outcomes

An algorithm that produces sequences of numbers that follow a specified probability distribution and possess the appearance of randomness is a

a. warm-up period.

b. simulation clock.

c. financial risk analysis.

d. random number generator.

e. continuous simulation.

In [22]:

import numpy as np
np.arange(3,12.1,.5)

Out[22]:

array([ 3. ,  3.5,  4. ,  4.5,  5. ,  5.5,  6. ,  6.5,  7. ,  7.5,  8. ,
        8.5,  9. ,  9.5, 10. , 10.5, 11. , 11.5, 12. ])

In [23]:

np.linspace(3,12,10,endpoint=True)

Out[23]:

array([ 3.,  4.,  5.,  6.,  7.,  8.,  9., 10., 11., 12.])

Self-Assessment: Random Variables in Simulation

True or False: If the distribution of a random variable in a simulation is unknown, then a normal distribution should always be used.

Self-Assessment: Simulation Reproducibility

Simulation results can be reproduced exactly by running the simulation again using the same

a. seed.

b. computer.

c. plant.

d. method of random number generation.

e. simulation clock.

In [0]:

In [0]:

In [0]: