Path: blob/main/Lessons/Lesson 10 - Simulation/Lesson_10.ipynb
871 views
Lesson 10: Simulation
Simulation Basics
Simulation Basics (video)
There is just one video for this lesson that gives an overview of the topic:
Simulation: What is it?
Simulation is using a computer to imitate the operation of a process or system in order to estimate its actual performance.
Components of a simulation model:
a definition of the state of the system
identification of the possible states of the system
identification of possible events that could change the value of the system
methods for randomly generating events
a way to relate the state transitions to the events that brought them about
Motivation for Simulation
Simulation is used for
mathematical models that are not tractable
analyzing stochastic (random/probabilistic) systems that operate indefinitely to gain insight into the behavior of the system over time
analyzing and designing systems that would otherwise be very time consuming and/or expensive
experimenting with a system without actually implementing it
Limitations of Simulation
Simulations have inherent variability, so they provide only statistical estimates rather than exact results (analytical methods provide exact results when tractable)
Simulations compare various alternative without necessarily finding the optimal one
Even with today's computers, some complex simulations still require large amounts of computing time and expense in programming and analysis
Simulations provide only numerical measures about the performance of a system and cause-and-effect relationships are not always evident
Simulations results apply only to the conditions that were simulated
Sensitivity analysis can be unwieldy in large simulations
Self-Assessment: Simulation
True or False: Simulation is using a computer to imitate the operation of a process or system in order to estimate its actual performance.
Self-Assessment: Simulating a System
True or False: Simulation provides a way of experimenting with proposed systems or policies without actually implementing them.
Self Assessment: Simulation and Time
True or False: Simulation is used for analyzing stochastic systems that operate indefinitely to gain insight into the behavior of the system over time.
Self-Assessment: Generalizing Simulation Results
True or False: Simulations results should be generalized beyond the conditions that were simulated.
Self-Assessment: Type of Simulation
(Hint: Review the subsection titled "Discrete-Event versus Continuous Simulation" in Section 20.1 before answering this one.)
Which of the following would be modeled by discrete-even simulation? Select all that apply.
a. The number of products sold over time.
b. The air pressure in submarine during its time under the water.
c. The arrival of customers to a queue.
d. Whether is rains or not in a day over a 10-year period.
e. The temperature of an engine over a period of operation.
Formulating a Simulation Process
In order to construct a simulation, the following questions may help guide the formulation of the process that is being simulated.
What variables are involved?
Which variables are discrete and which are ones are continuous?
How do variables relate to each other?
What formulas or relationships are needed?
Are some variables dependent on the outcomes of others?
How do the various events and outcomes relate chronologically?
How will the passage of time be marked?
How will the outcome variables be captured and summarized?
How will the simulation performance be evaluated? Numerical summaries? Probabilities?
Simulations come in all shapes and sizes. It takes a lot of creativity, programming skills, and ability to model processes with random variables and functions and equations and put it all together so that the important statistics can be captured on each simulation run and summarize those in some way so that yo can gain insight into the process or system that you are simulating.
Basic Simulation Tools
Generating Discrete Events
Categorical Outcomes
Suppose there is a very large bowl of Skittles where 20% are purple, 16% yellow, 21% green, 18% orange, 13% red, and 12% blue. Also suppose we are to select one Skittle randomly from the bowl and make note of the color.
The cell below contains code to simulate this random outcome. You can re-run the cell as many times as you like to see different outcomes. In a later cell we will address how to capture a series of outputs.
That entire if_else
structure can be replaced by one command using numpy.random.choice
. The code to do this is below:
Random Integers
Suppose uniformly distributed random integers are needed, such as the order quantity or demand in the Freddie the Newsboy simulation (see Hillier p. 923).
The following code will generate a uniformly distributed random integer in the specified range. Again, the cell can be executed as many times as you like to see different outcomes, but they are not being stored. In a later cell we will address how to capture a series of outputs.
Generating Continuous Outcomes
There are many continuous distributions to choose from. The Hillier textbook sections 20.3-20.4 discuss some of what is going on "behind the scenes" when you use computer code to generate random numbers (also called pseudo-random numbers because of their reproducibility with random number seeds - more on that below).
Look under Distributions on the Numpy Manual for a list of options.
In this course, the focus will be on implementation rather than the mathematics behind random number generation. A few common options are in the cell that follows.
You may not have heard of some of these probability distributions and that's OK. If you find yourself in the situtation where you need to
Note: Be aware that many distributions, including the exponential, Weibull, and lognormal, can have different parameterizations so be sure to consult the documentation of the software you are using to be sure of what you are generating.
We've included a separate notebook called lognormal.ipynb that you can use to understand how to set parameters for the lognormal distribution.
Generating Replications
Using Conditions (for, while, if, elif, else) vs. Using Numpy Arrays
Let's generate 1000 replicates of the Skittle selection. We'll do this two ways - with a for loop and with a Numpy array. We've written a function that takes in the simulation size and a boolean (defaulted to true) to choose whether to run looping code or generate the data using Numpy arrays. We've added some timer code to demonstrate the difference between using loops and arrays in terms of speed.
Run the code in the next cell several times. Which method is faster?
In interpreted languages like Python and R it is generally faster to avoid loops when possible.
In the next cell, we generate 10000 skittles and summarize the result with a frequency table and bar graph. Are the frequencies about what you'd expect, given our original probabilities (20% purple, 16% yellow, 21% green, 18% orange, 13% red, and 12% blue)?
Using Arrays for Continuous Distributions
Using arrays when generating data from continuous distributions is even easier, as most of them are built-in to Numpy. In the next cell, we generate 40 observations from a normal distribution with mean of 20 and standard deviation of 4.
You can change the simulation size to something larger - say 1000, and you should get a graph that shows a normal distribution (peaked at the mean, trailing off at both ends).
Random Number Seeds
You should have noticed when running the cells above multiple times that the results vary each time the random numbers are generated. A simulation can be reproduced exactly by specifying a random number seed so that the (pseudo-)random numbers generated will have the same initial value to start the process of random number generation.
Run the following cell a few times to see if the results vary. Change the random number seed to a different value and run the cell again. Just pick any number you want for the seed. Did the results change when the random number seed changed?
Go back to the original seed of 5 and run the cell again. What happens? Do you recognize the result?
Self-Assessment: Discrete-Event Simulation
Textbook Problem 20.1
20.1-1. Use the uniform random numbers in cells C13:C18 of Fig. 20.1 to generate six random observations for each of the following situations. (hint: don't use the computer for this one, just use the random numbers printed in cells C13:C18 of Fig. 20.1 on p. 896)
(a) Throwing an unbiased coin.
(b) A baseball pitcher who throws a strike 60 percent of the time and a ball 40 percent of the time.
(c) The color of a traffic light found by a randomly arriving car when it is green 40 percent of the time, yellow 10 percent of the time, and red 50 percent of the time.
Self-Assessment: Discrete-Event Simulation 2
Textbook Problem 20.3 (a,b,e)
Jessica Williams, manager of Kitchen Appliances for the Midtown Department Store, feels that her inventory levels of stoves have been running higher than necessary. Before revising the inventory policy for stoves, she records the number sold each day over a period of 25 days, as summarized below.
(a) Use these data to estimate the probability distribution of daily sales.
(b) Calculate the mean of the distribution obtained in part (a).
(e) Formulate a model in Python for performing a simulation of the daily sales. Perform 300 replications and obtain the average of the sales over the 300 simulated days. This can be done with a loop or with numpy.random.choice
. (Use np.random.seed(seed=222) and let's see if we all get the same answer.)
Note: this example shows a simulation for a situation where the outcome of interest, the mean sales in this case, can be computed analytically and so there is really no need to simulate it. The analytical solution is a constant, whereas the simulation has inherent variability. Simulation is best employed in situation where the analytical solution is intractable or at least so difficult that simulation is worthwhile.
Examples
Coin Flip Simulation (from Textbook)
In the Coin-Flipping Game simulation example on pp. 894-899 in the Hillier textbook each play of the game involves repeatedly flipping an unbiased coin until the difference between the number of heads tossed and the number of tails is 3. If you decide to play the game, you are required to pay $1 for each flip of the coin. You are not allowed to quit during a play of the game. You receive $8 at the end of each play of the game.
This situation may have an analytical solution, but it would take considerable work to get through it. In this case, good insight into the behavior of this game can be gleaned from a fairly simple simulation. The textbook authors discuss the Excel implementation of this simulation in detail. The same simulation is constructed in Python in the cell below.
Run the simulation a few times to see the variability in the results. Notice the types of summaries that can be made of the simulation results: descriptive statistics like mean, standard deviation, minimum, and maximum, as well as graphical summaries like histograms or boxlots.
In this textbook, they commonly ask for 1000 replications in a given simulation, but with today's computing power you could easily increase that number to 10,000 or 100,000 or more depending on what you have to work with. In simulation, a bigger the simulation size means more precise simulation results (i.e. the results are closer to the actual underlying values).
Textbook Problem 20.6-3
The Avery Co. factory has been having a maintenance problem with the control panel for one of its production processes. This control panel contains four identical electromechanical relays that have been the cause of the trouble. The problem is that the relays fail fairly frequently, thereby forcing the control panel (and the production process it controls) to be shut down while a replacement is made. The current practice is to replace the relays only when they fail.
The average total cost of doing this has been $3.19 per hour. To attempt to reduce this cost, a proposal has been made to replace all four relays whenever any one of them fails to reduce the frequency with which the control panel must be shut down. Would this actually reduce the cost?
The pertinent data are the following. For each relay, the operating time until failure has approximately a uniform distribution from 1,000 to 2,000 hours. The control panel must be shut down for one hour to replace one relay or for two hours to replace all four relays. The total cost associated with shutting down the control panel and replacing relays is $1,000 per hour plus $200 for each new relay.
Use simulation on a spreadsheet to evaluate the cost of the proposal and compare it to the current practice. Perform 1,000 trials (where the end of each trial coincides with the end of a shutdown of the control panel) and determine the average cost per hour.
The average cost per hour is near $2.37 per hour, which is well below the current average of $3.19 per hour. In fact, even the maximum cost per hour of the 1000 replications in the simulation is less than the $3.19 per hour, so clearly the policy of replacing all four relays when the any one of them fails is more cost-effective.
Further Analysis of Simulation Results
Parameter Analysis and Trend Charts
The Freddy the Newsboy simulation on pp. 923-939 of the Hillier textbook is constructed below in the next Python code cell.
Freddie the Newsboy Simulation
Freddie the Newsboy Simulation with Parameter Analysis for Order Quantity
Optimization Within a Simulation
Freddie the Newsboy: find maximum average profit
More Self-Assessment
Self-Assessment: Simulation Results
True or False: Simulations provide exact results just like analytical methods.
Self-Assessment: Simulating Outcomes
An algorithm that produces sequences of numbers that follow a specified probability distribution and possess the appearance of randomness is a
a. warm-up period.
b. simulation clock.
c. financial risk analysis.
d. random number generator.
e. continuous simulation.
Self-Assessment: Random Variables in Simulation
True or False: If the distribution of a random variable in a simulation is unknown, then a normal distribution should always be used.
Self-Assessment: Simulation Reproducibility
Simulation results can be reproduced exactly by running the simulation again using the same
a. seed.
b. computer.
c. plant.
d. method of random number generation.
e. simulation clock.