Lecture 13 – Simulation
DSC 10, Fall 2022
Announcements
Lab 4 is due tomorrow at 11:59PM.
Homework 4 is due on Tuesday 10/25 at 11:59PM.
The Midterm Project is due Tuesday 11/1 at 11:59PM. Use pair programming 👯. See this post for clarifications.
10+ more weekly office hours will be added starting next week!
Midterm Exam details
The Midterm Exam is in one week, on Friday, 10/28 during your assigned lecture.
It will be a 50 minute, on-paper, closed-notes exam. We will provide you with first 2 pages of the reference sheet.
It will consist of multiple choice, fill-in-the-blank code, and short answer questions.
Bring a pen/pencil/eraser and a photo ID. No scantron or blue book needed.
No calculator, computers, notes, or other aids are allowed.
Seating assignments and alternate details are coming next week.
Today's material is on the midterm; next week's is not.
Agenda
Simulation.
Example: What's the probability of getting 60 or more heads if we flip 100 coins?
Example: The "Monty Hall" Problem.
Simulation
Simulation
What is the probability of getting 60 or more heads if we flip 100 coins?
While we could calculate it by hand (and will learn how to in future courses), we can also approximate it using the computer:
Figure out how to do one experiment (i.e., flip 100 coins).
Run the experiment a bunch of times.
Find the proportion of experiments in which the number of heads was 60 or more.
This is how we'll use simulation – to approximate a probability through computation.
The techniques we will introduce in today's lecture will appear in almost every lecture for the remainder of the quarter!
Making a random choice
To simulate, we need a way to perform a random experiment on the computer (e.g. flipping a coin, rolling a die).
A helpful function is
np.random.choice(options).The input,
options, is a list or array to choose from.The output is a random element in
options. By default, all elements are equally likely to be chosen.
Making multiple random choices
np.random.choice(options, n) will return an array of n randomly selected elements from options.
With replacement vs. without replacement
By default,
np.random.choiceselects with replacement.That is, after making a selection, that option is still available.
e.g. if every time you draw a marble from a bag, you put it back.
If an option can only be selected once, select without replacement by specifying
replace=False.e.g. if every time you draw a marble from a bag, you do not put it back.
Example: What's the probability of getting 60 or more heads if we flip 100 coins?
Flipping coins
What is the probability of getting 60 or more heads if we flip 100 coins?
Strategy:
Figure out how to do one experiment (i.e., flip 100 coins).
Run the experiment a bunch of times.
Find the proportion of experiments in which the number of heads was 60 or more.
Step 1: Figure out how to do one experiment
Use
np.random.choiceto flip 100 coins.Use
np.count_nonzeroto count the number of heads.np.count_nonzero(array)returns the number of entries inarraythat areTrue.
Q: Why is it called
count_nonzero?A: In Python,
True == 1andFalse == 0, so counting the non-zero elements counts the number ofTrues.
Aside: Putting the experiment in a function
It's a good idea to do this, as it makes it easier to run the experiment repeatedly.
Step 2: Repeat the experiment
How do we run the same code many times? Using a
for-loop!Each time we run the experiment, we'll need to store the results in an array.
To do this, we'll use
np.append!
Step 2: Repeat the experiment
Step 3: Find the proportion of experiments in which the number of heads was 60 or more
This is quite close to the true theoretical answer!
Visualizing the distribution
This histogram describes the distribution of the number of heads in each experiment.
Now we see another reason to use density histograms.
Using density means that areas approximate probabilities.
Example: The "Monty Hall" Problem
The "Monty Hall" Problem
Suppose you’re on a game show, and you’re given the choice of three doors: behind one door is a car 🚗; behind the others, goats 🐐🐐.
You pick a door, say No. 2, and the host, who knows what’s behind the doors, opens another door, say No. 3, which has a goat.
He then says to you, “Do you want to pick door No. 1?”
Question: Is it to your advantage to switch your choice?
(The question was originally posed in Parade magazine’s "Ask Marilyn" column. It is called the "Monty Hall problem" because Monty Hall was the host of the game show in question, "Let's Make a Deal.")
Concept Check ✅ – Answer at cc.dsc10.com
You originally selected door #2. The host reveals door #3 to have a goat behind it. What should you do?
A. Might as well stick with door number #2; it has just as high a chance of winning as door #1. It doesn't matter whether you switch or not.
B. Switch to door number #1; it has a higher chance of winning than door #2.
Let's see 🤔
We'll use simulation to compute:
The probability of winning if we switch.
The probability of winning if we stay.
This is just 1 - (probability of winning if we switch).
Whichever strategy has the higher probability of winning is better!
Time to simulate!
Let's simulate the Monty Hall problem many times to estimate the probability of winning.
Figure out how to simulate one game of Monty Hall.
Play the game many times.
Count the proportion of wins for each strategy (stay or switch).
Step 1: Simulate a single game
When a contestant picks their door, there are three equally-likely outcomes:
Car.
Goat #1.
Goat #2.
Step 1: Simulate a single game
Suppose we can see what is behind their door (but the contestant can't).
If it is a car, they will win if they stay.
If it is a goat, they will win if they switch.
Step 1: Simulate a single game
Step 1: Simulate a single game
Let's turn this into a function to make it easier to repeat:
Step 2: Play the game many times
We should save the winning strategies. To do so, let's use np.append:
Step 3: Count the proportion of wins for each strategy (stay or switch)
These are quite close to the true probabilities of winning per strategy ( for switch, for stay).
Conclusion: it is better to switch.
Alternate implementation
Looking back at our implementation, we kept track of the winning strategy in each experiment.
However, all we really needed to keep track of was the number of experiments in which the winning strategy was
'Switch'(or'Stay').
Idea: Keep a tally of the number of times the winning strategy was
'Switch'. That is, initializeswitch_countto 0, and add 1 to it each time the winning strategy is'Switch'.
No arrays needed! This strategy won't always work; it depends on the goal of the simulation.
Marilyn vos Savant's column
- vos Savant asked the question in Parade magazine.
- She stated the correct answer: switch.
- She received over 10,000 letters in disagreement, including over 1,000 letters from people with Ph.D.s.
Summary, next time
Simulation finds probabilities
Calculating probabilities is important, but can be hard!
You'll learn plenty of formulas in future DSC classes.
Simulation lets us find probabilities through computing instead of math.
Many real-world scenarios are complicated.
Simulation is much easier than math in many of these cases.
The simulation "recipe"
To estimate the probability of an event through simulation:
Make a function that runs the experiment once.
Run that function many, many times (usually 10000) with a
for-loop, and save the results in an array withnp.append.Compute the proportion of times the event occurs using
np.count_nonzero.
What's next?
In the next class, we will start talking about sampling.
Key idea: We want to learn something about a large population (e.g. all undergraduates at UCSD). However, it's far too difficult to survey everyone. If we collect a sample, what can we infer about the larger population?
Today's material is on the midterm; next week's is not.