CoCalc Logo Icon
StoreFeaturesDocsShareSupportNewsAboutSign UpSign In

CoCalc Runs Your Jupyter Notebooks and Linux Terminals using Powerful CPUs and GPUs!

| Download
Views: 39
Visibility: Unlisted (only visible to those who know the link)
Image: ubuntu2204
Kernel: Python 3 (system-wide)

Section One; Random Numbers

The use of random numbers in simulations has become an essential tool in various fields such as engineering, physics, economics, and computer science. Random numbers help generate random events, allowing simulations to model complex scenarios that cannot be replicated in real life. In this essay, I will discuss the applications of random numbers in simulations and their benefits.

Random numbers are a key component of Monte Carlo simulations, a technique used to model complex scenarios and estimate probabilities in a wide range of fields. Monte Carlo simulations use random numbers to generate a large number of possible outcomes, which can be used to estimate the likelihood of different scenarios and analyze risk.

A flip of a coin is a good first look at random, the result of the toss is either a head or tail. In a 'fair' coin the chance should be equal. For ease of programming we will assign 0 to a head and 1 to a tail.

Much of the computing in Python is done in computer programs that have been written and placed in packages. These generally make the computer code more readable and also allow powerful tools that don't need to be rewritten. So first we will import some useful packages

# In this cell we import computer code that we will use later # First we import the packages for plotting and for generating a random number # packages contain computer code to perform specific tasks import matplotlib.pyplot as plt # import the plotting routines <https://matplotlib.org/> import random # import the routine for generating random numbers <https://docs.python.org/3/library/random.html> import numpy as np # numpy is a Python library for scientific computing see <https://numpy.org> from scipy.stats import gaussian_kde # Gaussian Kernel Density Estimation (explained later) import seaborn as sns # another plot library
# The computer program introduces the "for" statement and the "if" statement # now 'toss'a coin ten times for i in range (0, 10): # the for statement executes the indented statements and increments 'i' from 0 to 9, so, ten times. coin = random.randint(0,1) # this creates, randomly, a 0 or 1. randint() is a function of the random library and stores it in the variable 'coin' if coin == 0: # Is the number in 'coin' a zero? print('head') # if so print'head' else: # if not then print('tail') # print 'tail'
head head tail head head tail tail head tail tail
# now let's toss 1000 times and see how often we get each # first define a function that will perform a single coin toss def toss(): # define a function called toss that contains the computer code from the previous cell coin = random.randint(0,1) # create a random number as in the previous cell if coin == 0: # test it, the statement 'coin equals zero, and return 0 # return a 0 if 'coin' contains a zero else: # or return 1 # return a 1 if 'coin' contains a one # because the value of coin is either 0 or 1, that value could be returned, that is, 'return coin' without using the 'if' statement.
# First, check the function from the previous cell to be sure it works. for i in range (0, 10): # once again we execute the test and print ten times result = toss() # but here we use the coin toss function defined in the previous cell print('result = ', result) if result: # Here we use the fact that a zero is treated as false and a non-zero is true. print('head') # if the result is one or true print 'head' else: # or if it is zero or false then print('tail') # print 'tail' # initialize counters to count the number of heads and tails heads = 0 tails = 0 # create a loop for 1000 tosses for i in range (0, 1000): result = toss() # call our function created earlier if result: # check for the result heads = heads + 1 # then if result is true then add 1 to the total for heads else: tails = tails + 1 # or add 1 to the total for tails print('Number of heads = ', heads) print('Number of tails = ', tails)
result = 0 tail result = 0 tail result = 1 head result = 1 head result = 0 tail result = 0 tail result = 1 head result = 0 tail result = 0 tail result = 0 tail Number of heads = 482 Number of tails = 518

In the previous cell the toss() function was executed ten times with print statements and then 1,000 times.

In the next example we will use the tossing of dice to explore the nature of random numbers. If you execute the previous cell, you will generally get different answers each time you run it. That will be shown in the following examples. The Python code may look more complicated but the only new item is the list denoted by brackets [ ] . Instead of just adding up the number of each, we can store the individual results in a list. That will be explained in the computer code.

Monte Carlo simulation is a computational technique that utilizes random numbers to simulate a wide range of processes and systems. The method derives its name from the renowned Monte Carlo Casino, known for its games of chance and unpredictability. The core idea behind Monte Carlo simulation is to repeatedly sample random values for uncertain variables within a model, allowing us to estimate the behavior and outcomes of the system.

# An easy first Monte Carlo simulation is that of throwing dice. # In this cell the variables for use later in the computer program are initialized. die_1 = 0 # This will be a number on one of the faces of the die 1 die_2 = 0 # same for die 2. the actual value will be computed in function roll_dice() num_simulations = 5 # the number of simulations num_rolls = 10 # the number of rolls in each simulation result = [] # the list which will contain the sums of die 1 and 2 for each simulation roll = range(1, num_rolls +1) # create list of roll numbers for plotting purposes, remember, Python starts counting at 0 therefore one must start and end one number more. # for this case that is, [1, 2, 3, 4, 5, 6, 7, 8, 9, 10] sumIt = [] # These are lists to contain the results of the rolls; sumIt for the sum of the two oneDie = [] # And oneDie for the results on each die # win_probability = [] # # Creating Roll Dice Function to create the result of the roll of two dice def roll_dice(): # create a function to roll two die die_1 = random.randint(1, 6) # create a number between 1 and 6 and store it in die_1 die_2 = random.randint(1, 6) # create a number between 1 and 6 and store it in die_2 return die_1, die_2 # return the results

In the next few examples we will look at the impact of using more sets of data in creating random number simulations. The first example uses 5 sets of the 10 rolls. The num_simulations = 5 and num_roll = 10 as defined in the previous cell

# Here we simply plot the results of each roll and each simulation. While accurate, it is difficult to see # what is happening. Better plots folloow. for j in range(num_simulations): #perform a number of simulations of the set of a number of rolls result = [] # create a list for the results of the rolls in each set for i in range(num_rolls): # perform the set of rolls die_1, die_2 = roll_dice() # roll the dice and as in the function created in the previous cell, there are two results that are returned in die_1 and die_2 result.append(die_1+die_2) # place the sum of the die into the "result" list sumIt.append(die_1+die_2) #create the list of the sum of the two dice oneDie.append(die_1) # add the number one one die to the oneDie list oneDie.append(die_2) # add the number on the second die to the oneDie list plt.plot(roll,result) # after the set of rolls, plot the results of the rolls. There will be 'num_simulations' plots showing. # print(*result) here for diagnostics # create the plot labels plt.title("Monte Carlo Dice Game [" + str(num_simulations) + " simulations]") #look at the plot to see where each of these plt statements place a value. plt.xlabel("Roll Number") # label the x axis plt.ylabel("die value") # label the y axis plt.xlim([1, num_rolls]) # place the start and stop on the x axis # now display the result plots and the labels plt.show()
Image in a Jupyter notebook

There you see the results of the two dice, each roll for each set and then each of the five sets are plotted separately. The only real information here is a visualization of the randomness of each roll.

#Here we plot the results of individual die. That is, how often does each number show? # you see the probability of each number 1 through 6 is not equal. This is because of the low number of sets plt.figure(figsize=(12,5)) bin_edges = np.arange(.5, 6.5+1, 1) #define the edges so the integers are plotted separately sns.displot(oneDie, kde = True, bins = bin_edges) # a function to create histogram with overlayed curve, this is the oneDie data from the previous cell plt.xlabel('face value') # create the labels plt.ylabel('number ') plt.title('frequency of face values on die') plt.show()
<Figure size 864x360 with 0 Axes>
Image in a Jupyter notebook
#Now we will look at the frequency of occurance of the sums, that is, 2 through 12 # it is clear some number have higher frequency of occurance, or probability of occurance. plt.figure(figsize=(12,5)) bin_edges = np.arange(1.5, 12.5+1, 1) # create the bin edges so integers plot better sns.displot(sumIt, kde = True, bins = bin_edges) # a function to create histogram with overlayed curve plt.xlabel('sum of dice ') #plot the labels plt.ylabel('number ') plt.title('frequency of values on dice')
Text(0.5, 1.0, 'frequency of values on dice')
<Figure size 864x360 with 0 Axes>
Image in a Jupyter notebook

Here we would expect a Gaussian or Normal curve: this will look better as we increase the number of sets.

Next we will increase the number of sets from 5 to 30

die_1 = 0 # This will be a number on one of the faces of the die 1 die_2 = 0 # same for die 2. the actual value will be computed in function roll_dice() num_simulations = 30 # the number of simulations num_rolls = 10 # the number of rolls in each simulation result = [] # the list which will contain the sums of die 1 and 2 for each simulation roll = range(1, num_rolls +1) # create list of roll numbers for plotting purposes, remember, Python starts counting at 0 # therefore one must start and end one number more. sumIt = [] # These are lists to contain the results of the rolls; sumIt for the sum of the two oneDie = [] # And oneDie for the results on each die # dieList = []
# Here we simply plot the results of each roll and each simulation. While accurate, it is difficult to see # what is happening. Better plots folloow. for j in range(num_simulations): #perform a number of simulations of the set of a number of rolls result = [] # create a list for the results of the rolls in each set for i in range(num_rolls): # perform the set of rolls die_1, die_2 = roll_dice() # roll the dice result.append(die_1+die_2) # place the sum of the die into the "result"list sumIt.append(die_1+die_2) oneDie.append(die_1) oneDie.append(die_2) plt.plot(roll,result) # after the set of rolls, plot them. Do this for each set # print(*result) here for diagnostics # create the plot labels plt.title("Monte Carlo Dice Game [" + str(num_simulations) + " simulations]") plt.xlabel("Roll Number") plt.ylabel("die value") plt.xlim([1, num_rolls]) # now display the result plots and the labels plt.show()
Image in a Jupyter notebook
#Here we plot the results of individual die. That is, how often does each number show? # you see the probability of each number 1 through 6 is about equal. fig, his = plt.subplots(figsize = (12,5)) #create the canvas for the plot bin_edges = np.arange(.5, 6.5+1, 1) #define the edges so the integers are plotted separately his.hist(oneDie, bins = bin_edges) # Create the histogram of the individual die his.set_xlabel('face value') his.set_ylabel('number ') his.set_title('frequency of face values on die')
Text(0.5, 1.0, 'frequency of face values on die')
Image in a Jupyter notebook
#Now we will look at the frequency of occurance of the sums, that is, 2 through 12 # it is clear some number have higher frequency of occurance, or probability of occurance. fig, his = plt.subplots(figsize = (12,5)) bin_edges = np.arange(1.5, 12.5+1, 1) his.hist(sumIt, bins = bin_edges) his.set_xlabel('sum of dice ') his.set_ylabel('number ') his.set_title('frequency of values on dice')
Text(0.5, 1.0, 'frequency of values on dice')
Image in a Jupyter notebook

While the frequency of occurrence for 30 sets looks closer to what we expect we still don't see the flat histogram for the single die or the Normal curve for the sum.

Next we will increase the number of sets from 30 to 100, we still do 10 rolls per set

die_1 = 0 # This will be a number on one of the faces of the die 1 die_2 = 0 # same for die 2. the actual value will be computed in function roll_dice() num_simulations = 100 # the number of simulations num_rolls = 10 # the number of rolls in each simulation result = [] # the list which will contain the sums of die 1 and 2 for each simulation roll = range(1, num_rolls +1) # create list of roll numbers for plotting purposes, remember, Python starts counting at 0 # therefore one must start and end one number more. sumIt = [] # These are lists to contain the results of the rolls; sumIt for the sum of the two oneDie = [] # And oneDie for the results on each die # dieList = []
# Here we simply plot the results of each roll and each simulation. While accurate, it is difficult to see # what is happening. Better plots folloow. for j in range(num_simulations): #perform a number of simulations of the set of a number of rolls result = [] # create a list for the results of the rolls in each set for i in range(num_rolls): # perform the set of rolls die_1, die_2 = roll_dice() # roll the dice result.append(die_1+die_2) # place the sum of the die into the "result"list sumIt.append(die_1+die_2) oneDie.append(die_1) oneDie.append(die_2) plt.plot(roll,result) # after the set of rolls, plot them. Do this for each set # create the plot labels plt.title("Monte Carlo Dice Game [" + str(num_simulations) + " simulations]") plt.xlabel("Roll Number") plt.ylabel("die value") plt.xlim([1, num_rolls]) # now display the result plots and the labels plt.show()
Image in a Jupyter notebook
#Here we plot the results of individual die. That is, how often does each number show? # you see the probability of each number 1 through 6 is about equal. fig, his = plt.subplots(figsize = (12,5)) #create the canvas for the plot bin_edges = np.arange(.5, 6.5+1, 1) #define the edges so the integers are plotted separately his.hist(oneDie, bins = bin_edges) # Create the histogram of the individual die his.set_xlabel('face value') his.set_ylabel('number ') his.set_title('frequency of face values on die') plt.show
<function matplotlib.pyplot.show(close=None, block=None)>
Image in a Jupyter notebook
#Now we will look at the frequency of occurance of the sums, that is, 2 through 12 # it is clear some number have higher frequency of occurance, or probability of occurance. fig, his = plt.subplots(figsize = (12,5)) bin_edges = np.arange(1.5, 12.5+1, 1) his.hist(sumIt, bins = bin_edges) his.set_xlabel('sum of dice ') his.set_ylabel('number ') his.set_title('frequency of values on dice')
Text(0.5, 1.0, 'frequency of values on dice')
Image in a Jupyter notebook

Even at one hundred rolls the histograms are not quite what we desire. So, we will increase the number of rolls to 1000.

die_1 = 0 # This will be a number on one of the faces of the die 1 die_2 = 0 # same for die 2. the actual value will be computed in function roll_dice() num_simulations = 1000 # the number of simulations num_rolls = 10 # the number of rolls in each simulation result = [] # the list which will contain the sums of die 1 and 2 for each simulation roll = range(1, num_rolls +1) # create list of roll numbers for plotting purposes, remember, Python starts counting at 0 # therefore one must start and end one number more. sumIt = [] # These are lists to contain the results of the rolls; sumIt for the sum of the two oneDie = [] # And oneDie for the results on each die # dieList = []
# Here we simply plot the results of each roll and each simulation. While accurate, it is difficult to see # what is happening. Better plots folloow. for j in range(num_simulations): #perform a number of simulations of the set of a number of rolls result = [] # create a list for the results of the rolls in each set for i in range(num_rolls): # perform the set of rolls die_1, die_2 = roll_dice() # roll the dice result.append(die_1+die_2) # place the sum of the die into the "result"list sumIt.append(die_1+die_2) oneDie.append(die_1) oneDie.append(die_2) plt.plot(roll,result) # after the set of rolls, plot them. Do this for each set # create the plot labels plt.title("Monte Carlo Dice Game [" + str(num_simulations) + " simulations]") plt.xlabel("Roll Number") plt.ylabel("die value") plt.xlim([1, num_rolls]) # now display the result plots and the labels plt.show()
Image in a Jupyter notebook
#Here we plot the results of individual die. That is, how often does each number show? # you see the probability of each number 1 through 6 is about equal. # import numpy as np #we imported numpy earlier fig, his = plt.subplots(figsize = (12,5)) #create the canvas for the plot bin_edges = np.arange(.5, 6.5+1, 1) #define the edges so the integers are plotted separately his.hist(oneDie, bins = bin_edges) # Create the histogram of the individual die his.set_xlabel('face value') his.set_ylabel('number ') his.set_title('frequency of face values on die')
Text(0.5, 1.0, 'frequency of face values on die')
Image in a Jupyter notebook

In the next cell, we use the gaussian_kde function to create the data set.

gaussian_kde stands for Gaussian Kernel Density Estimation. It is used for non-parametric estimation of the probability density function (PDF) of a random variable. This is useful in statistics when you want to estimate the underlying distribution of a data set without assuming any particular underlying distribution.

The method uses a Gaussian (also known as normal) distribution as the kernel. The kernel acts as a smooth, localized weighting function over the data points. By summing these functions, it creates a smooth approximation of the frequency distribution. The bandwidth of the kernel (which controls the smoothness of the resulting estimate) can be adjusted to fit the data appropriately.

gaussian_kde is particularly useful in data science and statistics for visualizing and analyzing data where you don't have a clear expectation of what the distribution should look like, or when the data doesn't fit traditional parametric distributions well.

#Now we will look at the frequency of occurance of the sums, that is, 2 through 12 # it is clear some number have higher frequency of occurance, or probability of occurance. kde = gaussian_kde(sumIt) #Now we create the data for the smooth curve on this histogram x = np.linspace(0,13,14) y = kde(x) * 8750 fig, his = plt.subplots(figsize = (12,5)) plt.plot(x,y) # plot the smooth curve bin_edges = np.arange(1.5, 12.5+1, 1) his.hist(sumIt, bins = bin_edges, color = 'blue') his.set_xlabel('sum of dice ') his.set_ylabel('number ') his.set_title('frequency of values on dice')
Text(0.5, 1.0, 'frequency of values on dice')
Image in a Jupyter notebook

Much closer to equal probability of obtaining a 1-6 on each die and much closer to a Normal Curve. Therefore it is seen that even in this simple model, a large number of sets provides a result that is closer to the analytic result.

#Here I was looking at how to generate a smooth curve # this is the first way I found, the seabourn library works better for this application import numpy as np from scipy.stats import gaussian_kde # Generate a random normal distribution with mean 0 and standard deviation 1 data = np.random.normal(0, 1, 10000) # Estimate the PDF using a Gaussian kernel kde = gaussian_kde(data) # Evaluate the PDF at a grid of points x = np.linspace(-4, 4, 1000) y = kde(x) # Plot the PDF estimate import matplotlib.pyplot as plt plt.plot(x, y) plt.show()
Image in a Jupyter notebook