CoCalc -- Section One.ipynb

Real-time collaboration for Jupyter Notebooks, Linux Terminals, LaTeX, VS Code, R IDE, and more,
all in one place. Commercial Alternative to JupyterHub.

| Download

Project: Computational Thinking

Path: Section One.ipynb

Views: ⁴²

Visibility: Unlisted (only visible to those who know the link)

Image: ubuntu2204

Kernel: Python 3 (system-wide)

Section One; Random Numbers

The use of random numbers in simulations has become an essential tool in various fields such as engineering, physics, economics, and computer science. Random numbers help generate random events, allowing simulations to model complex scenarios that cannot be replicated in real life. In this essay, I will discuss the applications of random numbers in simulations and their benefits.

Random numbers are a key component of Monte Carlo simulations, a technique used to model complex scenarios and estimate probabilities in a wide range of fields. Monte Carlo simulations use random numbers to generate a large number of possible outcomes, which can be used to estimate the likelihood of different scenarios and analyze risk.

A flip of a coin is a good first look at random, the result of the toss is either a head or tail. In a 'fair' coin the chance should be equal. For ease of programming we will assign 0 to a head and 1 to a tail.

Much of the computing in Python is done in computer programs that have been written and placed in packages. These generally make the computer code more readable and also allow powerful tools that don't need to be rewritten. So first we will import some useful packages

In [23]:

# In this cell we import computer code that we will use later
# First we import the packages for plotting and for generating a random number
# packages contain computer code to perform specific tasks
import matplotlib.pyplot as plt      # import the plotting routines <https://matplotlib.org/>
import random                        # import the routine for generating random numbers <https://docs.python.org/3/library/random.html>
import numpy as np                   # numpy is a Python library for scientific computing see <https://numpy.org>
from scipy.stats import gaussian_kde # Gaussian Kernel Density Estimation (explained later)
import seaborn as sns                # another plot library

In [4]:

# The computer program introduces the "for" statement and the "if" statement
# now 'toss'a coin ten times
for i in range (0, 10):        # the for statement executes the indented statements and increments 'i' from 0 to 9, so, ten times.
   coin = random.randint(0,1)  # this creates, randomly, a 0 or 1. randint() is a function of the random library and stores it in the variable 'coin'
   if coin == 0:               # Is the number in 'coin' a zero?
      print('head')            # if so print'head'
   else:                       # if not then
      print('tail')            # print 'tail'

head
head
tail
head
head
tail
tail
head
tail
tail

In [5]:

# now let's toss 1000 times and see how often we get each
# first define a function that will perform a single coin toss
def toss():                    # define a function called toss that contains the computer code from the previous cell
   coin = random.randint(0,1)  # create a random number as in the previous cell
   if coin == 0:               # test it, the statement 'coin equals zero, and
      return 0                 # return a 0 if 'coin' contains a zero
   else:                       # or
      return 1                 # return a 1 if 'coin'  contains a one
# because the value of coin is either 0 or 1, that value could be returned, that is, 'return coin' without using the 'if' statement.

In [6]:

# First, check the function from the previous cell to be sure it works.
for i in range (0, 10):  # once again we execute the test and print ten times
  result = toss()        # but here we use the coin toss function defined in the previous cell
  print('result = ', result)
  if result:             # Here we use the fact that a zero is treated as false and a non-zero is true.
    print('head')        # if the result is one or true print 'head'
  else:                  # or if it is zero or false then
      print('tail')      # print 'tail'
# initialize counters to count the number of heads and tails
heads = 0
tails = 0
# create a loop for 1000 tosses
for i in range (0, 1000):
  result = toss()  # call our function created earlier
  if result:  # check for the result
        heads = heads + 1  # then if result is true then add 1 to the total for heads
  else:
        tails = tails + 1  # or add 1 to the total for tails
print('Number of heads = ', heads)
print('Number of tails = ', tails)

result =  0
tail
result =  0
tail
result =  1
head
result =  1
head
result =  0
tail
result =  0
tail
result =  1
head
result =  0
tail
result =  0
tail
result =  0
tail
Number of heads =  482
Number of tails =  518

In the previous cell the toss() function was executed ten times with print statements and then 1,000 times.

In the next example we will use the tossing of dice to explore the nature of random numbers. If you execute the previous cell, you will generally get different answers each time you run it. That will be shown in the following examples. The Python code may look more complicated but the only new item is the list denoted by brackets [ ] . Instead of just adding up the number of each, we can store the individual results in a list. That will be explained in the computer code.

Monte Carlo simulation is a computational technique that utilizes random numbers to simulate a wide range of processes and systems. The method derives its name from the renowned Monte Carlo Casino, known for its games of chance and unpredictability. The core idea behind Monte Carlo simulation is to repeatedly sample random values for uncertain variables within a model, allowing us to estimate the behavior and outcomes of the system.

In [7]:

# An easy first Monte Carlo simulation is that of throwing dice.
# In this cell the variables for use later in the computer program are initialized.
die_1 = 0         # This will be a number on one of the faces of the die 1
die_2 = 0         # same for die 2. the actual value will be computed in function roll_dice()

num_simulations = 5            # the number of simulations
num_rolls = 10                 # the number of rolls in each simulation
result = []                    # the list which will contain the sums of die 1 and 2 for each simulation
roll = range(1, num_rolls +1)  # create list of roll numbers for plotting purposes, remember, Python starts counting at 0 therefore one must start and end one number more.
                               # for this case that is, [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
sumIt = []                    # These are lists to contain the results of the rolls; sumIt for the sum of the two
oneDie = []                   # And oneDie for the results on each die

# win_probability = []

# # Creating Roll Dice Function to create the result of the roll of two dice

def roll_dice():                  # create a function to roll two die
    die_1 = random.randint(1, 6)  # create a number between 1 and 6 and store it in die_1
    die_2 = random.randint(1, 6)  # create a number between 1 and 6 and store it in die_2
    return die_1, die_2           # return the results

In the next few examples we will look at the impact of using more sets of data in creating random number simulations. The first example uses 5 sets of the 10 rolls. The num_simulations = 5 and num_roll = 10 as defined in the previous cell

In [8]:

# Here we simply plot the results of each roll and each simulation. While accurate, it is difficult to see
# what is happening. Better plots folloow.
for j in range(num_simulations): #perform a number of simulations of the set of a number of rolls
  result =  []                   # create a list for the results of the rolls in each set
  for i in range(num_rolls):     # perform the set of rolls

    die_1, die_2 = roll_dice()   # roll the dice and as in the function created in the previous cell, there are two results that are returned in die_1 and die_2
    result.append(die_1+die_2)   # place the sum of the die into the "result" list

    sumIt.append(die_1+die_2)    #create the list of the sum of the two dice
    oneDie.append(die_1)         # add the number one one die to the oneDie list
    oneDie.append(die_2)         # add the number on the second die to the oneDie list
  plt.plot(roll,result)          # after the set of rolls, plot the results of the rolls.  There will be 'num_simulations' plots showing.
#  print(*result)                 here for diagnostics
# create the plot labels
plt.title("Monte Carlo Dice Game [" + str(num_simulations) + " simulations]") #look at the plot to see where each of these plt statements place a value.
plt.xlabel("Roll Number")        # label the x axis
plt.ylabel("die value")          # label the y axis
plt.xlim([1, num_rolls])         # place the start and stop on the x axis
# now display the result plots and the labels
plt.show()

There you see the results of the two dice, each roll for each set and then each of the five sets are plotted separately. The only real information here is a visualization of the randomness of each roll.

In [9]:

#Here we plot the results of individual die. That is, how often does each number show?
# you see the probability of each number 1 through 6 is not equal. This is because of the low number of sets

plt.figure(figsize=(12,5))

bin_edges = np.arange(.5, 6.5+1, 1)                  #define the edges so the integers are plotted separately
sns.displot(oneDie, kde = True, bins = bin_edges)    #  a function to create histogram with overlayed curve, this is the oneDie data from the previous cell

plt.xlabel('face value')                             # create the labels
plt.ylabel('number ')
plt.title('frequency of face values on die')
plt.show()

<Figure size 864x360 with 0 Axes>

In [10]:

#Now we will look at the frequency of occurance of the sums, that is, 2 through 12
# it is clear some number have higher frequency of occurance, or probability of occurance.


plt.figure(figsize=(12,5))
bin_edges = np.arange(1.5, 12.5+1, 1)            # create the bin edges so integers plot better
sns.displot(sumIt, kde = True, bins = bin_edges)          #  a function to create histogram with overlayed curve

plt.xlabel('sum of dice ')                     #plot the labels
plt.ylabel('number ')
plt.title('frequency of values on dice')

Text(0.5, 1.0, 'frequency of values on dice')

<Figure size 864x360 with 0 Axes>

Here we would expect a Gaussian or Normal curve: this will look better as we increase the number of sets.

Next we will increase the number of sets from 5 to 30

In [11]:

die_1 = 0                         # This will be a number on one of the faces of the die 1
die_2 = 0                         # same for die 2. the actual value will be computed in function roll_dice()
num_simulations = 30           # the number of simulations
num_rolls = 10                    # the number of rolls in each simulation
result = []                       # the list which will contain the sums of die 1 and 2 for each simulation
roll = range(1, num_rolls +1)     # create list of roll numbers for plotting purposes, remember, Python starts counting at 0
#                                   therefore one must start and end one number more.
sumIt = []                        # These are lists to contain the results of the rolls; sumIt for the sum of the two
oneDie = []                       # And oneDie for the results on each die
# dieList = []

In [12]:

# Here we simply plot the results of each roll and each simulation. While accurate, it is difficult to see
# what is happening. Better plots folloow.
for j in range(num_simulations): #perform a number of simulations of the set of a number of rolls
  result =  []                   # create a list for the results of the rolls in each set
  for i in range(num_rolls):     # perform the set of rolls

    die_1, die_2 = roll_dice()   # roll the dice
    result.append(die_1+die_2)   # place the sum of the die into the "result"list

    sumIt.append(die_1+die_2)
    oneDie.append(die_1)
    oneDie.append(die_2)
  plt.plot(roll,result)          # after the set of rolls, plot them. Do this for each set
#  print(*result)                 here for diagnostics
# create the plot labels
plt.title("Monte Carlo Dice Game [" + str(num_simulations) + " simulations]")
plt.xlabel("Roll Number")
plt.ylabel("die value")
plt.xlim([1, num_rolls])         
# now display the result plots and the labels
plt.show()

In [13]:

#Here we plot the results of individual die. That is, how often does each number show?
# you see the probability of each number 1 through 6 is about equal.

fig, his = plt.subplots(figsize = (12,5))  #create the canvas for the plot

bin_edges = np.arange(.5, 6.5+1, 1)        #define the edges so the integers are plotted separately

his.hist(oneDie, bins = bin_edges)        # Create the histogram of the individual die

his.set_xlabel('face value')
his.set_ylabel('number ')
his.set_title('frequency of face values on die')

Text(0.5, 1.0, 'frequency of face values on die')

In [14]:

#Now we will look at the frequency of occurance of the sums, that is, 2 through 12
# it is clear some number have higher frequency of occurance, or probability of occurance.

fig, his = plt.subplots(figsize = (12,5))
bin_edges = np.arange(1.5, 12.5+1, 1)
his.hist(sumIt, bins = bin_edges)
his.set_xlabel('sum of dice ')
his.set_ylabel('number ')
his.set_title('frequency of values on dice')

Text(0.5, 1.0, 'frequency of values on dice')

While the frequency of occurrence for 30 sets looks closer to what we expect we still don't see the flat histogram for the single die or the Normal curve for the sum.

Next we will increase the number of sets from 30 to 100, we still do 10 rolls per set

In [15]:

die_1 = 0                         # This will be a number on one of the faces of the die 1
die_2 = 0                         # same for die 2. the actual value will be computed in function roll_dice()
num_simulations = 100            # the number of simulations
num_rolls = 10                    # the number of rolls in each simulation
result = []                       # the list which will contain the sums of die 1 and 2 for each simulation
roll = range(1, num_rolls +1)     # create list of roll numbers for plotting purposes, remember, Python starts counting at 0
#                                   therefore one must start and end one number more.
sumIt = []                        # These are lists to contain the results of the rolls; sumIt for the sum of the two
oneDie = []                       # And oneDie for the results on each die
# dieList = []

In [16]:

# Here we simply plot the results of each roll and each simulation. While accurate, it is difficult to see
# what is happening. Better plots folloow.
for j in range(num_simulations): #perform a number of simulations of the set of a number of rolls
  result =  []                   # create a list for the results of the rolls in each set
  for i in range(num_rolls):     # perform the set of rolls

    die_1, die_2 = roll_dice()   # roll the dice
    result.append(die_1+die_2)   # place the sum of the die into the "result"list

    sumIt.append(die_1+die_2)
    oneDie.append(die_1)
    oneDie.append(die_2)
  plt.plot(roll,result)          # after the set of rolls, plot them. Do this for each set

# create the plot labels
plt.title("Monte Carlo Dice Game [" + str(num_simulations) + " simulations]")
plt.xlabel("Roll Number")
plt.ylabel("die value")
plt.xlim([1, num_rolls])         
# now display the result plots and the labels
plt.show()

In [17]:

#Here we plot the results of individual die. That is, how often does each number show?
# you see the probability of each number 1 through 6 is about equal.


fig, his = plt.subplots(figsize = (12,5))  #create the canvas for the plot

bin_edges = np.arange(.5, 6.5+1, 1)        #define the edges so the integers are plotted separately

his.hist(oneDie, bins = bin_edges)        # Create the histogram of the individual die

his.set_xlabel('face value')
his.set_ylabel('number ')
his.set_title('frequency of face values on die')
plt.show

<function matplotlib.pyplot.show(close=None, block=None)>

In [18]:

#Now we will look at the frequency of occurance of the sums, that is, 2 through 12
# it is clear some number have higher frequency of occurance, or probability of occurance.

fig, his = plt.subplots(figsize = (12,5))
bin_edges = np.arange(1.5, 12.5+1, 1)
his.hist(sumIt, bins = bin_edges)
his.set_xlabel('sum of dice ')
his.set_ylabel('number ')
his.set_title('frequency of values on dice')

Text(0.5, 1.0, 'frequency of values on dice')

Even at one hundred rolls the histograms are not quite what we desire. So, we will increase the number of rolls to 1000.

In [19]:

die_1 = 0                         # This will be a number on one of the faces of the die 1
die_2 = 0                         # same for die 2. the actual value will be computed in function roll_dice()
num_simulations = 1000            # the number of simulations
num_rolls = 10                    # the number of rolls in each simulation
result = []                       # the list which will contain the sums of die 1 and 2 for each simulation
roll = range(1, num_rolls +1)     # create list of roll numbers for plotting purposes, remember, Python starts counting at 0
#                                   therefore one must start and end one number more.
sumIt = []                        # These are lists to contain the results of the rolls; sumIt for the sum of the two
oneDie = []                       # And oneDie for the results on each die
# dieList = []

In [20]:

# Here we simply plot the results of each roll and each simulation. While accurate, it is difficult to see
# what is happening. Better plots folloow.
for j in range(num_simulations): #perform a number of simulations of the set of a number of rolls
  result =  []                   # create a list for the results of the rolls in each set
  for i in range(num_rolls):     # perform the set of rolls

    die_1, die_2 = roll_dice()   # roll the dice
    result.append(die_1+die_2)   # place the sum of the die into the "result"list

    sumIt.append(die_1+die_2)
    oneDie.append(die_1)
    oneDie.append(die_2)
  plt.plot(roll,result)          # after the set of rolls, plot them. Do this for each set

# create the plot labels
plt.title("Monte Carlo Dice Game [" + str(num_simulations) + " simulations]")
plt.xlabel("Roll Number")
plt.ylabel("die value")
plt.xlim([1, num_rolls])         
# now display the result plots and the labels
plt.show()

In [21]:

#Here we plot the results of individual die. That is, how often does each number show?
# you see the probability of each number 1 through 6 is about equal.

# import numpy as np   #we imported numpy earlier

fig, his = plt.subplots(figsize = (12,5))  #create the canvas for the plot

bin_edges = np.arange(.5, 6.5+1, 1)        #define the edges so the integers are plotted separately

his.hist(oneDie, bins = bin_edges)        # Create the histogram of the individual die

his.set_xlabel('face value')
his.set_ylabel('number ')
his.set_title('frequency of face values on die')

Text(0.5, 1.0, 'frequency of face values on die')

In the next cell, we use the gaussian_kde function to create the data set.

gaussian_kde stands for Gaussian Kernel Density Estimation. It is used for non-parametric estimation of the probability density function (PDF) of a random variable. This is useful in statistics when you want to estimate the underlying distribution of a data set without assuming any particular underlying distribution.

The method uses a Gaussian (also known as normal) distribution as the kernel. The kernel acts as a smooth, localized weighting function over the data points. By summing these functions, it creates a smooth approximation of the frequency distribution. The bandwidth of the kernel (which controls the smoothness of the resulting estimate) can be adjusted to fit the data appropriately.

gaussian_kde is particularly useful in data science and statistics for visualizing and analyzing data where you don't have a clear expectation of what the distribution should look like, or when the data doesn't fit traditional parametric distributions well.

In [30]:

#Now we will look at the frequency of occurance of the sums, that is, 2 through 12
# it is clear some number have higher frequency of occurance, or probability of occurance.
kde = gaussian_kde(sumIt)                        #Now we create the data for the smooth curve on this histogram
x = np.linspace(0,13,14)
y = kde(x) * 8750
fig, his = plt.subplots(figsize = (12,5))
plt.plot(x,y)                                    # plot the smooth curve
bin_edges = np.arange(1.5, 12.5+1, 1)
his.hist(sumIt, bins = bin_edges, color = 'blue')
his.set_xlabel('sum of dice ')
his.set_ylabel('number ')
his.set_title('frequency of values on dice')

Text(0.5, 1.0, 'frequency of values on dice')

Much closer to equal probability of obtaining a 1-6 on each die and much closer to a Normal Curve. Therefore it is seen that even in this simple model, a large number of sets provides a result that is closer to the analytic result.

In [31]:

#Here I was looking at how to generate a smooth curve
# this is the first way I found, the seabourn library works better for this application

import numpy as np
from scipy.stats import gaussian_kde

# Generate a random normal distribution with mean 0 and standard deviation 1
data = np.random.normal(0, 1, 10000)

# Estimate the PDF using a Gaussian kernel
kde = gaussian_kde(data)

# Evaluate the PDF at a grid of points
x = np.linspace(-4, 4, 1000)
y = kde(x)

# Plot the PDF estimate
import matplotlib.pyplot as plt
plt.plot(x, y)
plt.show()

In [0]:

Real-time collaboration for Jupyter Notebooks, Linux Terminals, LaTeX, VS Code, R IDE, and more,
all in one place. Commercial Alternative to JupyterHub.

Section One; Random Numbers

Product

Resources

Company

Real-time collaboration for Jupyter Notebooks, Linux Terminals, LaTeX, VS Code, R IDE, and more, all in one place. Commercial Alternative to JupyterHub.

Section One; Random Numbers

Real-time collaboration for Jupyter Notebooks, Linux Terminals, LaTeX, VS Code, R IDE, and more,
all in one place. Commercial Alternative to JupyterHub.