Real-time collaboration for Jupyter Notebooks, Linux Terminals, LaTeX, VS Code, R IDE, and more,
all in one place.
Real-time collaboration for Jupyter Notebooks, Linux Terminals, LaTeX, VS Code, R IDE, and more,
all in one place.
Path: blob/master/chapters/chap14.ipynb
Views: 531
Printed and electronic copies of Modeling and Simulation in Python are available from No Starch Press and Bookshop.org and Amazon.
Nondimensionalization
Modeling and Simulation in Python
Copyright 2021 Allen Downey
License: Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International
In the previous chapter we swept the parameters of the Kermack-McKendrick (KM) model: the contact rate, beta
, and the recovery rate, gamma
. For each pair of parameters, we ran a simulation and computed the total fraction of the population infected.
In this chapter we'll investigate the relationship between the parameters and this metric, using both simulation and analysis.
The figures in the previous chapter suggest that there is a relationship between the parameters of the KM model, beta
and gamma
, and the fraction of the population that is infected. Let's think what that relationship might be.
When
beta
exceedsgamma
, there are more contacts than recoveries during each day. The difference betweenbeta
andgamma
might be called the excess contact rate, in units of contacts per day.As an alternative, we might consider the ratio
beta/gamma
, which is the number of contacts per recovery. Because the numerator and denominator are in the same units, this ratio is dimensionless, which means it has no units.
Describing physical systems using dimensionless parameters is often a useful move in the modeling and simulation game. In fact, it is so useful that it has a name: nondimensionalization (see http://modsimpy.com/nondim). So that's what we'll try first.
Exploring the Results
In the previous chapter, we wrote a function, sweep_parameters
, that takes an array of values for beta
and an array of values for gamma
. It runs a simulation for each pair of parameters and returns a SweepFrame
with the results.
I'll run it again with the following arrays of parameters.
Here's what the first few rows look like:
The SweepFrame
has one row for each value of beta
and one column for each value of gamma
. We can print the values in the SweepFrame
like this:
This is the first example we've seen with one for
loop inside another:
Each time the outer loop runs, it selects a value of
gamma
from the columns of theSweepFrame
and extracts the corresponding column.Each time the inner loop runs, it selects a value of
beta
from the index of the column and selects the corresponding element, which is the fraction of the population that got infected.
Since there are 11 rows and 4 columns, the total number of lines in the output is 44.
The following function uses the same loops to enumerate the elements of the SweepFrame
, but instead of printing a line for each element, it plots a point.
For each element of the SweepFrame
it plots a point with the ratio beta/gamma
as the coordinate and metric
-- which is the fraction of the population that's infected -- as the coordinate.
Here's what it looks like:
The results fall on a single curve, at least approximately. That means that we can predict the fraction of the population that will be infected based on a single parameter, the ratio beta/gamma
. We don't need to know the values of beta
and gamma
separately.
Contact Number
From Chapter 11, recall that the number of new infections in a given day is , and the number of recoveries is . If we divide these quantities, the result is , which is the number of new infections per recovery (as a fraction of the population).
When a new disease is introduced to a susceptible population, is approximately 1, so the number of people infected by each sick person is . This ratio is called the contact number or basic reproduction number (see http://modsimpy.com/contact). By convention it is usually denoted , but in the context of an SIR model, that notation is confusing, so we'll use instead.
The results in the previous section suggest that there is a relationship between and the total number of infections. We can derive this relationship by analyzing the differential equations from Chapter 11:
In the same way we divided the contact rate by the infection rate to get the dimensionless quantity , now we'll divide by to get a ratio of rates:
Which we can simplify as
Replacing with , we can write
Dividing one differential equation by another is not an obvious move, but in this case it is useful because it gives us a relationship between , , and that does not depend on time. From that relationship, we can derive an equation that relates to the final value of . In theory, this equation makes it possible to infer by observing the course of an epidemic.
Here's how the derivation goes. We multiply both sides of the previous equation by :
And then integrate both sides:
where is a constant of integration. Rearranging terms yields:
Now let's see if we can figure out what is. At the beginning of an epidemic, if the fraction infected is small and nearly everyone is susceptible, we can use the approximations and to compute :
Since , we get .
Now, at the end of the epidemic, let's assume that , and is an unknown quantity, . Now we have:
Solving for , we get
By relating and , this equation makes it possible to estimate based on data, and possibly predict the behavior of future epidemics.
Analysis and Simulation
Let's compare this analytic result to the results from simulation. I'll create an array of values for .
And compute the corresponding values of :
To get the total infected, we compute the difference between and , then store the results in a Series
:
The ModSim library provides a function called make_series
we can use to put c_array
and frac_infected
in a Pandas Series
.
Now we can plot the results:
When the contact number exceeds 1, analysis and simulation agree. When the contact number is less than 1, they do not: analysis indicates there should be no infections; in the simulations there are a small number of infections.
The reason for the discrepancy is that the simulation divides time into a discrete series of days, whereas the analysis treats time as a continuous quantity. When the contact number is large, these two models agree; when it is small, they diverge.
Estimating Contact Number
The previous figure shows that if we know the contact number, we can estimate the fraction of the population that will be infected with just a few arithmetic operations. We don't have to run a simulation.
We can also read the figure the other way; if we know what fraction of the population was affected by a past outbreak, we can estimate the contact number. Then, if we know one of the parameters, like gamma
, we can use the contact number to estimate the other parameter, like beta
.
At least in theory, we can. In practice, it might not work very well, because of the shape of the curve.
When the contact number is low, the curve is quite steep, which means that small changes in yield big changes in the number of infections. If we observe that the total fraction infected is anywhere from 20% to 80%, we would conclude that is near 2.
And when the contact number is high, the curve is nearly flat, which means that it's hard to see the difference between values of between 3 and 6.
With the uncertainty of real data, we might not be able to estimate with much precision. But as one of the exercises below, you'll have a chance to try.
Summary
In this chapter we used simulations to explore the relationship between beta
, gamma
, and the fraction infected. Then we used analysis to explain that relationship.
With that, we are done with the Kermack-McKendrick model. In the next chapter we'll move on to thermal systems and the notorious coffee cooling problem.
Exercises
This chapter is available as a Jupyter notebook where you can read the text, run the code, and work on the exercises. You can access the notebooks at https://allendowney.github.io/ModSimPy/.
Exercise 1
At the beginning of this chapter, I suggested two ways to relate beta
and gamma
: we could compute their difference or their ratio.
Because the ratio is dimensionless, I suggested we explore it first, and that led us to discover the contact number, which is beta/gamma
. When we plotted the fraction infected as a function of the contact number, we found that this metric falls on a single curve, at least approximately. That indicates that the ratio is enough to predict the results; we don't have to know beta
and gamma
individually.
But that leaves a question open: what happens if we do the same thing using the difference instead of the ratio?
Write a version of plot_sweep_frame
, called plot_sweep_frame_difference
, that plots the fraction infected versus the difference beta-gamma
.
What do the results look like, and what does that imply?
Exercise 2
Suppose you run a survey at the end of the semester and find that 26% of students had the Freshman Plague at some point. What is your best estimate of c
?
Hint: if you display frac_infected_series
, you can read off the answer.
Exercise 3
So far the only metric we have considered is the total fraction of the population that gets infected over the course of an epidemic. That is an important metric, but it is not the only one we care about.
For example, if we have limited resources to deal with infected people, we might also be concerned about the number of people who are sick at the peak of the epidemic, which is the maximum of I
.
Write a version of sweep_beta
that computes this metric, and use it to compute a SweepFrame
for a range of values of beta
and gamma
. Make a contour plot that shows the value of this metric as a function of beta
and gamma
.
Then use plot_sweep_frame
to plot the maximum of I
as a function of the contact number, beta/gamma
. Do the results fall on a single curve?
Under the Hood
ModSim provides make_series
to make it easier to create a Pandas Series. In this chapter, we used it like this:
If you import Series
from Pandas, you can make a Series
yourself, like this:
The difference is that the arguments are in reverse order: the first argument is stored as the values in the Series
; the second argument is stored as the index.
I find that order counterintuitive, which is why I use make_series
. make_series
takes the same optional keyword arguments as Series
, which you can read about at https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.Series.html.