CoCalc Logo Icon
StoreFeaturesDocsShareSupportNewsAboutSign UpSign In
AllenDowney

Real-time collaboration for Jupyter Notebooks, Linux Terminals, LaTeX, VS Code, R IDE, and more,
all in one place.

GitHub Repository: AllenDowney/ModSimPy
Path: blob/master/chapters/chap14.ipynb
Views: 531
Kernel: Python 3 (ipykernel)

Printed and electronic copies of Modeling and Simulation in Python are available from No Starch Press and Bookshop.org and Amazon.

Nondimensionalization

Modeling and Simulation in Python

Copyright 2021 Allen Downey

License: Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International

# download modsim.py if necessary from os.path import basename, exists def download(url): filename = basename(url) if not exists(filename): from urllib.request import urlretrieve local, _ = urlretrieve(url, filename) print('Downloaded ' + local) download('https://raw.githubusercontent.com/AllenDowney/' + 'ModSimPy/master/modsim.py')
# import functions from modsim from modsim import *
download('https://github.com/AllenDowney/ModSimPy/raw/master/' + 'chap11.py')
download('https://github.com/AllenDowney/ModSimPy/raw/master/' + 'chap12.py')
download('https://github.com/AllenDowney/ModSimPy/raw/master/' + 'chap13.py')
# import code from previous notebooks from chap11 import make_system from chap11 import update_func from chap11 import run_simulation from chap11 import plot_results from chap12 import calc_total_infected from chap13 import sweep_beta from chap13 import sweep_parameters

In the previous chapter we swept the parameters of the Kermack-McKendrick (KM) model: the contact rate, beta, and the recovery rate, gamma. For each pair of parameters, we ran a simulation and computed the total fraction of the population infected.

In this chapter we'll investigate the relationship between the parameters and this metric, using both simulation and analysis.

The figures in the previous chapter suggest that there is a relationship between the parameters of the KM model, beta and gamma, and the fraction of the population that is infected. Let's think what that relationship might be.

  • When beta exceeds gamma, there are more contacts than recoveries during each day. The difference between beta and gamma might be called the excess contact rate, in units of contacts per day.

  • As an alternative, we might consider the ratio beta/gamma, which is the number of contacts per recovery. Because the numerator and denominator are in the same units, this ratio is dimensionless, which means it has no units.

Describing physical systems using dimensionless parameters is often a useful move in the modeling and simulation game. In fact, it is so useful that it has a name: nondimensionalization (see http://modsimpy.com/nondim). So that's what we'll try first.

Exploring the Results

In the previous chapter, we wrote a function, sweep_parameters, that takes an array of values for beta and an array of values for gamma. It runs a simulation for each pair of parameters and returns a SweepFrame with the results.

I'll run it again with the following arrays of parameters.

beta_array = [0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1.0 , 1.1] gamma_array = [0.2, 0.4, 0.6, 0.8] frame = sweep_parameters(beta_array, gamma_array)

Here's what the first few rows look like:

frame.head()

The SweepFrame has one row for each value of beta and one column for each value of gamma. We can print the values in the SweepFrame like this:

for gamma in frame.columns: column = frame[gamma] for beta in column.index: metric = column[beta] print(beta, gamma, metric)

This is the first example we've seen with one for loop inside another:

  • Each time the outer loop runs, it selects a value of gamma from the columns of the SweepFrame and extracts the corresponding column.

  • Each time the inner loop runs, it selects a value of beta from the index of the column and selects the corresponding element, which is the fraction of the population that got infected.

Since there are 11 rows and 4 columns, the total number of lines in the output is 44.

The following function uses the same loops to enumerate the elements of the SweepFrame, but instead of printing a line for each element, it plots a point.

from matplotlib.pyplot import plot def plot_sweep_frame(frame): for gamma in frame.columns: column = frame[gamma] for beta in column.index: metric = column[beta] plot(beta/gamma, metric, '.', color='C1')

For each element of the SweepFrame it plots a point with the ratio beta/gamma as the xx coordinate and metric -- which is the fraction of the population that's infected -- as the yy coordinate.

Here's what it looks like:

plot_sweep_frame(frame) decorate(xlabel='Contact number (beta/gamma)', ylabel='Fraction infected')

The results fall on a single curve, at least approximately. That means that we can predict the fraction of the population that will be infected based on a single parameter, the ratio beta/gamma. We don't need to know the values of beta and gamma separately.

Contact Number

From Chapter 11, recall that the number of new infections in a given day is βsiN\beta s i N, and the number of recoveries is γiN\gamma i N. If we divide these quantities, the result is βs/γ\beta s / \gamma, which is the number of new infections per recovery (as a fraction of the population).

When a new disease is introduced to a susceptible population, ss is approximately 1, so the number of people infected by each sick person is β/γ\beta / \gamma. This ratio is called the contact number or basic reproduction number (see http://modsimpy.com/contact). By convention it is usually denoted R0R_0, but in the context of an SIR model, that notation is confusing, so we'll use cc instead.

The results in the previous section suggest that there is a relationship between cc and the total number of infections. We can derive this relationship by analyzing the differential equations from Chapter 11:

dsdt=βsididt=βsiγidrdt=γi\begin{aligned} \frac{ds}{dt} &= -\beta s i \\ \frac{di}{dt} &= \beta s i - \gamma i\\ \frac{dr}{dt} &= \gamma i\end{aligned}

In the same way we divided the contact rate by the infection rate to get the dimensionless quantity cc, now we'll divide di/dtdi/dt by ds/dtds/dt to get a ratio of rates:

dids=βsiγiβsi\frac{di}{ds} = \frac{\beta s i - \gamma i}{-\beta s i}

Which we can simplify as

dids=1+γβs\frac{di}{ds} = -1 + \frac{\gamma}{\beta s}

Replacing β/γ\beta/\gamma with cc, we can write

dids=1+1cs\frac{di}{ds} = -1 + \frac{1}{c s}

Dividing one differential equation by another is not an obvious move, but in this case it is useful because it gives us a relationship between ii, ss, and cc that does not depend on time. From that relationship, we can derive an equation that relates cc to the final value of ss. In theory, this equation makes it possible to infer cc by observing the course of an epidemic.

Here's how the derivation goes. We multiply both sides of the previous equation by dsds:

di=(1+1cs)dsdi = \left( -1 + \frac{1}{cs} \right) ds

And then integrate both sides:

i=s+1clogs+qi = -s + \frac{1}{c} \log s + q

where qq is a constant of integration. Rearranging terms yields:

q=i+s1clogsq = i + s - \frac{1}{c} \log s

Now let's see if we can figure out what qq is. At the beginning of an epidemic, if the fraction infected is small and nearly everyone is susceptible, we can use the approximations i(0)=0i(0) = 0 and s(0)=1s(0) = 1 to compute qq:

q=0+1+1clog1q = 0 + 1 + \frac{1}{c} \log 1

Since log1=0\log 1 = 0, we get q=1q = 1.

Now, at the end of the epidemic, let's assume that i()=0i(\infty) = 0, and s()s(\infty) is an unknown quantity, ss_{\infty}. Now we have:

q=1=0+s1clogsq = 1 = 0 + s_{\infty}- \frac{1}{c} \log s_{\infty}

Solving for cc, we get

c=logss1c = \frac{\log s_{\infty}}{s_{\infty}- 1}

By relating cc and ss_{\infty}, this equation makes it possible to estimate cc based on data, and possibly predict the behavior of future epidemics.

Analysis and Simulation

Let's compare this analytic result to the results from simulation. I'll create an array of values for ss_{\infty}.

s_inf_array = linspace(0.003, 0.99, 50)

And compute the corresponding values of cc:

from numpy import log c_array = log(s_inf_array) / (s_inf_array - 1)

To get the total infected, we compute the difference between s(0)s(0) and s()s(\infty), then store the results in a Series:

frac_infected = 1 - s_inf_array

The ModSim library provides a function called make_series we can use to put c_array and frac_infected in a Pandas Series.

frac_infected_series = make_series(c_array, frac_infected)

Now we can plot the results:

plot_sweep_frame(frame) frac_infected_series.plot(label='analysis') decorate(xlabel='Contact number (c)', ylabel='Fraction infected')

When the contact number exceeds 1, analysis and simulation agree. When the contact number is less than 1, they do not: analysis indicates there should be no infections; in the simulations there are a small number of infections.

The reason for the discrepancy is that the simulation divides time into a discrete series of days, whereas the analysis treats time as a continuous quantity. When the contact number is large, these two models agree; when it is small, they diverge.

Estimating Contact Number

The previous figure shows that if we know the contact number, we can estimate the fraction of the population that will be infected with just a few arithmetic operations. We don't have to run a simulation.

We can also read the figure the other way; if we know what fraction of the population was affected by a past outbreak, we can estimate the contact number. Then, if we know one of the parameters, like gamma, we can use the contact number to estimate the other parameter, like beta.

At least in theory, we can. In practice, it might not work very well, because of the shape of the curve.

  • When the contact number is low, the curve is quite steep, which means that small changes in cc yield big changes in the number of infections. If we observe that the total fraction infected is anywhere from 20% to 80%, we would conclude that cc is near 2.

  • And when the contact number is high, the curve is nearly flat, which means that it's hard to see the difference between values of cc between 3 and 6.

With the uncertainty of real data, we might not be able to estimate cc with much precision. But as one of the exercises below, you'll have a chance to try.

Summary

In this chapter we used simulations to explore the relationship between beta, gamma, and the fraction infected. Then we used analysis to explain that relationship.

With that, we are done with the Kermack-McKendrick model. In the next chapter we'll move on to thermal systems and the notorious coffee cooling problem.

Exercises

This chapter is available as a Jupyter notebook where you can read the text, run the code, and work on the exercises. You can access the notebooks at https://allendowney.github.io/ModSimPy/.

Exercise 1

At the beginning of this chapter, I suggested two ways to relate beta and gamma: we could compute their difference or their ratio.

Because the ratio is dimensionless, I suggested we explore it first, and that led us to discover the contact number, which is beta/gamma. When we plotted the fraction infected as a function of the contact number, we found that this metric falls on a single curve, at least approximately. That indicates that the ratio is enough to predict the results; we don't have to know beta and gamma individually.

But that leaves a question open: what happens if we do the same thing using the difference instead of the ratio?

Write a version of plot_sweep_frame, called plot_sweep_frame_difference, that plots the fraction infected versus the difference beta-gamma.

What do the results look like, and what does that imply?

# Solution goes here
# Solution goes here
# Solution goes here

Exercise 2

Suppose you run a survey at the end of the semester and find that 26% of students had the Freshman Plague at some point. What is your best estimate of c?

Hint: if you display frac_infected_series, you can read off the answer.

# Solution goes here
# Solution goes here

Exercise 3

So far the only metric we have considered is the total fraction of the population that gets infected over the course of an epidemic. That is an important metric, but it is not the only one we care about.

For example, if we have limited resources to deal with infected people, we might also be concerned about the number of people who are sick at the peak of the epidemic, which is the maximum of I.

Write a version of sweep_beta that computes this metric, and use it to compute a SweepFrame for a range of values of beta and gamma. Make a contour plot that shows the value of this metric as a function of beta and gamma.

Then use plot_sweep_frame to plot the maximum of I as a function of the contact number, beta/gamma. Do the results fall on a single curve?

# Solution goes here
# Solution goes here
# Solution goes here
# Solution goes here
# Solution goes here
# Solution goes here
# Solution goes here

Under the Hood

ModSim provides make_series to make it easier to create a Pandas Series. In this chapter, we used it like this:

frac_infected_series = make_series(c_array, frac_infected)

If you import Series from Pandas, you can make a Series yourself, like this:

from pandas import Series frac_infected_series = Series(frac_infected, c_array)

The difference is that the arguments are in reverse order: the first argument is stored as the values in the Series; the second argument is stored as the index.

I find that order counterintuitive, which is why I use make_series. make_series takes the same optional keyword arguments as Series, which you can read about at https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.Series.html.