Real-time collaboration for Jupyter Notebooks, Linux Terminals, LaTeX, VS Code, R IDE, and more,
all in one place. Commercial Alternative to JupyterHub.

GitHub Repository: AllenDowney/ModSimPy
Path: blob/master/chapters/chap10.ipynb
Views: ⁶²²

Kernel: Python 3 (ipykernel)

Printed and electronic copies of Modeling and Simulation in Python are available from No Starch Press and Bookshop.org and Amazon.

Case Studies Part 1

Modeling and Simulation in Python

License: Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International

In [1]:

# download modsim.py if necessary

from os.path import basename, exists

def download(url):
    filename = basename(url)
    if not exists(filename):
        from urllib.request import urlretrieve
        local, _ = urlretrieve(url, filename)
        print('Downloaded ' + local)
    
download('https://raw.githubusercontent.com/AllenDowney/' +
         'ModSimPy/master/modsim.py')

In [2]:

# import functions from modsim

from modsim import *

This chapter presents case studies where you can apply the tools we have learned so far to problems involving population growth, queueing systems, and tree growth.

This chapter is available as a Jupyter notebook where you can read the text, run the code, and work on the exercises. Click here to access the notebooks: https://allendowney.github.io/ModSimPy/.

Historical World Population

The Wikipedia page about world population growth includes estimates for world population from 12,000 years ago to the present (see https://en.wikipedia.org/wiki/World_population_estimates.html).

The following cells download an archived version of this page and read the data into a Pandas DataFrame.

In [3]:

download('https://raw.githubusercontent.com/AllenDowney/' +
         'ModSimPy/master/data/World_population_estimates.html')

In [4]:

from pandas import read_html

filename = 'World_population_estimates.html'
tables = read_html(filename, header=0, index_col=0, decimal='M')
len(tables)

In [5]:

table1 = tables[1]
table1.head()

Some of the values are null because not all researchers provide estimates for the same dates.

Again, we'll replace the long column names with more convenient abbreviations.

In [6]:

table1.columns = ['PRB', 'UN', 'Maddison', 'HYDE', 'Tanton', 
                  'Biraben', 'McEvedy & Jones', 'Thomlinson', 'Durand', 'Clark']

Some of the estimates are in a form Pandas doesn't recognize as numbers, but we can coerce them to be numeric.

In [7]:

for col in table1.columns:
    table1[col] = pd.to_numeric(table1[col], errors='coerce')

Here are the results. Notice that we are working in millions now, not billions.

In [8]:

table1.plot()
decorate(xlim=[-10000, 2000], xlabel='Year', 
         ylabel='World population (millions)',
         title='Prehistoric population estimates')
plt.legend(fontsize='small');

We can use xlim to zoom in on everything after Year 0.

The following figure shows the estimates of several research groups from 1 CE to the near present.

In [9]:

table1.plot()
decorate(xlim=[0, 2000], xlabel='Year', 
         ylabel='World population (millions)',
         title='CE population estimates')

See if you can find a model that fits these estimates. How well does your best model predict actual population growth from 1940 to the present?

In [10]:

tables = read_html(filename, header=0, index_col=0, decimal='M')
table2 = tables[2]
table2.columns = ['census', 'prb', 'un', 'maddison', 
                  'hyde', 'tanton', 'biraben', 'mj', 
                  'thomlinson', 'durand', 'clark']

In [11]:

un = table2.un / 1e9
census = table2.census / 1e9

In [12]:

# Solution goes here

In [13]:

# Solution goes here

One Queue Or Two?

This case study is related to queueing theory, which is the study of systems that involve waiting in lines, also known as "queues".

Suppose you are designing the checkout area for a new store. There is enough room in the store for two checkout counters and a waiting area for customers. You can make two lines, one for each counter, or one line that feeds both counters.

In theory, you might expect a single line to be better, but it has some practical drawbacks: in order to maintain a single line, you might have to install barriers, and customers might be put off by what seems to be a longer line, even if it moves faster.

So you'd like to check whether the single line is really better and by how much. Simulation can help answer this question.

This figure shows the three scenarios we'll consider:

One queue, one server (left), one queue, two servers (middle), two

One queue, one server (left), one queue, two servers (middle), two queues, two servers (right).

As we did in the bike share model, we'll divide time into discrete time steps of one minute. And we'll assume that a customer is equally likely to arrive during any time step. I'll denote this probability using the Greek letter lambda, $\lambda$ , or the variable name lam. The value of $\lambda$ probably varies from day to day, so we'll have to consider a range of possibilities.

Based on data from other stores, you know that it takes 5 minutes for a customer to check out, on average. But checkout times are variable: most customers take less than 5 minutes, but some take substantially more. A simple way to model this variability is to assume that when a customer is checking out, they always have the same probability of finishing during the next time step, regardless of how long they have been checking out. I'll denote this probability using the Greek letter mu, $\mu$ , or the variable name mu.

If we choose $\mu=1/5$ per minute, the average time for each checkout will be 5 minutes, which is consistent with the data. Most people take less than 5 minutes, but a few take substantially longer, which is probably not a bad model of the distribution in real stores.

Now we're ready to implement the model. In the repository for this book, you'll find a notebook called queue.ipynb that contains some code to get you started and instructions. You can download it from https://github.com/AllenDowney/ModSimPy/raw/master/examples/queue.ipynb or run it on Colab at https://colab.research.google.com/github/AllenDowney/ModSimPy/blob/master/examples/queue.ipynb.

As always, you should practice incremental development: write no more than one or two lines of code at a time, and test as you go!

Predicting Salmon Populations

Each year the U.S. Atlantic Salmon Assessment Committee reports estimates of salmon populations in oceans and rivers in the northeastern United States. The reports are useful for monitoring changes in these populations, but they generally do not include predictions.

The goal of this case study is to model year-to-year changes in population, evaluate how predictable these changes are, and estimate the probability that a particular population will increase or decrease in the next 10 years.

As an example, I use data from the 2017 report, which provides population estimates for the Narraguagus and Sheepscot Rivers in Maine.

In the repository for this book, you'll find a notebook called salmon.ipynb that contains this data and some code to get you started. You can download it from https://github.com/AllenDowney/ModSimPy/raw/master/examples/salmon.ipynb or run it on Colab at https://colab.research.google.com/github/AllenDowney/ModSimPy/blob/master/examples/salmon.ipynb.

You should take my instructions as suggestions; if you want to try something different, please do!

Tree Growth

This case study is based on "Height-Age Curves for Planted Stands of Douglas Fir, with Adjustments for Density", a working paper by Flewelling et al. It provides site index curves, which are curves that show the expected height of the tallest tree in a stand of Douglas fir as a function of age, for a stand where the trees are the same age. Depending on the quality of the site, the trees might grow more quickly or slowly. So each curve is identified by a site index that indicates the quality of the site.

The goal of this case study is to explain the shape of these curves, that is, why trees grow the way they do. The answer I propose involves fractal dimensions, so you might find it interesting.

In the repository for this book, you'll find a notebook called trees.ipynb that incrementally develops a model of tree growth and uses it to fit the data. You can download it from https://github.com/AllenDowney/ModSimPy/raw/master/examples/trees.ipynb or run it on Colab at https://colab.research.google.com/github/AllenDowney/ModSimPy/blob/master/examples/trees.ipynb.

There are no exercises in this case study, but it is an example of what you can do with the tools we have so far and a preview of what you will be able to do with the tools in the next few chapters.

In [ ]:

Real-time collaboration for Jupyter Notebooks, Linux Terminals, LaTeX, VS Code, R IDE, and more,
all in one place. Commercial Alternative to JupyterHub.

Case Studies Part 1

Historical World Population

One Queue Or Two?

Predicting Salmon Populations

Tree Growth

Product

Resources

Company

Real-time collaboration for Jupyter Notebooks, Linux Terminals, LaTeX, VS Code, R IDE, and more, all in one place. Commercial Alternative to JupyterHub.

Case Studies Part 1

Historical World Population

One Queue Or Two?

Predicting Salmon Populations

Tree Growth

Real-time collaboration for Jupyter Notebooks, Linux Terminals, LaTeX, VS Code, R IDE, and more,
all in one place. Commercial Alternative to JupyterHub.