Real-time collaboration for Jupyter Notebooks, Linux Terminals, LaTeX, VS Code, R IDE, and more,
all in one place.

GitHub Repository: AllenDowney/ModSimPy
Path: blob/master/examples/salmon.ipynb
Views: ⁵³¹

Kernel: Python 3 (ipykernel)

Salmon

Modeling and Simulation in Python

License: Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International

In [1]:

# install Pint if necessary

try:
    import pint
except ImportError:
    !pip install pint

In [2]:

# download modsim.py if necessary

from os.path import basename, exists

def download(url):
    filename = basename(url)
    if not exists(filename):
        from urllib.request import urlretrieve
        local, _ = urlretrieve(url, filename)
        print('Downloaded ' + local)
    
download('https://github.com/AllenDowney/ModSimPy/raw/master/modsim.py')

In [3]:

# import functions from modsim

from modsim import *

Can we predict salmon populations?

Each year the U.S. Atlantic Salmon Assessment Committee reports estimates of salmon populations in oceans and rivers in the northeastern United States. The reports are useful for monitoring changes in these populations, but they generally do not include predictions.

The goal of this case study is to model year-to-year changes in population, evaluate how predictable these changes are, and estimate the probability that a particular population will increase or decrease in the next 10 years.

As an example, I'll use data from page 18 of the 2017 report, which provides population estimates for the Narraguagus and Sheepscot Rivers in Maine.

USASAC_Report_2017_Page18

There are tools for extracting data from a PDF document automatically, but for this example I will keep it simple and type it in.

Here are the population estimates for the Narraguagus River:

In [4]:

pops = [2749, 2845, 4247, 1843, 2562, 1774, 1201, 1284, 1287, 
        2339, 1177, 962, 1176, 2149, 1404, 969, 1237, 1615, 1201]

To get this data into a Pandas Series, I'll also make a range of years to use as an index.

In [5]:

years = linrange(1997, 2015)
years

And here's the series.

In [6]:

pop_series = TimeSeries(pops, index=years)
pop_series

Here's what it looks like:

In [7]:

def plot_population(series):
    series.plot(label='Estimated population')
    decorate(xlabel='Year', 
             ylabel='Population estimate', 
             title='Narraguacus River',
             ylim=[0, 5000])
    
plot_population(pop_series)

Modeling changes

To see how the population changes from year-to-year, I'll use diff to compute the absolute difference between each year and the next and shift to align the changes with the year they happened.

In [8]:

abs_diffs = pop_series.diff().shift(-1)
abs_diffs

We can compute relative differences by dividing by the original series elementwise.

In [9]:

rel_diffs = abs_diffs / pop_series
rel_diffs

These relative differences are observed annual net growth rates. So let's drop the NaN and save them.

In [10]:

rates = rel_diffs.dropna()
rates

A simple way to model this system is to draw a random value from this series of observed rates each year. We can use the NumPy function choice to make a random choice from a series.

In [11]:

np.random.choice(rates)

Simulation

Now we can simulate the system by drawing random growth rates from the series of observed rates.

I'll start the simulation in 2015.

In [12]:

t_0 = 2015
p_0 = pop_series[t_0]

I'll create a System object with variables t_0, p_0, rates, and duration=10 years.

The series of observed rates is one big parameter of the model.

In [13]:

system = System(t_0=t_0,
                p_0=p_0,
                duration=10,
                rates=rates)

Write an update functon that takes as parameters pop, t, and system. It should choose a random growth rate, compute the change in population, and return the new population.

In [14]:

# Solution goes here

Test your update function and run it a few times

In [15]:

update_func1(p_0, t_0, system)

Here's a version of run_simulation that stores the results in a TimeSeries and returns it.

In [16]:

def run_simulation(system, update_func):
    """Simulate a queueing system.
    
    system: System object
    update_func: function object
    """
    t_0 = system.t_0
    t_end = t_0 + system.duration
    
    results = TimeSeries()
    results[t_0] = system.p_0
    
    for t in linrange(t_0, t_end):
        results[t+1] = update_func(results[t], t, system)

    return results

Use run_simulation to run generate a prediction for the next 10 years.

Then plot your prediction along with the original data. Your prediction should pick up where the data leave off.

In [17]:

# Solution goes here

To get a sense of how much the results vary, we can run the model several times and plot all of the results.

In [18]:

def plot_many_simulations(system, update_func, iters):
    """Runs simulations and plots the results.
    
    system: System object
    update_func: function object
    iters: number of simulations to run
    """
    for i in range(iters):
        results = run_simulation(system, update_func)
        results.plot(color='gray', label='', linewidth=1, alpha=0.3)

The plot option alpha=0.1 makes the lines semi-transparent, so they are darker where they overlap.

Run plot_many_simulations with your update function and iters=30. Also plot the original data.

In [19]:

# Solution goes here

The results are highly variable: according to this model, the population might continue to decline over the next 10 years, or it might recover and grow rapidly!

It's hard to say how seriously we should take this model. There are many factors that influence salmon populations that are not included in the model. For example, if the population starts to grow quickly, it might be limited by resource limits, predators, or fishing. If the population starts to fall, humans might restrict fishing and stock the river with farmed fish.

So these results should probably not be considered useful predictions. However, there might be something useful we can do, which is to estimate the probability that the population will increase or decrease in the next 10 years.

Distribution of net changes

To describe the distribution of net changes, write a function called run_many_simulations that runs many simulations, saves the final populations in a SweepSeries, and returns the SweepSeries.

In [20]:

def run_many_simulations(system, update_func, iters):
    """Runs simulations and report final populations.
    
    system: System object
    update_func: function object
    iters: number of simulations to run
    
    returns: series of final populations
    """
    # FILL THIS IN

In [21]:

# Solution goes here

Test your function by running it with iters=5.

In [22]:

run_many_simulations(system, update_func1, 5)

Now we can run 1000 simulations and describe the distribution of the results.

In [23]:

last_pops = run_many_simulations(system, update_func1, 1000)
last_pops.describe()

If we substract off the initial population, we get the distribution of changes.

In [24]:

net_changes = last_pops - p_0
net_changes.describe()

The median is negative, which indicates that the population decreases more often than it increases.

We can be more specific by counting the number of runs where net_changes is positive.

In [25]:

np.sum(net_changes > 0)

Or we can use mean to compute the fraction of runs where net_changes is positive.

In [26]:

np.mean(net_changes > 0)

And here's the fraction where it's negative.

In [27]:

np.mean(net_changes < 0)

So, based on observed past changes, this model predicts that the population is more likely to decrease than increase over the next 10 years, by about 2:1.

A refined model

There are a few ways we could improve the model.

It looks like there might be cyclic behavior in the past data, with a period of 4-5 years. We could extend the model to include this effect.
Older data might not be as relevant for prediction as newer data, so we could give more weight to newer data.

The second option is easier to implement, so let's try it.

I'll use linspace to create an array of "weights" for the observed rates. The probability that I choose each rate will be proportional to these weights.

The weights have to add up to 1, so I divide through by the total.

In [28]:

weights = linspace(0, 1, len(rates))
weights /= sum(weights)
weights

I'll add the weights to the System object, since they are parameters of the model.

In [29]:

system.weights = weights

We can pass these weights as a parameter to np.random.choice (see the documentation)

In [30]:

np.random.choice(system.rates, p=system.weights)

Write an update function that takes the weights into account.

In [31]:

# Solution goes here

Use plot_many_simulations to plot the results.

In [32]:

# Solution goes here

Use run_many_simulations to collect the results and describe to summarize the distribution of net changes.

In [33]:

# Solution goes here

Does the refined model have much effect on the probability of population decline?

In [34]:

# Solution goes here

Real-time collaboration for Jupyter Notebooks, Linux Terminals, LaTeX, VS Code, R IDE, and more,
all in one place.

Salmon

Can we predict salmon populations?

Modeling changes

Simulation

Distribution of net changes

A refined model

Product

Resources

Company

Real-time collaboration for Jupyter Notebooks, Linux Terminals, LaTeX, VS Code, R IDE, and more, all in one place.

Salmon

Can we predict salmon populations?

Modeling changes

Simulation

Distribution of net changes

A refined model

Real-time collaboration for Jupyter Notebooks, Linux Terminals, LaTeX, VS Code, R IDE, and more,
all in one place.