Real-time collaboration for Jupyter Notebooks, Linux Terminals, LaTeX, VS Code, R IDE, and more,
all in one place.
Real-time collaboration for Jupyter Notebooks, Linux Terminals, LaTeX, VS Code, R IDE, and more,
all in one place.
Path: blob/master/notebooks/chap10.ipynb
Views: 531
Modeling and Simulation in Python
Chapter 10
Copyright 2017 Allen Downey
Under the hood
To get a DataFrame
and a Series
, I'll read the world population data and select a column.
DataFrame
and Series
contain a variable called shape
that indicates the number of rows and columns.
A DataFrame
contains index
, which labels the rows. It is an Int64Index
, which is similar to a NumPy array.
And columns
, which labels the columns.
And values
, which is an array of values.
A Series
does not have columns
, but it does have name
.
It contains values
, which is an array.
And it contains index
:
If you ever wonder what kind of object a variable refers to, you can use the type
function. The result indicates what type the object is, and the module where that type is defined.
DataFrame
, Int64Index
, Index
, and Series
are defined by Pandas.
ndarray
is defined by NumPy.
Optional exercise
The following exercise provides a chance to practice what you have learned so far, and maybe develop a different growth model. If you feel comfortable with what we have done so far, you might want to give it a try.
Optional Exercise: On the Wikipedia page about world population estimates, the first table contains estimates for prehistoric populations. The following cells process this table and plot some of the results.
Select tables[1]
, which is the second table on the page.
Not all agencies and researchers provided estimates for the same dates. Again NaN
is the special value that indicates missing data.
Again, we'll replace the long column names with more convenient abbreviations.
Some of the estimates are in a form Pandas doesn't recognize as numbers, but we can coerce them to be numeric.
Here are the results. Notice that we are working in millions now, not billions.
We can use xlim
to zoom in on everything after Year 0.
See if you can find a model that fits these data well from Year 0 to 1950.
How well does your best model predict actual population growth from 1950 to the present?