Path: blob/master/lessons/lesson_12/python-notebooks-data-wrangling/Data-Extraction--NASA-Text.ipynb
1904 views
Data extraction - NASA climate plaintext data
A set of examples on how to extract machine-readable data from the raw, official sources. No pandas needed, just requests and regex and xlrd (for Excel spreadsheets)
(in progress)
File system setup
Global Surface Air Temperature Anomaly, via NASA
Source: NASA Goddard Institute for Space Studies (GISS) Surface Temperature Analysis.
Our traditional analysis using only meteorological station data is a line plot of global annual-mean surface air temperature change, with the base period 1951-1980, derived from the meteorological station network [This is an update of Plate 6(b) in Hansen et al. (2001).] Uncertainty bars (95% confidence limits) are shown for both the annual and five-year means, account only for incomplete spatial sampling of data.
The data file
Direct link to the source data file: http://data.giss.nasa.gov/gistemp/graphs_v3/Fig.A.txt
The contents: From 1880 to 2015, the change in global average surface air temperature change, compared to the average global temperature measured in the period 1951 to 1980.
An excerpt of the file:
The years 1885 and 2015 are said to have a global average temperature of -0.51 and +1.01 degrees Celsius, respectively, from the average temperature as measured in the period of 1951-1980.
The 5-year mean of 1882 -- -0.48 -- is the rolling average of the annual means for 1880 through 1884.
Parsing and wrangling the temperature text file
This can be done with using regular expressions and re.findall()
. In the snippet below, I write two files, since there aren't 5-year mean values for every annual mean value:
NASA CO2 gases
Source...?: NASA GISS: Forcings in GISS Climate Model.
I'm not really sure what the landing page for the following data set comes from. It seems to consist of data from:
NOAA's World Data Center for Paleoclimatology's ice core research.
NASA measurements from Mauna Loa
NOAA/ESRL measurements
The data file
Direct link to the source file: http://data.giss.nasa.gov/modelforce/ghgases/Fig1A.ext.txt
The observed global average of carbon dioxide gas in parts-per million.
An excerpt of the file:
Parsing and wrangling the global gases file
As in the temperatures-file example, just adroit use of regular expressions. However, there's one wrinkle: the data file contains two sections; observations, as excerpted above, and "Future Scenarios":
We want to wrangle only the data before the future scenarios section: