Real-time collaboration for Jupyter Notebooks, Linux Terminals, LaTeX, VS Code, R IDE, and more,
all in one place. Commercial Alternative to JupyterHub.
Real-time collaboration for Jupyter Notebooks, Linux Terminals, LaTeX, VS Code, R IDE, and more,
all in one place. Commercial Alternative to JupyterHub.
Math 480: Open Source Mathematical Software
2016-05-09
William Stein
Lectures 19: Pandas (part 1 of 3)
Notes:
Homework (and grading that is due this friday at 6pm) is assigned
Screencast...
We will talk about Pandas this week, then statsmodels and numpy/scipy starting next week (rather than wait until the end).
Pandas - overview
Pandas foundations ("in 10 minutes")
Start on your homework
Pandas Overview
"pandas is an open source, BSD-licensed library providing high-performance, easy-to-use data structures and data analysis tools for the Python programming language."
Problem pandas solves: data analysis and modeling. pandas enables you to carry out your entire data analysis workflow in Python without having to switch to a more domain specific language like R.
Pandas does not implement significant modeling functionality outside of linear and panel regression. Instead one uses statsmodels ("estimate statistical models, and perform statistical tests") and scikit-learn ("Machine Learning in Python"), which we will look at next week.
Look at the overview of functionality at the bottom here: http://pandas.pydata.org/#library-highlights
Next, let's see some very basic foundations, before you try it out...
Look at the very beginning of ten-minutes-to-pandas.sagews in same directory.
Look at the beginning of plotting.sagews in same directory.
Wednesday: pandas101 data example