Book a Demo!
CoCalc Logo Icon
StoreFeaturesDocsShareSupportNewsAboutPoliciesSign UpSign In
YStrano
GitHub Repository: YStrano/DataScience_GA
Path: blob/master/lessons/lesson_16/01_time_series (done).ipynb
1904 views
Kernel: Python 3

Working With Time Series Data


Learning Objectives

After this lesson, you will be able to:

  • Identify time series data.

  • Explain the challenges of working with time series data.

  • Use the datetime library to represent dates as objects.

  • Preprocess time series data with Pandas.


A time series is a series of data points that's indexed (or listed, or graphed) in time order. Most commonly, a time series is a sequence that's taken at successive equally spaced points in time. Time series are often represented as a set of observations that have a time-bound relation, which is represented as an index.

Time series are commonly found in sales, analysis, stock market trends, economic phenomena, and social science problems.

These data sets are often investigated to evaluate the long-term trends, forecast the future, or perform some other form of analysis.

Check for Understanding: List some examples of real-world time series data.

Let's take a look at some Apple stock data to get a feel for what time series data look like.

import pandas as pd from datetime import timedelta %matplotlib inline aapl = pd.read_csv("data/aapl.csv")

Take a high-level look at the data. What are we looking at?

aapl.head()
aapl.describe()

As time is important to time series data, we will need to interpret these data in the ways that humans interpret them (which is many ways).

Python's DateTime library is great for dealing with time-related data, and Pandas has incorporated this library into its own datetime series and objects.

In this lesson, we'll review these data types and learn a little more about each of them:

  • datetime objects.

  • datetime series.

  • Timestamps.

  • timedelta().

datetime Objects

Below, we'll load in the DateTime library, which we can use to create a datetime object by entering in the different components of the date as arguments.

# The datetime library is something you should already have from Anaconda. from datetime import datetime
# Let's just set a random datetime — not the end of the world or anything. lesson_date = datetime(2012, 12, 21, 12, 21, 12, 844089)

The components of the date are accessible via the object's attributes.

print("Micro-Second", lesson_date.microsecond) print("Second", lesson_date.second) print("Minute", lesson_date.minute) print("Hour", lesson_date.hour) print("Day", lesson_date.day) print("Month",lesson_date.month) print("Year", lesson_date.year)
Micro-Second 844089 Second 12 Minute 21 Hour 12 Day 21 Month 12 Year 2012

timedelta()

Suppose we want to add time to or subtract time from a date. Maybe we're using time as an index and want to get everything that happened a week before a specific observation.

We can use a timedelta object to shift a datetime object. Here's an example:

# Import timedelta() from the DateTime library. from datetime import timedelta # Timedeltas represent time as an amount rather than as a fixed position. offset = timedelta(days=1, seconds=20) # The timedelta() has attributes that allow us to extract values from it. print('offset days', offset.days) print('offset seconds', offset.seconds) print('offset microseconds', offset.microseconds)
offset days 1 offset seconds 20 offset microseconds 0

datetime's .now() function will give you the datetime object of this very moment.

now = datetime.now() print("Like Right Now: ", now)
Like Right Now: 2018-06-11 18:45:48.670805

The current time is particularly useful when using timedelta().

print("Future: ", now + offset) print("Past: ", now - offset)
Future: 2018-06-12 18:46:08.670805 Past: 2018-06-10 18:45:28.670805

Note: The largest value a timedelta() can hold is days. For instance, you can't say you want your offset to be two years, 44 days, and 12 hours; you have to convert those years to days.

You can read more about the timedelta() category here.

Guided Practice: Apple Stock Data

We can practice using datetime functions and objects using Apple stock data.

aapl.head()

The Date column starts off as an object.

aapl.dtypes
Date object Open float64 High float64 Low float64 Close float64 Volume int64 dtype: object

Convert time data to a datetime object.

Overwrite the original Date column with one that's been converted to a datetime series.

aapl['Date'] = pd.to_datetime(aapl.Date)

We can see these changes reflected in the Date column structure.

aapl.head()

We can also see that the Date object has changed.

aapl.dtypes
Date datetime64[ns] Open float64 High float64 Low float64 Close float64 Volume int64 dtype: object

The .dt Attribute

Pandas' datetime columns have a .dt attribute that allows you to access attributes that are specific to dates. For example:

aapl.Date.dt.day aapl.Date.dt.month aapl.Date.dt.year aapl.Date.dt.weekday_name

And, there are many more!

aapl.Date.dt.weekday_name.head()
0 Friday 1 Thursday 2 Wednesday 3 Tuesday 4 Monday Name: Date, dtype: object
aapl.Date.dt.dayofyear.head()
0 13 1 12 2 11 3 10 4 9 Name: Date, dtype: int64

Check out the Pandas .dt documentation for more information.

Timestamps

Timestamps are useful objects for comparisons. You can create a timestamp object using the pd.to_datetime() function and a string specifying the date. These objects are especially helpful when you need to perform logical filtering with dates.

ts = pd.to_datetime('1/1/2017') ts
Timestamp('2017-01-01 00:00:00')

The main difference between a datetime object and a timestamp is that timestamps can be used as comparisons.

Let's use the timestamp ts as a comparison with our Apple stock data.

aapl.loc[aapl.Date >= ts].head()

We can even get the first and last dates from a time series.

aapl.Date.max() - aapl.Date.min()
Timedelta('360 days 00:00:00')

Check for Understanding: Why do we convert the DataFrame column containing the time information into a datetime object?

Set datetime to Index the DataFrame

After converting the column containing time data from object to datetime, it is also useful to make the index of the DataFrame a datetime.

aapl.head()

Let's set the Date column as the index.

aapl.set_index('Date', inplace=True)
aapl.head()

Filtering by Date with Pandas

It is easy to filter by date using Pandas. Let's create a subset of data containing only the stock prices from 2017. We can specify the index as a string constant.

aapl['2017']

There are a few things to note about indexing with time series. Unlike numeric indexing, the end index will be included. If you want to index with a range, the time indices must be sorted first.

Recap: The steps for preprocessing time series data are to:

  • Convert time data to a datetime object.

  • Set datetime to index the DataFrame.

aapl.resample('M').mean()
aapl.resample('M').first() aapl.resample('M').last() aapl.resample('Y').first() aapl.resample('Q').first()

Recap

  • We use time series analysis to identify changes in values over time.

  • The datetime library makes working with time data more convenient.

  • To preprocess time series data with Pandas, you:

    1. Convert the time column to a datetime object.

    2. Set the time column as the index of the DataFrame.

Instructor Note: These are optional and can be assigned as student practice questions outside of class.

1) Create a datetime object representing today's date.

today = datetime.now().date()
datetime.today()
datetime.datetime(2018, 6, 11, 19, 10, 13, 394596)
datetime.now()
datetime.datetime(2018, 6, 11, 19, 10, 28, 321598)
datetime.today() == datetime.now()
False

2) Load the UFO data set from the internet.

import pandas as pd from datetime import timedelta %matplotlib inline ufo = pd.read_csv('http://bit.ly/uforeports')
ufo.head()

3) Convert the Time column to a datetime object.

ufo['Time'] = pd.to_datetime(ufo['Time'])
ufo.head()

4) Set the Time column to the index of the dataframe.

ufo.set_index('Time', inplace=True)

5) Create a timestamp object for the date January 1, 1999.

ts = pd.to_datetime('1/1/1999')
jan_1 = datetime(1999, 1, 1)

6) Use the timestamp object to perform logical filtering on the DataFrame and create a subset of entries with a date above or equal to January 1, 1999.

ufo.loc[ufo.index>jan_1]