Predicting Prices Movements With sklearn

An example of sklearn model building, training, saving in the ObjectStore, and loading.

Import Libraries

Let's start by importing the functionality we'll need to build the model and to split our data into training/testing sets. We also import pickle so we can store our model in ObjectStore later.

In [3]:

from sklearn.svm import SVR
from sklearn.model_selection import GridSearchCV
from sklearn.model_selection import train_test_split
import pickle

Gather & Prepare Data

Let's retrieve some intraday data for the SPY by making a History request.

In [ ]:

qb = QuantBook()
spy = qb.AddEquity('SPY')
history = qb.History(qb.Securities.Keys, 360, Resolution.Daily)
spy_hist = history.loc['SPY']
spy_hist

We create a function that prepares our data suitable for training and testing our Model. We use 5 steps of OHLCV data to predict the closing price of the bar right after. By tying this to a function, we increase clarity, as well as reusability, especially if we were to copy it into a class in a .py file.

In [ ]:

# function to prepare our data for training our ML Model
def prep_data(data, n_tsteps=5):
    # n_tsteps is the number of time steps at and before time t we want to use
    #   to predict the close price at time t + 1
    
    # this helps normalizes the data
    df = data.pct_change()
    
    # drop the NaNs and infinities
    with pd.option_context('mode.use_inf_as_na', True):
        df = df.dropna()
    
    features = []
    labels = []

    for i in range(len(df)-n_tsteps):
        input_data = df.iloc[i:i+n_tsteps].values.flatten()
        features.append(input_data)
        label = df['close'].iloc[i+n_tsteps]
        labels.append(label)

    return np.array(features), np.array(labels)

Build the Model (SVR with GridSearch Hyperparameter Optimization)

Let's build the model using sklearn. We use a Support Vector Regressor as it works well with non-linear data. Furthermore, we optimize the hyperparameters of this model using GridSearchCV. We encourage users to experiment with different optimizable hyperparameters (e.g. kernel type) and models (e.g. Random Forests).

In [ ]:

def build_model(X, y):
    # note: grid parameters are typically unique to the model
    param_grid = {'C': [.05, .1, .5, 1, 5, 10], 'epsilon': [0.001, 0.005, 0.01, 0.05, 0.1], 'gamma': ['auto', 'scale']} 
    gsc = GridSearchCV(SVR(), param_grid, scoring='neg_mean_squared_error', cv=5)
    model = gsc.fit(X, y).best_estimator_
    return model

def build_model_simple(X, y):
    # similar to above, but without hyperparameter optimization
    model = SVR()
    model.fit(X, y)
    return model

Let's build and train our model by feeding in data prepared using the prep_data function.

In [ ]:

X, y = prep_data(spy_hist)

# split the data for training and testing
#   we need testing data to evaluate how well our model performs on new data 
X_train, X_test, y_train, y_test = train_test_split(X, y)

model = build_model(X_train, y_train)

Analyze Performance

We then make predictions on the testing data set. We compare our Predicted Values with the Expected Values by plotting both to see if our Model has predictive power.

In [ ]:

y_hat = model.predict(X_test)
df = pd.DataFrame({'y': y_test.flatten(), 'y_hat': y_hat.flatten()})
df.plot(title='Model Performance: predicted vs actual %change in closing price')

Save the Model to ObjectStore

We dump the model using the pickle module and save the resulting bytes to ObjectStore

In [ ]:

model_key = 'spy_model'

pickled_model = pickle.dumps(model)
qb.ObjectStore.SaveBytes(model_key, pickled_model)

Load Model from the ObjectStore

Let's first retrieve the bytes of the model from ObjectStore. When we retrieve the bytes from ObjectStore, we need to cast it into a form useable by pickle with the bytearray() method.

In [ ]:

if qb.ObjectStore.ContainsKey(model_key):
    model_bytes = qb.ObjectStore.ReadBytes(model_key)
    model_bytes = bytearray(model_bytes)
    loaded_model = pickle.loads(model_bytes)

To ensure the model was successfully loaded, let's see if the model is able to make predictions.

In [ ]:

y_hat = loaded_model.predict(X_test)
df = pd.DataFrame({'y': y_test.flatten(), 'y_hat': y_hat.flatten()})
df.plot(title='Model Performance: predicted vs actual %change in closing price')

Appendix

Below are some helper methods to manage the ObjectStore keys. We can use these to validate the saving and loading is successful.

In [ ]:

def get_ObjectStore_keys():
    return [str(j).split(',')[0][1:] for _, j in enumerate(qb.ObjectStore.GetEnumerator())]

def clear_ObjectStore():
    for key in get_ObjectStore_keys():
        qb.ObjectStore.Delete(key)

In [ ]:

clear_ObjectStore()