Book a Demo!
CoCalc Logo Icon
StoreFeaturesDocsShareSupportNewsAboutPoliciesSign UpSign In
uob-COMS30035
GitHub Repository: uob-COMS30035/lab_sheets_public
Path: blob/main/activity_recognition.ipynb
336 views
Kernel: ml_labs

Activity Recognition with Healthy Older People Using a Batteryless Wearable Sensor

This notebook provides code for loading the activity recognition dataset into a suitable format for classification and sequence labelling.

The code is divided into three sections:

  • Loading the data

  • Sequence labelling: processing the data into a suitable format for sequence labelling.

  • Classification: processing the data into a suitable format for use with an IID classifier.

We recommend running all three sections, then using the variables produced by the code as required for sequence labelling and IID classification.

Loading the Data

Run the following cells to load the data from disk. You will need to run code from the subsequent sections (or your own code) to get the data into a suitable format for classification and sequence labelling.

import os data_path = "activity_recognition_dataset/S1_Dataset" files = os.listdir(data_path) print(files)
import pandas as pd # use pandas to load data from CSV files combined_data = [] for file in files: if file == "README.txt": continue try: # load a single sequence from the file seq_dataframe = pd.read_csv(os.path.join(data_path, file), header=None, names=["time", "frontal acc", "vertical acc", "lateral acc", "antenna ID", "RSSI", "phase", "frequency", "label"]) # put the ID of the sequence into the dataframe as an extra column seq_dataframe['seqID'] = file # use filename as ID combined_data.append(seq_dataframe) # put the data frame into a list except: print(f"Could not load file {file}. Skipping.")

Sequence Labelling

The following cell creates two variables that you can use for sequence labelling:

  • X_by_seq is a list of 2-D numpy arrays. Each numpy array in this list corresponds to on data sequence and contains the input feature values for that sequence.

  • y_by_seq is a list of 1-D numpy arrays, where each array contains the target class labels for each sequence.

import numpy as np # the pandas dataframe stores data in a table with headers input_cols = ["frontal acc", "vertical acc", "lateral acc", "RSSI", "phase", "frequency"] # column headers for the input features output_col = "label" # column header for the output label # get the relevant columns from the pandas dataframes and convert to numpy arrays X_by_seq = [] # store a list of numpy arrays containing the input features for each sequence y_by_seq = [] # store a list of 1-D numpy arrays containing the target activity labels for each sequence for seq_table in combined_data: X_by_seq.append(seq_table[input_cols].values) y_by_seq.append(seq_table[output_col].values - 1) # subtract one from the label so that labels start from 0 n_states = np.unique(np.concatenate(y_by_seq)).size # how many classes/states are there?

Variables for sequence labelling

The cell below produces the data you need for sequence labelling... You should be able to work with these variables directly.

from sklearn.model_selection import train_test_split # create train/test split. Sequences are kept complete. X_by_seq_tr, X_by_seq_test, y_by_seq_tr, y_by_seq_test = train_test_split(X_by_seq, y_by_seq, test_size=0.2, random_state=21) # You may wish to make further splits of the data or to modify this split.

You may find the following code useful for creating a sequence labelling model:

# Record which observations occur given each state (activity label) X_given_y = [[] for _ in range(n_states)] # empty list where we will record the observations that occur given each activity label for s, X_seq in enumerate(X_by_seq_tr): for i in range(X_seq.shape[0]): state_i = y_by_seq_tr[s][i] X_given_y[state_i].append(X_seq[i, :][None, :])
from hmmlearn import hmm # We recommend using this class if builing an HMM # Record the mean feature values for observations in each state means = np.zeros((n_states, len(input_cols))) # Record the variance of feature values for observations in each state diagonal_covars = np.zeros((n_states, len(input_cols))) for state in range(n_states): means[state] = np.mean(X_given_y[state], axis=0) diagonal_covars[state, :] = np.var(X_given_y[state], axis=0)

IID Classification

In this code, we take the X and y lists produced for sequence labelling, and concatenate the data points for all sequences. This produces a single set of training data and a single set of test data, which are not divided into separate sequences.

X_tr = np.concatenate(X_by_seq_tr, axis=0) # combine features into one matrix -- use this as input features for training a classifier y_tr = np.concatenate(y_by_seq_tr) # combine target labels into one list -- use this as target labels for training a classifier
X_test = np.concatenate(X_by_seq_test, axis=0) # combine features into one matrix -- use this as input features for testing a classifier y_test = np.concatenate(y_by_seq_test) # combine target labels into one list -- use this as target labels for evaluating a classifier