Path: blob/master/ab_tests/quantile_regression/quantile_regression.ipynb
2597 views
Quantile Regression
When working with real-world regression model, often times knowing the uncertainty behind each point estimation can make our predictions more actionable in a business settings. One method of going from a single point estimation to a range estimation or so called prediction interval is known as Quantile Regression.
For example, consider historical sales of an item under a certain circumstance are (10000, 10, 50, 100). Standard least squares method would gives us an estimate of 2540. If we were to restock based on that prediction, we would likely going to significantly overstock 75% of the time. But if we estimate the quantiles of the data distribution, the estimated 5th, 50th, and 95th percentiles are 16, 75, 8515, which are much more informative than the 2540 single estimation. After producing these range estimation we can also leverage some business context that might be hard to incorporate into our data analysis and use that to determine the final business decision.
Objective Function
As we might recall, for linear regression or so called ordinary least squares (OLS), we assume the relationship between our input variable and our output label can be modeled by a linear function.
And the most common objective function is squared error.
With quantile regression, we have an additional parameter , which specifies the quantile of our target variable that we're interested in modeling, where and our objective function becomes:
Let's try and get some intuition of what this objective function is telling us. The quantile loss differs depending on the evaluated quantile. Such that more negative errors are penalized more when we specify a higher quantiles and more positive errors are penalized more for lower quantiles. To confirm that this is actually the case, the code chunk below simulates the quantile loss at different quantile values.
Let's look at each line separately:
The orange line shows the median, which is symmetric around zero. The median aims to bisect the set of predictions, so we want to weigh underestimates equally to overestimates. As it turns out choosing a quantile of 0.5 is equivalent to modeling the absolute values of the residuals .
The blue line shows the 10th percentile, which assigns a lower loss to negative errors and a higher loss to positive errors. The 10th percentile means we think there's a 10 percent chance that the true value is below the predicted value, so it makes sense to assign less weight to underestimates than to overestimates.
The green blue line shows the 90th percentile, which is the opposite of the 10th percentile.
Quantile Regression With LightGBM
In the following section, we generate a sinoide function + random gaussian noise, with 80% of the data points being our training samples (blue points) and the rest being our test samples (red points). Generating a 1 dimension fake data allows us to easily visualize it and gain intuition on what sort of black magic our algorithm is doing.
We first use the squared error loss as our objective function to train our tree model and visualize the predicted value versus the ground truth.
For tree models, it's not possible to predict more than one value per model. Therefore what we'll do is training different models which predict different quantile values. Then just as before, we plot the prediction of various models that were specified to optimize on different quantile values against the ground truth.
A quick inspection of the plot above shows that by training a model using 10% quantile and 90% quantile, we were able to generate a prediction interval for our predictions.
Like all other machine learning tasks, we can check how our model performed by looking at some evaluation metrics. Since our quantile loss differs depending on the quantile we've specified. Hence, we'll look at the quantile loss at different levels for each of our model.
Not surprisingly, model that were specified to optimized for quantile 10%/50%/90% performed better on quantile 10%/50%/90% loss respectively.
We can also apply quantile regression's objective function and use it to train other sorts of models such as deep learning, but we won't touch upon it here. The reference section includes links that shows how to achieve this with various modern deep learning frameworks.
Mentions of quantile regression applied in practice. Blog: How Instacart delivers on time (using quantile regression)
Instacart has a model that predicts delivery times, but they needed a way to account for how big the predicted error can be so that it is actually delivered on time. Even if our predictive model is unbiased, we are only correct on average. This prediction error, or what they termed as buffer time, needs to be high enough to cover the risk of lateness in most of the cases. But if the buffer is too high, they might lose efficiency by reducing the size of the feasible space of the optimization problem, as fewer shoppers might be considered for a given order. A quantile regression of 0.9 will give us an upper bound of the delivery time, that will be used to make sure the delivery will not be late 90% of the time