Path: blob/master/recsys/max_inner_product/max_inner_product.ipynb
2624 views
Maximum Inner Product
Matrix factorization are potent techniques in solving the collaborative filtering problem. It mainly involves building up the user-item interaction matrix, then decomposing it into a user latent factor (a.k.a embedding) and item latent factor each with some user specified dimension (a hyperparameter that we get to tweak).

To generate the items recommended for each user, we would perform a dot product between the two matrices and retrieve the top-k items that have the highest "scores". This process, however, can often times becomes a large bottleneck for these type of algorithms when the number of users and items becomes fairly large. As exhaustive computation of the dot product is extremely expensive. This document's focus is to demonstrate a order preserving transformation that converts the maximum inner product into a nearest neighborhood search problem to significantly speed up the process for generating the top-k recommendations.
Order Preserving Transformations
We'll first describe the notation we'll be using. Lower case is for scalars, , bold lower case for vectors, , and bold upper case for matrices, .
Given a vector, . The norm is denoted by . The inner product is represented as . Last but not least, is for denoting a concatenation of a scalar with a vector .
On one hand, we have a matrix of vectors , such that . Where is the number of dimensions we set for the latent factor. Whereas, our query vector .
Our objective is to retrieve an index according to the maximum inner product.
The idea behind speeding up the workload for maximum inner product operations is to transform the problem into a distance minimization problem or nearest neighborhood search.
Once we transform the problem into a euclidean distance problem, there are plethora of algorithms/packages available for doing fast similarity search. To do so, we are going to apply a transformation function on our matrix, and our query vector, . Note that the idea here is only to perform transformation on top of the existing and , not to design a whole new algorithm in itself to learn embeddings/latent factors that directly uses distance minimization to generate the prediction, as this prevents us from using the existing matrix factorization algorithms.
The order transformation is to add an additional dimension to each of the latent factors:
As
To link the maximum inner product to the distance minimization problem, we would then have:
Since both and are independent of the term , that concludes our order preserving transformation.
Upon building the transformation, our original matrices would have 1 extra dimension. Then the next step is to pick our favorite nearest neighborhood algorithm and use it to generate the predictions. Popular options at the time of writing this includes, faiss, nmslib, or hnswlib. The ann-benchmarks also lists down the comparison between different open-source nearest neighborhood search algorithms/packages.
Let's now take a look at these concepts in practice.
Matrix Factorization
We'll be using the movielens data to illustrate to concept.
For the train/test split, the process is to split each user's behavior based on chronological order. e.g. If an user interacted with 10 items, and we specify a test set of size, 0.2. Then the first 8 items that the user first interacted with will fall in the training set, and the last 2 items will belong to the test set.
The model we'll be using is Bayesian Personalized Ranking from the implicit library.
The model object also provides a .recommend method that generates the recommendation for a user.
We can also generate the recommendations ourselves. We'll first confirm that the recommend function that we've implemented matches the one provided by the library, also implement a recommend_all function that generates the recommendation for all the user, this will be used to compare against the nearest neighborhood search on the order transformed matrix later.
Different model/library have different ways of extracting the item and user factors/embeddings, we assign it to index_factors and query_factors to make all downstream code agnostic of libraries' implementation.
Implementation
To implement our order preserving transformation, we first apply the transformation on our index factors. Recall that the formula is: Let . .
Our next step is to use our favorite nearest neighborhood search algorithm/library to conduct the search. We'll be leveraging hnswlib in this example, explaining the details behind the this nearest neighborhood search algorithm is beyond the scope of this document.
To generate the the prediction, we first transform the incoming "queries". .
Benchmark
We can time the original recommend method using maximum inner product versus the new method of using the order preserving transformed matrices with nearest neighborhood search.
Note that the timing is highly dependent on the dataset. We'll observe a much larger speedup if the number of items/labels in the output/index factor is larger. In the movielens dataset, we only had to rank the top items for each user among 1.6K items, in a much larger dataset, the number of items could easily go up to 100K or even million, that's when we'll see the real potential of this method.
Another thing worth checking is the quality of the prediction using the new method. Here we're using hnswlib library to generate the nearest neighborhood items, as hnswlib is technically an approximate nearest neighborhood algorithm. We can measure how much overlap the approximate top recommendations are to the original top recommendations to make sure we are using the right parameters for the nearest neighborhood search algorithm. Notation-wise:
Where and are the lists of top k approximate recommendations and top k optimal/original recommendations respectively.