- 最新!五大顶会2019必读的深度推荐系统与CTR预估相关的论文 - 深度传送门的文章 - 知乎
- https://github.com/jeanigarcia/recsys2018-evaluation-tutorial
- CSE 258: Web Mining and Recommender Systems
- CSE 291: Trends in Recommender Systems and Human Behavioral Modeling
- THE AAAI-19 WORKSHOP ON RECOMMENDER SYSTEMS AND NATURAL LANGUAGE PROCESSING (RECNLP)
- Information Recommendation for Online Scientific Communities, Purdue University, Luo Si, Gerhard Klimeck and Michael McLennan
- Recommendations for all : solving thousands of recommendation problems a day
- http://staff.ustc.edu.cn/~hexn/
- Learning Item-Interaction Embeddings for User Recommendations
- Summary of RecSys
- How Netflix’s Recommendations System Works
- 个性化推荐系统,必须关注的五大研究热点
- How Does Spotify Know You So Well?
- https://blog.statsbot.co/recommendation-system-algorithms-ba67f39ac9a3
- https://buildingrecommenders.wordpress.com/
Recommender Systems (RSs) are software tools and techniques providing suggestions for items to be of use to a user.
RSs are primarily directed towards individuals who lack sufficient personal experience or competence to evaluate the potentially overwhelming number of alternative items that a Web site, for example, may offer.
Xavier Amatriain discusses the traditional definition and its data mining core.
Traditional definition: The recommender system is to estimate a utility function that automatically predicts how a user will like an item.
User Interest is implicitly reflected in Interaction history
, Demographics
and Contexts
, which can be regarded as a typical example of data mining. Recommender system should match a context to a collection of information objects. There are some methods called Deep Matching Models for Recommendation
.
It is an application of machine learning, which is in the representation + evaluation + optimization form. And we will focus on the representation and evaluation
.
- https://github.com/hongleizhang/RSPapers
- https://github.com/benfred/implicit
- https://github.com/YuyangZhangFTD/awesome-RecSys-papers
- https://github.com/daicoolb/RecommenderSystem-Paper
- https://github.com/grahamjenson/list_of_recommender_systems
- https://www.zhihu.com/question/20465266/answer/142867207
- http://www.mbmlbook.com/Recommender.html
- 直接优化物品排序的推荐算法
- 推荐系统遇上深度学习
- Large-Scale Recommender Systems@UTexas
- Alan Said's publication
- MyMediaLite Recommender System Library
- Recommender System Algorithms @ deitel.com
- Workshop on Recommender Systems: Algorithms and Evaluation
- Semantic Recommender Systems. Analysis of the state of the topic
- Recommender Systems (2019/1)
- Recommender systems & ranking
Evolution of the Recommender Problem |
---|
Rating |
Ranking |
Page Optimization |
Context-aware Recommendations |
Evaluation of Recommendation System
The evaluation of machine learning algorithms depends on the tasks.
The evaluation of recommendation system can be regarded as some machine learning models such as regression, classification and so on.
We only take the mathematical convenience into consideration in the following methods.
Gini index, covering rate
and more realistic factors are not discussed in the following content.
Recommendation Strategies |
---|
Collaborative Filtering (CF) |
Content-Based Filtering (CBF) |
Demographic Filtering (DF) |
Knowledge-Based Filtering (KBF) |
Hybrid Recommendation Systems |
There are 3 kinds of collaborative filtering: user-based, item-based and model-based collaborative filtering.
The user-based methods are based on the similarities of users. If user
The item-based methods are based on the similarity of items. If one person added a brush to shopping-list, it is reasonable to recommend some toothpaste to him or her. And we can explain that you bought item
Matrix completion is to complete the matrix
Note that the rank of a matrix is not easy or robust to compute.
We can apply customized PPA to matrix completion problem
$$ \min { {|Z|}{\ast}} \ s.t. Z{\Omega} = X_{\Omega} $$
We let
- Producing
$Y^{k+1}$ by$$Y^{k+1}=\arg\max_{Y} {L([2Z^k-Z^{k-1}],Y)-\frac{s}{2}|Y-Y^k|};$$ - Producing
$Z^{k+1}$ by$$Z^{k+1}=\arg\min_{Z} {L(Z,Y^{k+1}) + \frac{r}{2}|Z-Z^k|}.$$
Rahul Mazumder, Trevor Hastie, Robert Tibshirani reformulate it as the following:
$$ \min f_{\lambda}(Z)=\frac{1}{2}{|P_{\Omega}(Z-X)|}F^2 + \lambda {|Z|}{\ast} $$
where
- A SINGULAR VALUE THRESHOLDING ALGORITHM FOR MATRIX COMPLETION
- Matrix and Tensor Decomposition in Recommender Systems
- Low-Rank Matrix Recovery
- ECE 18-898G: Special Topics in Signal Processing: Sparsity, Structure, and Inference Low-rank matrix recovery via nonconvex optimization
- Matrix Completion/Sensing as NonConvex Optimization Problem
- Exact Matrix Completion via Convex Optimization
- A SINGULAR VALUE THRESHOLDING ALGORITHM FOR MATRIX COMPLETION
- Customized PPA for convex optimization
- Matrix Completion.m
A novel approach to collaborative prediction is presented, using low-norm instead of low-rank factorizations. The approach is inspired by, and has strong connections to, large-margin linear discrimination. We show how to learn low-norm factorizations by solving a semi-definite program, and present generalization error bounds based on analyzing the Rademacher complexity of low-norm factorizations.
Consider the soft-margin learning, where we minimize a trade-off between the trace norm of
And it can be rewritten as a semi-definite optimization problem (SDP):
$$
\min_{A, B} \frac{1}{2}(tr(A)+tr(B))+c\sum_{(ui)\in O}\xi_{ui}, \
s.t. , \begin{bmatrix} A & X \ X^T & B \ \end{bmatrix} \geq 0, Z_{ui}X_{ui}\geq 1- \xi_{ui},
\xi_{ui}>0 ,\forall ui\in O
$$
where
- Maximum Margin Matrix Factorization
- Fast Maximum Margin Matrix Factorization for Collaborative Prediction
- Maximum Margin Matrix Factorization by Nathan Srebro
This technique is also called nonnegative matrix factorization.
(\color{red}{Note:}) The data sets we more frequently encounter in collaborative prediction problem are of ordinal ratings
If we have collected user
And we can predict the score
where
$$ C(P,Q) = \sum_{(u,i):Observed}(r_{u,i}-\hat{r}{u,i})^{2}=\sum{(u,i):Observed}(r_{u,i}-\sum_f p_{u,f}q_{i,f})^{2}\ \arg\min_{P_u, Q_i} C(P, Q) $$
where
Additionally, we can add regular term into the cost function to void over-fitting
It is called the regularized singular value decomposition or Regularized SVD.
Funk-SVD considers the user's preferences or bias. It predicts the scores by $$ \hat{r}{u,i} = \mu + b_u + b_i + \left< P_u, Q_i \right> $$ where $\mu, b_u, b_i$ is biased mean, biased user, biased item, respectively. And the cost function is defined as $$ \min\sum{(u,i): Observed}(r_{u,i} - \hat{r}_{u,i})^2 + \lambda (|P_u|^2+|Q_i|^2+|b_i|^2+|b_u|^2). $$
SVD ++ predicts the scores by
$$
\hat{r}{u,i} = \mu + b_u + b_i + (P_u + |N(u)|^{-0.5}\sum{i\in N(u)} y_i) Q_i^{T}
$$
where
-
$\mu + b_u + b_i$ is the base-line prediction; -
$\left<P_u, Q_i\right>$ is the SVD of rating matrix; -
$\left<|N(u)|^{-0.5}\sum_{i\in N(u)} y_i, Q_i\right>$ is the implicit feedback where$N(u)$ is user${u}$ 's item set,$y_j$ is the implicit feedback of item$j$ .
We learn the values of involved parameters by minimizing the regularized squared error function.
- Biased Regularized Incremental Simultaneous Matrix Factorization@orange3-recommender
- SVD++@orange3-recommender
- 矩阵分解之SVD和SVD++
- SVD++:推荐系统的基于矩阵分解的协同过滤算法的提高
- https://zhuanlan.zhihu.com/p/42269534
One possible improvement of this cost function is that we may design more appropriate loss function other than the squared error function.
Inductive Matrix Completion (IMC) is an algorithm for recommender systems with side-information of users and items. The IMC formulation incorporates features associated with rows (users) and columns (items) in matrix completion, so that it enables predictions for users or items that were not seen during training, and for which only features are known but no dyadic information (such as ratings or linkages).
IMC assumes that the associations matrix is generated by applying feature vectors associated with
its rows as well as columns to a low-rank matrix
The inputs
$$
\min \sum_{(i,j)\in \Omega}\ell(P_{(i,j)}, x_i^T W H^T y_j) + \frac{\lambda}{2}(| W |^2+| H |^2)
$$
The loss function
- Inductive Matrix Completion for Recommender Systems with Side-Information
- Inductive Matrix Completion for Predicting Gene-Diseasev Associations
In linear regression, the least square methods is equivalent to maximum likelihood estimation of the error in standard normal distribution.
Regularized SVD |
---|
Probabilistic model |
---|
And
So that we can reformulate the optimization problem as maximum likelihood estimation.
- Latent Factor Models for Web Recommender Systems
- Regression-based Latent Factor Models@CS 732 - Spring 2018 - Advanced Machine Learning by Zhi Wei
- Probabilistic Matrix Factorization
Sometimes, the information of user we could collect is implicit such as the clicking at some item.
In CLiMF
the model parameters are learned by directly maximizing the Mean Reciprocal Rank (MRR).
Its objective function is $$ F(U,V)=\sum_{i=1}^{M}\sum_{j=1}^{N} Y_{ij} [\ln g(U_{i}^{T}V_{j})+\sum_{k=1}^{N}\ln (1 - Y_{ij} g(U_{i}^{T}V_{k}-U_{i}^{T}V_{j}))] \-\frac{\lambda}{2}({|U|}^2 + {|V|}^2) $$
where
Numbers | Factors | Others | |||
---|---|---|---|---|---|
the number of users | latent factor vector for user |
binary relevance score | |||
the number of items | latent factor vector for item |
logistic function |
We use stochastic gradient ascent to maximize the objective function.
- Collaborative Less-is-More Filtering@orange3-recommendation
- https://dl.acm.org/citation.cfm?id=2540581
- Collaborative Less-is-More Filtering python Implementation
- CLiMF: Collaborative Less-Is-More Filtering
Until now, we consider the recommendation task as a regression prediction process, which is really common in machine learning. The boosting or stacking methods may help us to enhance these methods.
A key to achieving highly competitive results on the Netflix data is usage of sophisticated blending schemes, which combine the multiple individual predictors into a single final solution. This significant component was managed by our colleagues at the Big Chaos team. Still, we were producing a few blended solutions, which were later incorporated as individual predictors in the final blend. Our blending techniques were applied to three distinct sets of predictors. First is a set of 454 predictors, which represent all predictors of the BellKor’s Pragmatic Chaos team for which we have matching Probe and Qualifying results. Second, is a set of 75 predictors, which the BigChaos team picked out of the 454 predictors by forward selection. Finally, a set of 24 BellKor predictors for which we had matching Probe and Qualifying results. from Netflix Prize.
- https://www.netflixprize.com/community/topic_1537.html
- https://www.netflixprize.com/assets/GrandPrize2009_BPC_BellKor.pdf
- https://www.netflixprize.com/assets/GrandPrize2009_BPC_BigChaos.pdf
Another advantage of collaborative filtering or matrix completion is that even the element of matrix is binary or implicit information such as
- BPR: Bayesian Personalized Ranking from Implicit Feedback,
- Applications of the conjugate gradient method for implicit feedback collaborative filtering,
- Intro to Implicit Matrix Factorization
- a curated list in github.com.
Explicit and implicit feedback |
---|
WRMF is simply a modification of this loss function: $$ C(P,Q){WRMF} = \sum{(u,i):Observed}c_{u,i}(I_{u,i} - \sum_f p_{u,f}q_{i,f})^{2} + \lambda_u|P_u|^2 + \lambda_i|Q_i|^2. $$
We make the assumption that if a user has interacted at all with an item, then
WRMF does not make the assumption that a user who has not interacted with an item does not like the item. WRMF does assume that that user has a negative preference towards that item, but we can choose how confident we are in that assumption through the confidence hyperparameter.
Alternating least square (ALS) can give an analytic solution to this optimization problem by setting the gradients equal to 0s.
- http://nicolas-hug.com/blog/matrix_facto_1
- http://nicolas-hug.com/blog/matrix_facto_2
- http://nicolas-hug.com/blog/matrix_facto_3
- Collaborative Filtering for Implicit Feedback Datasets
- Alternating Least Squares Method for Collaborative Filtering
- Implicit Feedback and Collaborative Filtering
- Faster Implicit Matrix Factorization
- CUDA Tutorial: Implicit Matrix Factorization on the GPU
- Fast Python Collaborative Filtering for Implicit Feedback Datasets
- Intro to Implicit Matrix Factorization: Classic ALS with Sketchfab Models
- https://www.cnblogs.com/Xnice/p/4522671.html
- https://blog.csdn.net/turing365/article/details/80544594
- Matrix factorization for recommender system@Wikiwand
- http://www.cnblogs.com/DjangoBlog/archive/2014/06/05/3770374.html
- Learning to Rank Sketchfab Models with LightFM
- Finding Similar Music using Matrix Factorization
- Top-N Recommendations from Implicit Feedback Leveraging Linked Open Data ?
More on Matrix Factorization
- The Advanced Matrix Factorization Jungle
- Non-negative Matrix Factorizations
- http://people.eecs.berkeley.edu/~yima/
- New tools for recovering low-rank matrices from incomplete or corrupted observations by Yi Ma@UCB
- DiFacto — Distributed Factorization Machines
- Learning with Nonnegative Matrix Factorizations
- Nonconvex Optimization Meets Low-Rank Matrix Factorization: An Overview
- Taming Nonconvexity in Information Science, tutorial at ITW 2018.
- Nonnegative Matrix Factorization by Optimization on the Stiefel Manifold with SVD Initialization
Many well-established recommender systems are based on representation learning in Euclidean space. In these models, matching functions such as the Euclidean distance or inner product are typically used for computing similarity scores between user and item embeddings. This paper investigates the notion of learning user and item representations in hyperbolic space.
Given a user
Hyperbolic Bayesian Personalized Ranking(HyperBPR)
leverages BPR pairwise learning to minimize the pairwise ranking loss between the positive and negative items.
Given a user
where
The parameters of our model are learned by using RSGD
.
- Stochastic gradient descent on Riemannian manifolds
- Hyperbolic Recommender Systems
- Scalable Hyperbolic Recommender Systems
The matrix completion used in recommender system are linear combination of some features such as regularized SVD and they only take the user-user interaction and item-item similarity.
Factorization Machines(FM)
is inspired from previous factorization models.
It represents each feature an embedding vector, and models the second-order feature interactions:
$$
\hat{y}
= w_0 + \sum_{i=1}^{n} w_i x_i+\sum_{i=1}^{n-1}\sum_{j=i+1}^{n}\left<v_i, v_j\right> x_i x_j\
= \underbrace{w_0 + \left<w, x\right>}{\text{First-order: Linear Regression}} + \underbrace{\sum{i=1}^{n-1}\sum_{j=i+1}^{n}\left<v_i, v_j\right> x_i x_j}_{\text{Second-order: pair-wise interactions between features}}
$$
where the model parameters that have to be estimated are $$ w_0 \in \mathbb{R}, w\in\mathbb{R}^n, V\in\mathbb{R}^{n\times k}. $$
And
And the linear regression the first order part
; the pair-wise interactions between features
second order part
.
However, why we call it factorization machine
? Where is the factorization?
If ${[W]}{ij}=w{ij}= \left<v_i, v_j\right>$,
In order to reduce the computation complexity, the second order part
- https://blog.csdn.net/g11d111/article/details/77430095
- 分解机(Factorization Machines)推荐算法原理 by 刘建平Pinard
- Factorization Machines for Recommendation Systems
- 第09章:深入浅出ML之Factorization家族
In FMs, every feature has only one latent vector to learn the latent effect with any other features.
In FFMs, each feature has several latent vectors. Depending on the field of other features, one of them is used to do the inner product.
Mathematically,
$$
\hat{y}=\sum_{j_1=1}^{n}\sum_{j_2=i+1}^{n}\left<v_{j_1,f_2}, v_{j_2,f_1}\right> x_{j_1} x_{j_2}
$$
where
- Yuchin Juan at ACEMAP
- Field-aware Factorization Machines for CTR Prediction
- https://blog.csdn.net/mmc2015/article/details/51760681
Deep learning is powerful in processing visual and text information so that it helps to find the interests of users such as Deep Interest Network, xDeepFM and more.
Deep learning models for recommender system may come from the restricted Boltzman machine. And deep learning models are powerful information extractors. Deep learning is really popular in recommender system such as spotlight.
Let
We use a conditional multinomial distribution (a “softmax”) for modeling each column of the observed
"visible" binary rating matrix
The marginal distribution over the visible ratings
$$ E(V,h) = -\sum_{i=1}^{m}\sum_{j=1}^{F}\sum_{k=1}^{K}W_{i,j}^{k} h_j v_i^k - \sum_{i=1}^{m}\sum_{k=1}^{K} v_i^k b_i^k -\sum_{j=1}^{F} h_j b_j. $$ The items with missing ratings do not make any contribution to the energy function
The parameter updates required to perform gradient ascent in the log-likelihood over the visible ratings Contrastive Divergence
to approximate the gradient.
We can also model “hidden” user features h as Gaussian latent variables:
$$
p(v_i^k = 1 | h) = \frac{\exp(b_i^k+\sum_{j=1}^{F}h_j W_{i,j}^{k})}{\sum_{l=1}^{K}\exp(b_i^k+\sum_{j=1}^{F}h_j W_{i,j}^{l})} \
p( h_j = 1 | V) = \frac{1}{\sqrt{2\pi}\sigma_j} \exp(\frac{(h - b_j -\sigma_j \sum_{i=1}^{m}\sum_{k=1}^{K} v_i^k W_{i,j}^k)^2}{2\sigma_j^2})
$$
where
- https://www.cnblogs.com/pinard/p/6530523.html
- https://www.cnblogs.com/kemaswill/p/3269138.html
- Restricted Boltzmann Machines for Collaborative Filtering
- Building a Book Recommender System using Restricted Boltzmann Machines
- On Contrastive Divergence Learning
- http://deeplearning.net/tutorial/rbm.html
- RBM notebook form Microsoft
AutoRec is a novel autoencoder
framework for collaborative filtering (CF). Empirically, AutoRec’s
compact and efficiently trainable model outperforms state-of-the-art CF techniques (biased matrix factorization, RBMCF and LLORMA) on the Movielens and Netflix datasets.
Formally, the objective function for the Item-based AutoRec (I-AutoRec) model is, for regularisation strength
where
for for activation functions
- 《AutoRec: Autoencoders Meet Collaborative Filtering》WWW2015 阅读笔记
- AutoRec: Autoencoders Meet Collaborative Filtering
The output of this model is
$$
P(Y=1|x) = \sigma(W_{wide}^T[x,\phi(x)] + W_{deep}^T \alpha^{(lf)}+b)
$$
where the wide
part deal with the categorical features such as user demographics and the deep
part deal with continuous features.
- https://arxiv.org/pdf/1606.07792.pdf
- Wide & Deep Learning: Better Together with TensorFlow, Wednesday, June 29, 2016
- Wide & Deep
- https://www.sohu.com/a/190148302_115128
DeepFM
ensembles FM and DNN and to learn both second order and higher-order feature interactions:
The FM component is a factorization machine and the output of FM is the summation of
an Addition
unit and a number of Inner Product
units:
The deep component is a feed-forward neural network
, which is used to learn high-order feature interactions. There is a personal guess that the component function in activation function
We would like to point out the two interesting features of this network structure:
- while the lengths of different input field vectors can be different, their embeddings are of the same size
$(k)$ ; - the latent feature vectors
$(V)$ in FM now server as network weights which are learned and used to compress the input field vectors to the embedding vectors.
It is worth pointing out that FM component and deep component share the same feature embedding, which brings two important benefits:
- it learns both low- and high-order feature interactions from raw features;
- there is no need for expertise feature engineering of the input.
- https://zhuanlan.zhihu.com/p/27999355
- https://zhuanlan.zhihu.com/p/25343518
- https://zhuanlan.zhihu.com/p/32127194
- https://arxiv.org/pdf/1703.04247.pdf
- CTR预估算法之FM, FFM, DeepFM及实践
$$
\hat{y} = w_0 + \left<w, x\right> + f(x)
$$
where the first and second terms are the linear regression part similar to that for FM, which models global bias of data and weight
of features. The third term multi-layered feedforward neural network
.
B-Interaction Layer
including Bi-Interaction Pooling
is an innovation in artificial neural network.
- http://staff.ustc.edu.cn/~hexn/
- https://github.com/hexiangnan/neural_factorization_machine
- LibRec 每周算法:NFM (SIGIR'17)
Attentional Factorization Machine (AFM) learns the importance of each feature interaction from data via a neural attention network.
We employ the attention mechanism on feature interactions by performing a weighted sum on the interacted vectors:
where
- https://www.comp.nus.edu.sg/~xiangnan/papers/ijcai17-afm.pdf
- http://blog.leanote.com/post/ryan_fan/Attention-FM%EF%BC%88AFM%EF%BC%89
It mainly consists of 3 parts: Embedding Layer
, Compressed Interaction Network(CIN)
and DNN
.
- KDD 2018 | 推荐系统特征构建新进展:极深因子分解机模型
- xDeepFM: Combining Explicit and Implicit Feature Interactions for Recommender Systems
- https://arxiv.org/abs/1803.05170
- 据说有RNN和CNN结合的xDeepFM
- 推荐系统遇上深度学习(二十二)--DeepFM升级版XDeepFM模型强势来袭!
- Deep Knowledge-aware Network for News Recommendation
- https://www.csie.ntu.edu.tw/~b97053/paper/Rendle2010FM.pdf
- https://www.cnblogs.com/pinard/p/6370127.html
- https://www.jianshu.com/p/6f1c2643d31b
- https://blog.csdn.net/John_xyz/article/details/78933253
- https://zhuanlan.zhihu.com/p/38613747
- Recommender Systems with Deep Learning
- 深度学习在序列化推荐中的应用
- 深入浅出 Factorization Machine 系列
- 论文快读 - Deep Neural Networks for YouTube Recommendations
- Deep Matrix Factorization Models for Recommender Systems
- Deep Matrix Factorization for Recommender Systems with Missing Data not at Random
It’s easy to observe how better matrix completions can be achieved by considering the sparse matrix as defined over two different graphs:
a user graph and an item graph. From a signal processing point of view, the matrix
Given the aforementioned multi-graph convolutional layers, the last step that remains concerns the choice of the architecture to use for reconstructing the missing information. Every (user, item) pair in the multi-graph approach and every user/item in the separable one present in this case an independent state, which is updated (at every step) by means of the features produced by the selected GCN.
- graph convolution network有什么比较好的应用task? - superbrother的回答 - 知乎
- https://arxiv.org/abs/1704.06803
- Deep Geometric Matrix Completion: a Geometric Deep Learning approach to Recommender Systems
- Talk: Deep Geometric Matrix Completion
Given part of the ratings in
Stacked denoising autoencoders(SDAE)
is a feedforward neural network for learning
representations (encoding) of the input data by learning to predict the clean input itself in the output.
Using the Bayesian SDAE as a component, the generative
process of CDL is defined as follows:
-
For each layer
${l}$ of the SDAE network,- For each column
${n}$ of the weight matrix$W_l$ , draw$$W_l;{\ast}n \sim \mathcal{N}(0,\lambda_w^{-1} I_{K_l}).$$ - Draw the bias vector
$$b_l \sim \mathcal{N}(0,\lambda_w^{-1} I_{K_l}).$$ - For each row
${j}$ of$X_l$ , draw$$X_{l;j\ast}\sim \mathcal{N}(\sigma(X_{l-1;j\ast}W_l b_l), \lambda_s^{-1} I_{K_l}).$$
- For each column
-
For each item
${j}$ ,- Draw a clean input
$$X_{c;j\ast}\sim \mathcal{N}(X_{L, j\ast}, \lambda_n^{-1} I_{K_l}).$$ - Draw a latent item offset vector
$\epsilon_j \sim \mathcal{N}(0, \lambda_v^{-1} I_{K_l})$ and then set the latent item vector to be:$$v_j=\epsilon_j+X^T_{\frac{L}{2}, j\ast}.$$
- Draw a clean input
-
Draw a latent user vector for each user
${i}$ :$$u_i \sim \mathcal{N}(0, \lambda_u^{-1} I_{K_l}).$$ -
Draw a rating
$R_{ij}$ for each user-item pair$(i; j)$ :$$R_{ij}\sim \mathcal{N}(u_i^T v_j, C_{ij}^{-1}).$$
Here $\lambda_w, \lambda_s, \lambda_n, \lambda_u$and
And joint log-likelihood of these parameters is $$L=-\frac{\lambda_u}{2}\sum_{i} {|u_i|}2^2-\frac{\lambda_w}{2}\sum{l} [{|W_l|}F+{|b_l|}2^2]\ -\frac{\lambda_v}{2}\sum{j} {|v_j - X^T{\frac{L}{2},j\ast}|}2^2-\frac{\lambda_n}{2}\sum{l} {|X_{c;j\ast}-X_{L;j\ast}|}2^2 \ -\frac{\lambda_s}{2}\sum{l}\sum_{j} {|\sigma(X_{l-1;j\ast}W_l b_l)-X_{l;j}|}2^2 -\sum{ij} {|R_{ij}-u_i^Tv_j|}_2^2 $$
It is not easy to prove that it converges.
- http://www.winsty.net/
- http://www.wanghao.in/
- https://www.cse.ust.hk/~dyyeung/
- Collaborative Deep Learning for Recommender Systems
- Deep Learning for Recommender Systems
- https://github.com/robi56/Deep-Learning-for-Recommendation-Systems
- 推荐系统中基于深度学习的混合协同过滤模型
- Deep Learning Meets Recommendation Systems
- Using Keras' Pretrained Neural Networks for Visual Similarity Recommendations
- Recommending music on Spotify with deep learning
It is essential for the recommender system to find the item which matches the users' demand. Its difference from web search is that recommender system provides item information even if the users' demands or generally interests are not provided. It sounds like modern crystal ball to read your mind.
In A Multi-View Deep Learning Approach for Cross Domain User Modeling in Recommendation Systems the authors propose to extract rich features from user’s browsing and search histories to model user’s interests. The underlying assumption is that, users’ historical online activities reflect a lot about user’s background and preference, and therefore provide a precise insight of what items and topics users might be interested in.
Its training data set and the test data is ${(\mathrm{X}i, y_i, r_i)\mid i =1, 2, \cdots, n}$ and $(\mathrm{X}{n+1}, y_{n+1})$, respectively.
Matching Model is trained using the training data set: a class of `matching functions’
The data is assumed to be generated according to the distributions
In fact, the inputs x and y can be instances (IDs), feature vectors, and structured objects, and thus the task can be carried out at instance level, feature level, and structure level.
And
Framework of Matching |
---|
Output: MLP |
Aggregation: Pooling, Concatenation |
Interaction: Matrix, Tensor |
Representation: MLP, CNN, LSTM |
Input: ID Vectors |
Sometimes, matching model and ranking model are combined and trained together with pairwise loss. Deep Matching models takes the ID vectors and features together as the input to a deep neural network to train the matching scores including Deep Matrix Factorization, AutoRec, Collaborative Denoising Auto-Encoder, Deep User and Image Feature, Attentive Collaborative Filtering, Collaborative Knowledge Base Embedding.
- https://sites.google.com/site/nkxujun/
- http://sonyis.me/dnn.html
- https://akmenon.github.io/
- https://sigir.org/sigir2018/program/tutorials/
- Learning to Match
- Deep Learning for Matching in Search and Recommendation
- Facilitating the design, comparison and sharing of deep text matching models.
- Framework and Principles of Matching Technologies
- A Multi-View Deep Learning Approach for Cross Domain User Modeling in Recommendation Systems
- Learning to Match using Local and Distributed Representations of Text for Web Search
- https://github.com/super-zhangchao/learning-to-match
The RecSys can be considered as some regression or classification tasks, so that we can apply the ensemble methods to these methods as BellKor's Progamatic Chaos
used the blended solution to win the prize.
In fact, its essence is bagging or blending, which is one sequential ensemble strategy in order to avoid over-fitting or reduce the variance.
In this section, the boosting is the focus, which is to reduce the error and boost the performance from a weaker learner.
There are two common methods to construct a stronger learner from a weaker learner: (1) reweight the samples and learn from the error: AdaBoosting; (2) retrain another learner and learn to approximate the error: Gradient Boosting.
BoostFM
integrates boosting into factorization models during the process of item ranking.
Specifically, BoostFM is an adaptive boosting framework that linearly combines multiple homogeneous component recommender system,
which are repeatedly constructed on the basis of the individual FM model by a re-weighting scheme.
BoostFM
- Input: The observed context-item interactions or Training Data
$S ={(\mathbf{x}_i, y_i)}$ parameters E and T.- Output: The strong recommender
$g^{T}$ .- Initialize
$Q_{ci}^{(t)}=1/|S|,g^{(0)}=0, \forall (c, i)\in S$ .- for
$t = 1 \to T$ do
- Create component recommender
$\hat{y}^{(t)}$ with$\bf{Q}^{(t)}$ on$\bf S$ ,$\forall (c,i) \in \bf S$, , i.e.,Component Recommender Learning Algorithm
;
- Compute the ranking accuracy
$E[\hat{r}(c, i, y^{(t)})], \forall (c,i) \in \bf S$ ;
- Compute the coefficient
$\beta_t$ , $$ \beta_t = \ln (\frac{\sum_{(c,i) \in \bf S} \bf{Q}^{(t)}{ci}{1 + E[\hat{r}(c, i, y^{(t)})]}}{\sum{(c,i) \in \bf S} \bf{Q}^{(t)}_{ci}{1- E[\hat{r}(c, i, y^{(t)})]}})^{\frac{1}{2}} ; $$
- Create the strong recommender
$g^{(t)}$ , $$ g^{(t)} = \sum_{h=1}^{t} \beta_h \hat{y}^{(t)} ;$$
- Update weight distribution (\bf{Q}^{t+1}), $$ \bf{Q}^{t+1}{ci} = \frac{\exp(E[\hat{r}(c, i, y^{(t)})])}{\sum{(c,i)\in \bf{S}} E[\hat{r}(c, i, y^{(t)})]} ; $$
- end for
Component Recommender
Naturally, it is feasible to exploit the L2R techniques to optimize Factorization Machines (FM). There are two major approaches in the field of L2R, namely, pairwise and listwise approaches. In the following, we demonstrate ranking factorization machines with both pairwise and listwise optimization.
Weighted Pairwise FM (WPFM)
Weighted ‘Listwise’ FM (WLFM)
- BoostFM: Boosted Factorization Machines for Top-N Feature-based Recommendation
- http://wnzhang.net/
- https://fajieyuan.github.io/
- https://www.librec.net/luckymoon.me/
- The author’s final accepted version.
Gradient Boosting Factorization Machine (GBFM)
model is to incorporate feature selection algorithm with Factorization Machines into a unified framework.
Gradient Boosting Factorization Machine Model
- Input: Training Data
$S ={(\mathbf{x}_i, y_i)}$ .- Output: $\hat{y}S =y_0(x) + {\sum}^S{s=1}\left<v_{si}, v_{sj}\right>$.
- Initialize rating prediction function as
$\hat{y}_0(x)$ - for
$s = 1 \to S$ do
- Select interaction feature
$C_p$ and$C_q$ from Greedy Feature Selection Algorithm;
- Estimate latent feature matrices
$V_p$ and$V_q$ ;
- Update $\hat{y}s(\mathrm{x}) = \hat{y}{s-1}(\mathrm{x}) + {\sum}{i\in C_p}{\sum}{j\in C_q} \mathbb{I}[i,j\in \mathrm{x}]\left<V_{p}^{i}, V_{q}^{j}\right>$
- end for
where s is the iteration step of the learning algorithm. At step s, we greedily select two interaction features
Greedy Feature Selection Algorithm
From the view of gradient boosting machine, at each
step s, we would like to search a function
where $\hat{y}s(\mathrm{x}) = \hat{y}{s−1}(\mathrm{x}) + \alpha_s f_s(\mathrm{x})$.
We heuristically assume that the
function
It is hard for a general convex loss function
The most common way is to approximate it by least-square
minimization, i.e., xGBoost
, it takes second order Taylor expansion of the loss function
$$\arg{\min}{i(t)\in {0, \dots, m}} \sum{i=1}^{n} h_i(\frac{g_i}{h_i}-f_{t-1}(\mathrm{x}i) q{C_{i}(t)}(\mathrm{x}_i))^2 + {|\theta|}_2^2 $$
where the negativefirst derivative and the second derivative at instance
Gradient Boosted Categorical Embedding and Numerical Trees (GB-CSENT)
is to combine Tree-based Models and Matrix-based Embedding Models in order to handle numerical features and large-cardinality categorical features.
A prediction is based on:
- Bias terms from each categorical feature.
- Dot-product of embedding features of two categorical features,e.g., user-side v.s. item-side.
- Per-categorical decision trees based on numerical features ensemble of numerical decision trees where each tree is based on one categorical feature.
In details, it is as following: $$ \hat{y}(x) = \underbrace{\underbrace{\sum_{i=0}^{k} w_{a_i}}{bias} + \underbrace{(\sum{a_i\in U(a)} Q_{a_i})^{T}(\sum_{a_i\in I(a)} Q_{a_i}) }{factors}}{CAT-E} + \underbrace{\sum_{i=0}^{k} T_{a_i}(b)}_{CAT-NT}. $$ And it is decomposed as the following table.
Ingredients | Formulae | Features |
---|---|---|
Factorization Machines | $\underbrace{\underbrace{\sum_{i=0}^{k} w_{a_i}}{bias} + \underbrace{(\sum{a_i\in U(a)} Q_{a_i})^{T}(\sum_{a_i\in I(a)} Q_{a_i}) }{factors}}{CAT-E}$ | Categorical Features |
GBDT | Numerical Features |
- http://www.hongliangjie.com/talks/GB-CENT_SD_2017-02-22.pdf
- http://www.hongliangjie.com/talks/GB-CENT_SantaClara_2017-03-28.pdf
- http://www.hongliangjie.com/talks/GB-CENT_Lehigh_2017-04-12.pdf
- http://www.hongliangjie.com/talks/GB-CENT_PopUp_2017-06-14.pdf
- http://www.hongliangjie.com/talks/GB-CENT_CAS_2017-06-23.pdf
- http://www.hongliangjie.com/talks/GB-CENT_Boston_2017-09-07.pdf
- Talk: Gradient Boosted Categorical Embedding and Numerical Trees
- Paper: Gradient Boosted Categorical Embedding and Numerical Trees
- https://qzhao2018.github.io/
AdaBPR (Adaptive Boosting Personalized Ranking)
is a boosting algorithm for top-N item recommendation using users' implicit feedback.
In this framework, multiple homogeneous component recommenders are linearly combined to achieve more accurate recommendation.
The component recommenders are learned based on a re-weighting strategy that assigns a dynamic weight to each observed user-item interaction.
Here explicit feedback refers to users' ratings to items while implicit feedback is derived from users' interactions with items, e.g., number of times a user plays a song.
The primary idea of applying boosting for item recommendation is to learn a set of homogeneous component recommenders and then create an ensemble of the component recommenders to predict users' preferences.
Here, we use a linear combination of component recommenders as the final recommendation model $$f=\sum_{t=1}^{T}{\alpha}t f{t}.$$
In the training process, AdaBPR runs for
where the notations are listed as follows:
-
$\mathbb{H}$ is the set of possible component recommenders such as collaborative ranking algorithms; -
$E(\pi(u,i,f))$ denotes the ranking accuracy associated with each observed interaction pair; -
$\pi(u,i,f)$ is the rank position of item${i}$ in the ranked item list of${u}$ , resulted by a learned ranking model${f}$ ; -
$\mathbb{O}$ is the set of all observed user-item interactions; - ${\beta}{u}$ is defined as reciprocal of the number of user $u$'s historical items ${\beta}{u}=\frac{1}{|V_{u}^{+}|}$ (
$V_{u}^{+}$ is the historical items of${u}$ ).
Explainable recommendation and search attempt to develop models or methods that not only generate high-quality recommendation or search results, but also intuitive explanations of the results for users or system designers, which can help improve the system transparency, persuasiveness, trustworthiness, and effectiveness, etc.
- Explainable Recommendation and Search @ rutgers
- Explainable Recommendation: A Survey and New Perspectives
- Explainable Entity-based Recommendations with Knowledge Graphs
- 2018 Workshop on ExplainAble Recommendation and Search (EARS 2018)
- EARS 2019
- ExplainAble Recommendation and Search (EARS)
- TEM: Tree-enhanced Embedding Model for Explainable Recommendation
- https://ears2019.github.io/
- Explainable Recommendation for Self-Regulated Learning
- Dynamic Explainable Recommendation based on Neural Attentive Models
- https://github.com/fridsamt/Explainable-Recommendation
- Explainable Recommendation for Event Sequences: A Visual Analytics Approach by Fan Du
- https://wise.cs.rutgers.edu/code/
- http://www.cs.cmu.edu/~rkanjira/thesis/rose_proposal.pdf
- http://jamesmc.com/publications
- FIRST INTERNATIONAL WORKSHOP ON DEEP MATCHING IN PRACTICAL APPLICATIONS
Social Recommendation
User-item/user-user interactions are usually in the form of graph/network structure. What is more, the graph is dynamic, and we need to apply to new nodes without model retraining.
- Accurate and scalable social recommendation using mixed-membership stochastic block models
- Do Social Explanations Work? Studying and Modeling the Effects of Social Explanations in Recommender Systems
- Existing Methods for Including Social Networks until 2015
- Social Recommendation With Evolutionary Opinion Dynamics
- Workshop on Responsible Recommendation
- https://recsys.acm.org/recsys18/fatrec/
- A Probabilistic Model for Using Social Networks in Personalized Item Recommendation
- Product Recommendation and Rating Prediction based on Multi-modal Social Networks
- Graph Neural Networks for Social Recommendation
- Studying Recommendation Algorithms by Graph Analysis
- Low-rank Linear Cold-Start Recommendation from Social Data
Knowledge Graph and Recommender System
- 推荐算法不够精准?让知识图谱来解决
- 如何将知识图谱特征学习应用到推荐系统?
- 可解释推荐系统:身怀绝技,一招击中用户心理
- 深度学习与知识图谱在美团搜索广告排序中的应用实践
- Unifying Knowledge Graph Learning and Recommendation: Towards a Better Understanding of User Preferences
- Explainable Reasoning over Knowledge Graphs for Recommendation
Reinforcement Learning and Recommender System
- Deep Reinforcement Learning for Page-wise Recommendations
- A Reinforcement Learning Framework for Explainable Recommendation
- Generative Adversarial User Model for Reinforcement Learning Based Recommendation System
- Adversarial Personalized Ranking for Recommendation
- Adversarial Training Towards Robust Multimedia Recommender System
- Explore, Exploit, and Explain: Personalizing Explainable Recommendations with Bandits
- Learning from logged bandit feedback
- Improving the Quality of Top-N Recommendation
Traditional Approaches | Beyond Traditional Methods |
---|---|
Collaborative Filtering | Tensor Factorization & Factorization Machines |
Content-Based Recommendation | Social Recommendations |
Item-based Recommendation | Learning to rank |
Hybrid Approaches | MAB Explore/Exploit |
- https://github.com/wzhe06/Reco-papers
- https://github.com/hongleizhang/RSPapers
- https://github.com/hongleizhang/RSAlgorithms
- https://zhuanlan.zhihu.com/p/26977788
- https://zhuanlan.zhihu.com/p/45097523
- https://www.zhihu.com/question/20830906
- https://www.zhihu.com/question/56806755/answer/150755503
- DLRS 2018 : 3rd Workshop on Deep Learning for Recommender Systems
- Deep Learning based Recommender System: A Survey and New Perspectives
- $5^{th}$ International Workshop on Machine Learning Methods for Recommender Systems
- MoST-Rec 2019: Workshop on Model Selection and Parameter Tuning in Recommender Systems
- 2018 Personalization, Recommendation and Search (PRS) Workshop
- WIDE & DEEP RECOMMENDER SYSTEMS AT PAPI
- Interdisciplinary Workshop on Recommender Systems
- 2nd FATREC Workshop: Responsible Recommendation
- https://github.com/gasevi/pyreclab
- https://github.com/cheungdaven/DeepRec
- https://github.com/cyhong549/DeepFM-Keras
- https://github.com/grahamjenson/list_of_recommender_systems
- https://github.com/maciejkula/spotlight
- https://github.com/Microsoft/Recommenders
- https://github.com/alibaba/euler
- https://github.com/alibaba/x-deeplearning/wiki/
- https://github.com/lyst/lightfm
- Surprise: a Python scikit building and analyzing recommender systems
- Orange3-Recommendation: a Python library that extends Orange3 to include support for recommender systems.
- MyMediaLite: a recommender system library for the Common Language Runtime
- http://www.mymediaproject.org/
- Workshop: Building Recommender Systems w/ Apache Spark 2.x
- A Leading Java Library for Recommender Systems
- lenskit: Python Tools for Recommender Experiments
- Samantha - A generic recommender and predictor server
Online advertising has grown over the past decade to over $26 billion in recorded revenue in 2010. The revenues generated are based on different pricing models that can be fundamentally grouped into two types: cost per (thousand) impressions (CPM) and cost per action (CPA), where an action can be a click, signing up with the advertiser, a sale, or any other measurable outcome. A web publisher generating revenues by selling advertising space on its site can offer either a CPM or CPA contract. We analyze the conditions under which the two parties agree on each contract type, accounting for the relative risk experienced by each party.
The information technology industry relies heavily on the on-line advertising such as [Google,Facebook or Alibaba]. Advertising is nothing except information, which is not usually accepted gladly. In fact, it is more difficult than recommendation because it is less known of the context where the advertisement is placed.
Hongliang Jie shares 3 challenges of computational advertising in Etsy, which will be the titles of the following subsections.
- 广告为什么要计算
- 计算广告资料汇总
- ONLINE VIDEO ADVERTISING: All you need to know in 2019
- CAP 6807: Computational Advertising and Real-Time Data Analytics
- Tutorial: Information Retrieval Challenges in Computational Advertising
- 计算广告
- 计算广告和机器学习
- https://headerbidding.co/category/adops/
- Deep Learning Based Modeling in Computational Advertising: A Winning Formula
- Computational Marketing
- Data Science and Analytics in Computational Advertising
GBRT+LR
When the feature vector
Practical Lessons from Predicting Clicks on Ads at Facebook or the blog use the GBRT to select proper features and LR to map these features into the interval
- 聊聊CTR预估的中的深度学习
- Deep Models at DeepCTR
- CTR预估算法之FM, FFM, DeepFM及实践
- Turning Clicks into Purchases
- https://github.com/shenweichen/DeepCTR
- https://github.com/wzhe06/CTRmodel
- https://github.com/cnkuangshi/LightCTR
- https://github.com/evah/CTR_Prediction
- http://2016.qconshanghai.com/track/3025/
- https://blog.csdn.net/u011747443/article/details/68928447
- Post-Click Conversion Modeling and Analysis for Non-Guaranteed Delivery Display Advertising
- Estimating Conversion Rate in Display Advertising from Past Performance Data
- https://www.optimizesmart.com/
- Papers on Computational Advertising
- CAP 6807: Computational Advertising and Real-Time Data Analytics
- Computational Advertising Contract Preferences for Display Advertising
- Machine Learning for Computational Advertising, UC Santa Cruz, April 22, 2009, Alex Smola, Yahoo Labs, Santa Clara, CA
- Computational Advertising and Recommendation
- Practical Lessons from Predicting Clicks on Ads at Facebook
- http://yelp.github.io/MOE/
- http://www.hongliangjie.com/talks/AICon2018.pdf
- https://sites.google.com/view/tsmo2018/invited-talks
- https://matinathomaidou.github.io/research/
- https://www.usermind.com/
- WHAT IS USER ENGAGEMENT?
- What is Customer Engagement, and Why is it Important?
- What is user engagement? A conceptual framework for defining user engagement with technology
- How to apply AI for customer engagement
- The future of customer engagement
- Second Uber Science Symposium: Exploring Advances in Behavioral Science
- Measuring User Engagement
- https://uberbehavioralsciencesymposium.splashthat.com/
- https://inlabdigital.com/
- https://www.futurelab.net/
- The User Engagement Optimization Workshop2
- The User Engagement Optimization Workshop1
- EVALUATION OF USER EXPERIENCE IN MOBILE ADVERTISI
- WWW 2019 Tutorial on Online User Engagement
- Recommender Systems
- https://libraries.io/github/computational-class
- http://www.52caml.com/
- 洪亮劼,博士 – Etsy工程总监
- Data Mining Machine Learning @The University of Texas at Austin
- Center for Big Data Analytics @The University of Texas at Austin
- Multimedia Computing Group@tudelft.nl
- knowledge Lab@Uchicago
- DIGITAL TECHNOLOGY CENTER@UMN
- The Innovation Center for Artificial Intelligence (ICAI)
- Data Mining and Machine Learning lab (DMML)@ASU
- Next Generation Personalization Technologies
- Recommender systems & ranking
- Secure Personalization: Building Trustworthy Recommender Systems
- Similar grants of Next Generation Personalization Technologies
- Big Data and Social Computing Lab @UIC
- Web Intelligence and Social Computing
- Welcome to the family, Zalando AdTech Lab Hamburg!
- Data and Marketing Associat
- Web search and data mining(WSDM) 2019
- Web Intelligent Systems and Economics(WISE) lab @Rutgers
- Ishizuka Lab. was closed. (2013.3)
- Online Marketing Congress 2017
- course-materials of Sys for ML/AI
- https://sigopt.com/blog/
- Web Understanding, Modeling, and Evaluation Lab
- https://knightlab.northwestern.edu/