250310_JH_labseminar[CASER : Personalized Top-N Sequential Recommendation via Convolutional Sequence Embedding].pptx

thanhdowork 100 views 19 slides Mar 10, 2025
Slide 1
Slide 1 of 19
Slide 1
1
Slide 2
2
Slide 3
3
Slide 4
4
Slide 5
5
Slide 6
6
Slide 7
7
Slide 8
8
Slide 9
9
Slide 10
10
Slide 11
11
Slide 12
12
Slide 13
13
Slide 14
14
Slide 15
15
Slide 16
16
Slide 17
17
Slide 18
18
Slide 19
19

About This Presentation

CASER : Personalized Top-N Sequential Recommendation via Convolutional Sequence Embedding


Slide Content

CASER : Personalized Top-N Sequential Recommendation via Convolutional Sequence Embedding Ju- Hee SHIM Network Science Lab Dept. of AI The Catholic University of Korea E-mail: [email protected] Jiaxi Tang, Ke Wang WSDM 2018

INTRODUCTION Motivation Architecture CASER Method Evaluation Datasets State-of-the-art methods Experimental setup Results CONCLUSION Q/A

INTRODUCTION Motivation Problems of Existing top-N Recommendation Models User’s general preferences are used as the basis for recommendations User’s general preferences : Reflect only static behavioral information of the user (ex. A person who likes Samsung products is only recommended Samsung products, and a person who likes Apple products is only recommended Apple products ) However, these unidirectional models have the following limitations: Simply recommend items related IPHONE : losing the opportunity to recommend phone accessories

INTRODUCTION Motivation Limitations of Traditional Markov Chain-Based Models Point-Level : The probability of purchasing a specific item often increases when multiple past items are combined -> fail to capture this effect (ex. A user who buys milk and butter it likely to purchase flour, but this is not reflected) b) Skip Behaviors : Unable to account for skipped behaviors -> Traditional models assume continuous influence, but in reall -world data, “skip” frequently occur

Architecture Transforming User Sequences into a Matrix “IMAGE” : Applying CNN : Convert the traditional 1D item sequence into an L x d matrix . L : the most recent L items d : Embedding dimension Horizontal Filters : Learning Union-Level Sequential Patterns Capturing patterns where multiple item combinations influence behavior Vertical Filters : Learning Point-Level Sequential Patterns Similar to traditional Markov Chain approaches Adding User embedding : Incorporate User Embedding to model Long-term user preferences effectively CASER

Architecture Transformer Layer: C onsists of L bidirectional Transformer layers . Each layer refines the user behavior sequence received from the previous layer to enhance representation power. In each layer, all item representations influence and update each other. Unlike RNN-based models, which pass information only from past to future, Self-Attention enables global interaction across all items in the sequence. Method

Architecture Method Embedding Look-up: Retrieving Past Item Embeddings : Locate L past item embeddings of user U in the latent space Stack these embedding to construct the final Embedding matrix (E) for training Create an embedding table using d-dimensional latent factors Q(item), P(User)

Architecture Method Convolutional Layers : Treat the embedding matrix (E) as an "image" and apply convolutional layers to capture sequential patterns in user behavior Consider sequential patterns as local features within the image Utilize two types of convolutional filters: 1) Vertical Convolutional Layer : Captures point-level sequential patterns Computes a weighted sum over the latent representations of the past L items

Architecture Method Convolutional Layers : 2) Horizontal Convolutional Layer : Captures union-level patterns . Varies the filter height (h) to extract diverse sequential features To extracted most Significant feature, using max-pooling

Architecture Method Fully-connected Layers : Concatenate the outputs from the horizontal and vertical filters Feed the concatenated features into a fully-connected layer to extract high-level abstract features Concatenate user embedding with the extracted features to capture general user preferences -> Pass the final representation to the output layer for prediction

Architecture Method Network Training & Recommendation: Apply the sigmoid activation function to the output layer to transform the output value y into a probability. Compute the likelihood across all sequences in the dataset for training Use the user’s last L item embeddings to compute y-values for all items Select the top-N items with the highest y-values for recommendatio

Datasets Evaluation MovieLens Gowalla Foursquare Tmall

Evaluation State-of-the-art methods Compared Methods POP BPR FMC FPMC Fossil GRU4Rec

Evaluation Experimental setup Evaluation Metrics Precision@N Recall@N MAP Optimizer : Adam Learning Rate : {1,10^-1,…,10^-4} grid search Batch Size : 100 L2 Regularization Dropout : 50% Latent Dimensions d : {5,10,20,30,50,100} Markov Order L : {1,2,3,…,9} Target Number T : {1,2,3} Activation Functions : {identity, sigmoid, tanh, ReLU } Number of Horizontal Filters : {4,8,16,32,64} Number of Vertical Filters : {1,2,4,8,16} Loss : BCE loss Negative Sampling : random item 3

Evaluation Results

Evaluation Results Ablation Study Results Caser model outperforms Fossil, GRU4Rec in terms of MAP, with the best performance observed at T =2,3. As the Markov Order L increases, performance improves and then plateaus; in sparse datasets, excessively large L can lead to performance degradation. Markov Targets T contributes to performance improvement = Predicting multiple future items simultaneously is more effective than predicting just one

Evaluation Results Ablation Study Results Performance results based on the usage of each compontent P : personalization(user embedding), h: horizontal convolutional layer, v : vertical convolutional layer The best performance is achieved when all three components are used together

Conclusion Conclusion The author of this paper was proposing CASER, a novel approach to top-N sequential recommendation. CASER cap tures information from point-level and union-level sequential patterns, skip behaviors, and long-term user preferences. A unique aspect of CASER is it’s attempt to interpret a user’s 1D sequence as a 2D image representation. This approach could be particularly meaningful in industries where the sequential dependency of user behavior is weak.

Q & A Q / A
Tags