Collaborative filtering

15koolneha 5,157 views 32 slides Mar 11, 2017
Slide 1
Slide 1 of 32
Slide 1
1
Slide 2
2
Slide 3
3
Slide 4
4
Slide 5
5
Slide 6
6
Slide 7
7
Slide 8
8
Slide 9
9
Slide 10
10
Slide 11
11
Slide 12
12
Slide 13
13
Slide 14
14
Slide 15
15
Slide 16
16
Slide 17
17
Slide 18
18
Slide 19
19
Slide 20
20
Slide 21
21
Slide 22
22
Slide 23
23
Slide 24
24
Slide 25
25
Slide 26
26
Slide 27
27
Slide 28
28
Slide 29
29
Slide 30
30
Slide 31
31
Slide 32
32

About This Presentation

Types of recommender systems in information retrieval. Collaborative filtering is a very widely used method in recommendation systems. Content based filtering and collaborative filtering are two major approaches. Hybrid systems are now being employed to get better recommendations. One such method is...


Slide Content

IR Presentation on Collaborative Filtering By- Neha Kulkarni (5202 ) ME Computer Pune Institute of Computer Technology

Recommender systems Types of recommender systems Content based filtering Collaborative filtering Hybrid systems Content boosted collaborative filtering Evaluation of the CBCF Advantages Conclusion O verview

Recommender system predict “rating” or “preference” that a user given to an item. Recommendation done by two ways: Content based filtering Collaborative filtering Recommender system

Content based filtering select an item based on correlation between the content of the items and user’s preference. Keywords are used to describe the items and user profile. Content based filtering

Collaborative filtering

Collaborative filtering based on collecting and analyzing a large amount of information on user’s behavior , activates or preference and predicting what user’s will like based on similarity to other user’s. For measuring similarity many algorithm used: K-nearest neighbor Pearson correlation Collaborative filtering

Collaborative filtering gives recommend items that are relevant to the user Content based recommendation gives the user profile content Because of this collaborative filtering is used mostly Difference

Cold start : we must have enough data in the system to find match Sparsity : most of the user do not rate most of items and hence the user-item rating matrix is “sparse”, therefore the probability of finding a set of users with significant similar rating is usually low. First rater : can not recommend an item that has not been previously rated. Disadvantages

Hybrid approach uses content based prediction to convert a sparse user rating matrix into a full use rating matrix and then uses collaborative filtering to provide recommendation. Ex: they use hybrid approach in domain of movie recommendation Hybrid approach

In neighborhood-based algorithms, a subset of users are chosen based on their similarity to the active user, and a weighted combination of their ratings is used to produce predictions for the active user. Steps: Weight all users with respect to similarity with the active user. N eighborhood-based algorithm

Select n users that have the highest similarity with the active user . Compute a prediction from a weighted combination of the selected neighbors ’ ratings .

Hybrid Models 1 . Implementing collaborative and content-based methods separately and combining their predictions 2. Incorporating some content-based characteristics into a collaborative approach 3. Incorporating some collaborative characteristics into a content-based approach 4. Constructing a general unifying model that incorporates both content-based and collaborative characteristics .

Netflix Example Netflix is a good example of hybrid system using content-boosted collaborative filtering. Recommendations are made by comparing the watching and searching habits of similar users(CF) and also by offering movies that share characteristics with films that the user has rated highly(Content-Based)

Amazon Example Another good example of hybrid recommendation system Stores the click stream of the user and usage pattern of the user and other users with similar preferences(CF) and also by offering products that share characteristics with products that the user has rated highly(Content-Based)

Content-Boosted Collaborative Filtering Use content-based predictor to enhance existing user data and then provide personalized predictions using collaborative filtering. I nput Input Content-based recommender CF-based recommender Combiner Recommendations

Content-Boosted Collaborative Filtering Create a pseudo-user rating for each user ‘u’ in the database. r u,i – actual rating of the user ‘u’ for item ‘i’ Cu,i – rating predicted by pure content-based system The two parameters put together give the dense pseudo-ratings matrix V .

Similarity between active user ‘a’ and another user ‘u’ is found out using Pearson’s correlation coefficient. Instead of using original user votes, we substitute the values provided by pseudo-user ratings vector v a and v u

Harmonic Mean Weighting I naccuracies in pseudo user-ratings vector often yielded misleadingly high correlations between the active user and other users. Hence to incorporate confidence (or the lack thereof) in our correlations, we weight them using the Harmonic Mean weighting factor ( HM weighting).

w here : n i - items rated by user i Harmonic mean tends to bias the weight towards the lower of the two values. The choice of the threshold as 50 ratings was based on 10-fold cross-validation.

To the harmonic mean weight, we add the significance weighting factor to obtain hybrid correlation weight . If two users have rated less than 50 items, significance weighting factor is n/50 or else if more than 50 items are rated, then it is 1.

Self-Weighting Factor To provide the pseudo-active user more importance than the neighbours(increase confidence in the pure-content predictions from the pseudo-active user) incorporate self-weighting factor in the final prediction. m ax- overall confidence on the content-based predictor

Producing predictions Where : Pa,i : final CBCF prediction for user a and item i Ca,i : pure content-based predictions for user a and item I n : size of the neighbourhood The denominator is a normalizing factor that ensures all weights sum to 1.

Evaluation Mean Absolute Error (statistical accuracy) : average absolute difference between predicted ratings and actual ratings ROC curve (decision support) : sensitivity : probability that a good item is accepted by the filter specificity : probability that a bad item is rejected by the filter

Why this system is better? Overcoming the first- rater problem Tackles sparsity Finding better neigbours Overcoming cold-start problem

Conclusion CBCF elegantly exploits content within a collaborative framework. Overcomes problems faced by pure content or collaborative systems. Incorporating content information into collaborative framework can improve the recommender systems.

References Data mining-Concepts and Techniques : 3 rd edition Mining the Web by Chakarabarti Web Data Mining, Springer “ Content-Boosted Collaborative Filtering for Improved Recommendations”, Prem Melville, Raymond J. Mooney, Ramadass Nagarajan , AAAI-02 Proceedings, 2002

Thank you!!