Types of recommender systems in information retrieval. Collaborative filtering is a very widely used method in recommendation systems. Content based filtering and collaborative filtering are two major approaches. Hybrid systems are now being employed to get better recommendations. One such method is...
Types of recommender systems in information retrieval. Collaborative filtering is a very widely used method in recommendation systems. Content based filtering and collaborative filtering are two major approaches. Hybrid systems are now being employed to get better recommendations. One such method is content-boosted collaborative filtering.
Size: 1.17 MB
Language: en
Added: Mar 11, 2017
Slides: 32 pages
Slide Content
IR Presentation on Collaborative Filtering By- Neha Kulkarni (5202 ) ME Computer Pune Institute of Computer Technology
Recommender systems Types of recommender systems Content based filtering Collaborative filtering Hybrid systems Content boosted collaborative filtering Evaluation of the CBCF Advantages Conclusion O verview
Recommender system predict “rating” or “preference” that a user given to an item. Recommendation done by two ways: Content based filtering Collaborative filtering Recommender system
Content based filtering select an item based on correlation between the content of the items and user’s preference. Keywords are used to describe the items and user profile. Content based filtering
Collaborative filtering
Collaborative filtering based on collecting and analyzing a large amount of information on user’s behavior , activates or preference and predicting what user’s will like based on similarity to other user’s. For measuring similarity many algorithm used: K-nearest neighbor Pearson correlation Collaborative filtering
Collaborative filtering gives recommend items that are relevant to the user Content based recommendation gives the user profile content Because of this collaborative filtering is used mostly Difference
Cold start : we must have enough data in the system to find match Sparsity : most of the user do not rate most of items and hence the user-item rating matrix is “sparse”, therefore the probability of finding a set of users with significant similar rating is usually low. First rater : can not recommend an item that has not been previously rated. Disadvantages
Hybrid approach uses content based prediction to convert a sparse user rating matrix into a full use rating matrix and then uses collaborative filtering to provide recommendation. Ex: they use hybrid approach in domain of movie recommendation Hybrid approach
In neighborhood-based algorithms, a subset of users are chosen based on their similarity to the active user, and a weighted combination of their ratings is used to produce predictions for the active user. Steps: Weight all users with respect to similarity with the active user. N eighborhood-based algorithm
Select n users that have the highest similarity with the active user . Compute a prediction from a weighted combination of the selected neighbors ’ ratings .
Hybrid Models 1 . Implementing collaborative and content-based methods separately and combining their predictions 2. Incorporating some content-based characteristics into a collaborative approach 3. Incorporating some collaborative characteristics into a content-based approach 4. Constructing a general unifying model that incorporates both content-based and collaborative characteristics .
Netflix Example Netflix is a good example of hybrid system using content-boosted collaborative filtering. Recommendations are made by comparing the watching and searching habits of similar users(CF) and also by offering movies that share characteristics with films that the user has rated highly(Content-Based)
Amazon Example Another good example of hybrid recommendation system Stores the click stream of the user and usage pattern of the user and other users with similar preferences(CF) and also by offering products that share characteristics with products that the user has rated highly(Content-Based)
Content-Boosted Collaborative Filtering Use content-based predictor to enhance existing user data and then provide personalized predictions using collaborative filtering. I nput Input Content-based recommender CF-based recommender Combiner Recommendations
Content-Boosted Collaborative Filtering Create a pseudo-user rating for each user ‘u’ in the database. r u,i – actual rating of the user ‘u’ for item ‘i’ Cu,i – rating predicted by pure content-based system The two parameters put together give the dense pseudo-ratings matrix V .
Similarity between active user ‘a’ and another user ‘u’ is found out using Pearson’s correlation coefficient. Instead of using original user votes, we substitute the values provided by pseudo-user ratings vector v a and v u
Harmonic Mean Weighting I naccuracies in pseudo user-ratings vector often yielded misleadingly high correlations between the active user and other users. Hence to incorporate confidence (or the lack thereof) in our correlations, we weight them using the Harmonic Mean weighting factor ( HM weighting).
w here : n i - items rated by user i Harmonic mean tends to bias the weight towards the lower of the two values. The choice of the threshold as 50 ratings was based on 10-fold cross-validation.
To the harmonic mean weight, we add the significance weighting factor to obtain hybrid correlation weight . If two users have rated less than 50 items, significance weighting factor is n/50 or else if more than 50 items are rated, then it is 1.
Self-Weighting Factor To provide the pseudo-active user more importance than the neighbours(increase confidence in the pure-content predictions from the pseudo-active user) incorporate self-weighting factor in the final prediction. m ax- overall confidence on the content-based predictor
Producing predictions Where : Pa,i : final CBCF prediction for user a and item i Ca,i : pure content-based predictions for user a and item I n : size of the neighbourhood The denominator is a normalizing factor that ensures all weights sum to 1.
Evaluation Mean Absolute Error (statistical accuracy) : average absolute difference between predicted ratings and actual ratings ROC curve (decision support) : sensitivity : probability that a good item is accepted by the filter specificity : probability that a bad item is rejected by the filter
Why this system is better? Overcoming the first- rater problem Tackles sparsity Finding better neigbours Overcoming cold-start problem
Conclusion CBCF elegantly exploits content within a collaborative framework. Overcomes problems faced by pure content or collaborative systems. Incorporating content information into collaborative framework can improve the recommender systems.
References Data mining-Concepts and Techniques : 3 rd edition Mining the Web by Chakarabarti Web Data Mining, Springer “ Content-Boosted Collaborative Filtering for Improved Recommendations”, Prem Melville, Raymond J. Mooney, Ramadass Nagarajan , AAAI-02 Proceedings, 2002