Let r
x be the vector of user x’s ratings
Jaccard similarity measure
▪Problem: Ignores the value of the rating
Cosine similarity measure
▪sim(x, y) = cos(r
x, r
y) =
�
�⋅�
�
||�
�||⋅||�
�||
▪Problem: Treats some missing ratings as “negative”
Pearson correlation coefficient
▪S
xy = items rated by both users x and y
1/25/22 Jure Leskovec, Stanford CS246: Mining Massive Datasets 24
r
x = [1, _, _, 1, 3]
r
y = [1, _, 2, 2, _]
r
x, r
y as sets:
r
x = {1, 4, 5}
r
y = {1, 3, 4}
r
x, r
y as points:
r
x = {1, 0, 0, 1, 3}
r
y = {1, 0, 2, 2, 0}
r
x, r
y … avg.
rating of x, y
��??????�,�=
σ
�∈??????
��
�
��−�
��
��−�
�
σ
�∈??????
��
�
��−�
�
??????
σ
�∈??????
��
�
��−�
�
??????