Probabilistic retrieval model

2,729 views 15 slides Oct 23, 2013
Slide 1
Slide 1 of 15
Slide 1
1
Slide 2
2
Slide 3
3
Slide 4
4
Slide 5
5
Slide 6
6
Slide 7
7
Slide 8
8
Slide 9
9
Slide 10
10
Slide 11
11
Slide 12
12
Slide 13
13
Slide 14
14
Slide 15
15

About This Presentation

No description available for this slideshow.


Slide Content

Probabilistic Retrieval M odel Baradhidasan P 2 nd Year Pondicherry University

INTRODUCTION Probability theory has been used as a principal means for modeling the retrieval process in mathematical terms . In conventional retrieval situations a document is retrieved whenever the keyword set attached to the appears similar in some sense to the query keywords. In this case the document is considered relevant to the query.

Cont.. Since the relevance of a document with respect to a query is a matter of degree. It can be postulated that when the document and query vectors are sufficiently similar, the corresponding probability of relevance is large enough to make it reasonable to retrieve the document in response query Applies the theory of probability

Why use Probabilities? Information Retrieval deals with uncertain information Probability is a measure of uncertainty Probabilistic Ranking Principle provable minimization of risk Probabilistic Inference To justify your decision

Approach The basic underlying tenet of the probabilistic approach to Retrieval is that, for optimal performance documents should be ranked in order of decreasing probability of relevance. Several models based on probabilistic approaches have been advocated here we shall briefly look into three such models.

objectives Highlight influential work on probabilistic models for IR Provide a working understanding of the probabilistic Techniques through a set of common implementation tricks Establish relationships between the popular approaches: stress common ideas, explain differences Outline issues in extending the models to interactive, cross-language, multi-media

Maron and kuhns Maron and kuhns proposed a model for probabilistic retrieval as early as in 1960. they advocated that the probability that a given document would be relevant to a user can be assessed by a calculation of the probability, for each document in the collection . That a user submitting a particular query would judge that document relevant Thus,

Cont.. For a query consisting of only one term (B), the probability that particular document (DM) will be judged relevant is the ratio of users who submit query term (B) and consider the document (DM) to be relevant in relation to the number of users who submitted the query term (B) Adopting this approach one has to employ historical information to calculate the probability of relevance the number times users.

Cont.. Who submitted a particular query term (B) judged a document (Dm) relevant compared with the total number of users who submitted that particular query term (B)

Salton approach The model suggested by salton and mcgill takes a different approach. The essence of this model is that if estimates for the probability of occurrence of various terms in relevant document can be calculated, then the probabilities that a document will be retrieved given that it is relevant, several experiments have shown that the probabilistic model can yield good results.

Two basic parameters The probability of relevance –pr( rel ) The probability of non-relevance-pr(non- rel ) if relevance is considered as a binary property then pr(non- rel )= 1 pr( rel ) However, there are two cost parameters associated with the process of retrieval A1- the loss associated with the retrieval of a non-relevant record

Cont… A2 the loss associated with the non- retrieval of a relevant record Because of the fact that retrieval of anon-relevant record carries a loss of a1 {1-p( rel )}, and the rejection of a relevant item has an associated loss factor of a2pr( rel ), the total loss for a given retrieval process will be minimized if an item is retrieved whenever A2pr( rel )>a1pr( rel )

Cont… Detined , and an item may be retrieved whenever the value of g and DISC is greater than or equals zero, where g or DISC = P( rel ) a1 1-Pr( rel ) a2 The relevance properties of a record mist be related to the relevance properties of various terms attached to the records. The probabilities that a document is relevant and not relevant, given that is has been selected, are defined by P ( rel selected) and P (non- rel selected) respectively.

Historical Background The first attempts to develop a probabilistic theory of retrieval were made over 30 years ago [Moron and Kuhn's 1960; Miller 1971], and since then there has been a steady development of the approach. There are already several operational IR systems based upon probabilistic or semi probabilistic models.   One major obstacle in probabilistic or semiprobabilistic IR models is finding methods for estimating the probabilities used to evaluate the probability of relevance that are both theoretically sound and computationally efficient.  

Conclusion
Tags