Information Retrieval Models

4,340 views 24 slides Mar 05, 2020

Slide 1 of 24

About This Presentation

Describes various information retrieval models

Size: 863.02 KB

Language: en

Added: Mar 05, 2020

Slides: 24 pages

Slide Content

Chapter 2Modeling
資工4B 86075800
陳建勳

Introduction.
Traditional information retrieval systems
usually adopt index termsto index and retrieve
documents.
An index term is a keyword(or group of related
words) which has some meaning of its own
(usually a noun).

The advantage of using index
terms
Simple
The semantic of the documents and of the
user information need can be naturally
expressed through sets of index terms.
Ranking algorithmsare at the core of information
retrieval systems(predicting which documents are
relevantand which are not).

A taxonomy of information retrieval
models
Retrieval:
Ad hoc
Filtering
Classic Models
Browsing
U
S
E
R
T
A
S
K
Boolean
Vector
Probabilistic
Structured Models
Non-overlapping lists
Proximal Nodes
Flat
Structured Guided
Hypertext
Browsing
Fuzzy
Extended Boolean
Set Theoretic
Algebraic
Generalized Vector
Lat. Semantic Index
Neural Networks
Inference Network
Belief Network
Probabilistic

Index TermsFull TextFull Text+
Structure
RetrievalClassic
Set Theoretic
Algebraic
Probabilistic
Classic
Set
Theoretic
Algebraic
Probabilistic
Structured
BrowsingFlat Flat
Hypertext
Structure Guided
Hypertext
Figure 2.2Retrieval models most frequently associated with distinct
combinations of a document logical view and a user task.

Retrieval : Ad hocand Filtering
Ad hoc : The documents in the collection
remain relatively static while new queries
are submtted to the system.
Filtering : The queries remain relatively
static while new documents come into the
system

Filtering
Typically, the filtering task simply
indicates to the user the documents
which might be of interest to him.
Routing : Rank the filtering documents
and show this ranking to the user.
Constructing user profiles in two ways.

A formal characterization of IR models
D: A set composed of logical views(or
representation) for the documents in the
collection.
Q: A set composed of logical views(or
representation) for the user information
needs(queries).
F: A framework for modeling document
representations, queries, and their relationships.
R(q
i, d
j): A ranking function which defines an
ordering among the documents with regard to the
query.

Classic information retrieval
model
Basic concepts : Each document is
described by a set of representative
keywords called index terms.
Assign a numerical weights to distinct
relevance between index terms.

Define
k
i: A generic index term
K : The set of all index terms {k
1,…,k
t}
w
i,j: A weight associated with index term
k
iof a document d
j
g
i: A function returns the weight associated
with k
i in any t-dimensoinal vector( g
i(d
j)=w
i,j )

Boolean model
Based on a binary decision criterion without any
notion of a grading scale.
Boolean expressions have precise semantics.It is
not simple to translate an information need into
a Boolean expression.
Can be represented as a disjunction of
conjunction vectors(in disjunctive normal form-
DNF).

Vector model
Assign non-binary weights to index
terms in queries and in documents.
Compute the similarity between
documents and query.
More precise than Boolean model.

想法
We think of the documents as a collection C
of objects and think of the user query as a
specification of a set A of objects.In this
scenario, the IR problem can be reduced to
the problem of determine which documents
are in the set A and which ones are not(i.e.,
the IR problem can be viewed as a
clustering problem).

Intra-cluster : One needs to determine
what are the features which better
describe the objects in the set A.
Inter-cluster : One needs to determine
what are the features which better
distinguish the objects in the set A.

tf : inter-clustering similarity is quantified by
measuring the raw frequency of a term k
i
inside a document d
j, such term frequencyis
usually referred to as the tf factor and
provides one measure of how well that term
describes the document contents.
idf : inter-clustering similarity is quantified by
measuring the inverse of the frequency of a
term k
iamong the documents in the
collection.This frequency is often referred to
as the inverse document frequency.

Vector model is simple and fast. It’s a
popular retrieval model.
Disadvantage : Index terms are
assumed to be mutually independent. It
doesn’t account for index term
dependencies.

Probabilistic model
We can think of the querying process
as a process of specifying the properties
of an ideal answer set(The problem is
that we do not know exactly what these
properties are.).

Structured text retrieval model
Retrieval models which combine information on
text content with information on the document
structure are called structured text retrieval
model.
Match point: refer to the position in the text
of a sequence of words which matches the user
query.
Region: refer to a contiguous portion of the
text.
Node: refer to a structural component of the
document such as a chapter, a section, a
subsection.

Model based on Non-overlapping
lists
Divide the whole text of each document
in non-overlapping text regions which
are collected in a list.
Text regions in the same list have no
overlapping, but text regions from
distinct lists might overlap.

Model based on Proximal
nodes
A model which allows the definition of
independent hierarchical indexing
structures over the same document text.
Each of these index structures is a strict
hierarchy composed of chapters,
sections, paragraphs, pages, and lines
which called nodes.

Models for browsing
Flat browsing
Structure guided browsing
The hypertext model

Flat browsing
The documents might be represented
as dots in a plan or as elements in a list.
Relevance feedback
Disadvantage : In a given page or
screen there may not be any indication
about the context where the user is.

Structure guided browsing
Organized in a directory structure. It
groups documents covering related
topics.
The same idea can be applied to a
single document.
Using history map.

The hypertext model
Written text is usually conceived to be
read sequentially.
The reader should not expect to fully
understand the message conveyed by
the writer by randomly reading pieces
of text here and there.

Information Retrieval Models

About This Presentation

Slide Content

Tags

Categories

Download

Quick Actions

Statistics

Related Slideshows

Information Retrieval Models

About This Presentation

Slide Content

Slide 1

Slide 2

Slide 3

Slide 4

Slide 5

Slide 6

Slide 7

Slide 8

Slide 9

Slide 10

Slide 11

Slide 12

Slide 13

Slide 14

Slide 15

Slide 16

Slide 17

Slide 18

Slide 19

Slide 20

Slide 21

Slide 22

Slide 23

Slide 24

Tags

Categories

Download

Quick Actions

Statistics

Related Slideshows

Pray For The Peace Of Jerusalem and You Will Prosper

Don_t_Waste_Your_Life_God.....powerpoint

VILLASUR_FACTORS_TO_CONSIDER_IN_PLATING_SALAD_10-13.pdf

Fertility awareness methods for women in the society

Chapter 5 Arithmetic Functions Computer Organisation and Architecture

syakira bhasa inggris (1) (1).pptx.......