Differentiate between parallel IR and distributed IR.ppt

MARasheed3 4 views 24 slides Mar 05, 2025
Slide 1
Slide 1 of 24
Slide 1
1
Slide 2
2
Slide 3
3
Slide 4
4
Slide 5
5
Slide 6
6
Slide 7
7
Slide 8
8
Slide 9
9
Slide 10
10
Slide 11
11
Slide 12
12
Slide 13
13
Slide 14
14
Slide 15
15
Slide 16
16
Slide 17
17
Slide 18
18
Slide 19
19
Slide 20
20
Slide 21
21
Slide 22
22
Slide 23
23
Slide 24
24

About This Presentation

Differentiate between parallel IR and distributed IR


Slide Content

Parallel and Distributed IR
Eric Brown

Parallel Computing
SISD:single instruction stream, single data stream.
SIMD:single instruction stream, multiple data stream.
MISD:multiple instruction stream, single data stream.
MIMD:multiple instruction stream, multiple data stream.

Performance Measures
S=
Running time of best available sequential algorithm
---------------------------------------------------------------
Running time of parallel algorithm
S<=
1
f +(1-f)/N
1
f
<=
=
S
N

Parallel IR
Introduction:

Develop new retrieval strategies that directly
lend themselves to parallel implementation.

Adapt existing, well studied information retrieval
algorithms to parallel processing.

MIMD Architecture

MIMD Architecture
Inverted Files

Logical Document Partitioning
Essentially the same basic underlying inverted file ind
ex as in the original sequential algorithm.

Physical Document Partitioning
Each subcollection has its own inverted file and the se
arch processes shard nothing during query evaluation.

MIMD Architecture
Logical document partitioning requires less commu
nication than physical document partitioning with si
milar parallelization, and so is likely to provide bett
er overall performance.
Physical document partitioning, on the other hand,
offers more flexibility and conversion of an existing
IR system into a parallel IR system is simpler using
physical document partition.

MIMD Architectures
Term partitioning

When term partitioning is used with an inverted file is
created for the document collection and the inverted lists
are spread across the processors.
Assuming each processor has its own I/O channel
and disks when term distribution in the documents
and the queries are more skewed, document partition
performs better. When terms are uniformly
distributed in user queries, term partition performs
better.

MIMD Architecture

SIMD Architecture
Signature Files

SIMD Architecture
Signature Files

SIMD Architecture
Signature Files

SIMD Architectures
Inverted Files

SIMD Architectures

SIMD Architectures
Inverted Files

SIMD Architectures

Distributed IR
Introduction

A distributed computing system can be viewed
as a MIMD parallel processor with relatively
slow inter-processor communication channel and
the freedom to employ a heterogeneous
collection of processors in the system.

Distributed IR
Introduction

Distributed Model is very similar to the MIMD
parallel processing model.

The main difference here is that subtasks run on
different computers and the communication
between the subtasks is performed using network
protocol such as TCP/IP.

Collection Partitioning
The procedure used to adding documents to
search servers in a distributed IR system
depends a number of factors.

Consider whether or not the system is centrally
administered.

Collection Partitioning

When the distribute system is centrally
administered, more options are available.
The first option is simple replication of the collection
across all of the search servers.
The second option is random distribution of the
documents.
The final option is explicit semantic partitioning of the
documents.

Source Selection
Source selection is the process of determining which of the
distributed document collections are most likely to contain
relevant documents for the current query, and therefore
should receive the query for processing.
The basic technique is to treat each collection as if it were a
single large document, index the collections, and evaluate
the query against the collections to produce a ranked listing
of collections.

Query Processing
Query processing in a distributed IR system proceeds
as follows:
Select collection to search.
Distribute query to selected collections.
Evaluate query at distributed collection in parallel.
Combine results from distributed collection into final result.

Web Issues
The parallel and distributed techniques
described above can then be used directly as
if the Web were any other large document
collection. This is the approach currently
taken by most of the popular Web search
services.

Trends and Research Issues
The trend in parallel hardware is the develop of
general MIMD machines.
Many challenges remain in the area of parallel and
distributed text retrieval.

The first challenge is measuring retrieval effectiveness
on large text collections.

The second significant challenge is interoperability, or
building distributed IR systems form heterogeneous
components.
Tags