Differentiate between parallel IR and distributed IR.ppt

MARasheed3 4 views 24 slides Mar 05, 2025

Slide 1 of 24

About This Presentation

Differentiate between parallel IR and distributed IR

Size: 191.08 KB

Language: en

Added: Mar 05, 2025

Slides: 24 pages

Slide Content

Parallel and Distributed IR
Eric Brown

Parallel Computing
SISD：single instruction stream, single data stream.
SIMD：single instruction stream, multiple data stream.
MISD：multiple instruction stream, single data stream.
MIMD：multiple instruction stream, multiple data stream.

Performance Measures
S=
Running time of best available sequential algorithm
---------------------------------------------------------------
Running time of parallel algorithm
S<=
1
f +(1-f)/N
1
f
<=
=
S
N

Parallel IR
Introduction：

Develop new retrieval strategies that directly
lend themselves to parallel implementation.

Adapt existing, well studied information retrieval
algorithms to parallel processing.

MIMD Architecture

MIMD Architecture
Inverted Files

Logical Document Partitioning
Essentially the same basic underlying inverted file ind
ex as in the original sequential algorithm.

Physical Document Partitioning
Each subcollection has its own inverted file and the se
arch processes shard nothing during query evaluation.

MIMD Architecture
Logical document partitioning requires less commu
nication than physical document partitioning with si
milar parallelization, and so is likely to provide bett
er overall performance.
Physical document partitioning, on the other hand,
offers more flexibility and conversion of an existing
IR system into a parallel IR system is simpler using
physical document partition.

MIMD Architectures
Term partitioning

When term partitioning is used with an inverted file is
created for the document collection and the inverted lists
are spread across the processors.
Assuming each processor has its own I/O channel
and disks when term distribution in the documents
and the queries are more skewed, document partition
performs better. When terms are uniformly
distributed in user queries, term partition performs
better.

MIMD Architecture

SIMD Architecture
Signature Files

SIMD Architectures
Inverted Files

SIMD Architectures

SIMD Architectures
Inverted Files

SIMD Architectures

Distributed IR
Introduction

A distributed computing system can be viewed
as a MIMD parallel processor with relatively
slow inter-processor communication channel and
the freedom to employ a heterogeneous
collection of processors in the system.

Distributed IR
Introduction

Distributed Model is very similar to the MIMD
parallel processing model.

The main difference here is that subtasks run on
different computers and the communication
between the subtasks is performed using network
protocol such as TCP/IP.

Collection Partitioning
The procedure used to adding documents to
search servers in a distributed IR system
depends a number of factors.

Consider whether or not the system is centrally
administered.

Collection Partitioning

When the distribute system is centrally
administered, more options are available.
The first option is simple replication of the collection
across all of the search servers.
The second option is random distribution of the
documents.
The final option is explicit semantic partitioning of the
documents.

Source Selection
Source selection is the process of determining which of the
distributed document collections are most likely to contain
relevant documents for the current query, and therefore
should receive the query for processing.
The basic technique is to treat each collection as if it were a
single large document, index the collections, and evaluate
the query against the collections to produce a ranked listing
of collections.

Query Processing
Query processing in a distributed IR system proceeds
as follows：
Select collection to search.
Distribute query to selected collections.
Evaluate query at distributed collection in parallel.
Combine results from distributed collection into final result.

Web Issues
The parallel and distributed techniques
described above can then be used directly as
if the Web were any other large document
collection. This is the approach currently
taken by most of the popular Web search
services.

Trends and Research Issues
The trend in parallel hardware is the develop of
general MIMD machines.
Many challenges remain in the area of parallel and
distributed text retrieval.

The first challenge is measuring retrieval effectiveness
on large text collections.

The second significant challenge is interoperability, or
building distributed IR systems form heterogeneous
components.

Differentiate between parallel IR and distributed IR.ppt

About This Presentation

Slide Content

Tags

Categories

Download

Quick Actions

Statistics

Related Slideshows

Differentiate between parallel IR and distributed IR.ppt

About This Presentation

Slide Content

Slide 1

Slide 2

Slide 3

Slide 4

Slide 5

Slide 6

Slide 7

Slide 8

Slide 9

Slide 10

Slide 11

Slide 12

Slide 13

Slide 14

Slide 15

Slide 16

Slide 17

Slide 18

Slide 19

Slide 20

Slide 21

Slide 22

Slide 23

Slide 24

Tags

Categories

Download

Quick Actions

Statistics

Related Slideshows

8-top-ai-courses-for-customer-support-representatives-in-2025.pptx

7-essential-ai-courses-for-call-center-supervisors-in-2025.pptx

25-essential-ai-courses-for-user-support-specialists-in-2025.pptx

8-essential-ai-courses-for-insurance-customer-service-representatives-in-2025.pptx

Know for Certain

PPT OPD LES 3ertt4t4tqqqe23e3e3rq2qq232.pptx