Query processing in Distributed Database System

7,547 views 21 slides Apr 08, 2020
Slide 1
Slide 1 of 21
Slide 1
1
Slide 2
2
Slide 3
3
Slide 4
4
Slide 5
5
Slide 6
6
Slide 7
7
Slide 8
8
Slide 9
9
Slide 10
10
Slide 11
11
Slide 12
12
Slide 13
13
Slide 14
14
Slide 15
15
Slide 16
16
Slide 17
17
Slide 18
18
Slide 19
19
Slide 20
20
Slide 21
21

About This Presentation

This is an PPT of DBMS. It include the following topic"Query processing in Distributed Database System".


Slide Content

QUERY PROCESSING IN DISTRIBUTED DATABASE SYSTEMS 1 Presented by: Muskaan MCA/25020/18

OUTLINE 2 What is Query ? What is Query Processor ? Main Problems of Query Processing Characteristics of Query Processor Main layers of Query Processing

st a t e m ent r equest i n g t he r e t r i e val of What is Query ? A que r y i s a information. A database  query  can be either a select  query or an action  query . A select  query  is a data retrieval query , while an action  query  asks for additional operations on the data, such as insertion, updating or deletion. 3

What is Query Processor ? The query processor in a DBMS receives as input , parses it, generates an execution plan, and completes the processing by executing the plan and returning the results to the client. In relational database , users perform the task of data processing and data manipulation with the help of high- level non-procedural language (e.g. SQL ). 4

What is Query Processor? Main function of a query processor is to transform a high- level-query (also called calculus query) into an equivalent lower-level query (also called algebraic query). This high-level query hides the low-level details from the user about the physical organization of the data and presents such an environment so that the user can handle the tasks of even complex queries in an easy, concise and simple fashion.

Main Problems of Query Processing Main problem of query processing is query optimization . It is a time consuming task, because many execution strategies are involved to minimize (optimize) computer re s our c e consumption . Time and space required to process the query is also an important factor for the performance of the query processing. 6

Important Characteristics of Query Processor Language Types of Optimization Optimization Timing Statistics 7

Important Characteristics of Query Processor Language The input language of query processing can be based on relational calculus or relational algebra.

Types of Optimization: Among all possible strategies for executing query, the one in which less time and space are required is the best solution for the optimization of query. 9

Optimization Timing : The actual time required to optimize the execution of a query is an important factor. If less time is required, then it is the best solution for query processing. 10

Statistics: The effectiveness of query optimization relies on statistical information of the database, i.e. how many fragments query will be needed, which operation should be done first . 11

Main layers of Query Processing Query processing involves 4 main layers: Query Decomposition Data Localization Global Query Optimization Distributed Execution 12

Main layers of Query Processing 13 Query Decomposition Calculus Query on Global Relations Algebraic Query on Global Relations Data Localization Algebraic Query on Fragments Global Optimization Distributed Query Execution Plan Distributed Execution Global Sc h e ma F r a g ment Schema Alloc a tion Schema Control Site Local Sites Fig. Generic Layering Scheme for Distributed Query Processing

Query Decomposition The first layer decomposes the calculus query into an algebraic query on global relations. Query decomposition c an be viewed as four successive steps: 1) Normalization, 2) Analysis, 3) Elimination of redundancy, and 4) Rewriting. 14

15 Query Decomposition Normalization First , the calculus query is rewritten in a normalized form that is suitable for manipulation. Its main objective is to isolate data so that additions, deletions, and modifications of a  field can be made in just one table  Analysis Second , the normalized query is analysed so that incorrect queries are detected and rejected as early as possible .

Query Decomposition Elimination of Redundancy Third , the correct query is simplified. One way to simplify a query is to eliminate redundan cy . Rewriting Fourth , the calculus query is restructured as an algebraic query. Several algebraic queries can be derived from the same calculus query, and that some algebraic queries are “better” than others. 16

Localization of Distributed Data Output of the first layer is an algebraic query on distributed relations which is input to the second layer . The main role of this layer is to localize the query’s data using data distribution information . We know that relations are fragmented and stored in disjoint subsets, called fragments where each fragment is stored at different site . 17

Global Query Optimization The input to the third layer is a fragment algebraic query . The goal of this layer is to find an execution strategy for the algebraic fragment query which is close to optimal . The previous layers have already optimized the query, by eliminating redundancies. 18

Global Query Optimization Query optimization consists of Finding the best ordering of operations in the query, ii)Finding the communication operations which minimize a cost function. 19

Distributed Execution The last layer is performed by all the sites having fragments involved in the query. Each subquery, called a local query, is executing at one site. It is then optimized using the local schema of the site. 20

THANK YOU
Tags