itm661-lecture0VBBBBBBBBBBBBBBM3-part2-2015.pdf

Lecture 3/2 Query Processing
•T. Connolly, and C. Begg, “Database Systems: A Practical Approach to Design, Implementation, and Management”, 5th edition,
Addison-Wesley, 2009. ISBN: 0-321-60110-6, ISBN-13: 978-0-321-60110-0 (International Edition).
•T. Connolly, and C. Begg, “Database Systems: A Practical Approach to Design, Implementation, and Management”, 4th edition,
Addison-Wesley, 2004. ISBN: 0-321-21025-5.
•R. Elmasri and S. B. Navathe, “Fundamentals of Database Systems”, 5th ed., Pearson, 2007, ISBN: 0-321-41506-X.
ITM661 – Database Systems

Query Processing 2
Objectives
Objectives of query processing and optimization.
Static versus dynamic query optimization.
How a query is decomposed and semantically analyzed.
How to create a R.A.T. to represent a query.
Rules of equivalence for RA operations.
How to apply heuristic transformation rules to improve
efficiency of a query.
How pipelining can be used to improve efficiency of
queries.
Difference between materialization and pipelining.
Advantages of left-deep trees.
(R.A.T = Relational Algebra Tree)

Query Processing 3
An Example (Branch and Staff Relations)
•A relation is a table with
columns and rows.
Only applies to logical structure of
the database, not the physical
structure.
•An attribute is a named
column of a relation.
•A domain is a set of
allowable values for one or
more attributes.
•A tuple is a row of a
relation.
•A degree is a number of
attributes in a relation.
•A cardinality is a number of
tuples in a relation.
•A relational database is a
collection of normalized
relations.

Query Processing 4
Relational Algebra and Calculus
Relational algebra and relational calculus are formal
languages associated with the relational model.
Informally,
Relational algebra is a (high-level) procedural
language and
Relational calculus a non-procedural language.
However, formally both are equivalent to one another.
A language that produces a relation that can be derived
using relational calculus is relationally complete.

Query Processing 5
Relational Algebra
There are 5 basic operations, in relational algebra, that performs
most of the data retrieval operations needed.
Selection
Projection
Cartesian Product
Union
Set Difference
Also operations that can be expressed by 5 basic operations.
Join
Intersection
Division

Query Processing 6
An Example (Home Rental Database)

Query Processing 7
An Example (Home Rental Database)

Query Processing 8
An Example (Home Rental Database)

Query Processing 9
Query Processing
Activities involved in retrieving data from the
database.
Aims of QP:
transform query written in high-level language (e.g.
SQL), into correct and efficient execution strategy
expressed in low-level language (implementing
Relational Algebra);
execute the strategy to retrieve required data.

Query Processing 10
Query Optimization
Activity of choosing an efficient execution
strategy for processing query.
As there are many equivalent transformations of
same high-level query, aim of QO is to choose
one that minimizes resource usage.
Generally, reduce total execution time of query.
May also reduce response time of query.
Problem computationally intractable with large
number of relations, so strategy adopted is
reduced to finding near optimum solution.

Query Processing 11
Different Strategies
Find all Managers that work at a London branch.
SELECT *
FROM Staff s, Branch b
WHERE s.branchNo = b.branchNo AND
(s.position = ‘Manager’ AND b.city = ‘London’);
Three equivalent RA queries are:
(1) s
(position='Manager’) (city='London')  (staff.bno=branch.bno) (Staff  Branch)
(2) s
(position='Manager')  (city='London') ( Staff |><|
staff.bno=branch.bno Branch)
(3) (s
position='Manager’(Staff)) |><|
staff.bno=branch.bno (s
city='London' (Branch))

(1) s
(position='Manager’) (city='London')  (staff.bno=branch.bno) (Staff  Branch)
(2) s
(position='Manager')  (city='London') ( Staff
staff.bno=branch.bno Branch)
(3) (s
position='Manager’(Staff))
staff.bno=branch.bno (s
city='London' (Branch))
Staff Branch

s s
(position='Manager’) (city='London')  (staff.bno=branch.bno)
1000 50
50000
epsilon
12 Query Processing

(1) s
(position='Manager’) (city='London')  (staff.bno=branch.bno) (Staff  Branch)
(2) s
(position='Manager')  (city='London') ( Staff
staff.bno=branch.bno Branch)
(3) (s
position='Manager’(Staff))
staff.bno=branch.bno (s
city='London' (Branch))
Staff Branch
s s
(position='Manager’) (city='London')
1000 50
epsilon
1000
13 Query Processing

(1) s
(position='Manager’) (city='London')  (staff.bno=branch.bno) (Staff  Branch)
(2) s
(position='Manager')  (city='London') ( Staff
staff.bno=branch.bno Branch)
(3) (s
position='Manager’(Staff))
staff.bno=branch.bno (s
city='London' (Branch))
Staff Branch
s
1000 50
epsilon
1000
s
50
position='Manager’ city='London'
50 5
14 Query Processing

Query Processing 15
Cost Comparison in Different Strategies
Assume:
1000 tuples in Staff; 50 tuples in Branch;
50 Managers; 5 London branches;
No indexes or sort keys;
Results of any intermediate operations stored on disk;
Cost of the final write is ignored;
Tuples are accessed one at a time.
(1) s
(position='Manager’) (city='London')  (staff.bno=branch.bno) (Staff  Branch)
(2) s
(position='Manager')  (city='London') ( Staff |><|
staff.bno=branch.bno Branch)
(3) (s
position='Manager’(Staff)) |><|
staff.bno=branch.bno (s
city='London' (Branch))

(1) (1000 + 50) + 2*(1000 * 50) = 101,050
(2) 2*1000 + (1000 + 50) = 3,050
(3) 1000 + 2*50 + 50 + 2*(5) = 1,160
Cartesian product and
join operations are
much more expensive
than selection

Query Processing 16
4 Phases of Query Processing
(1) decomposition
(parsing and
validation)
(2) optimization
(3) code generation
(4) execution.

Query Processing 17
Dynamic versus Static Optimization
Two choices when first three phases of QP can be
carried out:
1.dynamically every time query is run.
Advantages if dynamic QO arise from fact that
information is up-to-date.
Disadvantages are that performance of query is
affected, time may limit finding optimum strategy.
2.Statically when query is first submitted.
Advantages of static QO are removal of runtime
overhead  more time to find optimum strategy.
Disadvantages arise from fact that chosen execution
strategy may no longer be optimal when query is
run.
Could use a hybrid approach to overcome this.

Query Processing 18
Query Decomposition
Aims are to transform high-level query into
RA query and check that query is syntactically
and semantically correct.
Typical stages are:
analysis
normalization
semantic analysis
simplification
query restructuring

Query Processing 19
Analysis (I)
Analyze query lexically and syntactically using compiler
techniques.
Verify relations and attributes exist.
Verify operations are appropriate for object type.
EX: SELECT staff_no
FROM Staff
WHERE position > 10;

This query would be rejected on two grounds:
Staff_no is not defined for Staff relation (should be
StaffNo).
Comparison ‘>10’ is incompatible with type ‘position’,
which is variable character string.

Query Processing 20
Analysis (II)
Finally, query transformed into some internal
representation more suitable for processing.
Some kind of query tree is typically chosen,
constructed as follows:
Leaf node created for each base relation.
Non-leaf node created for each intermediate
relation produced by RA operation.
Root of tree represents query result.
Sequence is directed from leaves to root.

Query Processing 21
Relational Algebra Tree

Query Processing 22
Normalization
Converts query into a normalized form for easier
manipulation.
Predicate can be converted into one of two forms:
Conjunctive normal form:

(position = 'Manager'  salary > 20000)  (bno = 'B3')

Disjunctive normal form:

(position = 'Manager'  bno = 'B3' ) 
(salary > 20000  bno = 'B3')

Query Processing 23
Semantic Analysis
Rejects normalized queries that are incorrectly
formulated or contradictory.
Query is incorrectly formulated if components do not
contribute to generation of result.
Query is contradictory if its predicate cannot be
satisfied by any tuple.
Algorithms to determine correctness exist only for
queries that do not contain disjunction and negation.

Query Processing 24
24
Semantic Analysis
For these queries, could construct:
A relation connection graph.
Normalized attribute connection graph.
Relation connection graph
Create node for each relation and node for result.
Create edges between two nodes that represent a
join, and edges between nodes that represent
projection.
If not connected, query is incorrectly formulated.

Query Processing 25
25
Normalized Attribute Connection Graph
Create node for each reference to an attribute, or
constant 0.
Create directed edge between nodes that represent a
join, and directed edge between attribute node and 0
node that represents selection.
Weight edges a  b with value c, if it represents
inequality condition (a  b + c); weight edges 0  a
with -c, if it represents inequality condition (a  c).
If graph has cycle for which valuation sum is
negative, query is contradictory.

Query Processing 26
Checking Semantic Correctness
SELECT p.propertyNo, p.street
FROM Client c, Viewing v, PropertyForRent p
WHERE c.clientNo = v.clientNo AND
c.maxRent >= 500 AND
c.prefType = 'Flat' AND p.ownerNo = 'CO93';
Relation connection graph not fully connected, so query is not
correctly formulated. Omiting the join (v.propertyNo =
p.propertyNo).
SELECT p.propertyNo, p.street
FROM Client c, Viewing v, PropertyForRent p
WHERE c.maxRent > 500 AND
c.clientNo = v.clientNo AND
v.propertyNo = p.propertyNo AND
c.prefType = 'Flat' AND c.maxRent < 200;
Normalized attribute connection graph has cycle between
nodes R.maxRent and 0 with negative valuation sum, so query
is contradictory

Query Processing 27
Checking Semantic Correctness
Relation
Connection graph
Normalized
attribute
connection graph

Query Processing 28
Simplification
Simplification strategy:
Detects redundant qualifications,
Eliminates common sub-expressions,
Transforms query to semantically equivalent but more
easily and efficiently computed form.
Typically, access restrictions, view definitions, and integrity
constraints are considered.
Assuming user has appropriate access privileges, first apply
well-known idempotency rules of Boolean algebra.
Ex.: p  p equivalent to p
p  true equivalent to true
Next step is Query restructuring

Query Processing 29
Transformation Rules for RA Operation (I)
Conjunctive selection operations can cascade into individual
selection operations (vice versa).
s
p  q  r(R) = s
p(s
q(s
r(R)))
Sometimes referred to as cascade of selection.
s
branchNo='B003‘  salary>15000(Staff) = s
branchNo='B003'(s
salary>15000(Staff))

Commutativity of selection.

s
p(s
q(R)) = s
q(s
p(R))

For example:
s
branchNo='B003'(s
salary>15000(Staff)) = s
salary>15000(s
branchNo='B003'(Staff))

Query Processing 30
Transformation Rules for RA Operation (II)
In a sequence of projection operations, only the last in the
sequence is required.

P
LP
M … P
N(R) = P
L (R)

For example:

P
lNameP
lName, branchNo(Staff) = P
lName (Staff)

Commutativity of selection and projection.

P
Ai, …, Am(s
p(R)) = s
p(P
Ai, …, Am(R)) where p {A
1, A
2, …, A
m}
For example:
P
fName,lName(s
lName='Beech'(Staff)) = s
lName='Beech'(P
fName,lName(Staff))

Query Processing 31
Transformation Rules for RA Operation (III)
Commutativity of q-join (and Cartesian product).
R
p S = S
p R
R  S = S  R
Rule also applies to equi-join and natural join. For example:
Staff
staff.branchNo=branch.branchNo Branch =
Branch
staff.branchNo=branch.branchNo Staff

Query Processing 32
Transformation Rules for RA Operation (IV)
Commutativity of selection and theta-join (or Cartesian
product).
If selection predicate involves only attributes of one of join
relations, selection and join (or Cartesian product) operations
commute:
s
p(R
r S) = (s
p(R))
r S
s
p(R  S) = (s
p(R))  S
where p is a constraint on attributes of R

Query Processing 33
Transformation Rules for RA Operation (V)
If selection predicate is conjunctive predicate having form (p
ู q), where p only involves attributes of R, and q only
attributes of S, selection and theta-join operations commute as:

s
p  q(R
r S) = (s
p(R))
r (s
q(S))
s
p  q(R  S) = (s
p(R))  (s
q(S))
For example:

s
position='Manager’  city='London'(Staff
staff.branchNo=branch.branchNo Branch) =

(s
position='Manager'(Staff))
staff.branchNo=branch.branchNo (s
city='London' (Branch))

Query Processing 34
Transformation Rules for RA Operation (VI)
Commutativity of projection and theta-join
(or Cartesian product).
If projection list is of form L = L
1  L
2, where L
1 only
involves attributes of R, and L
2 only involves attributes of S,
provided join condition only contains attributes of L,
projection and theta-join operations commute as:

P
L1 L2(R
r S) = (P
L1(R))
r (P
L2(S))

Query Processing 35
Transformation Rules for RA Operation (VII)
If join condition contains additional attributes not in L, say
attributes M = M
1  M
2 where M
1 only involves attributes of
R, and M
2 only involves attributes of S, a final projection op.
is required:

P
L1  L2(R
r S) = P
L1  L2( (P
L1  M1(R))
r (P
L2  M2(S)))
For example:

P
position, city, branchNo(Staff
staff.branchNo=branch.branchNo Branch) =
(P
position, branchNo(Staff))
staff.branchNo=branch.branchNo (P
city, branchNo (Branch))

and using the latter rule:
P
position, city(Staff
staff.branchNo=branch.branchNo Branch) =
P
position, city ((P
position, branchNo(Staff))
staff.branchNo=branch.branchNo ( P
city, branchNo (Branch)))

Query Processing 36
Transformation Rules for RA Operation (VIII)
Commutativity of union and intersection (but not set
difference).

R  S = S  R
R  S = S  R
Commutativity of selection and set operations (union,
intersection, and set difference).

s
p(R  S) = s
p(S)  s
p(R)
s
p(R  S) = s
p(S)  s
p(R)
s
p(R - S) = s
p(S) - s
p(R)
Commutativity of projection and union.

P
L(R  S) = P
L(S)  P
L(R)

Associativity of union and intersection (but not set difference).

(R  S)  T = S  (R  T)
(R  S)  T = S  (R  T)

Query Processing 37
Transformation Rules for RA Operation (IX)
Associativity of q-join (and Cartesian product).
Cartesian product and natural join are always associative:
(R S) T = R (S T)
(R  S)  T = R  (S  T)
If join condition q involves attributes only from S and T, then
theta-join is associative as follows:
(R
p S)
q  r T = R
p  r (S
q T)
For example:
((Staff
Staff.staffNo=PropertyForRent.staffNo PropertyForRent)

PropertyForRent.ownerNo=owner.ownerNo  staff.lName=owner.lName Owner)

=

((Staff
Staff.staffNo=PropertyForRent.staffNo  Staff.lName=Owner.lName (PropertyForRent
PropertyForRent.ownerNo = Owner.ownerNo Owner)

Query Processing 38
Use of Transformation Rules (I)
For prospective renters of flats, find properties that match
requirements and owned by CO93.

SELECT p.propertyNo, p.street
FROM Client c, Viewing v, PropertyForRent p
WHERE c.prefType = ‘Flat’ AND
c.clientNo = v.clientNo AND
v.propertyNo = p.propertyNo AND
c.maxRent >= p.rent AND
c.prefType = p.type AND
p.ownerNo = ‘CO93’;

Query Processing 39
Use of Transformation Rules (II)
SELECT p.propertyNo, p.street
FROM Client c, Viewing v, PropertyForRent p
WHERE c.prefType = ‘Flat’ AND
c.clientNo = v.clientNo AND
v.propertyNo = p.propertyNo AND
c.maxRent >= p.rent AND
c.prefType = p.type AND p.ownerNo = ‘CO93’;

Query Processing 40
Use of Transformation Rules (III)
SELECT p.propertyNo, p.street
FROM Client c, Viewing v, PropertyForRent p
WHERE c.prefType = ‘Flat’ AND
c.clientNo = v.clientNo AND
v.propertyNo = p.propertyNo AND
c.maxRent >= p.rent AND
c.prefType = p.type AND p.ownerNo = ‘CO93’;

Query Processing 41
Use of Transformation Rules (IV)
SELECT p.propertyNo, p.street
FROM Client c, Viewing v, PropertyForRent p
WHERE c.prefType = ‘Flat’ AND
c.clientNo = v.clientNo AND
v.propertyNo = p.propertyNo AND
c.maxRent >= p.rent AND
c.prefType = p.type AND p.ownerNo = ‘CO93’;

Query Processing 42
Heuristical Processing Strategies
Perform selection operations as early as possible.
Keep predicates on same relation together.
Combine Cartesian product with subsequent selection whose
predicate represents join condition into a join operation.
Use associativity of binary operations to rearrange leaf nodes
so leaf nodes with most restrictive selection operations
executed first.
Perform projection as early as possible.
Keep projection attributes on same relation together.
Compute common expressions once.
If common expression appears more than once, and result
not too large, store result and reuse it when required.
It is useful when querying views, as same expression is used
to construct view each time.

Query Processing 43
Pipelining
Materialization - output of one operation is stored in
temporary relation for processing by next.
Could also pipeline results of one operation to
another without creating temporary relation.
Known as pipelining or on-the-fly processing.
Pipelining can save on cost of creating temporary
relations and reading results back in again.
Generally, pipeline is implemented as separate
process or thread.

Query Processing 44
Types of Trees

Query Processing 45
Pipelining
With linear trees, relation on one side of each
operator is always a base relation.
However, as need to examine entire inner
relation for each tuple of outer relation, inner
relations must always be materialized.
This makes left-deep trees appealing as inner
relations are always base relations.
Reduces search space for optimum strategy,
and allows QO to use dynamic processing.
Not all execution strategies are considered.

Query Processing 46
Exercise I
Assume that there are 1000 tuples in Student, 40 tuples in
Course, 5000 tuples in TakeCourse, 40 tuples in Lecturer,
25 internal lecturers, 30 three-credit courses, 8 IT lecturers
and 200 IT students.
Assume that the following SQL statement is executed
SELECT *
FROM Lecturer L, Course C
WHERE C.LecturerID = L.LecturerID AND
L.InOrEx = ‘I’ AND C.Credit = 3
Compare the cost of disk accesses of three following equivalent relational
algebra queries corresponding to the above SQL statement.
(a) s
(C.LecturerID = L.LecturerID)  (C.Credit = 3)  ( L.InOrEx = ‘I’ )
(Course X Lecturer)
(b) s
(C.Credit = 3)

 ( L.InOrEx = ‘I’ )
( Course |X|
C.LecturerID = L.LecturerID
Lecturer)
(c) (s
(C.Credit = 3)
(Course)) |X|
C.LecturerID = L.LecturerID
(s
( L.InOrEx = ‘I’ )
(Lecturer))

Query Processing 47
Exercise II
Given the following SQL query, illustrate relational algebra trees based
on heuristical processing strategies.

SELECT S.SFName, S.SLName, L.LFName
FROM Student S, Course C, Lecturer L, TakeCourse T
WHERE C.LecturerID = L.LecturerID AND
C.CourseID = T.CourseID AND
S.StudentID = T.StudentID AND
C.Credit = 3 AND S.DeptID = ‘IT’ AND
L.LDeptID = S.DeptID;
Hint: The heuristics are
(1) Perform selection operations as early as possible
(2) Combine the Cartesian product with a subsequent selection operation whose predicate represents
a join condition into a join operation.
(3) Use associativity of binary operations to rearrange leaf nodes so that the leaf nodes with the most
restrictive selection operations are executed first.
(4) Perform projection as early as possible
(5) Compute common expressions once.

itm661-lecture0VBBBBBBBBBBBBBBM3-part2-2015.pdf

About This Presentation

Slide Content

Tags

Categories

Download

Quick Actions

Statistics

Related Slideshows

itm661-lecture0VBBBBBBBBBBBBBBM3-part2-2015.pdf

About This Presentation

Slide Content

Slide 1

Slide 2

Slide 3

Slide 4

Slide 5

Slide 6

Slide 7

Slide 8

Slide 9

Slide 10

Slide 11

Slide 12

Slide 13

Slide 14

Slide 15

Slide 16

Slide 17

Slide 18

Slide 19

Slide 20

Slide 21

Slide 22

Slide 23

Slide 24

Slide 25

Slide 26

Slide 27

Slide 28

Slide 29

Slide 30

Slide 31

Slide 32

Slide 33

Slide 34

Slide 35

Slide 36

Slide 37

Slide 38

Slide 39

Slide 40

Slide 41

Slide 42

Slide 43

Slide 44

Slide 45

Slide 46

Slide 47

Tags

Categories

Download

Quick Actions

Statistics

Related Slideshows

Pray For The Peace Of Jerusalem and You Will Prosper

Don_t_Waste_Your_Life_God.....powerpoint

VILLASUR_FACTORS_TO_CONSIDER_IN_PLATING_SALAD_10-13.pdf

Fertility awareness methods for women in the society

Chapter 5 Arithmetic Functions Computer Organisation and Architecture

syakira bhasa inggris (1) (1).pptx.......