itm661-lecture0VBBBBBBBBBBBBBBM3-part2-2015.pdf

beshahashenafe20 20 views 47 slides May 08, 2024
Slide 1
Slide 1 of 47
Slide 1
1
Slide 2
2
Slide 3
3
Slide 4
4
Slide 5
5
Slide 6
6
Slide 7
7
Slide 8
8
Slide 9
9
Slide 10
10
Slide 11
11
Slide 12
12
Slide 13
13
Slide 14
14
Slide 15
15
Slide 16
16
Slide 17
17
Slide 18
18
Slide 19
19
Slide 20
20
Slide 21
21
Slide 22
22
Slide 23
23
Slide 24
24
Slide 25
25
Slide 26
26
Slide 27
27
Slide 28
28
Slide 29
29
Slide 30
30
Slide 31
31
Slide 32
32
Slide 33
33
Slide 34
34
Slide 35
35
Slide 36
36
Slide 37
37
Slide 38
38
Slide 39
39
Slide 40
40
Slide 41
41
Slide 42
42
Slide 43
43
Slide 44
44
Slide 45
45
Slide 46
46
Slide 47
47

About This Presentation

HUSADDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDD


Slide Content

Lecture 3/2 Query Processing
•T. Connolly, and C. Begg, “Database Systems: A Practical Approach to Design, Implementation, and Management”, 5th edition,
Addison-Wesley, 2009. ISBN: 0-321-60110-6, ISBN-13: 978-0-321-60110-0 (International Edition).
•T. Connolly, and C. Begg, “Database Systems: A Practical Approach to Design, Implementation, and Management”, 4th edition,
Addison-Wesley, 2004. ISBN: 0-321-21025-5.
•R. Elmasri and S. B. Navathe, “Fundamentals of Database Systems”, 5th ed., Pearson, 2007, ISBN: 0-321-41506-X.
ITM661 – Database Systems

Query Processing 2
Objectives
Objectives of query processing and optimization.
Static versus dynamic query optimization.
How a query is decomposed and semantically analyzed.
How to create a R.A.T. to represent a query.
Rules of equivalence for RA operations.
How to apply heuristic transformation rules to improve
efficiency of a query.
How pipelining can be used to improve efficiency of
queries.
Difference between materialization and pipelining.
Advantages of left-deep trees.
(R.A.T = Relational Algebra Tree)

Query Processing 3
An Example (Branch and Staff Relations)
•A relation is a table with
columns and rows.
Only applies to logical structure of
the database, not the physical
structure.
•An attribute is a named
column of a relation.
•A domain is a set of
allowable values for one or
more attributes.
•A tuple is a row of a
relation.
•A degree is a number of
attributes in a relation.
•A cardinality is a number of
tuples in a relation.
•A relational database is a
collection of normalized
relations.

Query Processing 4
Relational Algebra and Calculus
Relational algebra and relational calculus are formal
languages associated with the relational model.
Informally,
Relational algebra is a (high-level) procedural
language and
Relational calculus a non-procedural language.
However, formally both are equivalent to one another.
A language that produces a relation that can be derived
using relational calculus is relationally complete.

Query Processing 5
Relational Algebra
There are 5 basic operations, in relational algebra, that performs
most of the data retrieval operations needed.
Selection
Projection
Cartesian Product
Union
Set Difference
Also operations that can be expressed by 5 basic operations.
Join
Intersection
Division

Query Processing 6
An Example (Home Rental Database)

Query Processing 7
An Example (Home Rental Database)

Query Processing 8
An Example (Home Rental Database)

Query Processing 9
Query Processing
Activities involved in retrieving data from the
database.
Aims of QP:
transform query written in high-level language (e.g.
SQL), into correct and efficient execution strategy
expressed in low-level language (implementing
Relational Algebra);
execute the strategy to retrieve required data.

Query Processing 10
Query Optimization
Activity of choosing an efficient execution
strategy for processing query.
As there are many equivalent transformations of
same high-level query, aim of QO is to choose
one that minimizes resource usage.
Generally, reduce total execution time of query.
May also reduce response time of query.
Problem computationally intractable with large
number of relations, so strategy adopted is
reduced to finding near optimum solution.

Query Processing 11
Different Strategies
Find all Managers that work at a London branch.
SELECT *
FROM Staff s, Branch b
WHERE s.branchNo = b.branchNo AND
(s.position = ‘Manager’ AND b.city = ‘London’);
Three equivalent RA queries are:
(1) s
(position='Manager’) (city='London')  (staff.bno=branch.bno) (Staff  Branch)
(2) s
(position='Manager')  (city='London') ( Staff |><|
staff.bno=branch.bno Branch)
(3) (s
position='Manager’(Staff)) |><|
staff.bno=branch.bno (s
city='London' (Branch))

(1) s
(position='Manager’) (city='London')  (staff.bno=branch.bno) (Staff  Branch)
(2) s
(position='Manager')  (city='London') ( Staff
staff.bno=branch.bno Branch)
(3) (s
position='Manager’(Staff))
staff.bno=branch.bno (s
city='London' (Branch))
Staff Branch

s s
(position='Manager’) (city='London')  (staff.bno=branch.bno)
1000 50
50000
epsilon
12 Query Processing

(1) s
(position='Manager’) (city='London')  (staff.bno=branch.bno) (Staff  Branch)
(2) s
(position='Manager')  (city='London') ( Staff
staff.bno=branch.bno Branch)
(3) (s
position='Manager’(Staff))
staff.bno=branch.bno (s
city='London' (Branch))
Staff Branch
s s
(position='Manager’) (city='London')
1000 50
epsilon
1000
13 Query Processing

(1) s
(position='Manager’) (city='London')  (staff.bno=branch.bno) (Staff  Branch)
(2) s
(position='Manager')  (city='London') ( Staff
staff.bno=branch.bno Branch)
(3) (s
position='Manager’(Staff))
staff.bno=branch.bno (s
city='London' (Branch))
Staff Branch
s
1000 50
epsilon
1000
s
50
position='Manager’ city='London'
50 5
14 Query Processing

Query Processing 15
Cost Comparison in Different Strategies
Assume:
1000 tuples in Staff; 50 tuples in Branch;
50 Managers; 5 London branches;
No indexes or sort keys;
Results of any intermediate operations stored on disk;
Cost of the final write is ignored;
Tuples are accessed one at a time.
(1) s
(position='Manager’) (city='London')  (staff.bno=branch.bno) (Staff  Branch)
(2) s
(position='Manager')  (city='London') ( Staff |><|
staff.bno=branch.bno Branch)
(3) (s
position='Manager’(Staff)) |><|
staff.bno=branch.bno (s
city='London' (Branch))

(1) (1000 + 50) + 2*(1000 * 50) = 101,050
(2) 2*1000 + (1000 + 50) = 3,050
(3) 1000 + 2*50 + 50 + 2*(5) = 1,160
Cartesian product and
join operations are
much more expensive
than selection

Query Processing 16
4 Phases of Query Processing
(1) decomposition
(parsing and
validation)
(2) optimization
(3) code generation
(4) execution.

Query Processing 17
Dynamic versus Static Optimization
Two choices when first three phases of QP can be
carried out:
1.dynamically every time query is run.
Advantages if dynamic QO arise from fact that
information is up-to-date.
Disadvantages are that performance of query is
affected, time may limit finding optimum strategy.
2.Statically when query is first submitted.
Advantages of static QO are removal of runtime
overhead  more time to find optimum strategy.
Disadvantages arise from fact that chosen execution
strategy may no longer be optimal when query is
run.
Could use a hybrid approach to overcome this.

Query Processing 18
Query Decomposition
Aims are to transform high-level query into
RA query and check that query is syntactically
and semantically correct.
Typical stages are:
analysis
normalization
semantic analysis
simplification
query restructuring

Query Processing 19
Analysis (I)
Analyze query lexically and syntactically using compiler
techniques.
Verify relations and attributes exist.
Verify operations are appropriate for object type.
EX: SELECT staff_no
FROM Staff
WHERE position > 10;

This query would be rejected on two grounds:
Staff_no is not defined for Staff relation (should be
StaffNo).
Comparison ‘>10’ is incompatible with type ‘position’,
which is variable character string.

Query Processing 20
Analysis (II)
Finally, query transformed into some internal
representation more suitable for processing.
Some kind of query tree is typically chosen,
constructed as follows:
Leaf node created for each base relation.
Non-leaf node created for each intermediate
relation produced by RA operation.
Root of tree represents query result.
Sequence is directed from leaves to root.

Query Processing 21
Relational Algebra Tree

Query Processing 22
Normalization
Converts query into a normalized form for easier
manipulation.
Predicate can be converted into one of two forms:
Conjunctive normal form:

(position = 'Manager'  salary > 20000)  (bno = 'B3')

Disjunctive normal form:

(position = 'Manager'  bno = 'B3' ) 
(salary > 20000  bno = 'B3')

Query Processing 23
Semantic Analysis
Rejects normalized queries that are incorrectly
formulated or contradictory.
Query is incorrectly formulated if components do not
contribute to generation of result.
Query is contradictory if its predicate cannot be
satisfied by any tuple.
Algorithms to determine correctness exist only for
queries that do not contain disjunction and negation.

Query Processing 24
24
Semantic Analysis
For these queries, could construct:
A relation connection graph.
Normalized attribute connection graph.
Relation connection graph
Create node for each relation and node for result.
Create edges between two nodes that represent a
join, and edges between nodes that represent
projection.
If not connected, query is incorrectly formulated.

Query Processing 25
25
Normalized Attribute Connection Graph
Create node for each reference to an attribute, or
constant 0.
Create directed edge between nodes that represent a
join, and directed edge between attribute node and 0
node that represents selection.
Weight edges a  b with value c, if it represents
inequality condition (a  b + c); weight edges 0  a
with -c, if it represents inequality condition (a  c).
If graph has cycle for which valuation sum is
negative, query is contradictory.

Query Processing 26
Checking Semantic Correctness
SELECT p.propertyNo, p.street
FROM Client c, Viewing v, PropertyForRent p
WHERE c.clientNo = v.clientNo AND
c.maxRent >= 500 AND
c.prefType = 'Flat' AND p.ownerNo = 'CO93';
Relation connection graph not fully connected, so query is not
correctly formulated. Omiting the join (v.propertyNo =
p.propertyNo).
SELECT p.propertyNo, p.street
FROM Client c, Viewing v, PropertyForRent p
WHERE c.maxRent > 500 AND
c.clientNo = v.clientNo AND
v.propertyNo = p.propertyNo AND
c.prefType = 'Flat' AND c.maxRent < 200;
Normalized attribute connection graph has cycle between
nodes R.maxRent and 0 with negative valuation sum, so query
is contradictory

Query Processing 27
Checking Semantic Correctness
Relation
Connection graph
Normalized
attribute
connection graph

Query Processing 28
Simplification
Simplification strategy:
Detects redundant qualifications,
Eliminates common sub-expressions,
Transforms query to semantically equivalent but more
easily and efficiently computed form.
Typically, access restrictions, view definitions, and integrity
constraints are considered.
Assuming user has appropriate access privileges, first apply
well-known idempotency rules of Boolean algebra.
Ex.: p  p equivalent to p
p  true equivalent to true
Next step is Query restructuring

Query Processing 29
Transformation Rules for RA Operation (I)
Conjunctive selection operations can cascade into individual
selection operations (vice versa).
s
p  q  r(R) = s
p(s
q(s
r(R)))
Sometimes referred to as cascade of selection.
s
branchNo='B003‘  salary>15000(Staff) = s
branchNo='B003'(s
salary>15000(Staff))

Commutativity of selection.

s
p(s
q(R)) = s
q(s
p(R))

For example:
s
branchNo='B003'(s
salary>15000(Staff)) = s
salary>15000(s
branchNo='B003'(Staff))

Query Processing 30
Transformation Rules for RA Operation (II)
In a sequence of projection operations, only the last in the
sequence is required.

P
LP
M … P
N(R) = P
L (R)

For example:

P
lNameP
lName, branchNo(Staff) = P
lName (Staff)

Commutativity of selection and projection.

P
Ai, …, Am(s
p(R)) = s
p(P
Ai, …, Am(R)) where p {A
1, A
2, …, A
m}
For example:
P
fName,lName(s
lName='Beech'(Staff)) = s
lName='Beech'(P
fName,lName(Staff))

Query Processing 31
Transformation Rules for RA Operation (III)
Commutativity of q-join (and Cartesian product).
R
p S = S
p R
R  S = S  R
Rule also applies to equi-join and natural join. For example:
Staff
staff.branchNo=branch.branchNo Branch =
Branch
staff.branchNo=branch.branchNo Staff

Query Processing 32
Transformation Rules for RA Operation (IV)
Commutativity of selection and theta-join (or Cartesian
product).
If selection predicate involves only attributes of one of join
relations, selection and join (or Cartesian product) operations
commute:
s
p(R
r S) = (s
p(R))
r S
s
p(R  S) = (s
p(R))  S
where p is a constraint on attributes of R

Query Processing 33
Transformation Rules for RA Operation (V)
If selection predicate is conjunctive predicate having form (p
ู q), where p only involves attributes of R, and q only
attributes of S, selection and theta-join operations commute as:

s
p  q(R
r S) = (s
p(R))
r (s
q(S))
s
p  q(R  S) = (s
p(R))  (s
q(S))
For example:

s
position='Manager’  city='London'(Staff
staff.branchNo=branch.branchNo Branch) =

(s
position='Manager'(Staff))
staff.branchNo=branch.branchNo (s
city='London' (Branch))

Query Processing 34
Transformation Rules for RA Operation (VI)
Commutativity of projection and theta-join
(or Cartesian product).
If projection list is of form L = L
1  L
2, where L
1 only
involves attributes of R, and L
2 only involves attributes of S,
provided join condition only contains attributes of L,
projection and theta-join operations commute as:

P
L1 L2(R
r S) = (P
L1(R))
r (P
L2(S))

Query Processing 35
Transformation Rules for RA Operation (VII)
If join condition contains additional attributes not in L, say
attributes M = M
1  M
2 where M
1 only involves attributes of
R, and M
2 only involves attributes of S, a final projection op.
is required:

P
L1  L2(R
r S) = P
L1  L2( (P
L1  M1(R))
r (P
L2  M2(S)))
For example:

P
position, city, branchNo(Staff
staff.branchNo=branch.branchNo Branch) =
(P
position, branchNo(Staff))
staff.branchNo=branch.branchNo (P
city, branchNo (Branch))

and using the latter rule:
P
position, city(Staff
staff.branchNo=branch.branchNo Branch) =
P
position, city ((P
position, branchNo(Staff))
staff.branchNo=branch.branchNo ( P
city, branchNo (Branch)))

Query Processing 36
Transformation Rules for RA Operation (VIII)
Commutativity of union and intersection (but not set
difference).

R  S = S  R
R  S = S  R
Commutativity of selection and set operations (union,
intersection, and set difference).

s
p(R  S) = s
p(S)  s
p(R)
s
p(R  S) = s
p(S)  s
p(R)
s
p(R - S) = s
p(S) - s
p(R)
Commutativity of projection and union.

P
L(R  S) = P
L(S)  P
L(R)

Associativity of union and intersection (but not set difference).

(R  S)  T = S  (R  T)
(R  S)  T = S  (R  T)

Query Processing 37
Transformation Rules for RA Operation (IX)
Associativity of q-join (and Cartesian product).
Cartesian product and natural join are always associative:
(R S) T = R (S T)
(R  S)  T = R  (S  T)
If join condition q involves attributes only from S and T, then
theta-join is associative as follows:
(R
p S)
q  r T = R
p  r (S
q T)
For example:
((Staff
Staff.staffNo=PropertyForRent.staffNo PropertyForRent)

PropertyForRent.ownerNo=owner.ownerNo  staff.lName=owner.lName Owner)

=

((Staff
Staff.staffNo=PropertyForRent.staffNo  Staff.lName=Owner.lName (PropertyForRent
PropertyForRent.ownerNo = Owner.ownerNo Owner)

Query Processing 38
Use of Transformation Rules (I)
For prospective renters of flats, find properties that match
requirements and owned by CO93.

SELECT p.propertyNo, p.street
FROM Client c, Viewing v, PropertyForRent p
WHERE c.prefType = ‘Flat’ AND
c.clientNo = v.clientNo AND
v.propertyNo = p.propertyNo AND
c.maxRent >= p.rent AND
c.prefType = p.type AND
p.ownerNo = ‘CO93’;

Query Processing 39
Use of Transformation Rules (II)
SELECT p.propertyNo, p.street
FROM Client c, Viewing v, PropertyForRent p
WHERE c.prefType = ‘Flat’ AND
c.clientNo = v.clientNo AND
v.propertyNo = p.propertyNo AND
c.maxRent >= p.rent AND
c.prefType = p.type AND p.ownerNo = ‘CO93’;

Query Processing 40
Use of Transformation Rules (III)
SELECT p.propertyNo, p.street
FROM Client c, Viewing v, PropertyForRent p
WHERE c.prefType = ‘Flat’ AND
c.clientNo = v.clientNo AND
v.propertyNo = p.propertyNo AND
c.maxRent >= p.rent AND
c.prefType = p.type AND p.ownerNo = ‘CO93’;

Query Processing 41
Use of Transformation Rules (IV)
SELECT p.propertyNo, p.street
FROM Client c, Viewing v, PropertyForRent p
WHERE c.prefType = ‘Flat’ AND
c.clientNo = v.clientNo AND
v.propertyNo = p.propertyNo AND
c.maxRent >= p.rent AND
c.prefType = p.type AND p.ownerNo = ‘CO93’;

Query Processing 42
Heuristical Processing Strategies
Perform selection operations as early as possible.
Keep predicates on same relation together.
Combine Cartesian product with subsequent selection whose
predicate represents join condition into a join operation.
Use associativity of binary operations to rearrange leaf nodes
so leaf nodes with most restrictive selection operations
executed first.
Perform projection as early as possible.
Keep projection attributes on same relation together.
Compute common expressions once.
If common expression appears more than once, and result
not too large, store result and reuse it when required.
It is useful when querying views, as same expression is used
to construct view each time.

Query Processing 43
Pipelining
Materialization - output of one operation is stored in
temporary relation for processing by next.
Could also pipeline results of one operation to
another without creating temporary relation.
Known as pipelining or on-the-fly processing.
Pipelining can save on cost of creating temporary
relations and reading results back in again.
Generally, pipeline is implemented as separate
process or thread.

Query Processing 44
Types of Trees

Query Processing 45
Pipelining
With linear trees, relation on one side of each
operator is always a base relation.
However, as need to examine entire inner
relation for each tuple of outer relation, inner
relations must always be materialized.
This makes left-deep trees appealing as inner
relations are always base relations.
Reduces search space for optimum strategy,
and allows QO to use dynamic processing.
Not all execution strategies are considered.

Query Processing 46
Exercise I
Assume that there are 1000 tuples in Student, 40 tuples in
Course, 5000 tuples in TakeCourse, 40 tuples in Lecturer,
25 internal lecturers, 30 three-credit courses, 8 IT lecturers
and 200 IT students.
Assume that the following SQL statement is executed
SELECT *
FROM Lecturer L, Course C
WHERE C.LecturerID = L.LecturerID AND
L.InOrEx = ‘I’ AND C.Credit = 3
Compare the cost of disk accesses of three following equivalent relational
algebra queries corresponding to the above SQL statement.
(a) s
(C.LecturerID = L.LecturerID)  (C.Credit = 3)  ( L.InOrEx = ‘I’ )
(Course X Lecturer)
(b) s
(C.Credit = 3)

 ( L.InOrEx = ‘I’ )
( Course |X|
C.LecturerID = L.LecturerID
Lecturer)
(c) (s
(C.Credit = 3)
(Course)) |X|
C.LecturerID = L.LecturerID
(s
( L.InOrEx = ‘I’ )
(Lecturer))

Query Processing 47
Exercise II
Given the following SQL query, illustrate relational algebra trees based
on heuristical processing strategies.

SELECT S.SFName, S.SLName, L.LFName
FROM Student S, Course C, Lecturer L, TakeCourse T
WHERE C.LecturerID = L.LecturerID AND
C.CourseID = T.CourseID AND
S.StudentID = T.StudentID AND
C.Credit = 3 AND S.DeptID = ‘IT’ AND
L.LDeptID = S.DeptID;
Hint: The heuristics are
(1) Perform selection operations as early as possible
(2) Combine the Cartesian product with a subsequent selection operation whose predicate represents
a join condition into a join operation.
(3) Use associativity of binary operations to rearrange leaf nodes so that the leaf nodes with the most
restrictive selection operations are executed first.
(4) Perform projection as early as possible
(5) Compute common expressions once.