Advanced DBb Chap1.pptxAdvanced Data base Transaction processing
Antenehsolomon2
9 views
82 slides
Sep 16, 2025
Slide 1 of 82
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
About This Presentation
Advanced Data base chapter 1 object oriented DB
Size: 6.45 MB
Language: en
Added: Sep 16, 2025
Slides: 82 pages
Slide Content
Course Title: Advanced Database Systems Course Number: CoSc2042 By :-Inst Anteneh.S
Organize yourselves into groups of four and submit your database project title by the deadline next week. Once your title is approved, promptly begin working on your project. Ensure that your project includes both comprehensive documentation and full implementation. . Here are some simple project ideas to consider: Library Management System (LMS) Student Information System (SIS) Hospital Management System (Hos/MS) Restaurant Management System (Restaurant MS) Inventory Management System ( InvMS ) Course Registration System (Course Reg Sy ) Hotel Reservation System ( HotResSy ) Real Estate Management System (Real Estate MS) Voting System Project work 30%
CHAPTER 1 CONCEPTS FOR OBJECT-ORIENTED DATABASES 1.1. Overview of Object-Oriented Concepts Object-oriented databases (OODB) integrate object-oriented programming principles with database technology . They provide an approach for managing complex and interrelated data more effectively than traditional relational databases . An Object-Oriented Database (OODB) combines the features of object-oriented programming and database management systems. Instead of organizing data into tables as in relational databases , OODB stores data as objects, similar to object-oriented programming concept s.
Discussion (10min) 1. Give at least 3 OOP language ?& 2. List and Explain OOP principles with Example ?
Example of OOP LANGUAGE
Key Concepts: Objects : The fundamental building blocks in object-oriented systems , representing real-world entities (e.g., a student, an invoice ). Attributes: Properties or characteristics of an object (e.g., a student object may have attributes like name, age, and GPA ). Classes: Templates or blueprints for creating objects . For example, a Car class may define attributes such as make, model, and year, and behaviors such as start() and stop(). Messages and Methods: Objects communicate by sending messages to invoke methods . For instance, a Car object might receive a start() message to trigger its start method .
Classes: Classes: Templates or blueprints for creating objects. For example, a Car class may define attributes such as make, model, and year, and behaviors such as start() and stop (). The collection of objects is said to be a class. Classes don’t consume any space in the memory . Class is also called a template of an object
Objects Objects can correspond to real-world objects or an abstract entity. When class is defined initially, the description is the only object that is defined . Example
More example
Real-Life Example Scenario : Consider a database for a library system. In this system: Books are objects with properties such as Title, Author, and ISBN. Users are objects with properties such as Name, Membership_ID , and BorrowedBooks . Relationships exist between users and books, like borrowing a book.
Attributes: Properties or characteristics of an object (e.g., a student object may have attributes like name, age, and GPA ). Attributes are defined in the class template and represent the state of an object . Objects will have data stored in the attrib utes field. Class attributes belong to the class itself Attributes
Messages and Methods: Objects communicate by sending messages to invoke methods. For instance, a Car object might receive a start() message to trigger its start method. Methods are functions that are defined inside a class that describe the behaviors of an object .
More Example Real-life Example: Imagine a library management system where each book is represented as an object . The book object has attributes like title, author, and ISBN , and methods like borrow() and return(). When a user borrows a book, the borrow() method is invoked.
What are the main principles of OOP? Discuss and Explain with clear Example
Key Concepts of OODB Objects : Objects represent entities and include attributes (data) and methods (behavior). Classes : Objects with similar properties and behaviors are grouped into classes. Inheritance : Classes can inherit properties and methods from parent classes. Encapsulation : Data and methods are bundled together, restricting access to internal details. Polymorphism : Objects can take multiple forms, allowing flexibility in data representation. Relationships : Objects can be linked to one another to form associations or aggregations.
Inheritance Inheritance :- Classes can reuse code from other classes . Relationships and subclasses between objects can be assigned, enabling developers to reuse common logic while still maintaining a unique hierarchy . More precisely, parent classes can extend their attributes and behaviors to child classes , which also means that this principle supports reusability.
Encapsulation:- Encapsulation:- This principle states that all important information is contained inside an object and only select information is exposed. Encapsulation is one of the concepts in OOPs concepts; it is the process that binds together the data and code into a single unit and keeps both from being safe from outside interference and misuse . Data and methods are bundled together, restricting access to internal details
Finally Encapsulation : Encapsulation binds data and operations (methods) together, ensuring that object internals are hidden from external systems. Objects expose only what is necessary through an interface, providing data security and abstraction. Example: In an e-commerce system, a Product object may have a method calculateDiscount (). Users interact with the method, but the logic behind it remains hidden.
Class Book: Attributes: BookID , title, isbn , isBorrowed Methods: borrowBook (): if isBorrowed == false: isBorrowed = true return "Borrowed" else: return "Unavailable" returnBook (): isBorrowed = false borrowBook () ensures only available books are borrowed.
Polymorphism Polymorphism refers to many forms, or it is a process that performs a single action in different ways. It occurs when we have many classes related to each other by inheritance
In the above figure, you can see, Man is only one, but he takes multiple roles like – he is a dad to his child, he is an employee, a salesperson and many more. This is known as Polymorphism .
Polymorphism is the process of using same method name by multiple classes and redefines methods for the derived classes. In the above example, we can see common action sound () but there are different ways to do the same action. This is one of the examples which shows polymorphism
1.2 Object Identity, Object Structure, and Type Constructors Object Identity (OID): 1. Object Identity (OID) Definition : Object Identity (OID) is a unique identifier assigned to each object in an object-oriented database. This identifier is immutable and independent of the object's data , ensuring that even if the object’s attributes change, its identity remains constant. Real-World Example : In a library management system , each book is represented as an object. The OID for the book (e.g., BookID ) uniquely identifies it in the system, regardless of changes to attributes like title or author. For example: OID: B1234 Title: "The Great Gatsby" Author: "F. Scott Fitzgerald" If the title or author is updated, the OID B1234 remains unchanged, ensuring that the book object is still uniquely identifiable.
2. Object Structure Definition : Object structure refers to how an object is composed of attributes (data fields) and relationships with other objects. Objects can also contain nested objects, forming a hierarchical structure. Real-World Example : In the library system , a user object can have: Attributes: Name, MembershipID A nested object: Address (containing Street, City, ZipCode ) Relationships: Borrowed books linked to the user. For example:
3. Type Constructors Definition : Type constructors define how complex types are built from simpler ones. Common type constructors include: Atomic : Basic types like integers, strings. Collection : Sets, lists, arrays. Tuple : Composite types grouping multiple attributes. Reference : Points to another object (e.g., relationships). Real-World Example : In the library system , the following type constructors could be used: Atomic : Title (String), BookID (Integer). Tuple : Address = (Street, City, ZipCode ). Collection : BorrowedBooks = [B1234, B5678]. Reference : Borrowed_By in the Books table refers to the UserID in the Users table.
1.3 Encapsulation of Operations, Methods, and Persistence Persistence: Object persistence allows objects to outlive the application that created them. Persistent objects are stored in the database and can be retrieved later . Example: A customer in a banking system has an account object that is stored in the database. The object retains its state (e.g., balance) even after the application is closed.
Data organization: In a relational database, data is organized and stored in tables , with each table containing rows and columns of data . In an object-oriented database, data is organized and stored as objects, which are self-contained units that contain both data and the operations or methods that can be performed on that data. WHAT IS THE DIFFERENCE BETWEEN A RELATIONAL DATABASE AND OBJECT-ORIENTED DATABASE
Data relationships: In a relational database, data relationships are defined and managed using keys and foreign keys, which link tables and records together . In an object-oriented database, data relationships are defined and managed using inheritance and polymorphism , which allow objects to take on different forms or behaviors depending on the context in which they are used.
Query language: In a relational database, data is queried and manipulated using a structured query language (SQL), which is a standardized and widely used language for managing and querying data . In an object-oriented database, data is queried and manipulated using object-oriented query languages , which are specialized languages that are designed for managing and querying object-oriented data.
Data manipulation: In a relational database, data manipulation is typically performed using SQL queries , which can be used to insert, update, delete, and retrieve data from the database. In an object-oriented database, data manipulation is performed using the operations or methods defined on the objects themselves, which allows for the efficient manipulation of complex data structures and relationships.
SUMMARY Summary
END OF CHAPTER 1
CHAPTER 2 QUERY PROCESSING AND OPTIMIZATION (4 HOURS) What is Query Processing? In a database management system (DBMS), query processing is the series of actions that transform a user's query (usually written in SQL) into a form that the database can understand and execute to retrieve the desired data . What is Query Optimization? Query optimization is the process of selecting the most efficient execution plan for a query from the many possible ways to process it. The goal is to minimize the resources (time, CPU, memory, I/O) required to get the results .
Steps in Query Processing Parsing and Translation: The DBMS checks the query for syntax errors and converts it into an internal representation (like a relational algebra expression or a query tree ). Optimization: The query optimizer analyzes the different ways to execute the query and selects the most efficient one . It considers factors like indexes, data distribution, and the cost of various operations . Evaluation: The DBMS executes the chosen plan, retrieving the data from the database and returning the results to the user.
Example Let's say we have a database with two tables: Customers and Orders. Customers Table
Query Processing Steps: Parsing and Translation: The DBMS checks the query's syntax and converts it into an internal representation . 2. Optimization : The query optimizer considers different execution plans: The optimizer estimates the cost of each plan (based on factors like the size of the tables and the presence of indexes ) and chooses the most efficient one ( likely Plan 2 in this case).
3. Evaluation : The DBMS executes the chosen plan, retrieving the data and returning the result : Why is Query Optimization Important? Performance: Efficient query execution is crucial for fast response times, especially in large databases. Resource Usage : Optimization minimizes the consumption of CPU, memory, and I/O resources, making the database system more efficient overall.
2.1. Translating SQL Queries into Relational Algebra What is Relational Algebra ?(recall From FDB Ch6) Relational algebra is a procedural query language . This means that it describes how to get the data you want , step by step, unlike SQL which is declarative ( you describe what you want). Relational algebra operations take relations (tables) as inpu t and produce relations as output .
Key Relational Algebra Operations (with SQL equivalents): Selection (σ ): Filters rows based on a condition. (SQL: WHERE ) Projection (π): Selects specific columns . (SQL : SELECT) Union (∪): Combines two relations with the same schema (columns). ( SQL: UNION) Intersection (∩): Returns rows present in both relations. (SQL: INTERSECT ) Difference (-): Returns rows present in the first relation but not the second . (SQL : EXCEPT or MINUS ) Cartesian Product (×): Combines every row of the first relation with every row of the second . (SQL: Implicit in joins without a WHERE clause) Rename (ρ): Renames a relation or attribute. (SQL: AS) Join (⋈): Combines rows from two relations based on a related column. (SQL: JOIN, INNER JOIN, LEFT JOIN, RIGHT JOIN, etc.)
Example 1: Simple Selection and Projection
Example 2: Join
Example 3: Selection, Projection, and Join Relational Algebra: π c.Name , o.Product ( σ c.City = 'New York' AND o.Product = 'Laptop' (Customers ⋈ c.CustomerID = o.CustomerID Orders))
The SQL parser translates the SQL query into a relational algebra expression. This expression represents the steps needed to retrieve the data. The query optimizer then analyzes this relational algebra expression to find the most efficient way to execute it. It might rewrite the expression into an equivalent one that is faster to evaluate. Key Points: Relational algebra is procedural, SQL is declarative. The translation process is crucial for query optimization. Understanding relational algebra helps in understanding how database systems work internally .
2.2 Basic Algorithms for Executing Query Operations Query execution relies on algorithms for operations like selection, join, and projection. Key Algorithms: Selection: Linear Search: Scan every record (costly for large datasets). Index Search: Use indexes for faster retrieval. Join: Nested Loop Join: Compare each record of one table with all records of another (inefficient for large tables). Sort-Merge Join: Sort both tables and merge them. Hash Join: Use hash tables to match records. Projection: Remove unwanted columns and duplicate rows.
More Example : σ(c>3)R Write(draw) the output ? For the above relation, σ(c>3)R will select the tuples which have c more than 3.
1. Write selection fetch data for students with age more than 17. 2. Write selection that return tuples(rows) from table Student with information of male students, of age more than 17.(Consider the Student table has an attribute Gender too.) 2. σ age > 17 and gender = 'Male' (Student) 1 . σ age > 17 (Student) 3. σsubject = “information” and cost = “150” or year > “2015”(Novels) The output would be − Selecting tuples from the novels wherever the subject happens to be ‘information’ and the ‘price’ is 150 or those novels have been published after 2015.
What are Heuristics? Heuristics are rules of thumb or shortcuts that help in problem-solving. They don't guarantee the absolute best solution , but they often provide a good solution quickly . In query optimization, heuristics are used to guide the search for an efficient execution plan . Finding the absolute best plan is often computationally very expensive , so heuristics help find a good enough plan in a reasonable time. 2.3 Using Heuristics in Query Optimization
Example: Joining Two Tables imagine we have two tables: Customers and Orders . Customers: ( CustomerID , Name, City) Orders: ( OrderID , CustomerID , OrderDate , Amount) We want to find all customers in London who have placed orders over $1000 . The SQL query might look like this:
Query Optimizer's Choices: The query optimizer has several choices about how to execute this query . Here are a couple: Nested Loop Join: For each row in the Customers table, it loops through all rows in the Orders table to find matching CustomerIDs . Hash Join: It creates a hash table on the CustomerID column of one table (e.g., Orders) and then probes this hash table with the CustomerID values from the other table (Customers).
Heuristics in Action: Here are some examples of heuristics the query optimizer might use: Small Table First: If one table is significantly smaller than the other, the optimizer might choose to iterate over the smaller table in a nested loop join. This reduces the number of outer loop iterations . In our example, if Customers is smaller than Orders, it's often more efficient to loop through Customers . Use Indexes: If there's an index on CustomerID in the Orders table, the optimizer will likely use it for the join. Indexes are like the index of a book; they speed up lookups. Similarly, an index on City in the Customers table would speed up the WHERE c.City = 'London' filter . Filter Early: It's generally more efficient to apply filters (the WHERE clause) as early as possible. In our example, filtering the Customers table for City = 'London' before the join would reduce the number of rows involved in the join operation. Choose Appropriate Join Algorithm: The optimizer might choose a hash join if the tables are large enough and there's enough memory available. For smaller tables, a nested loop join might be faster due to lower overhead.
Common Heuristics: Push Selections Down: Apply selection operations as early as possible to reduce the dataset size. Combine Operations: Merge consecutive projections or selections. Reorder Joins: Arrange join operations to minimize intermediate results. SELECT name FROM Student WHERE age > 20 AND department = 'CS'; Optimized Plan: 1. Apply department = 'CS' first (fewer results). 2. Then apply age > 20. 3. Finally, project the name.
SQL queries are translated into relational algebra expressions to guide query execution. Practical SQL Command: sql – Create the Student table CREATE TABLE Student ( id INT PRIMARY KEY, name VARCHAR(50), age INT, department VARCHAR(50 )); Insert sample data INSERT INTO Student (id, name, age, department) VALUES (1, 'Alice', 22, 'CS'),(2, 'Bob', 19, 'Math'),(3, 'Charlie', 25, 'CS'),(4, 'Diana', 20, 'Biology'); Query : Retrieve names of students older than 20 SELECT name FROM Student WHERE age > 20; Relational Algebra: π_ name( σ_ age>20(Student))
Exercise: 15Min 1. Write SQL queries to: Retrieve all students from the CS department. Find students older than 21 and project their names and departments. 2. Convert the queries into relational algebra expressions.
Possible answer
2.4. Using Selectivity and Cost Estimates in Query Optimization Query optimization aims to find the most efficient way to execute a SQL query. Two key factors in this process are selectivity and cost estimation . 1. Selectivity Definition: Selectivity refers to the fraction of tuples in a relation that satisfy a given condition . It's a value between 0 and 1 . A selectivity of 0 means no tuples satisfy the condition, while a selectivity of 1 means all tuples satisfy it . Importance: Selectivity helps estimate the size of intermediate results produced by query operations ( selections, joins, etc .). This size estimation is crucial for cost estimation.
Example: Consider a customer table with 10,000 tuples( a record or row) . If the condition city = 'London' is true for 1,000 customers , the selectivity of this condition is 1,000 / 10,000 = 0.1 . How it's Used: The optimizer uses selectivity to estimate how many tuples will remain after a selection operation . For example, if we have SELECT * FROM customer WHERE city = 'London', the optimizer expects the result to have approximately 10,000 * 0.1 = 1,000 tuples.
2 . Cost Estimation Definition: Cost estimation involves predicting the resources (e.g., I/O operations, CPU time, memory) required to execute a query using a particular plan. Importance: The optimizer compares the estimated costs of different execution plans and chooses the plan with the lowest cost. Factors Affecting Cost: I/O operations: Reading data from disk is a major cost factor. CPU time: Processing tuples , performing comparisons, etc. Memory: Buffering data in memory. Network communication (for distributed databases): Transferring data between nodes .
Statistics: The DBMS maintains statistics about the data to estimate costs. These statistics include: Number of tuples in each relation. Distribution of attribute values (e.g., histograms ) . Existence and type of indexes . Cost Models: DBMSs use cost models to calculate the cost of different operations based on these statistics . The models are often simplified to make the optimization process efficient.
3. Example: Putting it Together Let's say we have two tables: customer (10,000 tuples) and orders (50,000 tuples). We want to execute the following query: The optimizer might consider these plans: Plan 1: Nested Loop Join: Iterate through each customer, and for each customer, scan the orders table to find matching orders. Plan 2: Index Join: If there's an index on customer.customer_id and orders.customer_id , use the indexes to speed up the join . Plan 3: Hash Join: Build a hash table on one table and probe it with the other.
To choose the best plan, the optimizer needs to estimate the cost of each plan. This involves: Estimating the size of intermediate results: Selectivity of c.city = 'London': Let's say it's 0.1 (1,000 customers in London). Selectivity of o.order_date = '2023-10-26': Let's say it's 0.01 (500 orders on that date) . Using these selectivities to estimate the number of tuples after the selections: Approximately 1,000 customers after the city = 'London' selection. Approximately 500 orders after the order_date = '2023-10-26' selection. Estimating the cost of each join algorithm based on the estimated sizes of the inputs. For example, the nested loop join cost is highly dependent on the size of the outer loop (which is affected by the selectivity of the selection on customer). Index joins and hash joins have different cost characteristics.
Exercises with Answers: Exercise: A table product has 100,000 tuples . The condition category = 'Electronics' has a selectivity of 0.05 . How many tuples are expected to satisfy this condition? Answer: 100,000 * 0.05 = 5,000 tuples Exercise: Two tables, authors (1,000 tuples) and books (5,000 tuples), are joined on author_id . The join condition is authors.author_id = books.author_id . Assume each author has written, on average, 5 books. What's a reasonable estimate for the size of the joined table ? Answer: Since each author has 5 books, the joined table would likely have around 5 * 1,000 = 5,000 tuples (one tuple per book).
2.5. Semantic Query Optimization Semantic query optimization (SQO) is a technique that uses knowledge about the database schema , integrity constraints, and other semantic information to transform or simplify queries, leading to more efficient execution. Unlike traditional query optimization, which focuses on the procedural aspects of query execution, SQO works at the logical level, leveraging the meaning of the data and the relationships between data elements. Key Concepts: Integrity Constraints: Rules that define valid data in the database. Examples include: Domain constraints: Restricting the values an attribute can hold (e.g., age must be non-negative). Referential integrity constraints: Ensuring relationships between tables are consistent (e.g., a foreign key must reference a valid primary key). Check constraints: User-defined rules (e.g., salary must be less than 100,000).
Semantic Information: Knowledge about the data and its meaning, which may not be explicitly captured by constraints. Query Transformation: Rewriting a query into an equivalent form that is more efficient to execute . How SQO Works: SQO uses logical reasoning and inference based on the available semantic information to : Simplify Queries: Remove redundant or unnecessary conditions. Transform Queries: Replace a query with an equivalent one that is faster to execute. Generate More Efficient Execution Plans : Guide the query optimizer towards better plans .
Examples: Simplifying a Query using a Domain Constraint: Schema: employee ( emp_id INT, age INT, salary DECIMAL) Domain Constraint: age >= 0 Query: SELECT * FROM employee WHERE age > -1; SQO: The optimizer recognizes that age > -1 is always true due to the domain constraint. The query can be simplified to SELECT * FROM employee; Using Referential Integrity: Schema: customer ( cust_id INT PRIMARY KEY, name VARCHAR), orders ( order_id INT PRIMARY KEY, cust_id INT FOREIGN KEY referencing customer) Query: SELECT c.name FROM customer c WHERE c.cust_id IN (SELECT o.cust_id FROM orders o); SQO: The optimizer knows that any cust_id found in the orders table must also exist in the customer table due to referential integrity. The subquery can be removed: SELECT c.name FROM customer c; Using Check Constraints: Schema: product ( prod_id INT, price DECIMAL, discount DECIMAL) Check Constraint: price >= 0 AND discount >= 0 AND discount <= price Query: SELECT * FROM product WHERE price < 0 OR discount > price; SQO: The query condition will never be true due to the check constraint. The query can be replaced with an empty result set. Combining Conditions: Schema: student ( stud_id INT, major VARCHAR, gpa DECIMAL) Query: SELECT * FROM student WHERE major = 'Computer Science' AND major = 'Engineering'; SQO: The query condition is logically impossible. The optimizer can return an empty result set without accessing the table.
Benefits of SQO: Improved Query Performance: By simplifying and transforming queries, SQO reduces the amount of work the DBMS needs to do. Reduced I/O: Fewer tuples need to be accessed and processed. Faster Response Times: Queries execute more quickly. Challenges of SQO: Complexity: Implementing SQO can be complex, as it requires reasoning about the semantics of the data and the relationships between data elements. Maintaining Semantic Information: The DBMS needs to maintain accurate and up-to-date semantic information (constraints, etc.) for SQO to be effective.
1. Create Db called AdvCs 3rd
SELECT c.CustomerName , o.OrderID , o.OrderDate FROM Orders o LEFT JOIN Customers c ON o.CustomerID = c.CustomerID ; -- Equivalent to RIGHT JOIN in many databases if RIGHT JOIN is not available .