Deductive databases

7,492 views 30 slides May 15, 2018
Slide 1
Slide 1 of 30
Slide 1
1
Slide 2
2
Slide 3
3
Slide 4
4
Slide 5
5
Slide 6
6
Slide 7
7
Slide 8
8
Slide 9
9
Slide 10
10
Slide 11
11
Slide 12
12
Slide 13
13
Slide 14
14
Slide 15
15
Slide 16
16
Slide 17
17
Slide 18
18
Slide 19
19
Slide 20
20
Slide 21
21
Slide 22
22
Slide 23
23
Slide 24
24
Slide 25
25
Slide 26
26
Slide 27
27
Slide 28
28
Slide 29
29
Slide 30
30

About This Presentation

this is the content for advance database management system.


Slide Content

Deductive Databases ADBMS By: Dabbal S. Mahara 2017

Deduction and Induction Deductive: towards the consequences All swans are white. Tessa is a swan.  Tessa is white. Inductive : towards a generalisation of observations Joe and Lisa and Tex and Wili and ... (all observed swans) are swans. Joe and Lisa and Tex and Wili and ... (all observed swans) are white.  All swans are white. 2

Deduction Human beings have the interesting ability to derive facts from a set of data, even though these facts are not explicitly represented in the data. That is, given appropriate information, humans can deduce new information by applying rules . For example, given a list of people and their parents, we could deduce their grandparents, great-grandparents, and so on, despite the fact that these facts were not explicitly represented in the original data. What we have done is to use a rule: “the parent of a parent is a grandparent (and so on ).” 3

Conventional Databases In very general terms, a conventional database consists of a collection of facts. Using some form of query language, these facts may be accessed and manipulated as required. However , facts must be explicitly stored in the database for them to be of any use. If a fact is not explicitly represented in the database, then as far as the database is concerned, that fact effectively does not exist. 4

Conventional Database To clarify this idea of implicit data representation, consider the following example. Suppose we have a conventional database that contains the two facts: “ John is Bill’s father .” “Bill is Derek’s father.” Now most people will quickly realise that John is also Derek’s grandfather (assuming, of course, that the two Bills are the same person). In other words, this fact is implicit in the two existing facts. However, the fact “John is Derek’s grandfather” is not stored in the database, so as far as the database is concerned, there is no relationship whatsoever between John and Derek. 5

Deductive Database A deductive database contains not only facts, but also general rules. These rules can be used to deduce new facts that are not explicitly represented in the database; that is, data can be stored implicitly . Now consider the following deductive database: “John is Bill’s father.” “Bill is Derek’s father.” “IF x is y’s father AND y is z’s father, THEN x is z’s grandfather.” In addition to the two facts, this database also contains a rule that specifies the grandfather relationship between a pair of entities. The database can prove the statement “John is Derek’s grandfather” by making the following substitution into the rule: x = “John”, y = “Bill” and z = “Derek”. 6

Deductive Database The effect of rules of this form is to define new relations that are not explicitly represented in the database. These are known as implicit or virtual relations. Explicit relations, like the father relation above, are known as base relations. The set of virtual relations is called the intensional database ( IDB ), while the set of base relations is called the extensional database ( EDB ). Deductive database systems are based on first-order logic. First-order logic allows us to express both facts and rules about the facts using the same syntax. 7

First Order Logic The first-order predicate calculus is one particular language for expressing logical statements. For example, the representation of the database above in first-order logic would be : → father(John, Bill) → father(Bill, Derek) ( ∀x ∀y ∀z)(father(x, y) ∧ father(y, z) → grandfather(x, z)) where ∧, → and ∀ represent conjunction, implication and the quantifier “for all” respectively. 8

Deductive Database A deductive database can be defined as an advanced database augmented with an inference system. Deductive databases have grown out of the desire to combine logic programming with relational databases to construct systems that support a powerful formalism. The immediate advantage of this is that it can potentially reduce the space needed to store data Database + Inference Deductive database 9

Deductive Database In a deductive database system we typically specify rules through a declarative language —a language in which we specify what to achieve rather than how to achieve it . An inference engine (or deduction mechanism ) within the system can deduce new facts from the database by interpreting these rules. Deductive databases have not found widespread adoptions outside academia, but some of their concepts are used in todays relational databases to support the advanced features of more recent SQL standards. 10

Why Deductive DBs? This has the advantage that some data can be stored implicitly using rules, rather than explicitly . This reduces the amount of storage the database occupies. The use of rules also allows us to store new kinds of data, such as recursive data. I t incorporates the features and power of a logic programming language (e.g. Prolog). 11

Deductive Database Expert systems work in a similar way to deductive databases, using rules to deduce information. A deductive database is oriented towards storing and manipulating large amounts of data, whereas an expert system stores and manipulates large numbers of rules . These rules are usually expressed in the database query language Datalog because of its simplicity and readability . 12

The deductive database work based on logic has used Prolog as a starting point. A variation of Prolog called Datalog is used to define rules declaratively in conjunction with an existing set of relations , which are themselves treated as literals in the language. A deductive database uses two main types of specifications: facts and rules . Facts can be considered as the data stored as relations in a relational database. Facts are specified in a manner similar to the way relations are specified, except that it is not necessary to include the attribute names. Rules are somewhat similar to relational views . They specify virtual relations that are not actually stored but that can be formed from the facts by applying inference mechanisms based on the rule specifications. 13 Specification of DDB

14 Prolog/ Datalog Notation The notation used in Prolog/ Datalog is based on providing predicates with unique names A predicate has an implicit meaning, which is suggested by the predicate name, and a fixed number of arguments If the arguments are all constant values, the predicate simply states that a certain fact is true If the predicate has variables as arguments, it is either considered as a query or as part of a rule or constraint All constant values in a predicate are either numeric or character strings starting with lowercase letters, whereas variable names always start with an uppercase letter

Datalog Notation A program is built from basic objects called atomic formulas Atomic formulas are literals of the form p( a 1 , a 2 , …, a n ), where p is a predicate name and n is the number of arguments for predicate p Different predicate symbols can have different number of arguments, and the number of arguments n of predicate p is sometimes called the arity or degree of p The arguments can be either constant values or variable names; Constant values are either numeric or character strings starting with lowercase letters, whereas variable names always start with an uppercase letter

16 Prolog/ Datalog Notation Fig: (a) Prolog notation (b) The supervisory tree

Basic inference mechanism for logic programs 17 I nterpretation of programs (rules + facts ) There are two main alternatives for interpreting the theoretical meaning of rules : proof theoretic , and model theoretic interpretation proof theoretic interpretation: In the proof-theoretic interpretation of rules, we consider the facts and rules to be true statements, or axioms . Ground axioms contain no variables. The facts are ground axioms that are considered to be true statements . Rules are deductive axioms since they are used to construct proofs that derive new facts from existing facts . The example given next shows how to prove the fact SUPERIOR( james , ahmad )

18 Example : 1. superior(X, Y)  supervise(X, Y). (rule 1) 2. superior (X, Y)  supervise(X, Z), superior (Z, Y) . (rule 2) 3. supervise( jennifer , ahmad ). (ground axiom, given) 4. supervise( james , jennifer ). (ground axiom, given) 5. superior( jennifer , ahmad ). (apply rule 1 on 3) 6. superior( james , ahmad ). (apply rule 2 on 4 and 5)

Given a finite or an infinite domain of constant values, assign to each predicate in the program every possible combination of values as arguments. We must then determine whether the predicate is true or false. In general, it is sufficient to specify the combinations of arguments that make the predicate true, and to state that all other combinations make the predicate false. If this is done for every predicate, it is called an interpretation of the set of predicates . For example, consider the interpretation shown in Figure given next for the predicates SUPERVISE and SUPERIOR. This interpretation assigns a truth value (true or false) to every possible combination of argument values (from a finite domain) for the two predicates. 19 Model theoretic interpretation

Model theoretic interpretation 20 An interpretation is called a model for a specific set of rules if those rules are always true under that interpretation ; that is, for any values assigned to the variables in the rules , the head of the rules is true when we substitute the truth values assigned to the predicates in the body of the rule by that interpretation.

21 Example: 1. superior(X, Y)  supervise(X, Y). (rule 1) 2. superior (X, Y)  supervise(X, Z), superior (Z, Y) . (rule 2) known facts: supervise(franklin, john), supervise(franklin, ramesh ), supervise(franklin, joyce ), supervise( james , franklin), supervise( jennifer , alicia ), supervise( jennifer , ahmad ), supervise( james , jennifer ). For all other possible (X, Y) combinations supervise(X, Y) is false . domain = { james , franklin, john, ramesh , joyce , jennifer , alicia , ahmad }

22 The above interpretation is also a model for the rules (1) and (2) since each of them evaluates always to true under the interpretation. For example, superior(X, Y)  supervise( X, Y ) superior(franklin, john)  supervise( franklin, john ) is true . superior(franklin, ramesh )  supervise( franklin, ramesh ) is true . ... … superior (X, Y)  supervise(X, Z), superior (Z, Y) superior ( james , ramesh )  supervise( james , franklin ), superior (franklin, ramesh ) is true . superior ( james , alicia )  supervise( james , jennifer ), superior ( jennifer , alicia ) is true .

Inference mechanism 23 In general, there are two approaches to evaluating logical programs: bottom-up and top-down . Bottom-up mechanism ( also called forward chaining and bottom-up resolution ) The inference engine starts with the facts and applies the rules to generate new facts. That is, the inference moves forward from the facts toward the goal . As facts are generated, they are checked against the query predicate goal for a match.

24 - Example query goal: superior( james , Y)? rules and facts are given as above. Check whether any of the existing facts directly matches the query . Apply the first rule to the existing facts to generate new facts. Apply the second rule to the existing facts to generate new facts . As each fact is generated , it is checked for a match of the the query goal. Repeat step 1 - 4 until no more new facts can be found. All the facts of the form: superior( james , a) are the answers.

25 Example : superior(X , Y)  supervise(X, Y). (rule 1) superior (X , Y)  supervise(X, Z), superior (Z, Y) . (rule 2) known facts: supervise(franklin, john), supervise(franklin, ramesh ), supervise(franklin, joyce ), supervise( james , franklin), supervise( jennifer , alicia ), supervise( jennifer , ahmad ), supervise( james , jennifer ). For all other possible (X, Y) combinations supervise(X, Y) is false . domain = { james , franklin, john, ramesh , joyce , jennifer , alicia , ahmad } superior( james , Y)? applying the first rule: superior( james , franklin), superior( james , jennifer ) Y = {franklin, jennifer } applying the second rule: Y = {John, Joyce, Ramesh, alicia , ahmad }

Top-down mechanism 26 ( also called back chaining and top-down resolution ) The inference engine starts with the query goal and attempts to find matches to the variables that lead to valid facts in the database. That is, the inference moves backward from the intended goal to determine facts that would satisfy the goal . During the course, the rules are used to generate subgoals . The matching of these subgoals will lead to the match of the intended goal.

27 -Example query goal: ?-superior( james , Y) rules and facts are given as above. Query: ?- superior( james , Y) Rule1: superior( james , Y)  supervise( james , Y) Rule2: superior( james , Y)  supervise( james , Z), superior(Z, Y) supervise( james , Z) superior( franklin , Y) superior( jennifer , Y) Y=franklin, jennifer Z=frankiln Z=jennifer Rule1: superior(franklin, Y)  supervise(franklin, Y) Rule1: superior( jennifer , Y)  supervise( jennifer , Y) Y= john, ramesh, joyce Y= alicia, ahmad

28

29

Thank You ! 30