INFERENCE IN FIRST-ORDER LOGIC Artificial Intelligence, A Modern Approach, CH.9 2017.04.28(Fri) Junya Tanaka (M1)
Definition Sound I nference derives true conclusions given true premises Complete I nference derives all true conclusions from a set of premises Substitution(SUBST({ var_name / value}, sentence)) EX. P = Animal(X) SUBST ({X/Panda},P}) = Animal(Panda)
Introduction Chapter 7 showed how a knowledge-based agent could represent the world in which it operates and deduce what actions to take. Chapter 8 examine first-order logic , which is sufficiently expressive to represent a good deal of our common-sense knowledge. effective procedures for answering questions posed in first-order logic. Goal of this chapter is …
Agenda 9.1 Propositional VS. First-Oder Interference 9.2 Unification and Lifting 9.3 Forward Chaining 9.4 Backward Chaining 9.5 Resolution 9.6 Summery
9.1 Propositional VS. First-Oder Inference This section and the next introduce the ideas underlying modern logical inference systems some simple inference rules that can be applied to sentences with quantifiers to obtain sentences without quantifiers Inference rules for quantifiers Universal Quantifiers a type of quantifier a logical constant which is interpreted as "given any" or "for all” Existential Quantifiers a type of quantifier, a logical constant which is interpreted as "there exists", "there is at least one", or "for some"
Universal Instantiation(UI: ) we can infer any sentence obtained by substituting a ground term (a term without variables) for the variable. Example If ∀x Person(x) (every x is a person) then Person(John), Person(Jack)…so on are true. So substitute x with anything(ground term) is legal . denote the result of applying the substitution to the sentence . Then the rule is written for any variable v and ground term g
Example Suppose our knowledge base contains the standard folkloric axiom stating that all greedy kings are evil: Then it seems quite permissible to infer any of the following sentences:
Existential Instantiation T he existential sentence says there is some object satisfying a condition, and applying the existential instantiation rule just gives a name to that object F or any sentence α , variable v , and constant symbol k that does not appear elsewhere in the knowledge base, Intuition There is “some” object satisfying a condition. We cannot randomly substitute the variable. Instead, we use a constant “C” which we assume that C satisfies the condition.
Example From the sentence ∃ x Crown(x ) ∧ OnHead (x, John ) we can infer the sentence Crown(C1 ) ∧ OnHead (C1, John ) as long as C1 does not appear in knowledge base. Provided C 1 is a new constant symbol, called a Skolem constant Existential Instantiation is a special case of a more general process called S kolemization
UI and EI UI can be applied several times to add new sentences; the new KB is logically equivalent to the old. Given everything is valid, instantiating anything is also valid. EI can be applied only once to replace the existential sentence; the new KB is not equivalent to the old, but is satisfiable iff the old KB was satisfiable . E.g. ∃x Killer(x, Victim) After substituting this sentence, we get Killer(Murderer, Victim) and the original sentence with ∃ is discarded. The new KB is not equivalent to the old one since the original sentence was replaced. These two KB are not logically equivalent yet inferentially equivalent
Reduction to propositional inference W e have rules for inferring nonquantified sentences from quantified sentences I t becomes possible to reduce first-order inference to propositional inference The first idea is that, just as an existentially quantified sentence can be replaced by one instantiation, a universally quantified sentence can be replaced by the set of all possible instantiations
Example Suppose the KB contains just the following: ∀ x King(x) ∧ Greedy(x) ⇒ Evil(x) King(John ) Greedy(John) Brother(Richard , John) Instantiating the universal sentence in all possible ways, we have King(John ) ∧ Greedy(John) ⇒ Evil(John) King(Richard ) ∧ Greedy(Richard) ⇒ Evil(Richard) King(John ) Greedy(John ) Brother(Richard , John) The new KB is propositionalized : proposition symbols are King(John ), Greedy(John), Evil(John), King(Richard) etc.
Problems Problem1: W ith function symbols, there are infinitely many ground terms, Father(Father(Father(John ))) Problem2: We can check the entailment of a sentence. (Modus Ponens, Resolution, etc.) However, unable to check the “non-entailment” of the sentence. (Infinitely nested statement → infinite loop)
9.2 Unification and Lifting T he propositionalization approach is rather inefficient Example the inference of Evil(John) from the sentences ∀x King(x)∧Greedy(x) ⇒ Evil(x) King (John ) Greedy (John ) This seems completely obvious to a human being
F irst-order inference rule Generalized Modus Ponens It can be summarized as "P implies Q and P is asserted to be true, so therefore Q must be true For atomic sentences p , p ′, and q, where there is a substitution θ such that SUBST( θ , pi′) = SUBST( θ , pi), for all i ,
Example of Generalized Modus Ponens Example Use the rule that greedy kings are evil, find some x such that x is a king and x is greedy, and then infer that this x is evil T he substitution {x/ John,y /John} to the implication premises King(x) and Greedy(x) and the knowledge-base sentences King(John) and Greedy(y) will make them identical p 1′ is King(John) p 2′ is Greedy(y) θ is {x/John, y/John} S UBST ( θ , q) is Evil(John) . p 1 is King(x) p 2 is Greedy(x) q is Evil(x)
Unification We can get the inference immediately if we can find a substitution θ such that King(x) and Greedy(x) match King(John) and Greedy(y) θ = {x/ John,y /John} works Unification finds substitutions that make different logical expressions look identical UNIFY takes two sentences and returns a unifier for them, if one exists UNIFY( p , q ) = where SUBST(, p ) = SUBST (, q ) Basically, find a that makes the two clauses look alike
Example of Unification Examples UNIFY(Knows( John,x ), Knows( John,Jane )) = {x/Jane} UNIFY(Knows( John,x ), Knows( y,Bill )) = {x/Bill, y/John} UNIFY(Knows( John,x ), Knows( y,Mother (y))= {y/John, x/Mother(John) UNIFY(Knows( John,x ), Knows( x,Elizabeth )) = fail Last example fails because x would have to be both John and Elizabeth We can avoid this problem by standardizing: The two statements now read UNIFY(Knows( John,x ), Knows( z,Elizabeth )) This is solvable: UNIFY(Knows( John,x ), Knows( z,Elizabeth )) = {x/ Elizabeth,z /John}
Unification To unify Knows( John,x ) and Knows( y,z ) Can use θ = {y/John, x/z } Or θ = {y/John, x/John, z/John} The first unifier is more general than the second. There is a single most general unifier (MGU) that is unique up to renaming of variables. MGU = { y/John, x/z }
Unification Algorithm
Storage and Retrieval Use TELL and ASK to interact with Inference Engine Implemented with STORE and FETCH STORE( s ) stores sentence s FETCH( q ) returns all unifiers that the query q unifies with some sentence in Knowledge Base Example: q = Knows( John,x ) KB is: Knows( John,Jane ), Knows( y,Bill ), Knows( y,Mother (y)) Result is 1 ={x/Jane}, 2 =, 3 = {John/ y,x /Mother(y )}
Storage and Retrieval First approach: Create a long list of all propositions in Knowledge Base Attempt unification with all propositions in Knowledge Base Works, but is inefficient Need to restrict unification attempts to sentences that have some chance of unifying Index facts in Knowledge Base Predicate Indexing Index predicates: All “Knows” sentences in one bucket All “Loves” sentences in another Use Subsumption Lattice (see below)
Storage and Retrieval Subsumption Lattice Child is obtained from parent through a single substitution Lattice contains all possible queries that can be unified with it. Works well for small lattices Predicate with n arguments has a 2 n lattice Structure of lattice depends on whether the base contains repeated variables
9.3 Forward Chaining Forward Chaining one of the two main methods of reasoning when using an inference engine and can be described logically as repeated application Idea: Start with atomic sentences in the KB Apply Modus Ponens Add new atomic sentences until no further inferences can be made Works well for a KB consisting of Situation Response clauses when processing newly arrived data
First-Order definite clauses They are disjunctions of literals of which exactly one is positive First Order Definite Clauses Disjunctions of literals of which exactly one is positive: Example : King(x) Greedy(x) Evil(x) King(John) Greedy(y) First Order Definite Clauses can include variables Variables are assumed to be universally quantified Greedy(y) means y Greedy(y) Not every KB can be converted into first definite clauses
Example(1/2) Consider the following problem: The law says that it is a crime for an American to sell weapons to hostile nations. The country Nono , an enemy of America, has some missiles, and all of its missiles were sold to it by Colonel West, who is American Prove West is a criminal
Example(2/2) it is a crime for an American to sell weapons to hostile nations: American(x ) Weapon(y) Sells( x,y,z ) Hostile(z) Criminal(x) Nono … has some missiles, i.e., x Owns( Nono,x ) Missile(x): Owns(Nono,M 1 ) and Missile(M 1 ) … all of its missiles were sold to it by Colonel West Missile(x) Owns( Nono,x ) Sells( West,x,Nono ) Missiles are weapons: Missile(x) Weapon(x) An enemy of America counts as "hostile“: Enemy( x,America ) Hostile(x) West, who is American … American(West) The country Nono , an enemy of America … Enemy( Nono,America )
S imple F orward-chaining algorithm
Proof of Forward Chaining
Proof of Forward Chaining
Proof of Forward Chaining
Properties of forward chaining Sound and complete for first-order definite clauses Datalog = first-order definite clauses + no functions FC terminates for Datalog in finite number of iterations May not terminate in general if α is not entailed This is unavoidable: entailment with definite clauses is semidecidable
Efficiency of forward chaining Incremental forward chaining: no need to match a rule on iteration k if a premise wasn't added on iteration k-1 match each rule whose premise contains a newly added positive literal Matching itself can be expensive: Database indexing allows O(1) retrieval of known facts e.g., query Missile(x) retrieves Missile(M 1 ) Forward chaining is widely used in deductive databases There are two major possible sources of complexity. Pattern matching Incremental forward chaining
Pattern matching Finding all possible unifiers consumes large amount of time. For example, Missile(x) ∧ Owns( Nono , x) → Sells(West, x, Nono ) Assume that Nono owns 1000 objects but only 3 are missile. If we sequentially find substitution of x in KB from objects owned by Nono , it would take lots of time to find a missile. On the other hand, if we started with finding from x that is missile, we would check only 3 times and that reduce great amount of computational time. This is called “conjunct ordering” problem: find a good order to solve the conjuncts of the rule premise NP-hard problem (but good heuristic available)
Example : Map-coloring problem Given a map, find a way to paint the color on each city(node) so that the adjacent cities do not share the same color. Colorable() means the constraints are satisfiable . Diff( wa,nt ) Diff( wa,sa ) Diff( nt,q ) Diff( nt,sa ) Diff( q,nsw ) Diff( q,sa ) Diff( nsw,v ) Diff( nsw,sa ) Diff( v,sa ) Colorable() Diff( Red,Blue ) Diff ( Red,Green ) Diff( Green,Red ) Diff( Green,Blue ) Diff( Blue,Red ) Diff( Blue,Green )
Incremental forward chaining On each iteration, Forward Chaining matches every rule in Knowledge Base. ( But didn’t show in the previous slide) E.g. Missile(x) → Weapon(x) is matched with Missile(M1) resulting in “Weapon(M1)”, which is already in KB, in every iteration . Observation : Facts obtained from iteration t must use some facts from the iteration t-1 to derive. It’s true, if the fact in iteration t could be derived with new fact from iteration t-1, we would get that iteration t fact sooner. Solution : At iteration t, we take a rule only if its premise contains a conjunct that unifies with a newly derived fact at iteration t-1. Note: The premise just needs to contain, doesn’t have to be whole.
Example of Incremental forward chaining Example We were able to derive the fact P(John) at 5 th iteration. There are these following sentences in KB (1) P(x) ∧ Q(Jane) → AA (2) K(x) ∧ M(x, y) ∧ N(x) ∧ O(x, z) → BB (3) P(Jack) ∧ N(Jake) → CC At the 6 th iteration, there is no point in checking the second and third sentences (2) does not contain P (3) cannot unify with P(John) since P was already substituted as Jack. And at the 7 th iteration, we don’t match P(John) with (1) because we will keep deriving AA.
9.4 Backward Chaining Backward Chaining is a “generator” Generates all possible substitutions to the goal. Backward Chaining is depth-first. Idea Starts from the goal Matches each literal in goal’s premise from left to right with facts in Knowledge Base Creates new recursive call, if the call succeed, the proof in that branch is completed. If the current goal is completed, continue with new sentence that can lead to goal. Be reminded : goal might be able to reached from one or more rules.
Backward chaining
Backward chaining
Backward chaining
Backward chaining
Backward chaining
Backward chaining
Backward chaining
Backward Chaining Properties of Backward chaining Depth-first recursive proof search Incomplete due to infinite loops Fixed by checking current goal against every goal Inefficient due to repeated subgoals Fixed by caching previous results However, Backward Chaining is widely used “without” improvements.
Backward Chaining Logic programming Algorithm = Logic + Control (Robert Kowalski’s equation) Declare problem in logical symbol form Query desired statement The program will find the solution to the query based on Knowledge Base. Prolog Logic programming language Use Backward Chaining as an inference algorithm Close world assumption
Prolog Redundant inference and infinite loops Prolog is trapped in infinite loops easily because of its depth-first style search. Let path(x , y) refers to “an existence of path between point x and z.” link(x, y) refers to “x is directly connected to y.” (adjacent node) To see if a path between two distant nodes exist, we use path(n1, n2) We model the rules in recursive style with prolog syntax as… path(n1, n2) :- link(n1, n2) path(n1 , n2) :- path(n1, n), path(n, n2)
Prolog Redundant inference and infinite loops path(n1, n2) :- link(n1, n2) Case 1: two nodes are directly connected. path(n1, n2) :- path(n1, n), link(n, n2) Case 2:two nodes are connected by some path between n1 and n while n is adjacent to n2. This can be written in FOL style as… [link(n1, n2)] ∨ [path(n1, n) ∧ link(n, n2)] → path(n1, n2)
Prolog If implemented badly, prolog will keep on delving itself to find the path as the right picture
Constraint logic programming(CLP ) Normally logic program(e.g. prolog) can answer only 1 or finite number of answers at time. What if there are infinitely many answers? Example x >=0 ∧ y>=0 ∧ z>=0 ∧ x+y >=z ∧ y+z >=x ∧ x+z >=y → Triangle(x, y, z) For Prolog Triangle(3,4,5) is True. But Triangle(3,4,z) causes an error because prolog cannot handle z>= 0 constraint. CLP can handle this. Triangle(3,4,z) results in 1<= z <= 7 CLP was invented for constraint problem solving. The answer is correct as long as it meets the constraints
9.5 Resolution Conjunctive normal form for First-order logic As in propositional logic, resolution algorithm works with sentences in Conjunctive normal form . Every First-order logic sentence can be converted into an inferentially equivalent Conjunctive normal form sentence .
Resolution Conversion to Conjunctive normal form Everyone who loves all animals is loved by someone: ∀x [∀y Animal(y) → Loves(x, y)] → [∃y Loves(y, x)] 1. Eliminate implications and biconditionals (if any) ∀x [¬(∀y ¬Animal(y) ∨ Loves(x, y))] ∨ [∃y Loves(y, x)] 2. Move ¬ inwards: (¬∀x p ≡ ∃x ¬p , ¬ ∃x p ≡ ∀x ¬p) ∀x [∃y ¬(¬ Animal(y) ∨ Loves(x, y))] ∨ [∃y Loves(y, x)] ∀x [∃y ¬¬ Animal(y) ∧ ¬Loves(x, y))] ∨ [∃y Loves(y, x)] ∀x [∃y Animal(y) ∧ ¬Loves(x, y))] ∨ [∃y Loves(y, x )] 3. Standardize variables: ∀x [∃y Animal(y) ∧ ¬Loves(x, y))] ∨ [∃y Loves(y, x)] Notice that there are two y in the sentence. However, these y refer to different objects The first one is animal, the second one is the person who loves x. To avoid confusion, we rename(standardize) the second y. ∀x [∃y Animal(y) ∧ ¬Loves(x, y))] ∨ [∃z Loves(z, x)]
Resolution 4. Skolemization As in the previous slide, skolemization is done so as to drop the EQ. However, we cannot use skolem variable as before. ∀x [Animal(A) ∧ ¬Loves(x, A))] ∨ [Loves(B, x)] Everyone either fails to love a “particular” animal A. Is loved by a “particular” person B. The meaning of the sentence has changed. In this case, we have to use “ skolem function” which its value depends on x. ∀x [Animal(F(x)) ∧ ¬Loves(x, F(x)))] ∨ [Loves(G(x), x)] Skolem function are all the “universally quantified variable” in a “scope of existential quantifier”. So… skolem funtion behaves like a universal quantifier
Resolution 5. Drop universal quantifiers Since everything up to this point is universally quantified, dropping the sign is a legal action. [Animal(F(x)) ∧ ¬Loves(x, F(x))] ∨ [Loves(G(x), x)] 6. Rewrite into Conjunctive form (Distribute ∧ over ∨) (L1 ∨ … ∨ Ln) ∧ (M1 ∨ … ∨ Mm) ∧ … ∧ (Z1 ∨ … ∨ Zz ) [Animal(F(x)) ∨ Loves(G(x), x)] ∧ [¬Loves(x, F(x)) ∨ Loves(G(x), x)]
Resolution Properties of resolution Complete for FOL Refutation-complete If a set of sentences is unsatisfiable , resolution will always be able to derive a contradiction. However, resolution can produce nonconstructive proofs. A proof which indirectly shows a mathematical object exists without providing a specific example or algorithm for producing an example. To avoid that, answer literal is added to the negated goal. CNF(KB ∧ ¬(a ∧ answerLiteral ))
Resolution strategies Unit preference Make use of “unit clause” : a sentence which has exactly one literal. Idea : use unit clause to produce shorter sentence. Since we want to reach empty clause as soon as possible, having a heuristic that could reduce the sentence’s length is worth trying
Resolution strategies Set of support( SoS ) A special set that in every step of resolution, the resolvent comes from a member of that set. Idea : We get a problem with a set of conditions. To solve the problem, we have to use every condition indicated in the problem. If we could solve without using every condition???? Something must be off! That set of conditions is an analogy to the set of support. Drawback If we define bad SoS , problem becomes unsolvable.
Summary A first approach uses inference rules for instantiating quantifiers in order to propositionalize the inference problem. However, it’s very slow. Lifted Modus Ponens, called generalized Modus Ponens, can be applied to FOL. FC and BC also use this rule. FC is a bottom-up process. Starting from scattered facts, assembling them to get the desired conclusion. BC is a top-down process. Starting from desired goal, then try to find the facts that can prove the goal true. Resolution also has its lifted version which works in the same manner, CNF and proof by contradiction, as the propositional version.