Abductive commonsense reasoning

SanKim13 116 views 29 slides Jul 10, 2020
Slide 1
Slide 1 of 29
Slide 1
1
Slide 2
2
Slide 3
3
Slide 4
4
Slide 5
5
Slide 6
6
Slide 7
7
Slide 8
8
Slide 9
9
Slide 10
10
Slide 11
11
Slide 12
12
Slide 13
13
Slide 14
14
Slide 15
15
Slide 16
16
Slide 17
17
Slide 18
18
Slide 19
19
Slide 20
20
Slide 21
21
Slide 22
22
Slide 23
23
Slide 24
24
Slide 25
25
Slide 26
26
Slide 27
27
Slide 28
28
Slide 29
29

About This Presentation

a paper review. This presentation introduces Abductive Commonsense Reasoning which is the published paper in ICLR 2020. In this paper, the authors use commonsense to generate plausible hypotheses. They generate new data set 'ART' and propose new models for 'aNLI', 'aNLG' usin...


Slide Content

Abductive Commonsense Reasoning San Kim 2020.07.03 Chandra Bhagavatula (1) , Ronan Le Bras (1) , Chaitanya Malaviya (1) , Keisuke Sakaguchi (1) , Ari Holtzman (1) , Hannah Rashkin (1) , Doug Downey (1) , Scott Wen-tau Yih (2) , Yejin Choi (1,3) 1. Allen Institute for AI 2. Facebook AI 3. Paul G. Allen School of Computer Scient & Engineering

Contributions Abductive Commonsense Reasoning Challenges Abductive NLI ( ) Abductive NLG ( ) ART : Dataset for A bductive R easoning in Narrative T ext 20K Narrative Context 200K Hypotheses Experiments & Key Insights  

Abductive Commonsense Reasoning is formulated as multiple choice problems consisting of a pair of observations as context and a pair of hypothesis choices.   : The observation at time : The observation at time . : A plausible hypothesis that explains the two observations and . : An implausible (or less plausible) hypothesis for observations and .   Abductive Natural Language Generation is the task of generating a valid hypothesis given the two observation and . Formally, the task requires to maximize .   Abductive Natural Language Inference

Abductive Reasoning Adopted from [1]

A Probabilistic Framework for   The task is to select the hypothesis that is most probable given the observations.     Rewriting the objective using Bayes Rule conditioned on , we have:     Illustration of the graphical models described in the probabilistic framework. Adopted from [1]

A Probabilistic Framework for   (a) Hypothesis Only the strong assumption: the hypothesis is entirely independent of both observations , i.e. . Maximize the marginal . (b, c) First (or Second) Observation Only Weaker assumption: the hypothesis depends on only one of the first or second observation. Maximize the conditional probability or .   Adopted from [1]

A Probabilistic Framework for   (d) Linear Chain Uses both observations, but consider each observation’s influence on the hypothesis independently . The model assumes that the three variables form a linear Markov chain. The second observation is conditionally independent of the first, given the hypothesis (i.e. ) (e) Fully Connected Jointly models all three random variables. Combine information across both observations to choose the correct hypothesis.   Adopted from [1]

Difference btw/ the Linear Chain and Fully Connected model : Carl went to the store desperately searching for flour tortillas for a recipe.   : Carl left the store very frustrated.   : The cashier was rude.   : The store had corn tortillas, but not flour ones.         Linear Chain                 : Plausible! : Plausible!         : Plausible! :Plausible… Fully Connected               :Plausible… : Plausible!         : Plausible! : Plausible!                  

model   Given as sequence of tokens, The task can be modeled as Optionally, The model can also be conditioned on background knowledge .     Adopted from [1]

ConceptNet [5] ConceptNet is a multilingual knowledge base Is constructed of: Nodes ( a word or a short phrase) Edges (relation and weight) ConceptNet is a multilingual knowledge base Is constructed of: Nodes ( a word or a short phrase) Edges (relation and weight) Birds is not capable of … { bark , chew their food , breathe water } Related Works Wordnet Microsoft Concept Net Google knowledge base Adopted from [6]

ATOMIC [2] xIntent : Why does X cause an event? xNeed : What does X need to do before the event? xAttr : How would X be described? xEffect : What effects does the event have on X? xWant : What would X likely want to do after the event? xReaction : How does X feel after the event? oReact : How do others’ feel after the event? oWant : What would others likely want to do after the event? oEffect : What effects does the event have on others? If-Then Relation Types If-Event-Then-Mental-State If-Event-Then-Event If-Event-Then-Persona Inference dimension Adopted from [2]

ATOMIC If-Event-Then-Mental-State : three relations relating to the mental pre- and post-conditions of an event. X compliments Y X wants to be nice X feels good Y feels flattered xIntent : likely intents of the event xReaction : likely (emotional) reactions of the event’s subject oReaction : likely (emotional) reactions of others Adopted from [2]

ATOMIC If-Event-Then-Event : five relations relating to events that constitute probable pre- and post-conditions of a given event. X calls the police X needs to dial 911 X starts to panic X wants to explain everything to the police xNeeds xEffect xWant oWant oEffect Others want to dispatch some officers Y will smile X pays Y a compliment If-Event-Then-Persona : a stative relation that describes how the subject of an event is described or perceived. xAttr X is flattering X pays Y a compliment xAttr X is caring

ATOMIC Adopted from [2]

COMET: COM mons E nse T ransformers for Automatic Knowledge Graph Construction [3] Adopted from [3]

COMET: COM mons E nse T ransformers for Automatic Knowledge Graph Construction Adopted from [3]

COMET: COM mons E nse T ransformers for Automatic Knowledge Graph Construction Loss function   Input Template Notation Natural language tuples in {s: subject, r: relation, o: object} Subject tokens: Relation tokens: Object tokens:   Adopted from [3]

COMET: COM mons E nse T ransformers for Automatic Knowledge Graph Construction

Adversarial Filtering Algorithm A significant challenge in creating datasets is avoiding annotation artifacts . Annotation artifacts: unintentional patterns in the data that leak information about the target label. In each iteration , : adversarial model : a random subset for training : validation set : plausible and implausible hypotheses for an instance . : the difference in the model evaluation of and . With probability , update instance that gets correct with a pair of hypotheses that reduces the value of , where (resp. ) is the pool of plausible (resp. implausible) hypotheses for instance .   Adopted from [1]

Performance of baselines ( )   Adopted from [1]

Performance of baselines ( )   Adopted from [1]

Performance of baselines ( )   Adopted from [1] Adopted from [1]

BERT-Score Evaluating Text Generation with BERT (like inception-score, Frechet -inception-distance, kernel-inception-distance in image generation) Adopted from [4]

BERT-Score BERT-Score       Importance Weighting using inverse document frequency( idf ) , where is an indicator function.     Baseline Rescaling: compute baseline b using Common Crawl monolingual datasets.  

BERT-Score Adopted from [4]

Abductive Commonsense Reasoning 1. : is unlikely to follow after the first observation . 2. : is plausible after but unlikely to precede the second observation . 3. Plausible: is a coherent narrative and forms a plausible alternative, but it less plausible than .   Adopted from [1] Bert performance

Abductive Commonsense Reasoning Adopted from [1]

Abductive Commonsense Reasoning Adopted from [1]

Reference [1] Chandra Bhagavatula et. al., Abductive Commonsense Reasoning, ICLR 2020. [2] Maarten Sap et. al., ATOMIC: An atlas of Machine Commonsense for If-Then Reasoning, arXiv:1811.00146 [3] Antonie Bosselut et. al., COMET: Commonsense Transformers for Automatic Knowledge Graph Construction, ACL 2019 [4] Tianyi Zhang et. al., BERTScore : Evaluating Text Generation with BERT, ICLR 2020. [5] Robyn Speer et. al., ConceptNet 5.5: An Open Multilingual Graph of General Knowledge, arXiv:1612.03975 [ 6] MIT medialab , URL: https://conceptnet.io/