a paper review. This presentation introduces Abductive Commonsense Reasoning which is the published paper in ICLR 2020. In this paper, the authors use commonsense to generate plausible hypotheses. They generate new data set 'ART' and propose new models for 'aNLI', 'aNLG' usin...
a paper review. This presentation introduces Abductive Commonsense Reasoning which is the published paper in ICLR 2020. In this paper, the authors use commonsense to generate plausible hypotheses. They generate new data set 'ART' and propose new models for 'aNLI', 'aNLG' using BERT, and GPT.
Size: 3.37 MB
Language: en
Added: Jul 10, 2020
Slides: 29 pages
Slide Content
Abductive Commonsense Reasoning San Kim 2020.07.03 Chandra Bhagavatula (1) , Ronan Le Bras (1) , Chaitanya Malaviya (1) , Keisuke Sakaguchi (1) , Ari Holtzman (1) , Hannah Rashkin (1) , Doug Downey (1) , Scott Wen-tau Yih (2) , Yejin Choi (1,3) 1. Allen Institute for AI 2. Facebook AI 3. Paul G. Allen School of Computer Scient & Engineering
Contributions Abductive Commonsense Reasoning Challenges Abductive NLI ( ) Abductive NLG ( ) ART : Dataset for A bductive R easoning in Narrative T ext 20K Narrative Context 200K Hypotheses Experiments & Key Insights
Abductive Commonsense Reasoning is formulated as multiple choice problems consisting of a pair of observations as context and a pair of hypothesis choices. : The observation at time : The observation at time . : A plausible hypothesis that explains the two observations and . : An implausible (or less plausible) hypothesis for observations and . Abductive Natural Language Generation is the task of generating a valid hypothesis given the two observation and . Formally, the task requires to maximize . Abductive Natural Language Inference
Abductive Reasoning Adopted from [1]
A Probabilistic Framework for The task is to select the hypothesis that is most probable given the observations. Rewriting the objective using Bayes Rule conditioned on , we have: Illustration of the graphical models described in the probabilistic framework. Adopted from [1]
A Probabilistic Framework for (a) Hypothesis Only the strong assumption: the hypothesis is entirely independent of both observations , i.e. . Maximize the marginal . (b, c) First (or Second) Observation Only Weaker assumption: the hypothesis depends on only one of the first or second observation. Maximize the conditional probability or . Adopted from [1]
A Probabilistic Framework for (d) Linear Chain Uses both observations, but consider each observation’s influence on the hypothesis independently . The model assumes that the three variables form a linear Markov chain. The second observation is conditionally independent of the first, given the hypothesis (i.e. ) (e) Fully Connected Jointly models all three random variables. Combine information across both observations to choose the correct hypothesis. Adopted from [1]
Difference btw/ the Linear Chain and Fully Connected model : Carl went to the store desperately searching for flour tortillas for a recipe. : Carl left the store very frustrated. : The cashier was rude. : The store had corn tortillas, but not flour ones. Linear Chain : Plausible! : Plausible! : Plausible! :Plausible… Fully Connected :Plausible… : Plausible! : Plausible! : Plausible!
model Given as sequence of tokens, The task can be modeled as Optionally, The model can also be conditioned on background knowledge . Adopted from [1]
ConceptNet [5] ConceptNet is a multilingual knowledge base Is constructed of: Nodes ( a word or a short phrase) Edges (relation and weight) ConceptNet is a multilingual knowledge base Is constructed of: Nodes ( a word or a short phrase) Edges (relation and weight) Birds is not capable of … { bark , chew their food , breathe water } Related Works Wordnet Microsoft Concept Net Google knowledge base Adopted from [6]
ATOMIC [2] xIntent : Why does X cause an event? xNeed : What does X need to do before the event? xAttr : How would X be described? xEffect : What effects does the event have on X? xWant : What would X likely want to do after the event? xReaction : How does X feel after the event? oReact : How do others’ feel after the event? oWant : What would others likely want to do after the event? oEffect : What effects does the event have on others? If-Then Relation Types If-Event-Then-Mental-State If-Event-Then-Event If-Event-Then-Persona Inference dimension Adopted from [2]
ATOMIC If-Event-Then-Mental-State : three relations relating to the mental pre- and post-conditions of an event. X compliments Y X wants to be nice X feels good Y feels flattered xIntent : likely intents of the event xReaction : likely (emotional) reactions of the event’s subject oReaction : likely (emotional) reactions of others Adopted from [2]
ATOMIC If-Event-Then-Event : five relations relating to events that constitute probable pre- and post-conditions of a given event. X calls the police X needs to dial 911 X starts to panic X wants to explain everything to the police xNeeds xEffect xWant oWant oEffect Others want to dispatch some officers Y will smile X pays Y a compliment If-Event-Then-Persona : a stative relation that describes how the subject of an event is described or perceived. xAttr X is flattering X pays Y a compliment xAttr X is caring
ATOMIC Adopted from [2]
COMET: COM mons E nse T ransformers for Automatic Knowledge Graph Construction [3] Adopted from [3]
COMET: COM mons E nse T ransformers for Automatic Knowledge Graph Construction Adopted from [3]
COMET: COM mons E nse T ransformers for Automatic Knowledge Graph Construction Loss function Input Template Notation Natural language tuples in {s: subject, r: relation, o: object} Subject tokens: Relation tokens: Object tokens: Adopted from [3]
COMET: COM mons E nse T ransformers for Automatic Knowledge Graph Construction
Adversarial Filtering Algorithm A significant challenge in creating datasets is avoiding annotation artifacts . Annotation artifacts: unintentional patterns in the data that leak information about the target label. In each iteration , : adversarial model : a random subset for training : validation set : plausible and implausible hypotheses for an instance . : the difference in the model evaluation of and . With probability , update instance that gets correct with a pair of hypotheses that reduces the value of , where (resp. ) is the pool of plausible (resp. implausible) hypotheses for instance . Adopted from [1]
Performance of baselines ( ) Adopted from [1]
Performance of baselines ( ) Adopted from [1]
Performance of baselines ( ) Adopted from [1] Adopted from [1]
BERT-Score Evaluating Text Generation with BERT (like inception-score, Frechet -inception-distance, kernel-inception-distance in image generation) Adopted from [4]
BERT-Score BERT-Score Importance Weighting using inverse document frequency( idf ) , where is an indicator function. Baseline Rescaling: compute baseline b using Common Crawl monolingual datasets.
BERT-Score Adopted from [4]
Abductive Commonsense Reasoning 1. : is unlikely to follow after the first observation . 2. : is plausible after but unlikely to precede the second observation . 3. Plausible: is a coherent narrative and forms a plausible alternative, but it less plausible than . Adopted from [1] Bert performance
Abductive Commonsense Reasoning Adopted from [1]
Abductive Commonsense Reasoning Adopted from [1]
Reference [1] Chandra Bhagavatula et. al., Abductive Commonsense Reasoning, ICLR 2020. [2] Maarten Sap et. al., ATOMIC: An atlas of Machine Commonsense for If-Then Reasoning, arXiv:1811.00146 [3] Antonie Bosselut et. al., COMET: Commonsense Transformers for Automatic Knowledge Graph Construction, ACL 2019 [4] Tianyi Zhang et. al., BERTScore : Evaluating Text Generation with BERT, ICLR 2020. [5] Robyn Speer et. al., ConceptNet 5.5: An Open Multilingual Graph of General Knowledge, arXiv:1612.03975 [ 6] MIT medialab , URL: https://conceptnet.io/