counterfactual reasoning and data augmentation for NLP and multimodal.
Size: 18.72 MB
Language: en
Added: Jun 29, 2024
Slides: 39 pages
Slide Content
Knowledge Intelligence Lab.
Hanbat National University
KCC 2024
설명가능인공지능(XAI) 워크샵
Multimodal Counterfactual reasoning을이용한
자연어해석연구동향
Cheoneum Park
Assistant Professor
Dept. of Computer Engineering
Hanbat National University, Korea
2024.06.27
Knowledge Intelligence Lab.
Hanbat National University연사소개
박천음
•Senior Researcher (HMG, `20~`23)
§Car Agent 연구/ 개발
§Automotive-domain-specific QA
•Manager (SKT, `23~`24)
§RAG, Retrieval, QA, Agent 연구/ 개발
•Assistant Professor (HBNU, `24~)
Interested in:
•Natural Language Processing
•Question Answering
•LLM and RAG
•Multimodal Learning
•Explainable AI
Knowledge Intelligence Lab.
Hanbat National University
•Counterfactual Reasoning
•Counterfactual for NLP
•Counterfactual for Multimodal
Outline
Knowledge Intelligence Lab.
Hanbat National University
Counterfactual Reasoning
•Causal inference
•특정요인이결과에미치는영향을평가하는과정
•Counterfactual reasoning
•Causal inference를위한 framework 또는 standard
•Predicting how alternative events, contrary to what actually
happened, might have resulted in different outcomes
4
Intro.
Knowledge Intelligence Lab.
Hanbat National University
1.Counterfactual for NLP
1)Counterfactual Story Reasoning and Generation
2)Retrieval-guided Counterfactual Generation for QA
3)DISCO: Distilling Counterfactuals with Large Language Models
4)DisentQA: Disentangling Parametric and Contextual
Knowledge with Counterfactual Question Answering
5)DO MODELS EXPLAIN THEMSELVES? COUNTERFACTUAL
SIMULATABILITY OF NATURAL LANGUAGE EXPLANATIONS
6)CEval: A Benchmark for Evaluating Counterfactual Text
Generation
Chapter 1
5
Knowledge Intelligence Lab.
Hanbat National University
Counterfactual Story Reasoning and Generation
Counterfactual Story Rewriting 제안
•원본이야기와counterfactual event제공
->Ending을 최소한으로수정해일관성유지
•TimeTravel dataset 구축
•ROCStories corpus 기반
•Counterfactual sentence로수정
•GPT, GPT-2를기반으로zero-shot 평가
•Counterfactual이포함된경우
•일관성유지어려움
•다른결말작성
6
Intro. – CF. for NLP
Counterfactual Story Reasoning and Generation, acl19
Knowledge Intelligence Lab.
Hanbat National University
Counterfactual Story Reasoning and Generation
7
Intro. – CF. for NLP
Counterfactual Story Reasoning and Generation, acl19
Knowledge Intelligence Lab.
Hanbat National University
Counterfactual Story Reasoning and Generation
8
Intro. – CF. for NLP
Counterfactual Story Reasoning and Generation, acl19
Knowledge Intelligence Lab.
Hanbat National University
Counterfactual Story Reasoning and Generation
9
Intro. – CF. for NLP
Counterfactual Story Reasoning and Generation, acl19
-ZS: Counterfactual 내용을이어서작성
-FT+CF:전반적인흐름은유지
-Supervised: 내용은guitar와관련있게바뀌었지만, 거의일관된ending 유지
Knowledge Intelligence Lab.
Hanbat National University
Retrieval-guided Counterfactual Generation for QA
•Deep learning-based NLP
•Poorly under a multitude of distributional shifts
•Over-reliance on spurious correlations or dataset artifacts
•Counterfactual data augmentation (CDA)
•CDA improves out-of-domain generalization and robustness against
spurious correlations
10
Intro. – CF. for NLP
Retrieval-guided Counterfactual Generation for QA, acl22
Knowledge Intelligence Lab.
Hanbat National University
Retrieval-guided Counterfactual Generation for QA
•Retrieve Generate Filter (RGF)
•Generate CDA samples for QA task
•Retrieve contexts from REALM
•Generate question for each context using T5
•Filter
•Noise Filtering
•Round-trip consistency
•An ensemble of six T5-based reading-
comprehension ((",$) → () models
•Keep any generated ("′ ,$′ ,(′) triples where at
least 5 of the 6 models agree on the answer
•약5%의잘못된데이터제거
11
Intro. – CF. for NLP
Retrieval-guided Counterfactual Generation for QA, acl22
Knowledge Intelligence Lab.
Hanbat National University
Retrieval-guided Counterfactual Generation for QA
•Filter
•Filtering for Minimality
•Word-level edit (Levenshtein) distance between " and "′
• ( = (′이고" 와"′ 사이의the smallest non-zero word-edit
distance
•Semantic Filtering
•QED(Question-meaning Decomposition) type을사용하여
질문을술어,구단위 decomposition ->평가
•Evaluation and results
•Fluency
•생성된질문의96%가문법적으로정확하고의미적으로타당함
•Correctness
•생성된질문의약75%가문맥과일치
•Semantic Diversity
•참조변경, 술어변경, 질문확장등다양한의미적변화 포함 12
Intro. – CF. for NLP
Retrieval-guided Counterfactual Generation for QA, acl22
Knowledge Intelligence Lab.
Hanbat National University
DISCO: Distilling Counterfactuals with Large Language Models
•DIStilled COunterfactual Data (DISCO)
•Crowd-sourcing CDA can be inefficient, costly,
and difficult to scale
•Generation methods rely on a fixed inventory
of perturbation type
•Method
•Task instances are decomposed into spans
•Prompt engineering and in-context learning are
applied with a general LLM
•Fully-automatic filtering over-generations using a
large teacher NLI model
•A student model is trained with the CDA
13
Intro. – CF. for NLP
DISCO: Distilling Counterfactuals with Large Language Models, acl23
Knowledge Intelligence Lab.
Hanbat National University
DISCO: Distilling Counterfactuals with Large Language Models
14
Intro. – CF. for NLP
DISCO: Distilling Counterfactuals with Large Language Models, acl23
From data cartography
Masked Prompt
Insertion Mode
Format: <Prefix> [insert] <Suffix>. It is <l'> that <H>
Knowledge Intelligence Lab.
Hanbat National University
DISCO: Distilling Counterfactuals with Large Language Models
15
Intro. – CF. for NLP
DISCO: Distilling Counterfactuals with Large Language Models, acl23
•Filtering
•생성된데이터의label이달라질수있음-> 이걸방지
•Filtering 기준
•Perturbation이instruction이나prompt일부를포함하고있는가?
•Perturbation이in-context를복사했는가?
•Perturbation이전제나가설의일부를반복했는가?
•중복확인
•Generated sentence와hypothesis와비교
•Lexical overlap rate
•Pre-defined set of common negation words
•Label distribution에따른비교
•Counterfactual과original 전제의차이확인
Knowledge Intelligence Lab.
Hanbat National University
DISCO: Distilling Counterfactuals with Large Language Models
16
Intro. – CF. for NLP
DISCO: Distilling Counterfactuals with Large Language Models, acl23
Knowledge Intelligence Lab.
Hanbat National University
DisentQA:Disentangling Parametric and Contextual Knowledge with Counterfactual Question Answering
17
Intro. – CF. for NLP
DisentQA: Disentangling Parametric and Contextual Knowledge with Counterfactual Question Answering, acl23
•QA modeldepends on two-type knowledge
•Parametric knowledge: model parameter에encoding된지식
•Contextual knowledge:inference 때입력받은외부지식
•QA model 이parametric / contextual knowledge모두답할수있는가
•Knowledge conflicts강건화
•Data augmentation으로모델학습제안
•Counterfactual data augmentation
•기존QA data의factuality를자동으로변경하여증축
->parametric knowledge의존성감소
•Answerability augmentation
•Contextual knowledge에답이없을경우->답변을회피하도록학습
Knowledge Intelligence Lab.
Hanbat National University
DisentQA:Disentangling Parametric and Contextual Knowledge with Counterfactual Question Answering
18
Intro. – CF. for NLP
•Why is the disentQA defined?
•Factual example
•Cont. answer: 문맥에따라잘답변
•Param. answer: 학습된분포로답변
•Counterfactual example
•Cont. answer: 문맥정보반영
•Param. answer: 학습된모델분포대로답변,
(Kanye West는Keeping Up에출연한적없음)
DisentQA: Disentangling Parametric and Contextual Knowledge with Counterfactual Question Answering, acl23
Knowledge Intelligence Lab.
Hanbat National University
DisentQA:Disentangling Parametric and Contextual Knowledge with Counterfactual Question Answering
19
Intro. – CF. for NLP
•Augmentation example type
•Augmentation approach
•Counterfactual -> use corpus-substitution
•(1) identifying named entity answers
•(2) replacing the answer with other answers of
the same entity from the same corpus
•Answerability
•Empty context
•Random context
DisentQA: Disentangling Parametric and Contextual Knowledge with Counterfactual Question Answering, acl23
Knowledge Intelligence Lab.
Hanbat National University
DisentQA:Disentangling Parametric and Contextual Knowledge with Counterfactual Question Answering
20
Intro. – CF. for NLP
DisentQA: Disentangling Parametric and Contextual Knowledge with Counterfactual Question Answering, acl23
Fine-tune T5 models (Large – 770M parameters, XXL – 11B parameters)
Knowledge Intelligence Lab.
Hanbat National University
DO MODELS EXPLAIN THEMSELVES?
21
Intro. – CF. for NLP
DO MODELS EXPLAIN THEMSELVES? COUNTERFACTUAL SIMULATABILITY OF NATURAL LANGUAGE EXPLANATIONS, arxiv23
•LLMs are trained to explain human decisions.
•LLM 스스로설명가능한가?
•다양한입력을처리하는LLM의good mental
model을만들수있는가?
•Counterfactual simulatability of natural
language explanation -> Two metrics
•1) Simulation generality
•보다일반적인counterfactual explanation
•Humans do not consume meat
•Muslims do not consume pork
•2) Simulation precision
•Human’s inference 결과와model의출력이일치하
는counterfactual의비율
More general
Knowledge Intelligence Lab.
Hanbat National University
DO MODELS EXPLAIN THEMSELVES?
22
Intro. – CF. for NLP
•Input question, explanation 을기반으로질문생성
•Answer는yes / no 로출력하도록
•Simulation generality
•Simulation precision
A Model: M Explanation: !!
Counterfactuals:
Simulatable subset:
Similarity !
w/ human evaluation
Knowledge Intelligence Lab.
Hanbat National University
CEval: A Benchmark for Evaluating Counterfactual Text Generation
23
Intro. – CF. for NLP
CEval: A Benchmark for Evaluating Counterfactual Text Generation, arxiv24
•A benchmark for comparing counterfactual text generation methods
•Tackle
•텍스트를최소한으로변경하여다른분류결과를얻는것을목표
•Dataset과metric의비일관적사용으로인해방법론발전을평가하는데어려움
•Approach
•Counterfactual metric,text quality metric통합
•Human annotation이포함된counterfactual dataset
•MICE, GDBA, CREST 등의baseline과OpenLLMLLAMA-2포함성능비교
•다양한counterfactual생성방법평가
Knowledge Intelligence Lab.
Hanbat National University
CEval: A Benchmark for Evaluating Counterfactual Text Generation
24
Intro. – CF. for NLP
CEval: A Benchmark for Evaluating Counterfactual Text Generation, arxiv24
Knowledge Intelligence Lab.
Hanbat National University
CEval: A Benchmark for Evaluating Counterfactual Text Generation
25
Intro. – CF. for NLP
Benchmark Design
•기존text를최소한수정->학습된Blackbox model의
출력확률을높이는새로운text생성
•Criteria of valid counterfactual
•Predictive Probability
•CD!’는사전정의된label #’을출력해야함
•Textual Similarity
•CD !’은원본data ! 와유사성유지해야함
•Likelihood in Feature Space
•CD !’는원본!와유사한특징값을보여야함
•!’이그럴듯하고현실적이며일반적인언어패턴보유
•Diversity
•Text instance를변경하는다양한방법제공
"=
Original prediction:
N samples:
An example:
CDG:
CDA example:
CDG prediction:
Counterfactual data generation
(CDG)
CEval: A Benchmark for Evaluating Counterfactual Text Generation, arxiv24
Knowledge Intelligence Lab.
Hanbat National University
CEval: A Benchmark for Evaluating Counterfactual Text Generation
26
Intro. – CF. for NLP
CEval: A Benchmark for Evaluating Counterfactual Text Generation, arxiv24
Counterfactual Metrics
•Flip Rate (FR)
•Original data와CD의predictive label이다른가
•Probability Change (∆P)
•Original data의label !, CD의label !’의확률값의
차이
•Token Distance (TD)
•The token-level distance: #(%,%′)
•Perplexity (PPL):
•Perplexity from GPT2
•Diversity (Div): 같은%에대한두CD (%!",%#")의유사도
Knowledge Intelligence Lab.
Hanbat National University
CEval: A Benchmark for Evaluating Counterfactual Text Generation
27
Intro. – CF. for NLP
CEval: A Benchmark for Evaluating Counterfactual Text Generation, arxiv24
Text Quality Metrics(Human evaluation)
•Fluency
•Coherence, readability 보장
•Cohesiveness
•Idea, logical flow의일관성보장
•Likability
•Tone, style, and the overall user experience등고려한liability
•Grammar
•Syntactical correctness, grammatical accuracy 평가
Knowledge Intelligence Lab.
Hanbat National University
CEval: A Benchmark for Evaluating Counterfactual Text Generation
28
Intro. – CF. for NLP
CEval: A Benchmark for Evaluating Counterfactual Text Generation, arxiv24
Knowledge Intelligence Lab.
Hanbat National University
CEval: A Benchmark for Evaluating Counterfactual Text Generation
29
Intro. – CF. for NLP
CEval: A Benchmark for Evaluating Counterfactual Text Generation, arxiv24
Knowledge Intelligence Lab.
Hanbat National University
1.Counterfactual for NLP
2.Counterfactual for Multimodal
1)What If the TV Was Off? Examining Counterfactual
Reasoning Abilities of Multi-modal Language Models
2)COCO-Counterfactuals
Chapter 2
30
Knowledge Intelligence Lab.
Hanbat National University
What If the TV Was Off?
31
Intro. – CF. for NLP – CF. for Multimodal
What If the TV Was Off? Examining Counterfactual Reasoning Abilities of Multi-modal Language Models, iccv23
•Counterfactual reasoning은이미관찰된사건에대한
대안고려
•Human intelligence의핵심
•Multimodal LLM (MLLM)의counterfactual
reasoning 능력평가목적
•C-VQA Benchmark 구축
•2,217 q-a pair 구축
•VQAv2 q-a pair 데이터사용
•질문에Counterfactual 전제추가
•이에따른답변수정
•ChatGPT로counterfactual 질문– 답변생성
•생성된질문-답변annotation
Knowledge Intelligence Lab.
Hanbat National University
What If the TV Was Off?
32
Intro. – CF. for NLP – CF. for Multimodal
What If the TV Was Off? Examining Counterfactual Reasoning Abilities of Multi-modal Language Models, iccv23
C-VQA Dataset
•Data Selection
•ViperGPT로VQA 수행-> correct answer인sample만사용
•Visual과language 모듈이모두올바르게처리할수있음을가정
•Annotation
•Counterfactual presupposition type
•Direct group:하나또는두개의산술연산을수행해야하는counterfactual 전제추가
•Indirect group: counterfactual전제가original답변을간접적으로변경(True/False)
•Example:How many X would there be if two more X were added?
Would this animal have claws if the animals were cats?
Knowledge Intelligence Lab.
Hanbat National University
What If the TV Was Off?
33
Intro. – CF. for NLP – CF. for Multimodal
What If the TV Was Off? Examining Counterfactual Reasoning Abilities of Multi-modal Language Models, iccv23
•Annotation (cont’)
•Question and answer annotation
•각그룹별200개씩human annotation 수행
•200개를기반으로counterfactual data GPT 자동생성: CoT
•Direct group
•먼저ChatGPT로counterfactual presupposition생성
•생성한전제를기반으로1) 새로운질문생성2)새로운답변계산
•Indirect group
•먼저original answer를반전
•그에상응하는counterfactual presupposition 생성
Knowledge Intelligence Lab.
Hanbat National University
What If the TV Was Off?
34
Intro. – CF. for NLP – CF. for Multimodal
What If the TV Was Off? Examining Counterfactual Reasoning Abilities of Multi-modal Language Models, iccv23
•Counterfactual 데이터에서전반적으로성능하락보임
•MLLM이counterfactual reasoning에어려움
•제안benchmark를기반으로MLLM의counterfactual reasoning 연구 방향제시
Knowledge Intelligence Lab.
Hanbat National University
COCO-Counterfactuals
•MS-COCO 기반counterfactual
examples이포함된multimodal
dataset
•Data generation pipeline 구성
•Text-to-image diffusion model
•Stable Diffusion
•Image-text dataset 기반
•MSCOCO
•COCO-counterfactual 평가
•Human evaluation
•모델의강건성및OOD(Out-Of-
Domain) 일반화성능평가
35
Intro. – CF. for NLP – CF. for Multimodal
COCO-Counterfactuals: Automatically Constructed Counterfactual Examples for Image-Text Pairs, neurips24
Knowledge Intelligence Lab.
Hanbat National University
COCO-Counterfactuals
Dataset 구축
Original caption: !!, Counterfactual caption: !"
•Creating Counterfactual Captions
•NLTK 로!! caption에서명사구분
•i번째[mask] token에대하여PLM으로prediction
•Top-10개candidate 추출-> 10개의counterfactual caption 생성
•대체된단어가명사인경우만필터링
•Sentence similarity model로!!와유사도계산
•(0.8, 0.91) 사이의유사도를가지는$"를추출
•GPT-2 기반!"에대한perplexity 계산
•가장낮은perplexity를갖는caption 사용
36
Intro. – CF. for NLP – CF. for Multimodal
COCO-Counterfactuals: Automatically Constructed Counterfactual Examples for Image-Text Pairs, neurips24
Knowledge Intelligence Lab.
Hanbat National University
COCO-Counterfactuals
•Generating Counterfactual Images
•(Synthetic) Original image: )$%, Counterfactual image: )&%
•Text 일부변경이image 생성에영향을크게미침
•Prompt-to-Prompt 적용
•Cross-attention maps 주입: diffusion 과정동안픽셀과prompt token 간의attention제어
•Prompt-to-Prompt의파라미터$조정: 이미지를다양하게수정하기위한denoising수제어
•0.2 이상코사인유사도필터링: CLIP을사용하여각캡션과생성된이미지에대한인코딩기반
•남은이미지중가장좋은이미지쌍(&"#,&$#)선택
•E_T, E_I 는CLIP의인코더
37
Intro. – CF. for NLP – CF. for Multimodal
COCO-Counterfactuals: Automatically Constructed Counterfactual Examples for Image-Text Pairs, neurips24
Knowledge Intelligence Lab.
Hanbat National University
COCO-Counterfactuals
•Generating COCO-Counterfactuals from MS-COCO
•Counterfactual generation pipeline으로MS-COCO data 생성
•25,014 original COCO caption -> counterfactual 생성
•Filtering 수행-> best candidates 추출
•24,508개pair 구축
•245만개의후보이미지쌍생성
•CLIP metric 적용-> 34,820 image-caption 추출
38
Intro. – CF. for NLP – CF. for Multimodal
COCO-Counterfactuals: Automatically Constructed Counterfactual Examples for Image-Text Pairs, neurips24
Knowledge Intelligence Lab.
Hanbat National University
감사합니다
Cheoneum Park [email protected]
https://kilab.hanbat.ac.kr/