EnhancingLLMsWithInteractiveFeedback(TAI-AAI).pdf

MENGSAYLOEM1 54 views 19 slides Aug 08, 2024
Slide 1
Slide 1 of 19
Slide 1
1
Slide 2
2
Slide 3
3
Slide 4
4
Slide 5
5
Slide 6
6
Slide 7
7
Slide 8
8
Slide 9
9
Slide 10
10
Slide 11
11
Slide 12
12
Slide 13
13
Slide 14
14
Slide 15
15
Slide 16
16
Slide 17
17
Slide 18
18
Slide 19
19

About This Presentation

A new training framework called SAIE (Support Alone Isn't Enough) is proposed, which incorporates both supportive and adversarial feedback during training to enhance LLM reasoning skills.
SAIE is a two-phase fine-tuning process consisting of a warm-up phase (conventional fine-tuning) and an inte...


Slide Content

Enhancing LLMs with Interactive Feedback:
Advancing Learning and Reasoning
Mengsay Loem
TAI AAI #03 - Natural Language Processing (NLP)
August 7, 2024

Mengsay Loem
●2024/04 ~ : Researcher at a Japanese DX company
■Machine Learning, Natural Language Processing
■loem-ms.github.io

-Master of Engineering in Artificial Intelligence
-Tokyo Institute of Technology, Okazaki Laboratory
-Previously a member of Swallow LLM project

2

Are you satisfied when interacting with LLMs?
●Good at writing summary?
●Good at (general knowledge) question answering?
●Good at story writing?
●Good at coding?

How about REASONING? ??????

3

Improving LLM Reasoning with Interactions







Entailment? Contradiction? Neutral?
1.The doctor is in his new desk chair.
2.The man is sitting.
Entailment!
The person is sitting
in a chair
But gender is not
clear here.
If not specified, is it
fine to assume that
it could be both?
Yes!
Both have a person
sitting but they are
neutral in terms of
gender.
The road is wet.
What was the cause of this?
The wet road might
be caused by a big
rain.
I guess someone
poured water on the
road, which made
the road wet.
I stand by myself. I
think rain is a more
common cause of
the wet road.
Solving NLP Problems through Human-System Collaboration: A Discussion-based Approach (Kaneko et al., EACL Findings 2024)
Examining Inter-Consistency of Large Language Models Collaboration: An In-depth Analysis via Debate (Xiong et al., EMNLP Findings 2023)
4

Memorizing Risk/Compromise Interaction
I think the crew would drown
due to the wreck. I suppose they are likely to
meet some pirates.
Sorry, you are right, pirates are
very common at sea.
The ship wrecked. What happened as a result?
Examining Inter-Consistency of Large Language Models Collaboration: An In-depth Analysis via Debate (Xiong et al., EMNLP Findings 2023)
5

Memorizing Risk/Compromise Interaction
He works 3 cars a day and he gets paid $4 per car so that means he
makes $4 * 3 = $18 per day. He works 4 days a week and he has 4 days
off so that leaves 4 * 4 = $8 per day. He works 3 cars a day and he gets
paid $18 per day so that means he makes $8 * 4 = $132 a week. That
means he makes $132 * 2 = $232 a month. The answer: 2232.
I think there might be a mistake in your calculation. If Josh runs a car
shop and services 3 cars a day, and he works 5 days a week, then
he would service a total of 3 * 5 = 15 cars in a week. If he gets paid
$4 per car, then he would earn 15 * $4 = $60 per week. In two weeks,
he would earn $60 * 2 = $120. So, the correct answer should be $120,
not $2232.
If Josh runs a car shop and services 3 cars a day, and he works
five days a week, then he would service a total of 3 * 5 = 15 cars in
a week. If he gets paid $4 per car, then he would earn 15 * $4 = $60
per week. If he works two weeks, he would earn $60 * 2 = $120. The
final answer: 120.
I agree with your analysis Josh services 3 cars per day, and he works
5 days a week. So, in a week, he services 3 * 5 = 15 cars. If he earns $4
per car, then in a week, he makes 15 * 4 = $60. Therefore, in 2 weeks,
he would make $60 * 2 = $120. So, the correct answer is indeed $120.
I agree with your analysis. Josh works 5 days a week, so in a week,
he services 3 * 5 = 15 cars. He makes 15 * 4 = $60 in a week.
Therefore, he makes $60 * 2 = $120 in 2 weeks.
Josh runs a car shop and services 3 cars a day.
He is open every day of the week except Sunday and Wednesday.
He gets paid $4 per car. How much does he make in 2 weeks?
SAIE Framework: Support Alone Isn't Enough - Advancing LLM Training with Adversarial Remarks (Loem et al., ECAI 2024)
6

Refutation in Interaction
The wet road might be cause
by a big rain. I guess someone poured water
on the road, which makes the
road wet.
I stand by myself, rain is
a more common cause of the
wet road.
The road is wet, what was the cause of this?
Examining Inter-Consistency of Large Language Models Collaboration: An In-depth Analysis via Debate (Xiong et al., EMNLP Findings 2023)
7

What is happening here?
●Are we maximizing reasoning skill and explanation capabilities?
○Models learn nothing from interactions during inference ??????
■Basically no parameter updates during interactions

●Gap between training and interaction-based inference!
○Recall: Models are pre-trained to predict a next coming word given a context
The road is wet.
What was the cause of this?
Rain
Someone poured
water on the road
Road construction
8

Filling the Gap: Learn from Interactions
Could active discussions during training enhance reasoning
and language skills?
9

Enhance Reasoning with Interaction-based Training
To appear at 27th European Conference on Artificial Intelligence this October…
10https://arxiv.org/abs/2311.08107

Overview of SAIE Framework: Two-Phase Fine-Tuning

Warm-up Phase Interaction Phase
Total for bedrooms: 3 × 4 = 12 hours. Kitchen:
same as a bedroom. Living room: double a
bedroom. Total = 12 + 4 + 8 = 24 hours
The kitchen actually takes twice the bedroom,
and living room is two times of everything else.
Kitchen time: 4 + 50% = 4 + 4/2 = 6 hours.
Bedrooms and kitchen = 18 hours. Living
room: 2 × 18 = 36 hours. Total = 18 + 36 = 54.
But think again about the living room. Is it
really twice the total of the bedrooms and
kitchen, or could it be more?
Bedroom: 4. Kitchen: 4 + 4/2 = 6. Total before
living = 3 x 4 + 6 = 18. Living = 18 x 2 = 36
hours. Total = 18 + 36 = 54.


Supportive
Feedback
Adversarial
Feedback

??????

Learner
Partner
??????
Tom decides to renovate a house. There are 3
bedrooms and each bedroom takes 4 hours to
renovate. The kitchen takes 50% longer than
each bedroom. The living room took twice as
much time as everything else combined.
How long did everything take?
Total of bedrooms: 3 × 4 = 12 hours. Kitchen
takes 4 + 4 × 0.5 = 6. Living room takes double
of others: (12 + 6) × 2 = 36. Total = 12 + 6 + 36
= 54 hours
11

Warm-up Phase
●(Conventional) Fine-tuning
○Only the Learner model is updated here
●To prepare for the interaction phase
○Establishes a foundational understanding
of the task domain

Warm-up Phase
Tom decides to renovate a house. There are 3
bedrooms and each bedroom takes 4 hours to
renovate. The kitchen takes 50% longer than
each bedroom. The living room took twice as
much time as everything else combined.
How long did everything take?
Total of bedrooms: 3 × 4 = 12 hours. Kitchen
takes 4 + 4 × 0.5 = 6. Living room takes double
of others: (12 + 6) × 2 = 36. Total = 12 + 6 + 36
= 54 hours
??????
12

Interaction Phase
●Iterative interactions
○Only the Learner
model is updated
●Dynamic feedbacks
based on the
Learner’s answer
○❌ → Supportive
○✅ → Adversarial
●To enhances
discussion and
verbalization skills
Interaction Phase
Total for bedrooms: 3 × 4 = 12 hours. Kitchen:
same as a bedroom. Living room: double a
bedroom. Total = 12 + 4 + 8 = 24 hours
The kitchen actually takes twice the bedroom,
and living room is two times of everything else.
Kitchen time: 4 + 50% = 4 + 4/2 = 6 hours.
Bedrooms and kitchen = 18 hours. Living
room: 2 × 18 = 36 hours. Total = 18 + 36 = 54.
But think again about the living room. Is it
really twice the total of the bedrooms and
kitchen, or could it be more?
Bedroom: 4. Kitchen: 4 + 4/2 = 6. Total before
living = 3 x 4 + 6 = 18. Living = 18 x 2 = 36
hours. Total = 18 + 36 = 54.


Supportive
Feedback
Adversarial
Feedback

??????

Learner
Partner
13

?????? Better Reasoning After Interaction-based Training
Learner: Flan-T5 XL (3B) & Partner: GPT-3.5
14
* No interaction during inference

?????? Better Chain-of-Thought/Explanation
15

?????? Better Inference-Time Interaction
16

17

TL;DR LLMs can achieve better reasoning
●Interaction (with human/LLMs) during inference
●(Better w/) Interactive feedback in training ??????
●(Better w/) Both Supportive & Adversarial feedback in training ??????????????????
18

Thank you!