A new training framework called SAIE (Support Alone Isn't Enough) is proposed, which incorporates both supportive and adversarial feedback during training to enhance LLM reasoning skills.
SAIE is a two-phase fine-tuning process consisting of a warm-up phase (conventional fine-tuning) and an inte...
A new training framework called SAIE (Support Alone Isn't Enough) is proposed, which incorporates both supportive and adversarial feedback during training to enhance LLM reasoning skills.
SAIE is a two-phase fine-tuning process consisting of a warm-up phase (conventional fine-tuning) and an interaction phase where the learner model is updated based on dynamic feedback.
Results show that LLMs trained with SAIE demonstrate better reasoning, chain-of-thought/explanation, and inference-time interaction capabilities.
Size: 1.25 MB
Language: en
Added: Aug 08, 2024
Slides: 19 pages
Slide Content
Enhancing LLMs with Interactive Feedback:
Advancing Learning and Reasoning
Mengsay Loem
TAI AAI #03 - Natural Language Processing (NLP)
August 7, 2024
Mengsay Loem
●2024/04 ~ : Researcher at a Japanese DX company
■Machine Learning, Natural Language Processing
■loem-ms.github.io
-Master of Engineering in Artificial Intelligence
-Tokyo Institute of Technology, Okazaki Laboratory
-Previously a member of Swallow LLM project
2
Are you satisfied when interacting with LLMs?
●Good at writing summary?
●Good at (general knowledge) question answering?
●Good at story writing?
●Good at coding?
How about REASONING? ??????
3
Improving LLM Reasoning with Interactions
Entailment? Contradiction? Neutral?
1.The doctor is in his new desk chair.
2.The man is sitting.
Entailment!
The person is sitting
in a chair
But gender is not
clear here.
If not specified, is it
fine to assume that
it could be both?
Yes!
Both have a person
sitting but they are
neutral in terms of
gender.
The road is wet.
What was the cause of this?
The wet road might
be caused by a big
rain.
I guess someone
poured water on the
road, which made
the road wet.
I stand by myself. I
think rain is a more
common cause of
the wet road.
Solving NLP Problems through Human-System Collaboration: A Discussion-based Approach (Kaneko et al., EACL Findings 2024)
Examining Inter-Consistency of Large Language Models Collaboration: An In-depth Analysis via Debate (Xiong et al., EMNLP Findings 2023)
4
Memorizing Risk/Compromise Interaction
I think the crew would drown
due to the wreck. I suppose they are likely to
meet some pirates.
Sorry, you are right, pirates are
very common at sea.
The ship wrecked. What happened as a result?
Examining Inter-Consistency of Large Language Models Collaboration: An In-depth Analysis via Debate (Xiong et al., EMNLP Findings 2023)
5
Memorizing Risk/Compromise Interaction
He works 3 cars a day and he gets paid $4 per car so that means he
makes $4 * 3 = $18 per day. He works 4 days a week and he has 4 days
off so that leaves 4 * 4 = $8 per day. He works 3 cars a day and he gets
paid $18 per day so that means he makes $8 * 4 = $132 a week. That
means he makes $132 * 2 = $232 a month. The answer: 2232.
I think there might be a mistake in your calculation. If Josh runs a car
shop and services 3 cars a day, and he works 5 days a week, then
he would service a total of 3 * 5 = 15 cars in a week. If he gets paid
$4 per car, then he would earn 15 * $4 = $60 per week. In two weeks,
he would earn $60 * 2 = $120. So, the correct answer should be $120,
not $2232.
If Josh runs a car shop and services 3 cars a day, and he works
five days a week, then he would service a total of 3 * 5 = 15 cars in
a week. If he gets paid $4 per car, then he would earn 15 * $4 = $60
per week. If he works two weeks, he would earn $60 * 2 = $120. The
final answer: 120.
I agree with your analysis Josh services 3 cars per day, and he works
5 days a week. So, in a week, he services 3 * 5 = 15 cars. If he earns $4
per car, then in a week, he makes 15 * 4 = $60. Therefore, in 2 weeks,
he would make $60 * 2 = $120. So, the correct answer is indeed $120.
I agree with your analysis. Josh works 5 days a week, so in a week,
he services 3 * 5 = 15 cars. He makes 15 * 4 = $60 in a week.
Therefore, he makes $60 * 2 = $120 in 2 weeks.
Josh runs a car shop and services 3 cars a day.
He is open every day of the week except Sunday and Wednesday.
He gets paid $4 per car. How much does he make in 2 weeks?
SAIE Framework: Support Alone Isn't Enough - Advancing LLM Training with Adversarial Remarks (Loem et al., ECAI 2024)
6
Refutation in Interaction
The wet road might be cause
by a big rain. I guess someone poured water
on the road, which makes the
road wet.
I stand by myself, rain is
a more common cause of the
wet road.
The road is wet, what was the cause of this?
Examining Inter-Consistency of Large Language Models Collaboration: An In-depth Analysis via Debate (Xiong et al., EMNLP Findings 2023)
7
What is happening here?
●Are we maximizing reasoning skill and explanation capabilities?
○Models learn nothing from interactions during inference ??????
■Basically no parameter updates during interactions
●Gap between training and interaction-based inference!
○Recall: Models are pre-trained to predict a next coming word given a context
The road is wet.
What was the cause of this?
Rain
Someone poured
water on the road
Road construction
8
Filling the Gap: Learn from Interactions
Could active discussions during training enhance reasoning
and language skills?
9
Enhance Reasoning with Interaction-based Training
To appear at 27th European Conference on Artificial Intelligence this October…
10https://arxiv.org/abs/2311.08107
Overview of SAIE Framework: Two-Phase Fine-Tuning
Warm-up Phase Interaction Phase
Total for bedrooms: 3 × 4 = 12 hours. Kitchen:
same as a bedroom. Living room: double a
bedroom. Total = 12 + 4 + 8 = 24 hours
The kitchen actually takes twice the bedroom,
and living room is two times of everything else.
Kitchen time: 4 + 50% = 4 + 4/2 = 6 hours.
Bedrooms and kitchen = 18 hours. Living
room: 2 × 18 = 36 hours. Total = 18 + 36 = 54.
But think again about the living room. Is it
really twice the total of the bedrooms and
kitchen, or could it be more?
Bedroom: 4. Kitchen: 4 + 4/2 = 6. Total before
living = 3 x 4 + 6 = 18. Living = 18 x 2 = 36
hours. Total = 18 + 36 = 54.
❌
✅
Supportive
Feedback
Adversarial
Feedback
✅
??????
❄
Learner
Partner
??????
Tom decides to renovate a house. There are 3
bedrooms and each bedroom takes 4 hours to
renovate. The kitchen takes 50% longer than
each bedroom. The living room took twice as
much time as everything else combined.
How long did everything take?
Total of bedrooms: 3 × 4 = 12 hours. Kitchen
takes 4 + 4 × 0.5 = 6. Living room takes double
of others: (12 + 6) × 2 = 36. Total = 12 + 6 + 36
= 54 hours
11
Warm-up Phase
●(Conventional) Fine-tuning
○Only the Learner model is updated here
●To prepare for the interaction phase
○Establishes a foundational understanding
of the task domain
Warm-up Phase
Tom decides to renovate a house. There are 3
bedrooms and each bedroom takes 4 hours to
renovate. The kitchen takes 50% longer than
each bedroom. The living room took twice as
much time as everything else combined.
How long did everything take?
Total of bedrooms: 3 × 4 = 12 hours. Kitchen
takes 4 + 4 × 0.5 = 6. Living room takes double
of others: (12 + 6) × 2 = 36. Total = 12 + 6 + 36
= 54 hours
??????
12
Interaction Phase
●Iterative interactions
○Only the Learner
model is updated
●Dynamic feedbacks
based on the
Learner’s answer
○❌ → Supportive
○✅ → Adversarial
●To enhances
discussion and
verbalization skills
Interaction Phase
Total for bedrooms: 3 × 4 = 12 hours. Kitchen:
same as a bedroom. Living room: double a
bedroom. Total = 12 + 4 + 8 = 24 hours
The kitchen actually takes twice the bedroom,
and living room is two times of everything else.
Kitchen time: 4 + 50% = 4 + 4/2 = 6 hours.
Bedrooms and kitchen = 18 hours. Living
room: 2 × 18 = 36 hours. Total = 18 + 36 = 54.
But think again about the living room. Is it
really twice the total of the bedrooms and
kitchen, or could it be more?
Bedroom: 4. Kitchen: 4 + 4/2 = 6. Total before
living = 3 x 4 + 6 = 18. Living = 18 x 2 = 36
hours. Total = 18 + 36 = 54.
❌
✅
Supportive
Feedback
Adversarial
Feedback
✅
??????
❄
Learner
Partner
13
?????? Better Reasoning After Interaction-based Training
Learner: Flan-T5 XL (3B) & Partner: GPT-3.5
14
* No interaction during inference
?????? Better Chain-of-Thought/Explanation
15
?????? Better Inference-Time Interaction
16
17
TL;DR LLMs can achieve better reasoning
●Interaction (with human/LLMs) during inference
●(Better w/) Interactive feedback in training ??????
●(Better w/) Both Supportive & Adversarial feedback in training ??????????????????
18