論文紹介:"RAt: Injecting Implicit Bias for Text-To-Image Prompt Refinement Models", "Measuring What Matters: Evaluating Ensemble LLMs with Label Refinement in Inductive Coding", "Dynamic Label Name Refinement for Few-Shot Dialogue Intent Classification"

ttamaki 0 views 34 slides Oct 07, 2025
Slide 1
Slide 1 of 34
Slide 1
1
Slide 2
2
Slide 3
3
Slide 4
4
Slide 5
5
Slide 6
6
Slide 7
7
Slide 8
8
Slide 9
9
Slide 10
10
Slide 11
11
Slide 12
12
Slide 13
13
Slide 14
14
Slide 15
15
Slide 16
16
Slide 17
17
Slide 18
18
Slide 19
19
Slide 20
20
Slide 21
21
Slide 22
22
Slide 23
23
Slide 24
24
Slide 25
25
Slide 26
26
Slide 27
27
Slide 28
28
Slide 29
29
Slide 30
30
Slide 31
31
Slide 32
32
Slide 33
33
Slide 34
34

About This Presentation

Ziyi Kou, Shichao Pei, Meng Jiang, Xiangliang Zhang, "RAt: Injecting Implicit Bias for Text-To-Image Prompt Refinement Models", EMNLP2024
https://aclanthology.org/2024.emnlp-main.1144/

Angelina Parfenova, Jürgen Pfeffer, "Measuring What Matters: Evaluating Ensemble LLMs with Label R...


Slide Content

RAt: Injecting Implicit Bias
for Text-To-Image Prompt Refinement Models
[Kou+, EMNLP2024]
Dynamic Label Name Refinement
for Few-Shot Dialogue Intent Classification
[Park+, ACL 2025]
Measuring What Matters: Evaluating EnsembleLLMs
with Label Refinement in Inductive Coding
[Parfenova+, Finding of ACL2025]
i?M???G??Z?
2025/9/25

3mw?wos?A
nRAt[Kou+, ACL Anthlogy2024]
•T2I-Refine???w??t??????8?U|
•RAt(a prompt refinement and attacking framework)
•ô¼íhþ›-jsU’Ì žµ ‹›îq
nDynamic Label Name Refinement [Park+, ACL Anthlogy2025]
•™¯ü¨w¨Å«åµðJt0`oˆ$åÕç^Û=›Š
•LLMpåÕç› .Y`|^Sqr Q›~³
nMeasuring What Matters [Parfenova+, ACL Anthlogy2025]
•ó:LLMwžï±ïÒçp°QwôM¯”Å\R›îq
•?Q~º¨ÅQ›Êˆœi ý°A¦ª›Š

RAt: Injecting Implicit Bias
for Text-To-Image Prompt
Refinement Models
[Kou+, EMNLP2024]

?A
n
•T2I(Text-to-image)???x?????tlh?r?Sh??\RD?
•???h?\Rtx ?ITm?.$s?????U?A
•°`$tT2I-Refine???[Pavlichenko&Ustalov, SIGIR2023]p׈¦Á
nRAt???????
è ×wæÑ ï┲ÓéïÓÄtåC†Ì ›ÇC
skKy?????\R»”®¿ÄÌ žµ›ÌÔ$tSé
Sî$ÐT‰ç$tÌ žµ›§=
nU [$??
•KyuT2Iѝ ï½á”Çï¬pô¼íhþ›\R
•Ä”«ïèÕçwÌ žµÅp┲tC_^•tXX

ORAt
nRat?dwv?t
•3mw?????t?Z???

ORAt
nGenerator?????
•®˜_q˜T“bMÌ žµ¯›Ö•hhþ›\R`È8w¦›^R
• q
1.???w?????!!"#?~?↑:?????!#$%?^R
2.fw¤w›w“æ›»”®¿ÄwÌ žµ“æt”VõQ¢!$&'?
3.f•›hþ\RÞÃçtÖ•|Ì žµ›§XSé`hhþ¢"$&'??
??
•\•x® ÍvwÌ žµ®L¯›Ëm€ßhþ
¢hi`ÐbWobYÌ蔣

ORAt
nAttacker?????
•Generatorp^R`hÓéïÓÄxo ›Ú€!õ`oM”wpbYÌè”
•ÌèsM‘Otiwo ›í`mmwæü›~!`Ì žµ›Ö•”
• q
1.iwÓéïÓÄ›Ëmwæütü¢™$›-m|Ì žµ›“‰£
2.DiffusionÕ”µw©µÄ\RÞÃ盍ÐT
3.¯t,nVrwo ›e”T7&=
※??o????wT?Q?omh?
•×µt_QmmÌ žµ›Üydh
r??????!()*? ?

ORAt
nObfuscator??&??????
•!()*tx|Ì žµtÚAb”o U’”\qUK“|x“x`M
•£Ý
•Ì žµxžc`‹°mwo t‘`sM
•ó:wÈb”o ›Êˆù˜do¶.w¯qT’Ì žµ›\ˆ Zb
•MO
•????]qtIDF???qCLIP????o?

?g?A
n?gw?$
•RAtxôhí›-lh‡‡ÓéïÓÄtÌ žµ›«ÖpV”T
•RAtw"0ÓéïÓÄxÌ žµU“‡•oM”q>nT•tXMT
•Ë Í”ÍåÝ”»w§MxRatwQ?trO??b?T
n?g?
•??????SFT [Hao+, NeurIPS2023]
•120?? ?w??????DiffusionDB|COCOsr???
•???????person, food, phone, room (???T?50??????? Z)
•(?)personwQgender(male/female), age(young/old)
•??Stable Diffusion1.4?b;|??????]qt10h?\R
• A?????#=15|'"$+=0.1|',-.=0.1
•GPUQuadro RTX 6000 ×4

?g?0
n°Aòè
?Ì žµÈ8Qó
?????QQ?
?Ë Í”ÍåÝ”»sg
nz? O
•Origin????????fw??
•Promptist[Hao+, PMLR2021]§=¶ 6Õ”µpÓéïÓÄ›æÑ ï
•RAt-ExpGeneratorp^R`hÐsÌ žµÖ“ÓéïÓÄ
n°A¦ª
•BiasCLIPp®»”®¿ÄÌ žµ¯qwTùQ›¢ôM„r “U§£
•QualityAesthetic predictort??h????

?gAL?Ì žµÈ8Qó
n???w_M
•#àÓéïÓÄ]qw»”®¿ÄÌ žµNw¬p
• Nàfw¬p›ËmÓéïÓÄwü͵S
•f•g•»”®¿Ä“æ]qwQ›¯b
nAL
•RAtxÌ žµ§S~µSw†Mp O›GVX Ís”
person (Male/Female)
food (Meat/Vegetable)
person (Adult/Child)
phone (iPhone/Android)room (Lounge/Bedroom)
person (Western/Eastern)

?gAL?????QQ?
n°A¦ª
•IDSA (Image-Driven Semantic Alignment)
•\RhþUiÓéïÓÄw A“æ›&~t¯qpVoM”T
•MBA (Maximum Bias Association)
•ÓéïÓĤw¤Ä”«ïq»”®¿ÄÌ žµwÈS¢CLIP????
nAL
•IDSARAt? O
•????$qw??T?QU
¡ËpVoM”
•MBARAt≈Origin≫Rat-Exp
•ÌÔ$Ì žµo x —sX
>nT?tXM

nQ$AL
•Promptist§=¶ 6pæÑ ï
•RAtÌ žµÖ“ÓéïÓÄ(sxmale)
•?R?h?xfw??
•™RtÌ žµ›àÖ
•bucka???↑.????s?F????
•sweating lelü›TMoM”7 ›¯bɿĵåï¬
?gAL?????QQ?
PromptistRAt

?gAL?Ë Í”ÍåÝ”»sg
nÐ*ÍåÝ”»
• s8???? O??-?
•™¯-Ëæwzp¢ℒ"$+?
•\Rh?:?/$&'?
•Attacker?????w? 6p
nAL
•-↑Bias↓|Quality↑
•ℒ"$+↑Bias↑|Quality↓
•/$&'↑????s`
•¶ 6pBias↑|Quality↓
5PLFO???w7&=tv?K?
??x?????????? 6?U~?wM?Q

?q?
nRAt??
n?$
•T2I-RefineÞÃçt0b”®"0$ÓéïÓÄÈ8ðJ¯›{O
n O
•?t,nX?????~n?Attacker?
•Ä”«ïèÕçwÌ žµÅ¢Obfuscator?
nRL
•U>nVtXMp»”®¿ÄÌ žµ›Ëmhþ›\R
• HRwT2I-Refine???qz??L$

Dynamic Label Name
Refinement for Few-Shot
Dialogue Intent Classification
[Park+, ACL2025]

?A
n
•t?tSb????+?w?/???7????
•?$???:UX|???$w??$ O?UM\qU]J
•Few-shot¶ 6U«è^•oM”U|¨Å™$wàUÉ
n? O
•?$???^?=?Dynamic Label Refinement????
•™$åÕ监“̬Tmà`bMt!õ|™¯$s”^›0n
n?Y
•ó:wÔ»·¿Ä~ÞÃçp^S² Í›aR
•›t™¯$t¨Å`h™$UMÔ»·¿Äp®LUf¶

? O
nLLM?;MhICL (retrieval-based in-context learning)??;
•ICL ?kw????q↑o-F?Qqp ?↑.?????/tT,?
n????wh?t?<w3µÂ¿Ópˆ$åÕç^Û=›æO
????w ?
•?????tt↑? o???H???/t??↑:??B%
• ˜«xiw™$]qt¬ç”Ðï¬
??$???^?=
•LLMt????$????w????ºt ao .Y
•i????-?b?T ?????\Rb?TQ?
?????
•^?=^?h???q??;MoLLMp7 4ü¨
•‰°ÞÃçpåÕç^Û=qü¨›æMTùQ¡Ë

?g?
n?;??????
•DialoGLUE[Mehri+, arXiv2020]
•BANKING77, HWU64,
CLINIC150
•??/tt↑10shot
n????R
•3 ?wLLM??;
•LLAMA3-8B-inst. [Grattafiori+, arXiv2024]
•Qwen2.5-7B-inst. [Baosong+, arXiv2024]
•Qwen2.5-1.5B-inst.
nz? O
•In-Context Learning%
•Baseline
•CoT?Chain-of-Thought?
•Ours??$???^?=+ICL?
•Fine-tuning%
•CPFT, ICDA, QAID (DialoGLUE%)
•DF, SBERT-M, SBERT-P
(HINT3%)

?gAL
nåÕç^Û=t‘”™¯wà
•Baseline?CoTx??w Os??$w??t M
•??check_pincode?q?modify-address?w???
•Š OxåÕçw™¯$ Os“›0n`¶Ã”»·¿Äp^SU² Í
n???????]qw?
•¶ÞÃçp^S² Í›¬Ý
•?t ?????Qwen2.5-1.5B?p‹ÅÝ ï›=Ô»pf¶sAL
•???wGV^t??c?

ALw?s
n™¯$¨ÅSw”°A
•™¯åÕ牜w¯± ï¨ÅS›z±
•?Llama3-8bpx|¨ÅSU0.86→0.74t?<‥̬s™¯à
•??Sw?<q^Sw? ?U?
n???????dwU ?
•åÕç^Û=qü¨›ÞÃçpîª
•7&GÞÃçpåÕç^Û=´ –ÞÃçpü¨
•?Qwen2.5-7B→Qwen2.5-1.5Bwʈù˜dp7G+4.10%? ?
•owʈù˜dp‹‡^Û=‘“ôQó

?q?
n? O
•Few-shott??/??tSb? ? D
•?$???^?=(Dynamic Label Refinement)
•LLM›;Mo ˜Aˆw¨Å«T’åÕ盺t ao6[
•™$åÕçw™¯$ Oó›0n`‘“̬Tmr Dósü¨UDót
n?gALwA:
•ó:wÔ»·¿Ä~ÞÃçtSMo°`hQó² Í›î Â
•åÕçw™¯$¨ÅS›_n`ü¨Qó~r Q›² Í
nv?q]J
•-????w?C???^?=q??w2??*?U?A
•:J↑|F?? ?qw??????x9p

Measuring What Matters:
Evaluating EnsembleLLMs
with Label Refinement in Inductive
Coding
[Parfenova+, Finding of ACL2025]

?A
n q]J
• ?w ‹$¯”ßï¬xÌ žµ°QwðJ
•o LLMt????=x ?TQU?
n?$
•©µÄÔ»w ‹$¯”ßï¬tSMo¯”Åw°Q~TùQw² Í
•?: ?LLMwžï±ïÒç´SîæÑ ïÝïÄ›Æ;
n?Y
•ó:LLM›Æ;`h ý`M¯”Å\RÍ Óå ï›Š
•™¯$~Ï$í›Q”óù°A¦ª›Š
• ?LLM??????Uo.G???? ?s?\q?? ?


nQ$????s(QDA; Qualitative Data Analysis)qx
•©µÄÔ»T’™¯wK”Í»”ï››~ü¨~r b” O
•??????????w OAs??tyM??????????Z?
•???Kk? ??/t?q↑:tM?y.????
•C2¢Â”Úüs£¯”Å›^’t§Â°æ=`”ڍŠÚ$›¨ Z
nNLPqLLMt????????w??=
•??LLMU????????=t?;
•]J
?7w°A¦ªx¯”Åwí›&~t°ApVsM
?LLMwo. Z?x? 6????????t??y?mX


n??????? 6w?;
•?:????????doQ?? ??$? OU??^?oM?
•xÞÃçwÕ t›ÆT`=:›4O®LK“
•E?$ O
•???w O??Zt??w?
• Z?w B?t??w?
•Mix-of-Experts(MoE) [Cai+, TKDE2025], LLM-Blender [Jiang+, ACL2023]
nŠZ€w›ÃÞÃè”»t‘”™¥>Ñè”Ü딫
•7 4>›<bÞÃè”»¢˜q £ÞÃ盃
• HRw¬p$%ùqŸs”
•?:?4w?T?7&s?w??Ror ?ht?
•wù^$™¥>Óé·µ›Û?`°`h7 4¯”Å›\R
MoE
LLM-Blender

??????
n? 6???
•þqJ¶Z€Ã”»600?qSemEval-2014 [Pontiki+, ACL Anthlogy2014]w
???4???400??w?
•??;x3?5w¯”¼U qtåÕçÇZ
•ù™Rp°”çŪ j
n??????
•100???;
•KagglewChatGPT????100??
?+w????????q↑o?g
•Z=Qó°AwhŠ
•.~? ?~?^U7

O?.???
n??
•???~?;srw??"
nPhase 1: LoRAFinetuning
•?:w ?LLM??;
•¶ 6Ô»pÐT
• Z—°ABERTScore/ ROUGE
• ??3????Phase 2?
nPhase 2: Moderation & Refinement
• ??3???w Z??Moderatort??
•?????UAL?w?~ .Y
•¯”Åw°Q›-mhН”ÅÚ”´
n Z?
•^?=^?h???
?7 4$s???~???w B?

ORefinement & Code Merging
n??
• ??3???w Z????0/,00,01
• ?↑.??????"
n rg
•Moderation
• ͐ÞÃçw Z—›ÞÃè”»tI`
dY[???2/,20,…,22?\R
•a?w??? Z??? ?
•Code Merging
• ?↑.??"w’Šˆ›sBERTp-‰
•7w¯”Å’Šˆq¯± ï¨ÅS›‰ Z
•??S≥??→7????6b;
•??S<??→ ?↑.?????-
n Z?
• ý`MÖ—t0`o°QwK”
7 4???
•Õï½Ú”«Ìwˆóù°A¦ªpQó°A

ÞÃç°A¦ª
nxÞÃçw°A¦ª
•ROUGE: n-gramt??????S(1-gram, 2-gram, 7??????)
•BERTScore: BERT’Šˆt‘”™¯$¨ÅS
nžï±ïÒçÞÃçwïù°A¦ª(CompositeScore)
•4mw°A›wù
•¯± ï¨ÅS(CosSim)↑
•METEOR↑
•?????????(LengthPenalty)↓
•Jensen-Shannon divergence (JS)↓
n¦ªw%pQU Â
• °AqwSpearman?0.73??|p=0.039?
•ROUGE?BERTScoreo px °A› GüSépVc
•CompositeScoreUí$¯”ßï¬wïù°Aq`o®pK”\q›¬Ý
CompositeScore
=CosSim+METEOR−LengthPenalty−JS
4

?gAL
n???????? ?
•?????????????t??G?~?
•Mixtral8x7BComposite ScoreU0.33→0.99
•Llama3.3 70B0.38→0.74
•o.ÞÃç‘“žï±ïÒçÞÃçwMU°Q~¼íU² Í
•???ww?(post-processing)U ??Q?nt?L$
nComposite Scoreüs
•??~??~?wT?Q?
??$tS?
•6?)↑:??????x
o.???~?^?=??????
? ?s?
•ROUGE-1‹„…°•

?gAL
nLLMwT?Q
•¯± ï¨ÅS|JS¼ Ì”´£ïµ›;Mo°A
•?????????xx??????M??S~?M??
•GPT-4 Ensemble? ?$s????
•Llama3.3 70B, Mixtral8x7B Ensemble???.$~^?
¯± ï¨ÅSJS¼ Ì”´£ïµ

?gAL
nQ$°A
•????w??6.83→4.00????t_n(41.5%~?)
•???w????:? t?nX
•Llama3.3 70B Ensemble+ref: 53
•Mixtral8x7B Ensemble+ref: 71
• ÑÕQÿnq”Úw°QU² Í

?q?
n??????LLMq Z?^?=t????<?$???????U|
n s?_
1.žï±ïÒç Ot‘“¯”ßï¬w°Q² Í|Îá”Úï, jqw
T?Q~?
2.¯”ÅÚ”´t‘”™ rgpåÕç“pow†Q~TùQ² Í
3. –žï±ïÒçÞÃçUo.G0”ÞÃç› Ís”®p$~¦ÁDós
O??
n??????q^?=w????dU??=^?hQ?st
SMo ??????t?MAL??q
Tags