論文紹介:A Systematic Survey of Prompt Engineering on Vision-Language Foundation Models

ttamaki 176 views 34 slides Jul 03, 2024
Slide 1
Slide 1 of 34
Slide 1
1
Slide 2
2
Slide 3
3
Slide 4
4
Slide 5
5
Slide 6
6
Slide 7
7
Slide 8
8
Slide 9
9
Slide 10
10
Slide 11
11
Slide 12
12
Slide 13
13
Slide 14
14
Slide 15
15
Slide 16
16
Slide 17
17
Slide 18
18
Slide 19
19
Slide 20
20
Slide 21
21
Slide 22
22
Slide 23
23
Slide 24
24
Slide 25
25
Slide 26
26
Slide 27
27
Slide 28
28
Slide 29
29
Slide 30
30
Slide 31
31
Slide 32
32
Slide 33
33
Slide 34
34

About This Presentation

Jindong Gu, Zhen Han, Shuo Chen, Ahmad Beirami, Bailan He, Gengyuan Zhang, Ruotong Liao, Yao Qin, Volker Tresp, Philip Torr "A Systematic Survey of Prompt Engineering on Vision-Language Foundation Models" arXiv2023

https://arxiv.org/abs/2307.12980


Slide Content

A Systematic Survey of Prompt
Engineering on Vision-Language
Foundation Models
Jindong Gu, Zhen Han, ShuoChen, Ahmad Beirami, BailanHe, GengyuanZhang,
RuotongLiao, Yao Qin, Volker Tresp, Philip Torr
arXiv2023
2024/6/19


•Vision-Language Prompt Engineering
◼Prompt
•AI
◼Prompt Engineering
•Prompt
ChatGPT
LLM 5
AI

◼ / Prompt Engineering
◼ Prompt Engineering
◼ Prompt Engineering

◼ / Prompt Engineering

/
◼LLM (

◼Prompt Engineering

LLM models
◼BERT [Devlin+, arXiv2018]
◼T5 [Raffel+, arXiv2019]
◼GPTs [Brown+, arXiv2020]
◼ViLBERT[Lu+, arXiv2019]
◼OFA [Wang+, arXiv2022]
◼Flamingo [Alayrac+, NeurIPS2022]
◼SimVLM[Wang+, arXiv2021]
◼PaLI[Chen+, arXiv2022]
◼MAGMA [Eichenberg+, arXiv2021]
◼BLIP2 [Li+, arXiv2023]
◼VL-T5 [Cho+, arXiv2021]
◼Frozen [Tsimpoukelli+, arXiv2021]
◼PICa[Yang+, arXiv2021]
◼FlanT5 [Chung+, arXiv2022]

Prompt
◼Hard Prompt
•Task instruction prompting
•In-context learning
•Retrieval-based prompting
•Chain-of-thought prompting
◼Soft Prompt
•Prompt tuning
•Prefix Token Tuning

Hard/Soft Prompt
◼Hard Prompt
• Prompt
CLIP
•‘ A photo of a ’text
◼Soft Prompt

Vision Transformer
• class token

Hard Prompt
◼Task instruction prompting
•[Efrat&Levy, arXiv2020]
◼Retrieval-based prompting
•PICa[Yang+, arXiv2021]
•[Rubin+, arXiv2021]
•[Li+, arXiv2023]
•[Ye+, arXIv2023]
◼In-context learning
•[Dong+, arXiv2022]
•GPTs [Brown+, arXiv2020]
◼Chain-of-thought prompting
•[Wei+, arXiv2022]
•[Zhang+, arXiv2022]
•[Qiao+, arXiv2022]

Hard Prompt
◼Task Instruction Prompting
• Propt
Prompt: Translate English to French
Prompt: Translate French to English
◼Retrieval-based prompting

[Rubin+, arXiv2021][Efrat+, arXiv2020]

Hard Prompt
◼In-context learning

◼Chain-of-thought prompting
• (=Prompt)
[Dong+, arXiv2022]
PositiveNegative

Soft Prompt
◼Prompt Tuning
• Prompt

◼Prefix Token Tuning
• Prompt

[Lester+, arXiv2021]
[Li+, arXiv2021]

Application
◼Visual Question Answering
•Flamingo [Alayrac+, NeurIPS2022]
•[Yang+, AAAI2022]
•[Tsimpoukelli+, NeurIPS2021]
◼Zero-shot Image Classification
•Kosmos [Huang+, arXiv2023]
◼Image Captioning
•PaLI[Chen+, arXiv2022]
•MAGMA [Eichenberg+, arXiv2021]
•SimVLM[Wang+, arXiv2021]
•OFA [Wang+, arXiv2022]
◼Chatbot
•Visual CahtGPT[Wu+, arXiv2023]
•ChatGPT
•GPT4 [Achiam+, arXiv2023]
•BiomedGPT[Zhang+, arXiv2023]

◼ Prompt Engineering

◼CLIP [Radford+,arXiv2021]
•Image-text matching
•Image, textEncoderTransformer

Prompt
◼Text Prompting
•Textprompt text encoder
◼Visual Prompting
•imageprompt image encoder
◼Unified Prompting
•Image, Text prompt encoder

Text Prompting
◼Hard Prompt
•“a photo of a” [Radford+,arXiv2021]
◼Soft Prompt
•Global Soft Prompt [Gao+, arXiv2020], [Shu+,arXiv2022], [Zhou+,arXiv2021]
• Prompt
•Group-specific Prompt [Ju+, arXiv2021],[Shen+,arXiv2022]
• Prompt
•Instance-specific Prompt [Zhou+,CVPR2022]
• Prompt

Text Prompting: Soft Prompt
◼Global Soft Prompt
◼Group-specific Prompt
◼Instance-specific Prompt

Image Prompting
◼Patch-wise Prompts


•[Jia+, ECCV2022]
•[Bahng+, arXiv2022]
•[Shen+, arXiv2022]
•[Wu+, arXiv2022]
•[Huang+, CVPR2023]
◼Annotation Prompts


•CPT [Yao+, arXiv2021]
•[Shtedritski+, ICCV2023]
•[Bar+,arXiv2022]

Image Prompting
◼Patch-wise Prompts
•[Jia+, ECCV2022]
•[Bahng+, arXiv2022]
◼Annotation Prompts
•[Shtedritski+, ICCV2023]

Unified Prompting
◼Coupled Unified Prompting
•Textimage Prompt

•[Zang+, arXiv2022]
◼Decoupled Unified Prompting
•Textimage Prompt

•[Shen+, arXiv2022]
•CPT [Yao+, arXiv2021]
•MaPLe[Khattak+, CVPR2023]

Unified Prompting
◼Coupled Unified Prompting
•[Zhang+, arXiv2022]
◼Decoupled Unified Prompting
•[Khattak+, CVPR2023]

Application
◼Image Classification
•CLIP [Radford+,arXiv2021]
•TPT [Shu+, arXiv2022]
◼Object Detection
•[Gu+, arXiv2021]
•[Guo+, arXiv2022]
•DualCoOp[Sun+, arXiv2022]
•[Du+, CVPR2022]
•PromptDet[Feng+, ECCV2022]
◼Semantic Segmentation
•Denseclip[Rao+, CVPR2022]
•[Kirillov+, arXiv2023]
◼Domain Adaptation
•[Ge+, arXiv2022]
•[Gao+, arXiv2022]
◼Continual Learning
•[Wang+, CVPR2022]
•Dualprompt[Wang+, ECCV2022]

◼ Prompt Engineering


•DRAW [Gregor+, arXiv2015]
•GAN [Goodfellow+, arXiv2014]
•VAE [Kingma+, arXiv2019]
•DALL-E [Ramesh+, arXiv2021]
•Parti [Yu+, arXiv2022]
◼Prompt Engineering

Prompt
◼ Prompt
•[Witteveen+, arXiv2022]
◼AI Prompt
•DiffuMask[Wu+, ICCV2023]
•ImaginaryNet[Ni+, ICLR2022]
•[He+, ICLR2022]
•[Avrahami, arXiv2022]
•Gldie[Nichol, arXiv2021]
•OneWord[Gal+, arXiv2022]
•DreamBooth[Ruiz+, arXiv2022]
•Custom Diffusion [Kumari+, CVPR2023]
•[Feng+, arXiv2022]
•[Epstein+, arXiv2023]
•Imagic[Kawar+, CVPR2023]
•[Zhang+, ICCV2023]
•[Hertz+, arXiv2022]

Prompt
◼Investigating Prompt Engineering in Diffusion Models [Witteveen+, arXiv2022]
• (Prompt)




AI Prompt
Wu+, I CCV 2023
◼DiffuMask[Wu+, ICCV2023]
•Prompt Engineering
’photo of bird’
1.??????[sub class]Wiki
2.Clipretrieval ??????
Photo of bird
Photo of Duck bird Photo of Swan bird

??????
Photo of Duck bird
Photo of a big Duck bird Photo of a pretty Duck bird…
??????

AI Prompt
◼ImaginaryNet[Ni+, ICLR2022]

•Prompt Engineering
• 1 ”A photo of a [class]”

• Prompt

Application

•DiffuMask[Wu+, NeurIPS2023]
•ImaginaryNet[Ni+, ICLR2022]
•[He+, arXiv2022]

•Text to video
•Make-A-Video [Singer+, arXiv2022]
•Imagin video [Ho+, arXiv2022]
•FateZero[Qi+, arXiv2023]
•Tune-A-Video [Wu+, arXiv2023]
•[Ruan+, CVPR2023]
•MovieFactory[Zhu+, arXiv2023]
•Text to 3D
•DiffRF[Muller+, CVPR2023]
•DreamFusion[Poole+, arXiv2022]
•NeRF[Mildenhall+, ACM2021]
•Magic3D [Lin+, CVPR2023]
•Dream3D [Xu+, CVPR2023]
•[Lee+, arXiv2022]
•[Khalid+, arXiv2022]

◼Prompt Engineering







Prompting
◼ Prompt Engineering
•Text prompting
1.Hard Prompt
1.Task instruction prompting
2.In-context learning
3.Retrieval-based prompting
4.Chain-of-thought prompting
2.Soft Prompt
1.Prompt tuning
2.Prefix Token Tuning

Prompting
◼ Prompt Engineering
•Text Prompting
1.Hard Prompt
2.Soft Prompt
1.Global Soft Prompt
2.Group-specific Prompt
3.Instance-specific Prompt
•Visual Prompting
1.Patch-wise Prompts
2.Annotation Prompt
•Unified Prompting
1.Coupled Unified Prompting
2.Decoupled Unified Prompting

Prompting
◼ Prompt Engineering
• Prompt
•AI Prompt
Tags