Responsible & Safe AI at GSFC Univ Vadodra

precogatIIITD 295 views 54 slides Aug 01, 2024
Slide 1
Slide 1 of 54
Slide 1
1
Slide 2
2
Slide 3
3
Slide 4
4
Slide 5
5
Slide 6
6
Slide 7
7
Slide 8
8
Slide 9
9
Slide 10
10
Slide 11
11
Slide 12
12
Slide 13
13
Slide 14
14
Slide 15
15
Slide 16
16
Slide 17
17
Slide 18
18
Slide 19
19
Slide 20
20
Slide 21
21
Slide 22
22
Slide 23
23
Slide 24
24
Slide 25
25
Slide 26
26
Slide 27
27
Slide 28
28
Slide 29
29
Slide 30
30
Slide 31
31
Slide 32
32
Slide 33
33
Slide 34
34
Slide 35
35
Slide 36
36
Slide 37
37
Slide 38
38
Slide 39
39
Slide 40
40
Slide 41
41
Slide 42
42
Slide 43
43
Slide 44
44
Slide 45
45
Slide 46
46
Slide 47
47
Slide 48
48
Slide 49
49
Slide 50
50
Slide 51
51
Slide 52
52
Slide 53
53
Slide 54
54

About This Presentation

Bias, Machine Unlearning, Interpretability, Guardrails, Jailbreaking, and many more


Slide Content

Responsible & Safe AI 1 Aug 2024 Inaugural Ceremony GSFCU ACM Student chapter Vadodra , Gujarat Ponnurangam Kumaraguru (“PK”) # ProfGiri CS IIIT Hyderabad Vice President, ACM India TEDx Speaker https:// precog.iiit.ac.in / /in/ponguru @ponguru

2

    3

Know the Audience Students / Faculty / Others Any of you have attended my talk(s) before?  4

What is AI? 5

What is Responsible & Safe? 6

What is Responsible & Safe AI? 7

8

9

10

11 Observations?

12

13

14

15

16

17

18

19

20 https://translate.google.co.in/

21 https://translate.google.co.in/

22 https://translate.google.co.in/

23 https://translate.google.co.in/

Activity Please do any prompting in any of these or other platforms, get them to give you biased response, do not do gender bias HINT: There are very nice prompts that students have come up with in the past  24

25

26

27

28 Guardrails

29

30 Jailbreak

What is an alignment problem? 31

What is an alignment problem? 32 https://youtu.be/yWDUzNiWPJA?si=wSDO4i_EMrHzHYDP  

High- level instantiation: ‘RLHF’ pipeline First step: instruction tuning! Second + third steps: maximize reward 33 https://arxiv.org/pdf/2203.02155

Rouge AIs We risk losing control over AIs as they become more capable. Proxy gaming: YouTube / Insta – User engagement – Mental health 34

What is going on?  https://www.youtube.com/watch?v=lnyuIHSaso8&t=75s 35

Questions? 36

Forms of unlearning  Exact unlearning  Approximate unlearning Unlearning via differential privacy  Empirical unlearning, where data to be unlearned are precisely known       (training examples) Empirical unlearning, where data to be unlearned are underspecified       (think “knowledge”) 37

Graph Unlearning What is it? 38

Graph Unlearning 39 Node feature unlearning Node unlearning Edge unlearning

40 Interpretability

New techniques and paradigms for turning model weights and activations into concepts that humans can understand https://en.wikipedia.org/wiki/Neural_network https://en.wikipedia.org/wiki/Artificial_neural_network 41 Interpretability

Interpretability: Mechanistic Reverse-engineer neural networks Explaining neurons and connected circuits 42

43

44

45

46

47

Bias in LLMs Current systems like ChatGPT employ guardrails, and do not respond to biased content Users on the Web leave out key contexts, which make LLMs think the content is biased This negatively affects user engagement LLMs must be able to explore and ask more questions Our work aims to make LLMs bias-aware – context resolves confusion! 48

Other Directions Probing   Robustness Jailbreaking …. 49 https://precog.iiit.ac.in/pages/publications.html

50 https://precog.iiit.ac.in/teaching/responsible-ai-nptel-f24/index.html

51 Search for: Ponnurangam Kumaraguru https://www.linkedin.com/in/ponguru/ https://twitter.com/ponguru

Interested in working with us? 52 Full time Research Associates  PhD Students  Interns 

53 https://precog.iiit.ac.in/

Acknowledgements Precog members Collaborators 54