[DSC DACH 25] Brinnae Bent - Hacking the Blackbox.pptx
DataScienceConferenc1
6 views
84 slides
Oct 24, 2025
Slide 1 of 84
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
About This Presentation
From computer vision to large language models, blackbox AI systems share a critical vulnerability: they're surprisingly easy to hack. So easy that students can successfully do it in the first week of class. Why? Because these models lack explainability and human-understandable reasoning processe...
From computer vision to large language models, blackbox AI systems share a critical vulnerability: they're surprisingly easy to hack. So easy that students can successfully do it in the first week of class. Why? Because these models lack explainability and human-understandable reasoning processes. But what if we could turn this problem into an opportunity? In this talk, Dr. Bent introduces "Adversarial Alignment" – a novel approach that leverages our ability to fool AI systems to actually understand them better and improve their alignment with human goals and values. Dr. Bent will demonstrate real-world examples of adversarial attacks against modern AI systems, present cutting-edge research in using these vulnerabilities to enhance model understanding, and share practical educational initiatives that teach responsible AI development through the lens of adversarial techniques. Attendees will gain actionable insights for implementing adversarial alignment practices in their organizations, strengthening their approach to responsible AI development, and building more robust and trustworthy AI systems.