DREAMS 2021 presentation- Evaluation metrics in human-in-the-loop for autonomous systems
PrajitTRajendran
17 views
19 slides
Apr 30, 2024
Slide 1 of 19
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
About This Presentation
Evaluation metrics in human-in-the-loop for autonomous systems
Size: 2.79 MB
Language: en
Added: Apr 30, 2024
Slides: 19 pages
Slide Content
Review of evaluation metrics used in literature + proposed idea DREAMS, EDCC 2021 13 September 2021 Evaluation of Human-in-the-Loop Learning based Autonomous Systems Prajit Thazhurazhikath Rajendran , Huascar Espinoza, Chokri Mraidha (CEA, DILS-LSEA), Agnes Delaborde (LNE)
DREAMS 2021 | Prajit T Rajendran Safety challenges of DL/AI components The use of DL/AI components in autonomous systems comes with various challenges: Vulnerable to out of distribution data Adversarial inputs Anomalies Lack of transparency Stochastic nature Unknown unknowns Uncertainty Safety is an emergent property- it is not as a property of any particular component individually Regulation/qualification/certification of such DL/AI components is an ongoing work by the community Traditional approaches do not facilitate safe learning Humans can guide the system to safe behavior with their knowledge, experience and adaptability normal anomaly Out-of-Distribution Samples
DREAMS 2021 | Prajit T Rajendran Categories of human-in-the-loop learning methods Active learning Semi-supervised ML where only a subset of the training data is labelled Human queried interactively to label data points of interest from the unlabelled set PROS: Reduces data labelling requirement CONS: Selecting the right points to query is important
DREAMS 2021 | Prajit T Rajendran Categories of human-in-the-loop learning methods Demonstration Human is in full control and provides demonstrations to train the agent Agent can mimic human data to use as a safe starting point PROS: Leads to safer policies CONS: More human effort needed, may be subjective, train-test distribution shift
DREAMS 2021 | Prajit T Rajendran Categories of human-in-the-loop learning methods Intervention Human, agent share control and human intervenes when necessary Human takes over control to avoid catastrophic states and agent learns from these PROS: Leads to safer policies CONS: Need to keep human in the loop for long, slow response time
DREAMS 2021 | Prajit T Rajendran Categories of human-in-the-loop learning methods Evaluation Agent in full control and human provides feedback for tasks Human gives feedback based on known objective or preference, which the agent learns PROS: Leads to safer policies CONS: Need to keep human in the loop for long, credit attribution problem, subjective feedback
DREAMS 2021 | Prajit T Rajendran Common metrics in HITL learning methods Rate of task completion Safety Performance Data requirement User trust Time User satisfaction Rate of catastrophies Response time Training time Subjective measures Likert scale Binary feedback Type of interactions Number of queries Average reward Deviation from thresholds Number of interventions
DREAMS 2021 | Prajit T Rajendran Common metrics in HITL learning methods Safety Learning from intervention used Human intervenes to avoid undesirable events or catastrophies Policy constrained to safer regions Evaluated based on number of occurences of catastrophies Trial without error- Towards safe RL via human intervention, William Saunders et.al Trial without error- Towards safe RL via human intervention, William Saunders et.al
DREAMS 2021 | Prajit T Rajendran Common metrics in HITL learning methods Performance One shot imitation learning, Yan Duan et.al One shot imitation learning, Yan Duan et.al Learning from demonstration + meta-learning used Train networks that are not specific to one task and can adapt to new tasks Evaluated based on average rate of success/task completion
DREAMS 2021 | Prajit T Rajendran Common metrics in HITL learning methods Time A Reduction of Imitation Learning and Structured Prediction to No-Regret Online Learning DAgger approaches: Learning from demonstration + intervention Start with imitation of expert policy, collect data Train the next policy under the aggregate of all collected datasets Hand over control to expert if necessary based on rulesets Evaluated based on number of training iterations needed to reach a significant level of performance Dropout Dagger- A Bayesian approach to safe imitation learning- Kunal Menda et.al
DREAMS 2021 | Prajit T Rajendran Common metrics in HITL learning methods Data requirement Overcoming blindspots in the real world: Leveraging complementary abilities for joint execution, Ramya Ramakrishnan et.al Overcoming blindspots in the real world: Leveraging complementary abilities for joint execution, Ramya Ramakrishnan et.al Learning from demonstration + intervention used Agent and human both are considered to have blindspots Choose actor (human vs agent) based on blindspot activation level Evaluated based on number of human queries needed vs average reward
DREAMS 2021 | Prajit T Rajendran Common metrics in HITL learning methods User satisfaction Trust ~ Extent that the human agrees with the AI Questionnaire about use of system, biological data, number of interventions, “humanness” etc Users could be quick to distrust AI with easily identifiable incorrect result Interpretability improves trust User trust Satisfaction w.r.t interaction, performance, design C ould be subjective Questionnaires, evaluative feedbacks Necessary for successful adoption and widespread use in society
DREAMS 2021 | Prajit T Rajendran Limitations of prior approaches Assumptions made about humans (even experts) being always correct Interactions between human and AI may not always be flawless Uncertainty of DL components not considered Presence of errors in data No existing measure for data quality Data quality may be defined in terms of completeness, accuracy and efficiency Cognitive overload Slow response Incorrect response Lack of attention Errors in perception Errors in planning Errors in execution
DREAMS 2021 | Prajit T Rajendran Proposed approach Hypothesis: Bad demonstration samples affect safety; Full self-exploration by system is also infeasible Premise : Infeasible to start training afresh due to large training time, unsafe exploration Data store Unsupervised anomaly detector Feature extractor Environment dynamics Anomaly predictor Policy learning module Correct samples Erroneous samples Non-exploratory training phase Candidate samples Human-in-the-loop Environment Exploratory training phase Historical data, demonstrations etc.
DREAMS 2021 | Prajit T Rajendran Proposed approach Non-exploratory training phase: Data from the data store is used to train the anomaly predictor and policy learning modules Can use human-in-the-loop to classify outliers as correct or erroneous Correct samples can directly be used for policy training Erroneous samples can be used to predict future anomalies/faults by combining with model of environment dynamics Data store Unsupervised anomaly detector Feature extractor Environment dynamics Anomaly predictor Policy learning module Correct samples Erroneous samples Non-exploratory training phase Candidate samples Human-in-the-loop Environment Exploratory training phase Historical data, demonstrations etc.
DREAMS 2021 | Prajit T Rajendran Proposed approach Exploratory training phase: System interacts with the environment but chooses actions based on predicted anomaly score Facilitates safe exploration by taking previous human feedback into consideration Data store Unsupervised anomaly detector Feature extractor Environment dynamics Anomaly predictor Policy learning module Correct samples Erroneous samples Non-exploratory training phase Candidate samples Human-in-the-loop Environment Exploratory training phase Historical data, demonstrations etc.
DREAMS 2021 | Prajit T Rajendran Future work Evaluation of suitable datasets used in autonomous systems policy/control development Development of experimental procedure for design and test of proposed model Implementation of human-in-the-loop sample classifier, and anomaly predictor Evaluation of system on pre-decided metrics on target domain
DREAMS 2021 | Prajit T Rajendran Conclusions Identified necessity of human-in-the-loop learning, discussed its categories Explored the various evaluation metrics of human-in-the-loop approaches presented in literature Defined the requirements for ”quality data“ with characteristics such as accuracy, completeness or efficiency Proposed a method to measure and improve data quality in human-in-the-loop approaches