Multi-Objective Deep Reinforcement Learning with �Priority-based Socially Aware Mobile Robot Navigation Frameworks
MichaelDang49
61 views
26 slides
Jul 17, 2024
Slide 1 of 26
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
About This Presentation
Main contributions:
Introducing a multi-objective framework designed to enhance existing single-objective navigation models, through 3 following contributions:
(1) The development of a multi-objective robot navigation framework.
(2) A reward prediction model.
(3) Conducting experiments that sho...
Main contributions:
Introducing a multi-objective framework designed to enhance existing single-objective navigation models, through 3 following contributions:
(1) The development of a multi-objective robot navigation framework.
(2) A reward prediction model.
(3) Conducting experiments that showcase the effectiveness of our framework within a crowded simulation environment.
Introducing a multi-objective framework designed to enhance existing single-objective navigation models, through 3 following contributions:
(1) The development of a multi-objective robot navigation framework.
(2) A reward prediction model.
(3) Conducting experiments that showcase the effectiveness of our framework within a crowded simulation environment.
Size: 14.16 MB
Language: en
Added: Jul 17, 2024
Slides: 26 pages
Slide Content
Multi-Objective Deep Reinforcement Learning with Priority-based Socially Aware Mobile Robot Navigation Frameworks Hanoi, Nov-2023 Institute of Information Technology Caugiay, Hanoi, Vietnam, Le Quy Don Technical University Caugiay, Hanoi, Vietnam,
Introduction and Related work Conclusion and Future work Outline 01 04 02 03 Methodology Experiment
Introduction and Related work 1
Socially aware robot navigation problem Social environment : Dynamic, dense of moving obstacles (humans or other objects) Non communicating situation Socially aware robot navigation is how to control the robot to reach the goal : Without collide on obstacles With time-efficently With social compliant
Deep Reinforcement Learning (DRL) approaches on Socially aware robot navigation CADRL - Collision Avoidance in Pedestrian-Rich Environments With Deep Reinforcement Learning SARL - Crowd-aware robot navigation with attention-based deep reinforcement learning
Socially aware robot navigation is a multi-objective decision-making problem The robot must not only reach its destination but also adhere to social rules. Each of these social rules can be considered an objective within the training process. Their importance may vary depending on the context. Some recent research attempts have endeavored to extend robot navigation into a multiobjective problem. However: They focused on relatively simple navigation spaces. (grid world, no pedestrians,…)
Main contributions of our work Introducing a multi-objective framework designed to enhance existing single-objective navigation models, through 3 following contributions: (1) The development of a multi-objective robot navigation framework. (2) A reward prediction model. (3) Conducting experiments that showcase the effectiveness of our framework within a crowded simulation environment.
Methodology 2
Typical Multi-objective Reinforcement learning Similar to Single-objective RL framework, which rely on Markov Decision Process (MDP): (1) The only difference: The environment issues a vector reward instead of a scalar reward The utility function which maps multi-objective reward to a scalar value in alignment with user-defined preferences .
User Preferences Modeling We proposed Object Order : , which defines the priority of each objective. The first objective in o ( ) has the highest priority and vice versa. The is preferable to following order o : if (2)
Proposed Reward Predictor with Objective Order Given Objective Order o , and utility function , the preferences ≻ o is define in terms of state-reward and trajectories-reward: For every vector state-reward : (3) For every trajectories-reward : (4) As utility function is unknown in most case, we propose a Reward Predictor ( denote as f ) to approximate u. The Reward Predictor predict scalar rewards from the state space S instead of reward-vector R .
Proposed Reward Predictor with Objective Order (cont) Given Objective Order o , and Reward Predictor f , the preferences ≻ o is define in terms of state-reward and trajectories-reward: For every vector state-reward : (5) For every trajectories-reward : (6) In fitting f , we deploy state loss and trajectory loss to ensure these contrains. Both losses ultilize the Cross-Entropy loss .
Proposed Reward Predictor The Embedding Module transforms the state of agents into high-dimensional vectors, facilitating the extraction of dynamic features. The Attention Module considers human interactions, is tasked with generating a context vector that is associated with each individual’s observation. The Prediction Module forecasts the subsequent scalar reward based on the observed state, in conjunction with the provided context vector.
Proposed Reward Predictor with Objective Order (cont)
Our proposed Multi-Objective Robot navigation framework With Reward Predictor: , which predicts scalar rewards from observed states satisfying predefined Objective O rder. We can convert a multi-objective RL framework to a single-objective one. ]
Proposed Framework
Experiment 3
Experiment Setup Simulation environment (adopt from SARL): Invisible robot Holonomic Number of humans: 5, 10, 15, 20 Baseline : SARL [1] SARL within our framework : SARL_f Predefined Objective Order: Reward predictor: f RL Framework: SARL Training episodes: 20.000 [1] C. Chen, Y. Liu, S. Kreiss, and A. Alahi, “Crowd-robot interaction: Crowd-aware robot navigation with attention-based deep reinforcement learning,” in 2019 International Conference on Robotics and Automation (ICRA) . IEEE, 2019, pp. 6015–6022.
Quantitative Evaluation on 100 testing episodes SARL_f shows a significant improvement in minimizing the discomfort experienced by humans SARL_f exhibits better generalization than SARL when facing unforeseen situations
Reward predictor We evalute 4 states, Over the course of the training process, we evaluate predicted rewards of 4 permanent randomly selected states, each representing one of 4 types (Success, Discomfort, Collision, and Other) Reward predictor has effectively assigned distinct rewards to each type of state.
Qualitative Evaluation SARL_f intentionally chose a longer path to ensure the safety of humans.
Qualitative Evaluation (cont) SARL_f exhibited a tendency to potentially halt its motion and wait for humans to move before resuming its path, therefore, reducing human discomfort.
Qualitative Evaluation (cont) SARL_f successfully navigate to the goal in 20-human setup while SARL doesn’t.
Conclusion and Future work 4
Conclusion and Future work Conclusion: Our framework leverages a reward prediction model to convert reward vectors into scalar rewards that align with user preferences . Eliminating the need for hand-crafted reward functions that rely on empirical experiences. Fully compabile with existing RL frameworks Future work: Exploring deeper into the impact of different objective prioritizations. Enhancing the training process of our framework in terms of both training duration and sample efficiency.