Multi-Objective Deep Reinforcement Learning with �Priority-based Socially Aware Mobile Robot Navigation Frameworks

MichaelDang49 61 views 26 slides Jul 17, 2024
Slide 1
Slide 1 of 26
Slide 1
1
Slide 2
2
Slide 3
3
Slide 4
4
Slide 5
5
Slide 6
6
Slide 7
7
Slide 8
8
Slide 9
9
Slide 10
10
Slide 11
11
Slide 12
12
Slide 13
13
Slide 14
14
Slide 15
15
Slide 16
16
Slide 17
17
Slide 18
18
Slide 19
19
Slide 20
20
Slide 21
21
Slide 22
22
Slide 23
23
Slide 24
24
Slide 25
25
Slide 26
26

About This Presentation

Main contributions:
Introducing a multi-objective framework designed to enhance existing single-objective navigation models, through 3 following contributions:

(1) The development of a multi-objective robot navigation framework.

(2) A reward prediction model.

(3) Conducting experiments that sho...


Slide Content

Multi-Objective Deep Reinforcement Learning with Priority-based Socially Aware Mobile Robot Navigation Frameworks Hanoi, Nov-2023 Institute of Information Technology Caugiay, Hanoi, Vietnam, Le Quy Don Technical University Caugiay, Hanoi, Vietnam,

Introduction and Related work Conclusion and Future work Outline 01 04 02 03 Methodology Experiment

Introduction and Related work 1

Socially aware robot navigation problem Social environment : Dynamic, dense of moving obstacles (humans or other objects) Non communicating situation Socially aware robot navigation is how to control the robot to reach the goal : Without collide on obstacles With time-efficently With social compliant

Deep Reinforcement Learning (DRL) approaches on Socially aware robot navigation CADRL - Collision Avoidance in Pedestrian-Rich Environments With Deep Reinforcement Learning SARL - Crowd-aware robot navigation with attention-based deep reinforcement learning

Socially aware robot navigation is a multi-objective decision-making problem The robot must not only reach its destination but also adhere to social rules. Each of these social rules can be considered an objective within the training process. Their importance may vary depending on the context. Some recent research attempts have endeavored to extend robot navigation into a multiobjective problem. However: They focused on relatively simple navigation spaces. (grid world, no pedestrians,…)

Main contributions of our work Introducing a multi-objective framework designed to enhance existing single-objective navigation models, through 3 following contributions: (1) The development of a multi-objective robot navigation framework. (2) A reward prediction model. (3) Conducting experiments that showcase the effectiveness of our framework within a crowded simulation environment.

Methodology 2

Typical Multi-objective Reinforcement learning Similar to Single-objective RL framework, which rely on Markov Decision Process (MDP): (1) The only difference: The environment issues a vector reward instead of a scalar reward The utility function which maps multi-objective reward to a scalar value in alignment with user-defined preferences .  

User Preferences Modeling We proposed Object Order : , which defines the priority of each objective. The first objective in o ( ) has the highest priority and vice versa. The is preferable to following order o : if (2)  

Proposed Reward Predictor with Objective Order Given Objective Order o , and utility function , the preferences ≻ o is define in terms of state-reward and trajectories-reward: For every vector state-reward : (3) For every trajectories-reward : (4) As utility function is unknown in most case, we propose a Reward Predictor ( denote as f ) to approximate u. The Reward Predictor predict scalar rewards from the state space S instead of reward-vector R .  

Proposed Reward Predictor with Objective Order (cont) Given Objective Order o , and Reward Predictor f , the preferences ≻ o is define in terms of state-reward and trajectories-reward: For every vector state-reward : (5) For every trajectories-reward : (6) In fitting f , we deploy state loss and trajectory loss to ensure these contrains. Both losses ultilize the Cross-Entropy loss .  

Proposed Reward Predictor The Embedding Module transforms the state of agents into high-dimensional vectors, facilitating the extraction of dynamic features. The Attention Module considers human interactions, is tasked with generating a context vector that is associated with each individual’s observation. The Prediction Module forecasts the subsequent scalar reward based on the observed state, in conjunction with the provided context vector.

Proposed Reward Predictor with Objective Order (cont)

Our proposed Multi-Objective Robot navigation framework With Reward Predictor: , which predicts scalar rewards from observed states satisfying predefined Objective O rder. We can convert a multi-objective RL framework to a single-objective one. ]  

Proposed Framework

Experiment 3

Experiment Setup Simulation environment (adopt from SARL): Invisible robot Holonomic Number of humans: 5, 10, 15, 20 Baseline : SARL [1] SARL within our framework : SARL_f Predefined Objective Order: Reward predictor: f RL Framework: SARL Training episodes: 20.000 [1] C. Chen, Y. Liu, S. Kreiss, and A. Alahi, “Crowd-robot interaction: Crowd-aware robot navigation with attention-based deep reinforcement learning,” in 2019 International Conference on Robotics and Automation (ICRA) . IEEE, 2019, pp. 6015–6022.

Quantitative Evaluation on 100 testing episodes SARL_f shows a significant improvement in minimizing the discomfort experienced by humans SARL_f exhibits better generalization than SARL when facing unforeseen situations

Reward predictor We evalute 4 states, Over the course of the training process, we evaluate predicted rewards of 4 permanent randomly selected states, each representing one of 4 types (Success, Discomfort, Collision, and Other) Reward predictor has effectively assigned distinct rewards to each type of state.

Qualitative Evaluation SARL_f intentionally chose a longer path to ensure the safety of humans.

Qualitative Evaluation (cont) SARL_f exhibited a tendency to potentially halt its motion and wait for humans to move before resuming its path, therefore, reducing human discomfort.

Qualitative Evaluation (cont) SARL_f successfully navigate to the goal in 20-human setup while SARL doesn’t.

Conclusion and Future work 4

Conclusion and Future work Conclusion: Our framework leverages a reward prediction model to convert reward vectors into scalar rewards that align with user preferences . Eliminating the need for hand-crafted reward functions that rely on empirical experiences. Fully compabile with existing RL frameworks Future work: Exploring deeper into the impact of different objective prioritizations. Enhancing the training process of our framework in terms of both training duration and sample efficiency.

Conclusion and Future work 4