Autonomous Navigation of a Unmanned Aeri

AdeebaAli14 102 views 63 slides Aug 23, 2024
Slide 1
Slide 1 of 63
Slide 1
1
Slide 2
2
Slide 3
3
Slide 4
4
Slide 5
5
Slide 6
6
Slide 7
7
Slide 8
8
Slide 9
9
Slide 10
10
Slide 11
11
Slide 12
12
Slide 13
13
Slide 14
14
Slide 15
15
Slide 16
16
Slide 17
17
Slide 18
18
Slide 19
19
Slide 20
20
Slide 21
21
Slide 22
22
Slide 23
23
Slide 24
24
Slide 25
25
Slide 26
26
Slide 27
27
Slide 28
28
Slide 29
29
Slide 30
30
Slide 31
31
Slide 32
32
Slide 33
33
Slide 34
34
Slide 35
35
Slide 36
36
Slide 37
37
Slide 38
38
Slide 39
39
Slide 40
40
Slide 41
41
Slide 42
42
Slide 43
43
Slide 44
44
Slide 45
45
Slide 46
46
Slide 47
47
Slide 48
48
Slide 49
49
Slide 50
50
Slide 51
51
Slide 52
52
Slide 53
53
Slide 54
54
Slide 55
55
Slide 56
56
Slide 57
57
Slide 58
58
Slide 59
59
Slide 60
60
Slide 61
61
Slide 62
62
Slide 63
63

About This Presentation

This is a presentation on the Autonomous UAV Navigation.


Slide Content

Monocular Visual SLAM-based Mapping and Autonomous Navigation in GPS-denied environments By: Adeeba Ali Under the supervision of Prof Rashid Ali (Deptt of Computer Engineering) and Prof. Mirza Faisal S. Baig (Deptt of Mechanical Engineering)

Outline Introduction Objective Challenges and Issues Literature Survey Research Gap Contributions Implementation details Results Conclusions References

Introduction An unmanned aerial vehicle (UAV) is an aircraft which can fly without a human pilot aboard. UAV navigation can be seen as a process of planning on how to safely and quickly reach the target location. In order to successfully complete the scheduled mission, a UAV must be fully aware of its states, including location, navigation speed, heading direction as well as starting point and target location. The problem of UAV navigation includes three steps: Localization and mapping, Obstacle avoidance, and Path planning.

Objective To develop a real-time autonomous UAV that can efficiently navigate in GPS-denied environments with minimum number of collisions with the obstacles present in the surrounding environment.

Challenges Autonomous UAV Navigation aided with monocular vision is an emerging, however more challenging task. Several factors like illumination effects, texture-less and texture-based regions, etc affect the proper deployment of vision-based methods. In order to overcome these limitations, several methodologies have been proposed, such as optical flow methods, stereo vision-based approaches, vanishing line technique, etc. After reviewing all of the existing techniques and the challenges associated with them we are going to propose an ORB SLAM based state of the art UAV’s autonomous navigation strategy for GPS-denied environments. The proposed approach will only exploit the visual sensor, that is, the front camera of UAV for navigation, obstacle detection, and collision avoidance.

Overview of the proposed approach Estimation of depth using deep convolutional neural networks. Obtain the map of the surrounding environment using an ORB SLAM. Integrate ORB SLAM with depth estimating CNN in order to densify the sparse point cloud map obtained from ORB SLAM. Extraction of key features from the environment map for UAV control Incorporate an efficient path planning algorithm capable of avoiding obstacles in the depth map obtained through the ORB SLAM and CNN. Evaluation of the proposed approach through simulation and real time experiments .

Literature Survey Celik et al. (2009) presented a visual SLAM-based system for indoor environment navigation, where the layout is unknown and without the aids of GPS. Ranftl et al. (2016) produced a dense depth map from two consecutive frames using segmented optical flow field. Esrafilian and Taghirad (2016) put forward a method based on oriented fast and rotated brief SLAM (ORB-SLAM). Seyed and Eugenio (2018) proposed a method that exploits feature based ORB SLAM for sparse map generation and Extended Kalman Filter (EKF) for combining the obtained pose and Inertial Measurement Unit (IMU).

Literature Survey Yathirajan et al. (2020) proposed a real-time chain-based path planning approach with built in obstacle avoidance in conjunction with ORB SLAM. Ram Prasad et al. (2018) proposed a model based on vanishing point estimation for autonomous navigation and collision avoidance in GPS-denied environments. Chakravarty et al. (2017) proposed a Obstacle avoidance technique that utilized Convolutional Neural Networks (CNN) for depth estimation from Single Image.

Literature Survey Taha and Endrey (2021) proposed a hybrid approach for autonomous Collision-free UAV navigation in 3D partially unknown dynamic environments. In their work they combine global path planning algorithm (RRT-connect) with a relative control law based on sliding mode control. Zhang et al. (2019) proposed a monocular vision-based method for obstacle avoidance. They used CNN for the generation of depth images which further fed to the control algorithm which steers the quadrotor to fly away from obstacles. Ram prasad et al. (2019) proposed a method that uses ORB SLAM for keypoints extraction from video frame and pinhole camera principle for depth calculation. Table 1: Enumerates the methods in Autonomous UAV Navigation

Types Authors Methods SLAM-based/ Map building Celik et al. (2009) Indirect; Sparse Map-based Ranftl et al. (2016) Optical flow-based Esrafilian and Taghirad (2016) ORB SLAM; Potential field Yathirajan et al. (2020) ORB SLAM; EKF (sensor fusion) Taha and Endrey (2021) ORB SLAM; Chain-based path planning Ram Prasad et al. (2019) Vanishing point estimation Mapless Ram Prasad et al. (2018) CNN-based depth estimation Chakravarty et al. (2017) RRT-based path planning & Slide mode control Zhang et al. (2009) CNN-based depth estimation Hybrid Seyed and Eugenio (2018) ORB SLAM; pinhole camera principle based depth estimation

Observations With the ability of autonomous navigation and collision avoidance, UAVs can set forth to places where humans can’t and collect video data for further understanding, thereby helping mankind in dealing with such disastrous situations. In certain scenarios, where the GPS precision is too low or GPS signal is not present at all like the indoor environments proximity sensors are utilized to perform the task of autonomous navigation. However many proximity sensors such as LIDARs have high power consumption and they are heavyweight sensors and hence are not suitable for aerial vehicles. Hence, in the proposed scheme the work is done in the field of monocular vision based UAV. In this work depth estimating CNN are utilized for obtaining the depth map from RGB images only. Earlier, two or more RGB images or stereo vision are used for obtaining depth map of the environment.

Research gap The existing approaches for vision based UAV navigation didn’t include any efficient technique to densify the sparse depth map of the surrounding environment obtained from visual SLAM. Previous research works on monocular depth estimation leverage transfer learning approach to use pre-trained CNN models like ResNet-50, AlexNet, and VGG-16 as the backbone networks of their depth estimation models, however the pretrained DenseNet-161 model hasn’t been utilized yet for extracting dense depth features from RGB images. Existing studies lack the involvement of optimization algorithms in the solutions proposed to monocular depth estimation problem for further minimizing the depth regression error.

Contributions Deep learning based integration of RGB image and sparse depth map of SLAM to get dense 3-D map of the surrounding. Obtained high quality depth estimation results as compared to the existing state-of-the-art after using pre-trained DenseNet-161 as a depth features extractor in the depth estimation model. Further minimize the regression error on depth predictions by ensembling four pre-trained CNN models and combining their predictions via three approaches: by taking simple average, assigning optimized weights to the predictions from models present in ensemble calculated through Genetic algorithm and Particle swarm optimization.

Publications Accepted: Dense Monocular depth estimation with densely connected convolutional networks. In: International Joint Conference on Advances in Computational Intelligence (IJCACI-2022), Springer. An Overview of Vision-based methods for Autonomous UAV Navigation In: Journal of Emerging Technologies and Innovative Research,(JETIR), vol 9, Issue 11, 2022. Design of an AI-powered flying cobot for providing assistance in industrial applications under the 5.0 Framework (Only Abstract) In: Intelligent Robots and Cobots (Industry 5.0 Transformation Applications, Wiley-Scrivener.

Publications Submitted : Improved quality single image based depth prediction using ensemble of ConvNets In: International Journal of Information Technology Management, University of Tehran Ensemble learning approach for good quality Monocular depth estimation In: 6th International Conference on Information Systems and Computer Networks (ISCON), IEEE

Implementation details The implementation of the proposed approach includes the following tasks: Estimation of depth using deep convolutional networks and testing of the developed models on NYU V2 depth dataset. Creation of the sparse point cloud map of the surrounding environment using ORB SLAM. Densification of the sparse map using fusion of RGB and sparse images. Selection of key features from depth image that can help in the UAV navigation. Implementation of the A-star(Hart et al. 1968) path planning algorithm. Development of the virtual indoor environment using AirSim. Simulation of the of UAV navigation in the virtual AirSim environment using unreal engine. Real time evaluation of the proposed approach on DJI tello drone.

RGB Video feed from UAV ORB SLAM (Localization and Mapping) Deep learning model (Scaling and depth estimation) Key points identification Occupancy grid map (Path planning and control) Pose Sparse map Scaled depth map Keypoints Control feedback Figure 1: Proposed design framework

Depth Estimation using CNN The objective of estimating the distance relative to the camera, that is, depth is to obtain a geometrical representation of the scene and to recover the appearance and 3-dimensional shape of the objects present in the RGB image. The estimation of the depth of objects present in the surrounding environment is one of the fundamental requirements of many emerging technologies. In this work the problem of depth estimation is to be solved by training the CNNs having encoder-decoder structure on NYU V2 depth dataset where encoder of the network extract the features from the RGB image and decoder which is made up of up-sampling blocks generates depth images by increasing the resolution of encoded feature stack. The pretrained CNN models are the backbone or encoder networks of depth estimation model and responsible for encoding the depth features present in RGB images.

Depth Estimation using CNN In this work we compare six different depth estimation models which differ in encoder part, as different pre-trained CNN models are used for different models. The five different pre-trained CNN encoders that are used in this work are Resnet-50, Densenet-161, Densenet-169, Densenet-201, and Mobilenet V2. In order to further increase the performance of the network, the ensemble of above mentioned five models is leveraged for depth prediction. The ensemble framework of the depth estimation models combine the depth predictions in three different ways: computing simple average, optimizing the weights given to different models using genetic algorithm and particle swarm optimization. Three kinds of loss function are used in the training of the model: REL, MSE, and berHu. The evaluation metrics used for testing the performance of the model are: RMSE, REL, δ1, δ2, and δ3.

Figure 2: The encoder-decoder architecture used for estimating depth image from a RGB image.

Dataset Used For training of CNN, the NYU Depth v2(Silberman et al. 2012) dataset will be used. This dataset contains 48521 indoor images for training and 654 for validation. These images are all of the indoor scenes and have been recorded with a Kinect camera at a resolution of 640 × 480 pixels. The dataset includes different types of rooms, like basements, bedrooms, offices, and dining rooms. This dataset also includes labeled depth images, which would allow for training semantic information. The kinect uses a structured light sensor to estimate the depth.

RGB Image from NYU V2 depth dataset Ground truth depth image Figure 3: A RGB image and the corresponding depth image take from an indoor scene of the NYU depth dataset

Loss functions Loss functions are used to determine the error (aka “the loss”) between the predicted and the given target value. They are used to evaluate how the given algorithm models the data, with the goal of minimizing the loss. They can be categorized into two groups: Classification loss functions and Regression loss functions. Loss functions are used in regression to find a line of best fit by minimizing the overall loss of all the points with the prediction from the line. Commonly used loss functions for regression problems are MSE, MAE, and Reversed Huber loss (BerHu).

Loss functions for Regression Networks Mean Absolute error (ℒ1): Mean absolute error is the absolute difference between the pixel values of the ground truth and the estimated depth image. ℒ1 = | y i - y i ’ | Where, y i and y i ’ are respectively the ground truth and the prediction. Mean Squared error (ℒ2): Mean squared error is the absolute difference between the square of the values of the ground truth and the prediction. ℒ2 = | y 2 i - y i ’ 2 |

Loss functions for Regression Networks 3) Reversed Huber Loss ( berHu ): Ɓ(e) = | e | if |e| <= c ( e 2 + c 2 ) / (2*c) otherwise Where, e is | y i - y i ’ | , and c is batch-dependent parameter, computed as 20% of the maximum absolute error over all pixels in a batch.

Evaluation metrics Root mean squared error (RMSE): It is the square root of the difference between the square of the values of the ground truth and the prediction. RMSE = √(y 2 i - y i ’ 2 ) Mean absolute relative error (REL): It is the ratio of the absolute error and the value of the ground truth. REL = (| y i - y i ’ |) /y i δi: It is the percentage of predicted pixels where the relative error is within a threshold. δi = card ( { y i ’ : max { (y i ’ / y i ) , (y i / y i ’) } < 1.25 i }) / card ( { y i } ) card is the cardinality of a set ; higher δi indicates better prediction

Figure 4: Screenshot of the code written for encoder-decoder CNN architecture

Ensemble of CNN Ensembling is the process of combining different models to obtain a robust and reliable model to make predictions. A convolutional neural network is an efficient deep learning model applied in various areas, hence an ensemble of the same deep learning models is more robust and reliable. Furthermore, this technique reduces the variance of predictions and generalization error. Ensemble models provide more accurate predictions when compared with the basic deep learning models. The performance of the ensemble of deep learning models is depend on how predictions from individual models are combined. In this work ensemble prediction of four different models is used, which are Resnet-50, Densenet-161, Densenet-169, Densenet-201.

Ensemble 1: Combining models by computing the average of their predictions Figure 5: Ensemble model whose output is the average of the depth predictions obtained from depth estimation models trained with four different pre-trained encoder networks

Ensemble 2: Assignment of weights using GA In this approach, the final depth prediction of the ensemble is obtained after calculating the weighted sum of the depth predictions from four different depth estimation models with encoder networks: ResNet-50, DenseNet-161, DenseNet-169, and DenseNet-201. An evolutionary optimization algorithm (aka “Genetic algorithm”) is used for finding the optimized weights in range [-0.1, 0.1] for individual depth estimation models. The fitness or objective function that the algorithm has to minimize is defined as follows: Func (λ i ) = ∑ ( λ i * pred i ) - target , i is from 1 to 4 w here, λ i are the weights assigned to the predictions from four different depth estimation models based on ResNet-50, DenseNet-161, DenseNet-169, and DenseNet-201 encoder networks.

Initial Population A population consisting of 100 genes is initially generated. Each gene consists of 4 chromosomes which are the values for λi’s. Fitness Function The value of the fitness function is computed for each gene in the population. Selection Two parents with minimum value on fitness function are selected for generating next generation. Crossover Crossover point is randomly selected, upto where the exchange of chromosomes between the selected genes takes place. Mutation The bits in the selected genes are flipped with a small probability. Termination Decision terminating condition? Figure 6: Work flow diagram of Genetic algorithm based optimization

Figure 7: Screenshot of the code written for Genetic Algorithm

Ensemble 3: Assignment of weights using PSO PSO was proposed by Kennedy and Eberhart in 1995 The algorithm is based on the belief of sociobiologists that a school of fish or a flock of birds can benefit from the experience of all its members. In other words, when a bird is flying and searching randomly for food, for instance, all the birds in the flock can share their discovery. This behavior of birds is simulated in PSO algorithm in order to obtain tangible optimized solutions for the real world complex problems. In this work, the 4-dimensional objective function which needs to be minimized for achieving least depth regression error is defined as: Func(λ d ) = ∑ (λ d * pred d ) - target , d is from 1 to 4 where, λ 1 , λ 2 , λ 3 , λ 4 are the weights assigned to the predictions from the four depth estimation models of the ensemble and a particle is refer to the 4-dimensional vector of these λ’s.

Ensemble 3: Assignment of weights using PSO A position vector, and a velocity vector is associated with each 4-dimensional particle of the initially generated population. After each iteration the position and velocity of the particle is updated until the termination condition is satisfied. The equations for the new position vector, X i (t+1) and velocity vector, V i (t+1) are as follows: X i (t+1) = X i (t) + V i (t+1) V i (t+1) = w*V i (t) + c 1 r 1 (pbest i - X i (t)) + c 2 r 2 (gbest - X i (t)) where, w is the inertial weight which is used to force the particle to keep moving in the same direction as the previous generation; c 1 and c 2 are constant accelerators; r 1 and r 2 are hyper parameters that control random deviations of the moving particles; pbest i is the optimized value of the fitness function obtained with particle i and gbest is the best value of the function explored by the entire swarm.

Initialization This is the first step of PSO in which the population of 4-dimensional 100 particles and all the hyper parameters are initialized. Fitness value calculation The value of fitness function corresponding to each particle in the population is calculated. Finding best particle Compare the fitness values obtained by best p_i with the value computed with the particle x_i, if the current value is greater than the previous best value then particle x_i will be consider as best p_i Identifying global best particle Find best particle of the swarm and regarded it as the global best particle Updating position Calculate the velocity of the particles for next iteration and then accordingly update the position of the particle Termination Terminating condition? Yes No Figure 8: Work flow diagram of Particle Swarm optimization

Figure 9: Screenshot of the code written for PSO

Map generation using ORB SLAM In order to make robots capable of navigating in a 3D environment, a map is required for obstacle detection and avoidance. ORB SLAM stands for Oriented-fast and Rotated-brief, generates a sparse point cloud representation of the environment. It was proposed and developed by Raul Mur-Artal in 2015; Artal et al. (2015). Till now three versions of ORB SLAM have been proposed and developed and their source codes are public and uploaded by the authors on github. For our work we have used ORB SLAM 2.

Tools used SNo. Name Type 1 Robot Operating System (ROS) Software 2 Pytorch Deep learning framework for python 3 OpenCV Open-Source Software library 4 ORB SLAM 2 Software 5 Python/C++ Programming languages Table 2: Specifications of the tools used

UAV specifications Dimensions: 99 x 92.5 x 41 mm Weight: 80 grams Propeller: 3” Sensors: Altitude sensor, barometer, IMU, 720p camera Communication: 2.4GHz WiFi Photo: 5MP (2592 x 1936) FOV: 82.6 o Video: HD 720p, 30FPS Format: JPEG(Photo); MP4(Video) Image Stabilization: Electronic Image Stabilization (EIS) Max Flight Distance: 100m Max Speed: 8m/s Max Flight Time: 13 min Max Flight Height: 50m Figure 10: DJI Tello-Ryze

Path planning Path planning is to find safe, that is, collision-free path between two pre-determined locations, source and destination by optimizing certain performance objectives. The performance objectives can be energy consumption , computing time, distance, path smoothness, and turns, etc depending upon the type of mission, and operating environment. Based on the degree of information about environment, path planning approaches are mostly classified into two categories, local path planning, and global path planning. In local path planning, the environment is not known, and UAVs use sensors or other devices in order to acquire information about the underlying environment, where as in global path planning all information about environment is known in advance. In this work we assume that map of the environment is given in advance and hence perform global path planning algorithm (A-star) for UAV navigation.

Comparison of state-of-the-art Path planning algorithms Algorithms Average path deviation Success Rate(%) Average time(sec) Average steps A-star 0.0 87 0.5038 70.31 Dijkstra 0.0 87 1.1195 70.31 RRT 61.09 100 2.2004 107.4 Wave-front 4.78 87 1.12 70.31 Table 3: Comparison of four most classic and state-of-the-art path planning algorithms on different evaluation measures.

Comparison of state-of-the-art Path planning algorithms The evaluation results of different path planning algorithms mentioned in Table 8 are obtained after running the given four path planning algorithms on the 2-D occupancy grid map. The size of the grid map used is 20 x 20. Each algorithm is made to run 20 times on this map and average values of the evaluation parameters are calculated

A-star path planning algorithm The A-star path planning algorithm was first proposed and described by Hart et al. 1968, and it is one of the best known path planning algorithms. A-star algorithm is a heuristic search algorithm, which aims to find a path from the start node to the goal node with the smallest cost by searching among all possible paths. The heuristic information related to the characteristics of the problem is utilized to solve the perform, so it is better than other uninformed search algorithms.

A-star path planning algorithm The A-star algorithm is defined as a best-first algorithm because the cost of each cell in the configuration space is computed as: f(n) = g(n) + h(n) Where, g(n) is the cost of the path from the start node to the current node n, h(n) is the cost of the path from node n to the goal node through the selected sequence of nodes, and h(n) is the heuristic function of the A-star algorithm. f(n) is denoted as the evaluation function of node n. The node with the minimum value of f(n) is chosen as the next node in the sequence.

Comparison of Evaluation Results of individual DE Models Table 4: Results obtained after validation of the Depth estimation models trained with MAE loss function on NYU V2 depth dataset. Encoder Networks Densenet-161 Densenet-169 Densenet-201 Resnet-50 Mobilenet V2 RMSE 0.452 0.464 0.462 0.481 0.590 δ1 0.863 0.858 0.857 0.847 0.758 δ2 0.970 0.969 0.969 0.965 0.939 δ3 0.993 0.992 0.993 0.990 0.984 REL 0.116 0.119 0.119 0.124 0.168

Figure 11: From top to bottom: RGB images (first row); ground truth depth images (second row); depth images estimated by ResNet-50 (third row); depth images estimated by DenseNet-161 (fourth row); depth images estimated by DenseNet-169 (fifth row); depth images estimated by DenseNet-201 (sixth row); depth images estimated by MobileNet-V2 (seventh row);

Comparison of results obtained using different ensemble strategies Table 5: Results obtained with different ensemble learning approaches. Ensemble Strategies Average of predictions GA based optimization PSO based optimization RMSE 0.434 0.422 0.424 δ1 0.874 0.869 0.867 δ2 0.974 0.974 0.971 δ3 0.994 0.994 0.993 REL 0.111 0.116 0.119

Figure 12: From top to bottom: RGB images (first row); ground truth depth images (second row); depth images estimated by Average-Ensemble (third row); depth images obtained after GA based optimization (fourth row); depth images obtained after PSO (fifth row)

Comparison with state of the art Table 6: Comparison of our optimized results with the state-of-the-art. Ensemble Strategies Average of predictions GA based optimization PSO based optimization Huang (2020) Alhashim (2019) Karaman (2018) Laina (2016) Eigen (2015) RMSE 0.434 0.422 0.424 0.459 0.465 0.514 0.573 0.641 δ1 0.874 0.869 0.867 0.459 0.465 0.810 0.811 0.769 δ2 0.974 0.974 0.971 0.459 0.465 0.959 0.953 0.950 δ3 0.994 0.994 0.993 0.459 0.465 0.989 0.988 0.988 REL 0.111 0.116 0.119 0.459 0.465 0.143 0.127 0.158

Results obtained after using ORB SLAM 2 TUM Dataset Figure 13: (a) a gray image from TUM dataset sequence ; (b) captured image with point cloud ; (c) sparse point cloud map

Results obtained after using ORB SLAM 2 2) KITTI Dataset Figure 14: (a) a gray image from KITTI dataset sequence ; (b) captured image with point cloud ; (c) sparse point cloud map

(a) Simple Environment (b) Complex Environment Figure 15: Examples of paths identified by A-star path planning algorithm in environments represented by Occupancy 2D grid maps, where red arrow is the starting point and yellow star is the target location. Path planning Results

UAV navigation in a Simulated indoor environment Unreal Engine (UE) is a 3D computer graphics game engine developed by Epic Games, it has been used in a variety of genres of games, television industry, robot simulations, etc. AirSim is an open-source plugin, developed by the Microsoft company, for Unreal Engine allowing a realistic simulation of cars and drones. The great advantage of AirSim is that it has been developed with the aim of simulating drones, and move specifically via Artificial Intelligence systems. With high level of realism of the Unreal Engine, it is possible to obtain simulations close to the real world behavior of a drone.

UAV navigation in a Simulated indoor environment The simulated indoor environment that we have created using AirSim plugins is composed of 5 corridors, 4 turns, and 2 crossroads. The most important step of the navigation algorithm is the detection of key points like turns, crossroads, staircases, etc based on the images captured by drone. Such key points can be identified through the depth images estimated from a deep learning model. Thus, depth estimation will be used essentially to try to detect key points in the environment and other obstacles based on the relative depth of each zone of pixels.

Key Points in the Simulated environment (a) (b) (c) Figure 16: (a) : Not a key point (go straight) ; (b) : key point (crossroad) ; (c) : key point (turn)

Real-Time Experiments with DJI Tello DJI Tello is made to navigate in the ‘L’ shaped corridor of Main Building, Zakir Husain College of Engineering and Technology, Aligarh Muslim University. In this corridor, the UAV flies 16 m forward and 5 m rightwards after taking a clockwise turn.

Going forward (b) Rotating at 90 o in clockwise direction (c) Right turn Figure 17: UAV actions during navigation in indoor corridor environment.

Conclusion The problem of UAV navigation is divided into the tasks of localization and mapping, obstacle avoidance and path planning. In order to obtain the map of the surrounding environment and to localize the UAV in it ORB SLAM is used to generate the point cloud map of the surrounding environment. However, map obtained through ORB SLAM is sparse which may result in more number of collisions, hence in order to solve this problem depth estimating CNN is integrated with the ORB SLAM2 software. The deep CNN model which is used for depth estimation from RGB images consists of an encoder-decoder architecture, where encoder is responsible for feature extraction and stacking, where as decoder is used for generating depth image after leveraging encoded features. The encoder is made up of deep learning model like Densenet-161, Densenet-169, Densenet-201, Resnet-50, and Mobilenet V2, and decoder is made up of upsampling blocks and transposed convolutional layers.

Conclusion To further improve the depth estimation results, an ensemble learning approach is used, where the depth estimates of individual depth estimation models are combined and the weights assign to them are optimized using Genetic Algorithm and Particle Swarm Optimization. The ORB SLAM2 software is tested on two datasets. The software used for simulating the autonomous UAV navigation is unreal engine 5.2 in which the indoor corridor environment is created using AirSim plugin. A-star path planning algorithm is used which enables the UAV to navigate from starting location to goal position with minimum collision and time. In order to evaluate the proposed approach, real time experiments are performed with DJI Tello Ryze drone.

References Celik, K., S. J. Chung, M. Clausman, and A. K. Somani. 2009. “Monocular Vision SLAM for Indoor Aerial Vehicles,” in IEEE/RSJ International Conference on Intelligent Robots and Systems, St. Louis, USA, October 10–15. Punarjay Chakravarty, Klaas Kelchtermans, Tom Roussel, Stijn Wellens, Tinne Tuytelaars and Luc Van Eycken.2017. “ CNN-based Single Image Obstacle Avoidance on a Quadrotor,” in IEEE/RSJ International Conference on Intelligent Robots and Systems, vol. 3, 2017, pp. 2298–2304. Esrafilian, O., and H. D. Taghirad. 2016. “Autonomous Flight and Obstacle Avoidance of a Quadrotor by Monocular SLAM,” in, 4th IEEE International Conference on, Robotics and Mechatronics Tehran, Iran, October 26–28. Ma, F., & Karaman, S. (2018). Sparse-to-dense: Depth prediction from sparse depth samples and a single image. 2018 IEEE International Conference on Robotics and Automation (ICRA) . Mur-Artal, R., Montiel, J. M. M., & Tardos, J. D. (2015). ORB-SLAM: A Versatile and Accurate Monocular SLAM System. IEEE Transactions on Robotics , 31 (5), 1147–1163. https://doi.org/10.1109/tro.2015.2463671

References Ram Prasad Padhy, Feng Xia, Suman Kumar Choudhury, Pankaj Kumar Sa, and Sambit Bakshi. 2018. “Monocular Vision Aided Autonomous UAV Navigation in Indoor Corridor Environments,” IEEE Transactions on Sustainable Computing, 2377-3782 (c). Ram Prasad Padhy, Suman Kumar Choudhury, Pankaj Kumar Sa, and Sambit Bakshi. 2019. “Obstacle Avoidance for Unmanned Aerial Vehicles,” IEEE Consumer Electronics Magazine, 2162-2248/19. Ranftl, R., V. Vineet, Q. Chen, and V. Koltun. 2016. “Dense Monocular Depth Estimation in Complex Dynamic Scenes,” IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, USA, June 27–30. Seyed Jamal Haddadi, Eugenio B. Castelan. 2018. “Visual-Inertial Fusion for Indoor Autonomous Navigation of a Quadrotor Using ORB-SLAM,” Latin American Robotic Symposium on Robotics, 978-1-5386-7761-2/18/2018 IEEE. N. Silberman, D. Hoiem et al., “Indoor segmentation and support inference from rgbd images,” Computer Vision–ECCV 2012, pp. 746–760, 2012.

References Taha Elmokadem and Andrey V. Savkin. 2021. “A Hybrid Approach for Autonomous Collision-Free UAV Navigation in 3D Partially Unknown Dynamic Environments,” MDPI journal of Drones 2021, 5, 57. Bharadwaja Yathirajam, Vaitheeswaran S.M, Ananda C.M. 2020. “ Obstacle Avoidance for Unmanned Air Vehicles Using Monocular-SLAM with Chain-Based Path Planning in GPS Denied Environments,” Journal of Journal of Aerospace System Engineering, EISSN 2508-7150, Vol.14, No.2, pp.1-11. Zhenghong Zhang, Mingkang Xiong, Huilin Xiong. 2019. “Monocular Depth Estimation for UAV Obstacle Avoidance,” in IEEE 4th International Conference on Cloud Computing and Internet of Things (CCIOT), 978-1-7281-4398-9/19/2019. D. Eigen and R. Fergus. Predicting depth, surface normals and semantic labels with a common multi-scale convolutional architecture. In Proceedings of the IEEE International Conference on Computer Vision, pages 2650–2658, 2015. I. Laina, C. Rupprecht et al. Deeper depth prediction with fully convolutional residual networks. In 3D Vision (3DV), 2016 Fourth International Conference on. IEEE, pages 239–248, 2016. Hart PE, Nilsson NJ, Raphael B. A formal basis for the heuristic determination of minimum cost paths. IEEE transactions on Systems Science and Cybernetics. 1968; 4(2):100–107.

THANK YOU
Tags