ppt - of a project will help you on your college projects
vikaspandey0702
104 views
44 slides
May 07, 2024
Slide 1 of 44
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
About This Presentation
ppt of a project
Size: 2.09 MB
Language: en
Added: May 07, 2024
Slides: 44 pages
Slide Content
Major Project Presentation on REAL TIME OBJECT RECOGNITION FOR VISUALLY IMPAIRED PEOPLE Mahatma Gandhi Mission’s College Of Engineering & Technology A-09, Sector 62, Noida, Uttar Pradesh 201301 Submitted by: Vikas Kumar Pandey Akshay kumar Hariom Roll No.: 1900950310011 Roll no :1900950310002 Roll no: 190950310006
Content Introduction Problems faced by blind peoples Literature review Objective Block diagram Yolo algorithm Block diagram of yolo algorithm Object detection Database used Methodology Flow chart Hardware used Advantages of yolo algorithm Survey Advantages Conclusion Future work Reference
Introduction The World Health Organization (WHO) had a survey over around 7889 million people. The statistics showed that among the population under consideration while survey, 253 millions were visually impaired.[4] There are many visually impaired people facing many problems in our society. The device developed can detect the objects in the user's surroundings. This is a model has been proposed which makes the visually impaired people detect objects in his surroundings. The output of the system is in audio form that can be easily understandable for a blind user.
Problem faced by blind people Visually Impaired People confront many problems in recognizing the objects. Blind people doesn’t able to recognize the objects next to them. This is developed to detect the objects in the user's surroundings. It will also solve the problem of keeping a walking stick.
Literature review “The authors in(Seema et al ) suggested using a smart system that guides a blind person in 2016[1]” The system detects the obstacles that could not be detected by his/her cane. However, the proposed system was designed to protect the blind from the area near to his/her head. Problem statement - The buzzer and vibrator were used and employed as output modes to a user. This is useful for obstacles detection only at a head level without recognizing the type of obstacles.
Contd. 2 . “A modification of several systems used in visual recognition was proposed in 2014.[2]” T he authors used fast-feature pyramids and provided findings on general object detection systems. The results showed that the proposed scheme can be strictly used for wide-spectrum images. Problem statement - I t does not succeed for narrow-spectrum images. Hence, their work cannot be used as efficient general objects detection.
Contd. “In ( Nazli Mohajeri et al, 2011) the authors suggested a two-camera system to capture photos”.[3] However, the proposed system was only tested under three conditions and for three objects. Specific obstacles that have distances from cameras of about 70 cm were detected. Problem statement - The results showed some range of error. Blind helping systems need to cover more cases with efficient and satisfied results.
Objective This project aims to relieve some of their problems using assistive technology. Simply it is the technique of real time stationary object recognition. To make visually impaired people self independent. To provide a device for detection of objects. Our main aim is, an object recognition function with device should be able to detect certain items from the camera and return an audio output to announce what it is. In order to recognize object, machine learning has to be involved.
Block diagram Capturing the video Object detection using Yolo Algorithm Raspberry 3B+ Text to speech Speaker
Object detection Object detection is a phenomenon in computer vision that involves the detection of various objects in digital images or videos. Some of the objects detected include people, cars, chairs, stones, buildings, and animals. It identify the object in a specific image. Establish the exact location of the object within the image.
Sr no ALGORITHM ADVANTAGE DISADVANTAGE 1 RESNET solve degradation problem by shortcuts skip connections. RESNETs are that for a deeper network the detection of errors becomes difficult. 2 R-CNN very accurate at image recognition and classification They fail to encode the position and orientation of objects. 3 FAST R-CNN save time compared to traditional algorithms like Selective Search. It still uses the Selective Search Algorithm which is slow and a time-consuming process. 4 SSD SSD makes more predictions. It has better coverage on location, scale, and aspect ratios. Shallow layers in a neural network may not generate enough high level features to do prediction for small objects. 5 YOLO Allows real time object detection. System trains in single go. More efficient and fast. Struggles to detect close objects because each grid can propose only 2 bounding boxes. EXISTING ALGORITHM
The YOLOv4 performance was evaluated based on previous YOLO versions (YOLOv3 and YOLOv2)as baselines. The new YOLOv4 shows the best speed-to-accuracy balance compared to state-of-the-art object detectors. In general, YOLOv4 surpasses all previous object detectors in terms of both speed and accuracy, ranging from 5 FPS to as much as 160 FPS. The YOLO v4 algorithm achieves the highest accuracy among all other real-time object detection models – while achieving 30 FPS or higher using a GPU. ALGORITHM SELECTION
YOLO algorithm YOLO is an abbreviation for the term 'You Only Look Once’. Created by Joseph Redmon, Santosh Divvala , Ross Girshick and Ali Farhadi. YOLO algorithm detects and recognizes various objects in the picture. Object detection in YOLO is done as a regression problem and provides the class probabilities of the detected images Prediction in the entire image is done in a single algorithmic run. YOLO algorithm consists of various variants including tiny YOLO and YOLOv1, v2, v3, v4. Popular because of its speed and accuracy.
Yolo evolution Algorithm Description The original YOLO - YOLO was the first object detection network to combine the problem of drawing bounding boxes and identifying class labels in one end-to-end differentiable network. YOLOv2 - YOLOv2 made a number of iterative improvements on top of YOLO including BatchNorm , higher resolution, and anchor boxes. YOLOv3 - YOLOv3 built upon previous models by adding an objects score to bounding box prediction, added connections to the backbone network layers and made predictions at three separate levels of granularity to improve performance on smaller objects. YOLOv4 - It is a one-stage detector with several components in it. It detects the object in real time. The speed and accuracy is faster than other algorithm.
CSP DARKNET53 CSPDarknet53 is a convolutional neural network and backbone for object detection. It employs a strategy to partition the feature map of the Image into two parts and then merges them through a cross-stage hierarchy. The use of a split and merge strategy allows for more gradient flow through the network.
SPATIAL PYRAMID POOLING A CNN consists of some Convolutional (Conv) layers followed by some Fully-Connected (FC) layers. Conv layers don’t require fixed-size input . The solution to this problem lies in the Spatial Pyramid Pooling (SPP) layer. It is placed between the last Conv layer and the first FC layer and removes the fixed-size constraint of the network. The goal of the SPP layer is to pool the variable-size features that come from the Conv layer and generate fixed-length outputs that will then be fed to the first FC layer of the network.
BAG OF FREEBIES AND SPACIALS ‘Bag of Freebies’ ( BoF ) is a general framework of training strategies for improving the overall accuracy of an object detection model. The set of techniques or methods that change the training strategy or training cost for improvement of model accuracy is termed as Bag of Freebies . Bag of Specials (BoS) can be considered as an add-on for any object detectors present right now to make them more accurate.
METHODOLOGY The steps of a currency recognition system based on image processing are as follows – Image capturing Image Acquisition Object detection YOLO algorithm Prediction
Block diagram Residual blocks Bounding box Target label Y Non max suppression Intersection over union Prediction Localization Start Output as audio Yolo algorithm
Capturing image Capturing of image is done by camera module for that purpose the objects captured in real time and stationary also.
Image Acquisition The image is captured by digital camera as RGB image and is converted to Gray scale version by intensity equation 1. I = (R+G+B)/3
RESIDUAL BLOCKS The image is divided into various grids. Each grid has a dimension of S x S. It uses the dimensions of 3 x 3, 13 x 13 and 19 x 19. There are many grid cells of equal dimension. Every grid cell will detect objects that appear within it.
LOCALIZATION The term 'localization' refers to where the object in the image is present. In YOLO object detection we classify image with localization i.e., a supervised learning algorithm is trained to not only predict class but also the bounding box around the object in image. Classification + localization = object detection
BOUNDING BOXES A bounding box is an outline that highlights an object in an image. Every bounding box in the image consists of the following attributes: Bounding box center (bx, by) Height ( bh ) Width ( bw ) Class (for example, person, car, traffic light, etc.). This is represented by the letter c. ( bw ) ( bh ) . (bx, by)
BOUNDING BOXES - CONT... Each 13x13 cell detects objects in the input image via its specified number of bounding boxes 13x13. In YOLO v4, each cell has 3 bounding boxes. So the total number of bounding boxes using 13x13 feature map would be. (13x13)x3 = 507 bounding boxes. The remaining bounding boxes are discarded as they don't localize the objects in the picture.
TARGET LABEL Y Target label y for this supervised learning task is explained as: Y is a vector containing Pc, Bx, By, Bh , Bw , CI,..., Ch Pc is the probability of presence of particular class in the grid cell. Pc >=0 and <=1. (i.e., Pc=0) means that object is not found. Pc>I means 100% probability that object is present. (Bx, By) defines the mid-point of object and ( Bh , Bw ) defines the height and width of bounding box. Also, if Pc > 0 then there will be n number of C which represents the classes of objects present in the image.
Intersection over union (IOU) (Intersection over Union) is a term used to describe the extent of overlap of two boxes. The greater the region of overlap, the greater the IOU. IOU is mainly used in applications related to object detection, where we train a model to output a box that fits perfectly around an object. IOU is also used in non max suppression algorithm. IOU=
NMS- NON MAX SUPRESSION To select the best bounding box, from the multiple predicted bounding boxes, an algorithm called Non-Max Suppression is used to "suppress" the less likely bounding boxes and keep only the best one.
Prediction YOLO v4 make detections at 3 different points i.e., layer 82, 94, 106. Network down-samples the input image by the network strides 32, 16 and 8 at those points respectively. After reaching a stride of 32, the network produces a 13x13 feature map for an input image of size 416x416. Another detection layer when the stride is 16 we obtain a 26x26 output feature map. And 52x52 feature map at the detection layer when the stride is 8. Thus, the total number of bounding boxes by YOLO V4 when the input image size is 416x416. ((13x13)+(26x26)+(52x52))x3 = 10647 bounding boxes32 is per image
Database used Coco dataset – COCO dataset, meaning “Common Objects In Context”. It is a large-scale image dataset containing 328,000 images of everyday objects and humans. The dataset contains annotations of machine learning models to recognize, label, and describe objects. COCO provides the following types of annotations: Object detection Captioning Key points Dense pose
Contd : Object detection consists of various approaches such as fast R-CNN, Retina-Net, and Sliding Window detection but none of the aforementioned methods can detect object in one single run. So there comes another efficient and faster algorithm called YOLO algorithm.
Flow Chart Start Capture image Image captured correctly Processing Deep learning Algorithm Predicted Object recognition Output in audio format Send error message Yes No No Yes
Hardware Raspberry pi 3B+ Camera module v2 Jumper wires Speaker Button
Raspberry pi 3B+ The Raspberry Pi 3 Model B+ is the latest product in the Raspberry Pi 3 range, boasting a 64-bit quad core processor running at 1.4GHz, dual-band 2.4GHz. and 5GHz wireless LAN, Bluetooth 4.2/BLE
Camera module v2 The Raspberry Pi Camera v2 is a high quality 8 megapixel Sony IMX219 image sensor custom designed add-on board for Raspberry Pi, featuring a fixed focus lens.
Advantage of yolo algorithm YOLO algorithm is important because of the following reasons: Speed : This algorithm improves the speed of detection because it can predict objects in real-time. High accuracy: YOLO is a predictive technique that provides accurate results. It use Convolutional implementation that means that if you have 3*3 grid (i.e., divide image into 9 grid cells) then you don't need to run the algorithm 9 times to validate presence of object in each grid cell rather this is one single convolutional implementation. Learning capabilities: The algorithm has excellent learning capabilities that enable it to learn the representations of objects and apply them in object detection.
Surveys According to national federation of blind , blind people can use all the devices easily so they can also use our object recognition system.[5]
Advantage This work is implemented using PTTS. Easy to set up. Open source tools were used for this project. Cheap and cost-efficient. This project will work on device only no need to buy any extra things.
Conclusion Simple Indian object recognition system based on yolo algorithm has been proposed. The system has been written in OpenCV.
Future Work Enhancing the accuracy by building a model of features for each object class. Working now on using local features instead of template matching Enhancing the best frame to be processed for runtime application Adding more objects to the database.