Yolov3

SHREYMOHAN1 1,866 views 16 slides Oct 11, 2018

Slide 1 of 16

About This Presentation

A comprehensible presentation on You Only Look Once (YOLO) algorithm (version 3).

Size: 3.03 MB

Language: en

Added: Oct 11, 2018

Slides: 16 pages

Slide Content

YOLO Wednesday, October 10, 2018 Shrey Mohan 1

Points to be discussed Overview Architecture Training Predictions Performance Current work 2

Overview Yolo is an end to end convolutional network for object localization and classification. It only looks at the image once and to make predictions, hence the name becomes You Only Look Once(YOLO). 3

Overview Yolo is one of the most accurate detection algorithms and it is the fastest at this point in time. Hence, it can be effectively used for real time computations. 4

Architecture 5

Architecture Yolo uses Darknet-53 which has 53 Convolution layers. This deeper backbone network thus gives a better mAP . Furthering detections to 3 scales also improves the mAP . 6

Training Yolo Various techniques were used : Different resolutions were used, 320, 352,….,608. Batch normalization was used after convolving. Unified datasets were used for training (next slide). 7

Training Yolo ImageNet and COCO datasets are merged. This helps the model to detect more specialized objects. The model may also detect objects it has never seen before. 8

Predictions The model predicts 3 matrices at different scales of the following dimensions : 13x13x(N*(80+5)) 26x26x(N*(80+5)) 52x52x(N*(80+5)) Where 80 are the class probabilities, 5 are the Bounding box attributes and N are the number of anchors for COCO dataset (next slide) 9

Predictions We decide pre-defined boxes for Yolo called anchor boxes which help predict bounding boxes The dimensions are decided after running k-means clustering on the training set bounding boxes. 10

Predictions 3 anchor boxes are defined for Yolo (v3) per grid cell. For each anchor box, model gives t x , t y , t w , t h , p o and 80 class probabilities. These parameters help predict locations of bounding boxes (next slide). 11

Predictions As seen in the equations t x , t y , t w and t h are the offsets to the anchor boxes. This is done for each of the anchor boxes defined (3 here). 12

Predictions Every grid has 3 anchor boxes associated. So how do we map anchor boxes to ground truths? We calculate the IoU of each anchor with each ground truth. Anchors with highest IoUs represent that particular ground truth. 13

Predictions Lambda- coord is set to 5 and lambda- noobj is set to 0.5. Square root of widths and heights are taken to treat small and large boxes equally. In this function, the last three terms are changed to cross-entropy functions instead. 14

Performance There is always a trade-off between accuracy and speed. While yolov2 ran on 45 fps on a titan X, yolov3 runs about 30 fps. 15

Yolov3

About This Presentation

Slide Content

Tags

Categories

Download

Quick Actions

Statistics

Related Slideshows

Yolov3

About This Presentation

Slide Content

Slide 1

Slide 2

Slide 3

Slide 4

Slide 5

Slide 6

Slide 7

Slide 8

Slide 9

Slide 10

Slide 11

Slide 12

Slide 13

Slide 14

Slide 15

Slide 16

Tags

Categories

Download

Quick Actions

Statistics

Related Slideshows

8-top-ai-courses-for-customer-support-representatives-in-2025.pptx

7-essential-ai-courses-for-call-center-supervisors-in-2025.pptx

25-essential-ai-courses-for-user-support-specialists-in-2025.pptx

8-essential-ai-courses-for-insurance-customer-service-representatives-in-2025.pptx

Know for Certain

PPT OPD LES 3ertt4t4tqqqe23e3e3rq2qq232.pptx