“Stream loss”: ConvNet learning for face verification using unlabeled videos in the wild

Learning Convolutional Neural Network for Face Verification Presented By: Elaheh Rashedi PhD in Computer Science Wayne State University 2018 Advisor: Professor Xue -wen Chen

Contents Introduction Long-Term Face Tracking using ConvNet “ FaceSequence ”: Video dataset for Face Recognition “ Stream-Loss ”: ConvNet Learning for Face Verification Conclusion & Future Work

Introduction Background Convolutional Neural Network (ConvNet) ConvNet-based Face Verification ConvNet -based Face Recognition Models Two-step verification model Single-step verification model Train and Test dataset Challenges Our Contributions

Convolutional Neural Network A kind of neural network where the input is image Contains less fully connectivity between neurons Fig 1. General ConvNet structure in face recognition problems

ConvNet-based Face Verification Common steps: Face detection Viola-Jones, Cascade CNN, … Pre-processing Geometric & lighting normalization ConvNet training Supervised vs. unsupervised Face identification Classification task Metric learning Joint-Bayesian, Cosine similarity, Triplet Similarity, Energy-based similarity, … Face Verification

ConvNet -based Face Verification Models Two-step verification models Two frameworks for identification and verification DeepFace model Web-scaled DeepFace model DeepID model series Single step verification models Same framework for identification and verification FaceNet model ( GoogleNet ) VGG model

Train and Test dataset Table 1. The common face recognition datasets

Challenges Few video-based trainable Convolutional Neural Network models are proposed Lack of available public video training dataset The existent long-term face tracking algorithm has low accuracy Face tracking algorithms can be utilized to collect the training video dataset

Our Contributions Designing a video -based face verification model using ConvNet

Contents Introduction Long-Term Face Tracking using ConvNet “ FaceSequence ”: Video dataset for Face Recognition “ Stream-Loss ”: ConvNet Learning for Face Verification Conclusion & Future Work

Long-Term Face Tracking using ConvNet Common Long Term Tracking Algorithms Tracking Challenges Proposed Model Detection-Verification-Tracking model (DVT) Deep-Learning-based Face Detection ConvNet -based Face Verification Multi-patch based Face Tracking DVT System Framework Demonstration Results

Common Long Term Tracking Algorithms Common Tracking Steps Select a video Employ a bounding box around the target Distinguish the object from the background Track the object around the same region in next frame Fig 2. Tracking schema using bounding box

Tracking Challenges Can be challenging on real world noisy videos Not robust against Appearance changes Occlusion Fast motion Illumination changes Background clutter Sensitive to the initialization of target Not able to handle all situations Long term tracking challenge: Not reliable in cases where the object leaves the view

Detection-Verification-Tracking model (DVT) Model Detection-Verification-Tracking Goal Long term face tracking Wild video target (unconstrained environment) Includes 3 components: Deep learning based face detection ConvNet -based face verification Multi-patch based tracking

Deep-Learning-based Face Detection Model Cascade-CNN ( ConvNet -based detection model) ConvNet structure: 3 ConvNets for faces vs. non-faces (binary classification) 3 ConvNets for bounding box calibration (Multiclass classification) Fig 3. Cascade-CNN face detection for binary classification

ConvNet -based Face Verification Pre-trained network based on VGG MatConvNet Convolutional Neural Network: 37 layers Feature vector dimension: 4098 Fig 4. Proposed Verification steps

Multi-patch based Face Tracking Employs Multiple patches around the target Categorize patches to reliable/non-reliable categories Track reliable patches Ignore non-reliable patches Result is the average of reliable patches Fig 5. Multi-Patch tracking

DVT System Framework Fig 6. Flowchart of the proposed Long-term face tracking method, DVT

Demonstration Fig 7. Demonstration of the DVT system for pausing the video and selecting the target face to be tracked

Fig 13. Demonstration of the DVT tracking results

Fig 8. An example of DVT output sequence.

Results Implemented by Matlab R2015b MatConvNet GUI implemented in Java Threshold Similarity threshold: 0.75 Skip time: 3s Running Time Video Duration *2 Table 2. The comparison between TLD, Face-TLD and the proposed DVT method in terms of precision and recall the sitcom IT-Crowd (first series, first episode).

Contents Introduction Long-Term Face Tracking using ConvNet “ FaceSequence ”: Video dataset for Face Recognition “ Stream-Loss ”: ConvNet Learning for Face Verification Conclusion & Future Work

“ FaceSequence ”: Video dataset for Face Recognition Stream Collection & Labeling A highly automated strategy Based on long-term face tracking Using noisy videos collected from web FaceSequence Statistics Stream Samples FaceSequence Advantages

Stream Collection & Labeling Steps: Video Collection Face videos are collected from the web Videos are curated to control biases in: ethnicity gender age Original Target selection ( ) Employing a face detection algorithm Negative sample selection ( ) Detecting other faces from the same frame as Positive sample stream selection ( ) Deploying face tracking algorithm Tracking for a specific time period

FaceSequence Statistics Table 3. Characteristics of FaceSequence dataset, including total number of collected videos, number of streams extracted from videos, and number of frames per each stream.

Stream Samples Fig 9. A sample of streams of frames available in FaceSequence dataset for 5 identities .

FaceSequence Advantages Contain streams of frames Extracted from noisy videos Retains higher similarity per subject Stream are stills from 1 second of video Images in a stream more similar in terms of: Background Lighting resolution Widely scalable : In public face datasets: labeled celebrity photos are crawled from the web challenging to assemble millions of individuals In private face datasets: human annotators are involved to expand the data costly and time consuming In FaceSequence , streams are automatically labeled no human interaction is in the labelling loop expandable

Contents Introduction Long-Term Face Tracking using ConvNet “ FaceSequence ”: Video dataset for Face Recognition “ Stream-Loss ”: ConvNet Learning for Face Verification Conclusion & Future Work

“Stream-Loss”: ConvNet Learning for Face Verification ConvNet -based Face Recognition Methods Loss Learning Approaches Video-based ConvNet Models Stream-based ConvNet Learning Method Proposed Architecture Design Stream Loss Learning Experimental Results LFW and YTF Datasets IJB-A Video Dataset

Loss Learning Approaches Contrastive loss Based on the distance between two objects Triplet loss Based on the distance between three objects Multiple loss Based on the distance between multiple objects Fig 10. Triplet loss Fig 11. multiple loss

Video-based ConvNet Models Adapting an image-based ConvNet to videos Mapping each face image into a single feature vector Using a ConvNet Aggregating all feature vectors into a single one Using an aggregation function (average, max, …) Examples: Neural aggregation Network (NAN) Input aggregated Network Pros: Simple architecture Cons: Each frame is treated like a still image The temporal relation between frames is ignored Fig 12. NAN architecture for video face recognition .

Stream-based ConvNet Learning Method We proposed: A novel video-based ConvNet architecture Training inputs are stream of videos (rather than images) Designed for face verification A video-based loss learning approach Named stream-loss

Proposed Architecture Design Fig 13. The architecture of the stream-based ConvNet for video face recognition .

Proposed Architecture Design ( cont …) Fig 14. Proposed flowchart for face verification.

Stream Loss Learning Set of input images Set of output feature vectors L2 Norm distance between pairs of yo, yp, yn

Stream Loss Learning ( cont …) smooth-max and smooth-min function

Stream Loss Learning ( cont …) Stream-loss function

Experimental Results Train the network FaceSequence Dataset Test on image dataset: LFW Dataset includes 13,233 images from 5,749 different identities, YTF Dataset includes 3,425 videos from 1,595 different identities . Test on video dataset IJB-A video data includes 5,397 images and 2,042 videos from 500 identities

Experiments on LFW and YTF Datasets Table 4. Comparison of Verification Performance of Different Methods on the LFW and YTF Datasets.

Experiments on IJB-A Video Dataset Table 5. Performance comparison on the IJB-A dataset. TAR/FAR: True/False Acceptance Rate for verification. The TAR of our method at FAR=0.01 reduces the error of VGG by 67% which demonstrates a significant improvement.

Contents Introduction Long-Term Face Tracking using ConvNet “ FaceSequence ”: Video dataset for Face Recognition “ Stream-Loss ”: ConvNet Learning for Face Verification Conclusion & Future Work

Conclusion

Future Work Feeding streams of negative examples into ConvNet (instead of only one negative example): Improve the loss learning procedure Design a new stream loss function Introducing a new noise layer into the proposed ConvNet Incorporating a modification signal to the stream-loss function to calculate the statistics of label noise. Adapting the network to the noisiness nature of the generated dataset Steps: Train the ConvNet to clean noisy annotations in the large dataset (e.g. FaceSequence ) using clean labels from the same domain Fine-tune the network using both the clean labels and the full dataset with reduced noise.

Thank You! Elaheh Rashedi [email protected]

“Stream loss”: ConvNet learning for face verification using unlabeled videos in the wild

About This Presentation

Slide Content

Tags

Categories

Download

Quick Actions

Statistics

Related Slideshows

“Stream loss”: ConvNet learning for face verification using unlabeled videos in the wild

About This Presentation

Slide Content

Slide 1

Slide 2

Slide 3

Slide 4

Slide 5

Slide 6

Slide 7

Slide 8

Slide 9

Slide 10

Slide 11

Slide 12

Slide 13

Slide 14

Slide 15

Slide 16

Slide 17

Slide 18

Slide 19

Slide 20

Slide 21

Slide 22

Slide 23

Slide 24

Slide 25

Slide 26

Slide 27

Slide 28

Slide 29

Slide 30

Slide 31

Slide 32

Slide 33

Slide 34

Slide 35

Slide 36

Slide 37

Slide 38

Slide 39

Slide 40

Slide 41

Slide 42

Slide 43

Slide 44

Slide 45

Tags

Categories

Download

Quick Actions

Statistics

Related Slideshows

8-top-ai-courses-for-customer-support-representatives-in-2025.pptx

7-essential-ai-courses-for-call-center-supervisors-in-2025.pptx

25-essential-ai-courses-for-user-support-specialists-in-2025.pptx

8-essential-ai-courses-for-insurance-customer-service-representatives-in-2025.pptx

Know for Certain

PPT OPD LES 3ertt4t4tqqqe23e3e3rq2qq232.pptx