PR12-151 The Unreasonable Effectiveness of Deep Features as a Perceptual Metric

TaesuKim3 335 views 13 slides Mar 24, 2019
Slide 1
Slide 1 of 13
Slide 1
1
Slide 2
2
Slide 3
3
Slide 4
4
Slide 5
5
Slide 6
6
Slide 7
7
Slide 8
8
Slide 9
9
Slide 10
10
Slide 11
11
Slide 12
12
Slide 13
13

About This Presentation

Paper review: "The Unreasonable Effectiveness of Deep Features as a Perceptual Metric"
Presented at Tensorflow-KR paper review forum (#PR12) by Taesu Kim
Paper link: https://arxiv.org/abs/1801.03924
Video link: https://youtu.be/VDeJFb5jt5M (in Korean)


Slide Content

The Unreasonable Effectiveness of
Deep Features as a Perceptual Metric
R. Zhang et al, UC Berkeley, OpenAI, Adobe
Presented by Taesu Kim
Mar.24,2019

Perceptual Similarity
›Which patch (left or right) is closer to the middle patch?
Peak Signal to Noise RatioStructural Similarity

Contributions
›A large-scale, highly varied, perceptual similarity dataset,
containing 484k human judgments
–Both parameterized distortion and real algorithm distortion
›Show all typesofdeepfeatures outperforms traditional metrics
›Show network architecture alone doesn’t account for the
performance
›Train new metric (LPIPS) on perceptual judgments
›Improve performance by calibrating feature response from a pre-
trained network.

Psychophysical Similarity Measurements
›Two Alternative Forced Choice (2AFC)
–Goal: Collect large-scale set of human perceptual judgements on distortions
–Procedure: Sample apatch.Distortittwice. Ask human which is smaller.
–2judgmentpertrainpatch, 5 judgment per valpatch
›Just Noticeable Differences (JND)
–Goal: validate 2AFC with less “cognitively penetrable” test
–Procedure: ask human if 2 patches are identical or not
–Provide a short training period of 10 pairs, containing 4 ”same” pairs, 1
obviously different pair, 5 different with distortions
–3judgments perpatch

Database
Traditional distortion CNN based distortion
Database comparison

Database

Deep Networks as a Perceptual Metric
›Lin: keep pre-trained network weights and learn linear weights w
›Tune: initialize from a pre-trained classification model and allow all the weights for network to be fine-tuned
›Scratch: Initialize the network from random Gaussian weights and train it on our judgment
Learned Perceptual Image Patch Similarity (LPIPS) metric

How well do low-level metrics and classification networks perform?
Does the network have to be trained on classification?
Can we train a metric on traditional and CNN-based distortions?
Does training on traditional and CNN-based distortions transfer to
real-world scenarios?

Do metrics correlate across different perceptual task?

Where do deep metrics and low-level metrics disagree?
Primary difference is that deep embeddings appear to be more sensitive to blur

Conclusions
›Introduce a new dataset of human perceptual similarity judgments
›Evaluate deep features across different architectures and task
›Deep features outperform all previous metrics
›The result is not restricted to VGG, but holds across different deep architectures and
level of supervision
›Perceptual similarity is an emergent property shared across deep visual
representations
›The networkstrainedtosolvechallengingvisualpredictionandmodelingtasksend
uplearningarepresentation of the world that correlates well with perceptual
judgments
›Thestrongerafeaturesetisatclassificationanddetection,thestrongeritisasa
modelofperceptualsimilarityjudgments
›Features that are good at semantic tasks are also good at self-supervised and
unsupervised tasks, and also provide good models of human perceptual behavior

Follow us:
Contact us:
[email protected]
For more information:
http://www.neosapience.com