Multimedia Information Retrieval

StephaneMarchandMaillet 8,315 views 138 slides Sep 04, 2015
Slide 1
Slide 1 of 138
Slide 1
1
Slide 2
2
Slide 3
3
Slide 4
4
Slide 5
5
Slide 6
6
Slide 7
7
Slide 8
8
Slide 9
9
Slide 10
10
Slide 11
11
Slide 12
12
Slide 13
13
Slide 14
14
Slide 15
15
Slide 16
16
Slide 17
17
Slide 18
18
Slide 19
19
Slide 20
20
Slide 21
21
Slide 22
22
Slide 23
23
Slide 24
24
Slide 25
25
Slide 26
26
Slide 27
27
Slide 28
28
Slide 29
29
Slide 30
30
Slide 31
31
Slide 32
32
Slide 33
33
Slide 34
34
Slide 35
35
Slide 36
36
Slide 37
37
Slide 38
38
Slide 39
39
Slide 40
40
Slide 41
41
Slide 42
42
Slide 43
43
Slide 44
44
Slide 45
45
Slide 46
46
Slide 47
47
Slide 48
48
Slide 49
49
Slide 50
50
Slide 51
51
Slide 52
52
Slide 53
53
Slide 54
54
Slide 55
55
Slide 56
56
Slide 57
57
Slide 58
58
Slide 59
59
Slide 60
60
Slide 61
61
Slide 62
62
Slide 63
63
Slide 64
64
Slide 65
65
Slide 66
66
Slide 67
67
Slide 68
68
Slide 69
69
Slide 70
70
Slide 71
71
Slide 72
72
Slide 73
73
Slide 74
74
Slide 75
75
Slide 76
76
Slide 77
77
Slide 78
78
Slide 79
79
Slide 80
80
Slide 81
81
Slide 82
82
Slide 83
83
Slide 84
84
Slide 85
85
Slide 86
86
Slide 87
87
Slide 88
88
Slide 89
89
Slide 90
90
Slide 91
91
Slide 92
92
Slide 93
93
Slide 94
94
Slide 95
95
Slide 96
96
Slide 97
97
Slide 98
98
Slide 99
99
Slide 100
100
Slide 101
101
Slide 102
102
Slide 103
103
Slide 104
104
Slide 105
105
Slide 106
106
Slide 107
107
Slide 108
108
Slide 109
109
Slide 110
110
Slide 111
111
Slide 112
112
Slide 113
113
Slide 114
114
Slide 115
115
Slide 116
116
Slide 117
117
Slide 118
118
Slide 119
119
Slide 120
120
Slide 121
121
Slide 122
122
Slide 123
123
Slide 124
124
Slide 125
125
Slide 126
126
Slide 127
127
Slide 128
128
Slide 129
129
Slide 130
130
Slide 131
131
Slide 132
132
Slide 133
133
Slide 134
134
Slide 135
135
Slide 136
136
Slide 137
137
Slide 138
138

About This Presentation

Introduction to Multimedia Information Retrieval with an intuitive approach. Tutorial for the non-specialist


Slide Content

© [email protected] – University of Geneva – Multimedia Retrieval – September 2015 - 1
Multimedia Information Retrieval
Stephane Marchand-Maillet
Viper group
University of Geneva
Switzerland
http://viper.unige.ch

© [email protected] – University of Geneva – Multimedia Retrieval – September 2015 - 2
•What is your background?
–Computer Science
–Scientific
–Humanities

•This lecture introduces intuition from
examples
–More technical details in recommended reading

Quick get-to-know

© [email protected] – University of Geneva – Multimedia Retrieval – September 2015 - 3
•To introduce (recall, rehearse) some key
principles and techniques of IR

•To show how these are adapted to Multimedia

•To look at the Multimedia context

•To see how to best face it
Objectives of the lecture

© [email protected] – University of Geneva – Multimedia Retrieval – September 2015 - 4
Outline
•Motivation and context

•Preliminary material on IR

•Audio-visual IR

•Complements

•Evaluation





4
Note: Several illustrations from within these slides have been borrowed from the Web, including Wikipedia or teaching
material. Please do not reproduce without permission from the respective authors. When in doubt, don't.

© [email protected] – University of Geneva – Multimedia Retrieval – September 2015 - 5
Data Production
•Growth of Data
–1,250B GB (=1.2EB) of data generated in
2010.
–Data generation growing at a rate of 58%
per year
•Baraniuk, R., “More is Less: Signal Processing and the
Data Deluge”, Science, V331, 2011.

1 exabyte (EB) = 1,073,741,824 gigabytes
0
2000
4000
6000
8000
10000
2010 2011 2012 2013 2014
Data Size (EB)

Data Generation Growth
http://www.intel.com/conte
nt/www/us/en/communicati
ons/internet-minute-
infographic.html
http://www.ritholtz.com/blog
/2011/12/financial-industry-
interconnectedness/
Internet
Scientific
Industry
Data
By Sverre Jarp,
By Felix'Schürmann
© Copyright attached

© [email protected] – University of Geneva – Multimedia Retrieval – September 2015 - 6
A digital world
[From http://1000memories.com/blog/94-number-of-photos-ever-taken-digital-and-analog-in-shoebox]

© [email protected] – University of Geneva – Multimedia Retrieval – September 2015 - 7
[Picture from: http://www.intel.com/content/www/us/en/communications/internet-minute-infographic.html]
Data communication
© Copyright attached

© [email protected] – University of Geneva – Multimedia Retrieval – September 2015 - 8
User “productivity”
[Picture from: http://www.go-gulf.com/wp-content/themes/go-gulf/blog/online-time.jpg - Feb 2012]
© Copyright attached

© [email protected] – University of Geneva – Multimedia Retrieval – September 2015 - 9
Motivation
•Decision making requires informed
choices
•Data production ever easier
•The information is often
overwhelming
–« Big Data » trend
•The information is often not easy to
manage and access


We need to bring a structure to the raw data
•Document (data) representation
•Similarity measurements
•Further analysis: data mining, information retrieval, learning
© Copyright attached

© [email protected] – University of Geneva – Multimedia Retrieval – September 2015 - 10
Components of Data Quality
•Validity
•Integrity
•Completeness
•Consistency
•Relevance
•Accuracy
and…
•Timeliness
•Interpretability
•Reliability
•…
 Applies on all media

© [email protected] – University of Geneva – Multimedia Retrieval – September 2015 - 11
•Searching for multimedia
–Is it like searching text?

•Information retrieval model

•Representing documents
–BoW and TF*IDF

•Indexing model
–Inverted files
–hashing
Preliminary

© [email protected] – University of Geneva – Multimedia Retrieval – September 2015 - 12
Searching for multimedia
Basic solution:
•Attach text to multimedia and search for text
–Tags, keywords
–Description

“Multimedia Annotation”
–Domain of “Knowledge Management”
–Requires the definition of Ontologies, Taxonomies,…
Human in the loop
–Not scalable, not feasible

© [email protected] – University of Geneva – Multimedia Retrieval – September 2015 - 13
[Picture from: http://www.intel.com/content/www/us/en/communications/internet-minute-infographic.html]
Per minute rate (some years ago!)
© Copyright attached

© [email protected] – University of Geneva – Multimedia Retrieval – September 2015 - 14
•From the sensing device
–Device type, model
–Sensing conditions (flash/no flash)
–Author (device owner?)
–…
•From the environment
–Time, date, location
–Environment conditions (inside/outside, weather)
–Spoken languages
–…
•More? From the content?
–Later in this lecture

Note: Can never be exhaustive

Auto-annotation

© [email protected] – University of Geneva – Multimedia Retrieval – September 2015 - 15
•“Wasp on a yellow flower”



•Specific (latin) name of the flower? Of the insect?
•What is the insect exactly doing?

•Weather looks nice: “countryside scene”?
•The picture may be arty:
“2015 1
st
price college photo competition winner”

Inference cannot resolve all
•“The wasp who scared my daughter during our holiday in
Gruyere”


Annotation

© [email protected] – University of Geneva – Multimedia Retrieval – September 2015 - 16
•Exif: standard picture annotation container
•ID3-tag: standard audio information
•NTFS: window-based metadata exchange
•JPSearch, MPEG 7: Search-oriented description
standards

•Every format will have a text header to insert
“comments” (PNG, GIF, JPG, TIFF,…)

 Complementarity text-based / content-based
•Other category is recommender-based
–“social” annotation
Annotating mulitmedia

© [email protected] – University of Geneva – Multimedia Retrieval – September 2015 - 17
MPEG-7 Still Image Description

© [email protected] – University of Geneva – Multimedia Retrieval – September 2015 - 18
IBM MPEG-7 Annotation Tool (video/visual stream)
[http://www.research.ibm.com/VideoAnnEx/]

© [email protected] – University of Geneva – Multimedia Retrieval – September 2015 - 19
•Given a query (information need)
•Given a repository (information source)
 Infer the most relevant documents from the repository as an
answer to the query

The most used query type is Query-by-Example
“I want something like this”
•Descriptive query
•Free text (Google-like) query
•Example document-s (“ ”)


•Relevance is similarity
•Similarity is transferred to “closeness”
More like this…
Information Retrieval paradigm

© [email protected] – University of Geneva – Multimedia Retrieval – September 2015 - 20
•IR tells us that we do not (really) need to have a
complete absolute understanding of documents
to respond queries, we (just) need to be able to
compare 2 documents

•We merely need appropriate distance
measurements
–To infer similarity
–Based on document features (space)
–and a notion of vicinity (distance)

 Everything is about inspecting neighborhoods
–Provided the distance is semantically relevant

Information Retrieval

© [email protected] – University of Geneva – Multimedia Retrieval – September 2015 - 21
Information management process
Raw documents
Representation space
(visualisation)
Document features
User interaction
Feature extraction
“Appropriate”
mapping
“Decision”
process
Query

© [email protected] – University of Geneva – Multimedia Retrieval – September 2015 - 22
Distance-based search
Representation space
(visualisation)
Query
Relevance list
(sorted by distance
to the query)

© [email protected] – University of Geneva – Multimedia Retrieval – September 2015 - 23
Example: text
Text documents
Feature extraction
“Appropriate”
mapping
User interaction
“Decision”
process
“Word” occurrences
Query

© [email protected] – University of Geneva – Multimedia Retrieval – September 2015 - 24
Also...
•Any type of media: webpage, audio, video,
data,...
•Objects, based on their characteristics
•People in social networks
•Concepts: processes, states, ... Etc

Anything for which “characteristics” may be
measured
This is what this lecture is about

© [email protected] – University of Geneva – Multimedia Retrieval – September 2015 - 25
•Raw data (the documents) carries information
+ Computers essentially perform additions
We need to represent the data somehow to
provide the computer with as much as possible
faithful information

•The representation is an opportunity for us to
transfer some prior (domain) knowledge as
design assumptions

If this (data modelling) step is flawed, the computer
will work with random information
Representation spaces (intuition)

© [email protected] – University of Geneva – Multimedia Retrieval – September 2015 - 26
From documents to features
•Features help characterizing 1 document (summary)
•Features help comparing 2 documents (similarity)
•Features help structuring a collection (visualization)

© [email protected] – University of Geneva – Multimedia Retrieval – September 2015 - 27
Comparing two documents (occurrences)
2 1 1 2 1 0 0
1 2 0 1 1 1 2
1 2 0 1 1 1 2
1 1 0 1 1 0 0
“Terms” (Vocabulary)
“Documents” (Repository)

© [email protected] – University of Geneva – Multimedia Retrieval – September 2015 - 28
Comparing two documents (presence)
Y Y Y Y Y N N
Y Y N N Y Y Y
Y Y N N Y Y Y
Y Y N Y Y N N
Similarity: Y+Y=1 Distance: Y+N=1

© [email protected] – University of Geneva – Multimedia Retrieval – September 2015 - 29
Comparing two documents (presence)
Y Y Y Y Y N N
Y Y N Y Y Y Y
1 1 0 1 1 0 0
Similarity = 4
(Distance=3)

© [email protected] – University of Geneva – Multimedia Retrieval – September 2015 - 30
Comparing two documents
6 (0)
3 (4) 3 (4)
4 (1)
3 (2)
3 (2)
S(D)

© [email protected] – University of Geneva – Multimedia Retrieval – September 2015 - 31
“ “
Comparing two documents
S(D)
D1 Y Y Y Y Y N N 1(6)
D2 Y Y N N Y Y Y 2(4)
D3 Y Y N N Y Y Y 2(4)
D4 Y Y N Y Y N N 0 (7)
Q N N Y N N Y Y
Nota: The query is seen as a document (Query-by-example)

© [email protected] – University of Geneva – Multimedia Retrieval – September 2015 - 32
Query-based ranking
“ “



•Since similarity is computed only on (Y,Y) pairs,
only terms count for computing the similarity

Every document having no common term
with the query scores 0
(ie terms not in the query are ignored)
2(4) 2(4) 1(6) 0(7)
D4 D3 D1 D2
Q

© [email protected] – University of Geneva – Multimedia Retrieval – September 2015 - 33
Computing the ranks
Naïve solution (fill the array):
•For every document, for every term, score 1 if
the term is both in the query and the
document
 N (documents) * M (terms) comparisons

Smarter solution
•For every term of the query (few), find
documents having this term and work from
there (the rest will score 0)

© [email protected] – University of Geneva – Multimedia Retrieval – September 2015 - 34
Information indexing
Goal:
to organize the data so as to
facilitate its access (read or write)

•Fast access for computation
•Fast access with selection criteria
•…

Baseline: exhaustive search, sequential scan (unordered list)
Strategy: enable “direct access”

Simple example:
•Address book: sorted list, with first letter shortcut
–Smith  “S”  search 19
th
sublist (ignore 25 other sublists)

© [email protected] – University of Geneva – Multimedia Retrieval – September 2015 - 35
Indexing structures
…+
M-tree
Tries
Suffix array
Suffix Tree
Inverted files
LSH…
Illustration: Wikipedia

© [email protected] – University of Geneva – Multimedia Retrieval – September 2015 - 36
Term Documents
D1,D2,D3,D4
D1,D2,D3,D4
D1
D1,D4
D1,D2,D3,D4
D2,D3
D2,D3
“find documents having this term”
Arrange documents by the terms they contain






 Number of comparisons ~ number of query terms
<< N*M
Inverted file
(D1),(D2,D3),(D2,D3)
D1, D2, D2, D3, D3
D4 is never considered !
(we know already that its score is 0)

© [email protected] – University of Geneva – Multimedia Retrieval – September 2015 - 37
Principles and lessons
•We need to be able to compare 2 documents
•Comparison is based on document features
•Similarity may be based on feature occurrence
•Only occurring features count in building
similarity
 Feature occurrence allows for the use of
inverted files to get independent from the size of
the repository
Feature occurrence loses a lot of structure in the
document

© [email protected] – University of Geneva – Multimedia Retrieval – September 2015 - 38
In practice (text IR)
•Document features are normalised word
(stem) occurrences (Bag of Words model)
–Vocabulary ~ 10’000 stems
–Billions of documents

•Term weighting scheme (TF*IDF) accounts for
feature statistics (better than Y/N occurrence)
•Similarity is based on cosine distance
•Inverted files include term statistics

© [email protected] – University of Geneva – Multimedia Retrieval – September 2015 - 39
BoW : Assumptions
•A text is reduced to the base vocabulary it uses
•Example:



•Becomes:



The ultimate goal is to define similarity as:

Two documents are similar if
they share the same vocabulary

Signals from a particle collider near Geneva suggested in
September that scientists might have sighted a long-sought
particle thought to be the source of mass itself.
A (x2), be, collide, from, geneva, have, in, itself, long-sought,
mass, may, near, of,
particle (x2), scientist, see, september, signal, source,
suggest, that, the, think, to

© [email protected] – University of Geneva – Multimedia Retrieval – September 2015 - 40
BoW: Limitations
•The structure is lost
–Title, sections, paragraphs, sentences, negations
•The representation is incomplete…
–If all words appear once only and there is no extra information such as
•Grammatical type
•Priors on terms (recent news, unused terms,…)
–And writing well is often about using a rich vocabulary
•Use of synonyms
•…or ambiguous
–If a word appears many times but because of polysemy
–Eg: mouse
•A part on mouse as a pet
•A part on computer mouse
•A part on Mickey mouse
–Does not make the text about “mouse” but maybe just about “kid
amusements”

… still works very well in practice

© [email protected] – University of Geneva – Multimedia Retrieval – September 2015 - 41
A document is represented by the occurrences of the
terms of the vocabulary
•Earlier example: Y/N occurrence
 Number of occurrence does not matter

•Counting the term occurrence
Favors longer documents (more occurrences)
Document length normalization (frequency)

•Not all frequent terms are relevant
Balance between representation (TF) and discriminative
power (IDF)
How good in my term in my query for this document?
A bit more details

© [email protected] – University of Geneva – Multimedia Retrieval – September 2015 - 42
•Term frequency (TF)
–Percentage of space taken by each term in the document:
“how much this term represents the document”
High TF  frequent term in the document  potential
representative term (keyword)

•Inverse document frequency (IDF)
–Inverse of the percentage of space taken by each term in
the collection:
“how much this term discriminates documents”
High DF  low IDF  term frequent in all documents  not
discriminant (all documents are relevant)
Eg: the, at, and, she, he, her… + thematic (patient, disease,…)

 Good terms are terms with high TF and high IDF
TF*IDF model: intuition

© [email protected] – University of Geneva – Multimedia Retrieval – September 2015 - 43
High TF
(frequent in document)
Potential keyword
Low TF
(infrequent in document)
High IDF
(infrequent in collection)
“Keyword”
Good representative
Verb, noun,..

Low representation

(automatically ignored)
Low IDF
(frequent in collection)
“Stop word”
Does not carry meaning
“structural”
The, at, and, she,…
(removed from voc / ignored)
Rare term

(automatically ignored)
TF*IDF term weight
When computing the difference
between documents, terms are
weighted by their TF*IDF factor

© [email protected] – University of Geneva – Multimedia Retrieval – September 2015 - 44
Inverted file construction
Docs
extraction
sorting compaction
Term Doc ID
“blah” N
“new” 1
“the” 1
“tool” N-1
… ...
“tool” 1
“blah” N
“blah” 1
Term Doc ID
“the” 1
“new” 1
“tool” 1
“blah” N
… ...
“tool” N-1
“blah” N
“blah” 1
Term Doc ID
“blah” 1
“new” 1
“the” 1
“tool” N-1
… ...
“tool” 1
“blah” N
Freq
1
1
1
1
...
1
2
Term Tot. F.
“blah” 3
“the” 1
… ...
“tool” 2
“new” 1
...
Doc ID
1
1
1
N-1
1
N
Freq
1
1
1
1
...
1
2
factorisation
Misc
operations
Eg: posting file compression
Dictionary
Collection information
« idf »
Postings
Document information
« tf »

© [email protected] – University of Geneva – Multimedia Retrieval – September 2015 - 45
Value Key
1
2

20

33

Hashing / fingerprinting (intuition only)
=1
=2
=3
=4
=5
=6
=7
=1+1+2+3+4+4+5 = 20
=1+2+2+4+5+6+7+7 = 33
Hash Table
Query
 Immediate access
=1+2+2+4+5+6+7+7 = 33
•Associate a key (hash, fingerprint) to each value
•Insert the (key, value) into a hash table
Immediate retrieval of the query
Difficult constraint (Perceptual hashing) : The key does
not vary if the value changes a bit (eg due to noise)


Hash function

© [email protected] – University of Geneva – Multimedia Retrieval – September 2015 - 46
What is multimedia?
Basically our digital life

•Text
–Plain text (any language)
–Structured text (XML-like, code,…)
•Visual
–Images (Photo)
–Sketches (drawing, map,..)
•Audio
–Music
–Speech
–Sound
•Misc
–3D object
–Video
–… (software, playlist,…)

+ Any combination
–PDF, Web pages and alike

© [email protected] – University of Geneva – Multimedia Retrieval – September 2015 - 47
Multimedia IR: Use Cases
•Image
–Find look-alike pictures, recognize landmark
•Video
–Find specific actions
•Music
–Find inspirational music, recognize music extract
•Medical
–Find similar cases
•Patent
–Find similar proposals, plagiarism

 Not always doable with text

47

© [email protected] – University of Geneva – Multimedia Retrieval – September 2015 - 48
Volume of multimedia data
•2 billion persons connected to internet
–Mostly via mobile devices (phones)
•1.8 billions active mobile phones
•180 millions active web servers

Sources:
R. Baeze-Yates – RussIR 2010
http://news.netcraft.com/

© [email protected] – University of Geneva – Multimedia Retrieval – September 2015 - 49
Volume of multimedia data
•Web scale:
–Google:
•Early 2004: 4.3 billions indexed pages
•Early 2005: 8 billions indexed pages
•Today: XX billion indexed pages?
–http://www.archive.org
•Growth: 20 Tb/month

•Multimedia collection:
–Facebook: 140 billions photos
180 years at 25 fps
–YouTube (2008): 83.4 million videos

•Curated collections:
–AOL Video library
•410 millions views in October 2011
–Institut National de l'Audiovisuel (INA-France):
•700’000h audio (radio)
•400’000h video (television)
•2 millions documents

© [email protected] – University of Geneva – Multimedia Retrieval – September 2015 - 50
The new wave
[From http://1000memories.com/blog/94-number-of-photos-ever-taken-digital-and-analog-in-shoebox]

© [email protected] – University of Geneva – Multimedia Retrieval – September 2015 - 51
Example: Information « loss »
•Annotation
–" Wasp on a yellow flower "

•Text search:
–« insect feeding
This document will be missed

•The user must know what is the annotation
limitation

•The goal is to represent the document
–Exhaustively: no specific interpretation context
–Automatically: no human intervention

© [email protected] – University of Geneva – Multimedia Retrieval – September 2015 - 52
•Image formation model

•Image features

•Indexing visual content


Visual Information Retrieval

© [email protected] – University of Geneva – Multimedia Retrieval – September 2015 - 53
Visual content has specific types
•Photo
•Medical image, satellite image
•Document: fax, scanned text…
•Drawing: sketch, blueprint, cartoon,…
•3D, shape


Visual content

© [email protected] – University of Geneva – Multimedia Retrieval – September 2015 - 54
Example: Images
Images
Feature extraction
“Appropriate”
mapping
“Decision”
process
Search
Photo collage
Filtering
Color histogram
User interaction
Query

© [email protected] – University of Geneva – Multimedia Retrieval – September 2015 - 55
Pixels
RGB Sensor
RGB Color model
Any color = combination of RGB values
Image = 3 Arrays (RGB) of values between 0 and 255
Pixel = RGB values (color) at a position in the image

© [email protected] – University of Geneva – Multimedia Retrieval – September 2015 - 56
Image processing / Analysis
RGB Histograms
Gray scale (intensity) Edge image

© [email protected] – University of Geneva – Multimedia Retrieval – September 2015 - 57
Low-level image representation
Global Color Histogram data in color spaces:




Global Texture data from Gabor filter banks:

© [email protected] – University of Geneva – Multimedia Retrieval – September 2015 - 58
Corner detection / keypoints

© [email protected] – University of Geneva – Multimedia Retrieval – September 2015 - 59
Local features: SIFT[Lowe 1999] and SURF[Bay et al 2006]

© [email protected] – University of Geneva – Multimedia Retrieval – September 2015 - 60
From data to information
Interpreting the content
•Data is physically measured
•Information is interpreted semantically
Semantic gap

•Fusion is a way to reduce the semantic gap
•Examples
–Looking at a movie w/o audio
–Listening to a story w/o visual
–Voice conversation vs video conversation
–Disambiguation (eg „jaguar“)

© [email protected] – University of Geneva – Multimedia Retrieval – September 2015 - 61
From data to information
Semantic gap:
“Discrepancy between the level of analysis of a computer and the level of
perception of the same data by a human user.”
Level Text Visual
Interpretation Story “Action”
Semantic
segmentation
Paragraph Scene
Group Sentence Object
Physical
segmentation
Word Homogeneous
region
Physical
inspection
Character Color, texture
Level of interpretation

Relevance for indexing


Computer precision

Semantic Gap

© [email protected] – University of Geneva – Multimedia Retrieval – September 2015 - 62
Semantic gap
•Exploited by Captchas

© [email protected] – University of Geneva – Multimedia Retrieval – September 2015 - 63
Region-based image understanding
Images are compared w.r.t the regions
(hopefully objects/concepts) they contain







Object detection eg, Face detection

© [email protected] – University of Geneva – Multimedia Retrieval – September 2015 - 64
Object detection / recognition
•Train the machine to recognize specific groups
of patterns
Faces
[Viola, Jones, 2001]
Objects
 Automated image tagging  Text search

© [email protected] – University of Geneva – Multimedia Retrieval – September 2015 - 65
Naming faces : Image to text
Specific characteristics of a face are extracted
(eg eigenfaces)

© [email protected] – University of Geneva – Multimedia Retrieval – September 2015 - 66
Deep Learning
[http://www.slideshare.net/NVIDIA/visual-computing-the-road-ahead-an-nvidia-ces-2015-presentation-deck]

© [email protected] – University of Geneva – Multimedia Retrieval – September 2015 - 67
Bag of Visual Words
•Once basic visual features are extracted,
–Eg: the description of the surrounding of each corner point
(eg SIFT)
•Basic features are grouped
–Eg corner of an object
•And “quantified”
–Many examples are gathered into one instance
•To create a “visual dictionary” of visual words

•An image is a composition (occurrences) of these visual
words (“terms”)
 We can use of the machinery related to text
- Inverted files, etc

© [email protected] – University of Geneva – Multimedia Retrieval – September 2015 - 68
Comparing two documents (occurrences)
2 1 1 2 1 0 0
1 2 0 1 1 1 2
1 2 0 1 1 1 2
1 1 0 1 1 0 0
“Terms” (Vocabulary)
“Documents” (Repository)

© [email protected] – University of Geneva – Multimedia Retrieval – September 2015 - 69
Hyperbrowser [S. Craver, 1999]

© [email protected] – University of Geneva – Multimedia Retrieval – September 2015 - 70
Hyperbrowser [S. Craver, 1999]

© [email protected] – University of Geneva – Multimedia Retrieval – September 2015 - 71
Digital shoebox

© [email protected] – University of Geneva – Multimedia Retrieval – September 2015 - 72
Digital shoebox

© [email protected] – University of Geneva – Multimedia Retrieval – September 2015 - 73
Collection mining

© [email protected] – University of Geneva – Multimedia Retrieval – September 2015 - 74
•Modeling sound

•Audio information
–Speech, music, sound

•Searching for audio

Audio IR

© [email protected] – University of Geneva – Multimedia Retrieval – September 2015 - 75
Example: Audio
Audio
Feature extraction
“Appropriate”
mapping
“Decision”
process
Playlist
Filtering
User interaction
Query

© [email protected] – University of Geneva – Multimedia Retrieval – September 2015 - 76
Sound characteristics
Sound corresponds to waves perturbing the environment

Wave characteristics

–Frequency/Period
1 Hertz (Hz) = 1 vibration/second
Related notion of pitch

–Intensity


•Human ear has logarithmic scale
Decibel (dB) scale
0dB = 10
-12
W/m
2
(Threshold of hearing)
10dB = 10
-11
W/m
2


x*10dB= 10
-(12-x)
W/m
2

2
m
Watt
Area
Power
Time
Energy
Area
1
I

© [email protected] – University of Geneva – Multimedia Retrieval – September 2015 - 77
Sound perception
Airwave collection
Mapping to
mechanical waves
Mapping to
compressional waves
Mapping to
electrical signal
0dB = 10
-12
W/m
2
: Threshold of hearing
160 dB = 10
-4
W/m
2
: Instant perforation of eardrum

© [email protected] – University of Geneva – Multimedia Retrieval – September 2015 - 78
200k 100k
Sound perception
0
Frequency f (Hz)
Human (20-20k)
Dog (50-45k)
Cat (50-85k)
50 100 500 1k 5k 10k 50k
Bats (50-85k)
Dolphins (-200k)
Bats (-120k)
Elephant (5-)
Infrasound Audible sound Ultrasound

© [email protected] – University of Geneva – Multimedia Retrieval – September 2015 - 79
Audio and Sound
50
100
80
20
500 1k 5k 10k 20k
40
60
0
120
100
Non-audible
Painful
Audible sounds
Speech
Frequency f (Hz)
Intensity I (dB)
Partly adapted from [Schäuble,1997]
Quiet library,
soft whisper
Quiet office,
living room,
Light traffic at a distance,
gentle breeze
Conversation,
Sewing machine
Busy traffic,
noisy restaurant
Subway, heavy city traffic,
alarm clock at 2 feet,
factory noise
Truck traffic, noisy home
appliances, shop tools,
lawnmower
Chain saw,
pneumatic drill

Rock concert in front of
speakers, thunderclap
(180) Rocket launching pad
(140) Gunshot blast, jet plane

© [email protected] – University of Geneva – Multimedia Retrieval – September 2015 - 80
Audio information
•Speech
–Male, female speech
Signal-based speech characterisation
–Automatic speech recognition (ASR)
ASR (text) retrieval

•Music
Genre recognition (Jazz, classic,…)
Excerpt retrieval
Query by humming
…

•Sound
–Car, water, people, animals,…
–Often useful in relation to visual (video)

© [email protected] – University of Geneva – Multimedia Retrieval – September 2015 - 81
Audio information
high
low
natural man-made
Information content

Origin
Animal
sounds
Wind
Water
Speech Music
Urban noise Machines,
engines

© [email protected] – University of Geneva – Multimedia Retrieval – September 2015 - 82
Audio features
•Low-level-audio representation
–Spectrogram
–MFCC


•Mid-level representations
–MIDI strings
–LPC

•High-level representation
–Automatic speech recognition (ASR)

© [email protected] – University of Geneva – Multimedia Retrieval – September 2015 - 83
Translation
Signal-
based
Audio information indexing
Audio
Sound Music Speech
ASR
Wordspotting
Translation
Piece/Genre recognition Classification
Visual
A/V Processing
Transcripts,
Speaker info
Score,
MIDI

© [email protected] – University of Geneva – Multimedia Retrieval – September 2015 - 84
Speech
Speech production
model
Speech perception
model
Note: In vision, mostly only
perception models
Speech

© [email protected] – University of Geneva – Multimedia Retrieval – September 2015 - 85
Speech production
Human apparatus
Speech production scenarios
1.Voiced sound (vocal tract)
2.Fricative sound (‘S’)
3.Plosive sound (‘P’)

© [email protected] – University of Geneva – Multimedia Retrieval – September 2015 - 86
Speech production modeling
Mechanical model
Human apparatus
Multistage process
1.Build up
pressure
2.Release
3.Return to first
state

© [email protected] – University of Geneva – Multimedia Retrieval – September 2015 - 87
Speech signal modeling
Given w(t) as a speech signal


where S(w) is the STFT of w.


The model assume H(w) as the
vocal tract transfert function
and E(w) is the excitation



222
)()()( wEwHwS 

© [email protected] – University of Geneva – Multimedia Retrieval – September 2015 - 88
Note: MP3 coding
Advanced coding model
MP3 = MPEG 1 Audio layer 3
See previous
slides

© [email protected] – University of Geneva – Multimedia Retrieval – September 2015 - 89
Sony music browser

© [email protected] – University of Geneva – Multimedia Retrieval – September 2015 - 90
Islands of music [http://www.oefai.at/~elias/music]
Organisation d'archives musicales

•Basée sur des cartes auto-organisées
de Kohonen (Self organising maps –
SOM)

© [email protected] – University of Geneva – Multimedia Retrieval – September 2015 - 91
ThemeFinder [http://www.themefinder.org/]

© [email protected] – University of Geneva – Multimedia Retrieval – September 2015 - 92
•Design a nice perceptual hash function for
audio (based on perceptual features)
•Slice an audio document (music piece) into
small chunks (large enough to be “unique”)
•For each chunk compute a key through the
hash function and store as value the music
piece ID (title)

Music query by example
Retrieval by noisy chunk

Audio fingerprinting

© [email protected] – University of Geneva – Multimedia Retrieval – September 2015 - 93
•Video type
–Movies
–Amateur souvenir film
–Lectures
–Youtube (short, static)

•Combining individual media (fusion)
–each brings some information…

•Going further: social multimedia…
Multimedia IR

© [email protected] – University of Geneva – Multimedia Retrieval – September 2015 - 94
From data to information
Interpreting the content
•Data is physically measured
•Information is interpreted semantically
Semantic gap

•Fusion is a way to reduce the semantic gap
•Examples
–Looking at a movie w/o audio
–Listening to a story w/o visual
–Voice conversation vs video conversation
–Disambiguation (eg „jaguar“)

© [email protected] – University of Geneva – Multimedia Retrieval – September 2015 - 95
Multimodal video features
Over temporal modalities:
•Segmentation (boundary) cues
•Categorization (region) cues


Word relatedness
measure
Visual syntactic
homogeneity
Multimodal
boundary-based cues
Semantic
region- based cues
Audio
Visual
Text

© [email protected] – University of Geneva – Multimedia Retrieval – September 2015 - 96
Adding context to interpretation
Video Story Segmentation:
Using fusion between the visual and audio
modalities, high-level segmentation may be
achieved

“shot”
 keyframe
Audio
transcript

© [email protected] – University of Geneva – Multimedia Retrieval – September 2015 - 97
Main goals of fusion
Multimodal information fusion aims at interpreting
jointly multiple sources of information
representing the same underlying „concept“
The main goal is the extraction of information

By fusing information, one aims at:
•Being more accurate in the discovery of the
„concept“
–Each individual stream may be incomplete
•Being more robust in the discovery of the
„concept“
–Each individual stream may be distorted (eg, noisy)

© [email protected] – University of Geneva – Multimedia Retrieval – September 2015 - 98
Multimodal fusion
•From many sources of information and
context, how to make our best to “interpret”
the data

representation
color
texture
shape
face
local features
vector space model
Port;;Naval and River;;For
Transportation;;Structure;;Architecture;;Vegetabl
e Garden;;Landscape Farmland -
Countryside);;Landscape;;Landscape
(Seascape);;Landscape;;Panorama;;View of the
City;;View
98
data
Interpretation

© [email protected] – University of Geneva – Multimedia Retrieval – September 2015 - 99
Levels of fusion
How to organise fusion from classical
„unimodal“ interpretation?


Feature
extraction
Recognizer Raw data
„Knowledge“
Output
Unique
source

© [email protected] – University of Geneva – Multimedia Retrieval – September 2015 - 100
Levels of fusion
Raw data
Output
Raw data
Multiple
sources
One unique
output
Recognizer
Decision

© [email protected] – University of Geneva – Multimedia Retrieval – September 2015 - 101
Early fusion strategy
•Acts over features
•All modalities are „concatenated into one“
•Only one decision is taken over the concatenated input
Raw data
Output
Raw data
Multiple
Sources





Fusion
One unique
output
Recognizer
Decision
Example: multimodal medical image analysis
•Sources: modalities (XRay, PET,…)

© [email protected] – University of Geneva – Multimedia Retrieval – September 2015 - 102
Intermediate fusion strategy
•Each source acts as an input for a specific decision
•All recognizers are coupled in some way to form a meta-
recognizer
•One decision is taken















Fusion
Raw data
Output
Raw data
Multiple
sources
One unique
output
Recognizer
Meta-
Recognizer
Recognizer
Decision
Example: Video analysis
•Source: A/V streams

© [email protected] – University of Geneva – Multimedia Retrieval – September 2015 - 103
Late fusion strategy
•Each source is processed individually by a specific recognizer
•Multiple independent initial decisions are taken, possibly associated with
confidence scores
•A final decision is taken based on this output






Final
decision





Fusion
Raw data
Output
Raw data
Multiple
sources
One unique
output
Recognizer
Recognizer
Decision
Decision
Decision
Example: Object recognition
•Source: image regions

© [email protected] – University of Geneva – Multimedia Retrieval – September 2015 - 104
Prototype-based fusion    ,,,,
SIFT)(
xxx
p
72π
query
document x    ,,7,2,
SIFT)(
xxx
p
π

© [email protected] – University of Geneva – Multimedia Retrieval – September 2015 - 105
Concept “Mona Lisa”
Multimedia is complex data
1503-1506 by L. Da Vinci
EN: Mona Lisa
IT: La Gioconda
FR: La Joconde
Visible in Le Louvre, Paris
48° 51′ 34″ N 2° 19′ 54″ E
Type: Oil on poplar
Dimensions: 77 cm × 53 cm (30 in × 21 in)
Photo by Mr. X
Date, context
Reason of trip
Variations from artists
Embedding into prints
Chocolate piece
URLs about Mona Lisa
Books about Mona Lisa
Videos about Mona Lisa
Audio about Mona Lisa
“official picture”
Texts about Mona Lisa
“Factual” data
“Faithful” data
“Related” data

© [email protected] – University of Geneva – Multimedia Retrieval – September 2015 - 106
Networked multimodal data
Data
Tags
Users

© [email protected] – University of Geneva – Multimedia Retrieval – September 2015 - 107


Features
Networked multimodal data
Data
Tags
Users

© [email protected] – University of Geneva – Multimedia Retrieval – September 2015 - 108
•Curse of dimensionality
–Sparsity
–Dimension reduction/visualisation

•Big data
–Indexing
–Visual Analytics

Complements

© [email protected] – University of Geneva – Multimedia Retrieval – September 2015 - 109
Statistics tell us that the more data we have, the
better we get

Simulation: exponential distribution
n=10 n=100 n=1’000
n=3’000
n=10’000 Mean Standard deviation

© [email protected] – University of Geneva – Multimedia Retrieval – September 2015 - 110
Imagine I want to know the age spread of a population
•I create 10 intervals (0-10, 10-20,…) and I want to know the proportion of
people in each interval
•To have a reliable estimate I want 100 persons per interval in average
With 10*100=1000 persons, I get my estimates

Now, imagine I want to estimate the height against age (I add one dimension
to my measure)
•I create 10 height interval and question, “how many people with height X
that are aged Y?”
•I have 10*10=100 such (X,Y) pairs
•To get 100 persons for each pair in average I need 100*100 = 10’000
persons

And so on…
The size of the data needed for a reliable estimate grows with the
exponential of the dimension (huge!!!)
In practice our data size is limited
Our only choice to preserve reliability is to reduce the dimension

High dimensionality

© [email protected] – University of Geneva – Multimedia Retrieval – September 2015 - 111
Dimension reduction: principle
•Given a set of data in a M-dimensional space, we
seek an equivalent representation of lower
dimension
–For better statistics
–For visualization (2D, 3D,..)
•Dimension reduction induces a loss. What to
sacrifice? What to preserve?
–Preserve local: neighbourhood, distances
–Preserve global: distribution of data, variance
–Sacrifice local: noise
–Sacrifice global: large distances
–Map linearly
–Unfold data

© [email protected] – University of Geneva – Multimedia Retrieval – September 2015 - 112
Some example techniques:
•SFC: preserve neighbourhoods
•PCA: preserve global linear structures
•MDS: preserve linear neighbourhoods
•IsoMAP: Unfold neighbourhoods
•SNE family: unfold statistically

Not studied here (but also valid):
•SOM (visualisation), LLE, Random projections
(hashing)
Dimension reduction

© [email protected] – University of Geneva – Multimedia Retrieval – September 2015 - 113
Non-recursive Space Filling Curves
Z-Scan Curve Snake Scan Curve
[Illustrations from the lecture “SFC in Info Viz”, Jiwen Huo, Uni Waterloo, CA]

© [email protected] – University of Geneva – Multimedia Retrieval – September 2015 - 114
Recursive Space Filling Curves
Hilbert Curve
Gray Code Curve
Peano Curve
[Illustrations from the lecture “SFC in Info Viz”, Jiwen Huo, Uni Waterloo, CA]

© [email protected] – University of Geneva – Multimedia Retrieval – September 2015 - 115
3D SFC
Z Curve Peano Curve
Hilbert Curve
N-dimensional algorithm:

A.R. Butz (April 1971). "Alternative algorithm for Hilbert’s space filling curve.".
IEEE Trans. On Computers, 20: 424–42.
[Illustrations from the lecture “SFC in Info Viz”, Jiwen Huo, Uni Waterloo, CA]

© [email protected] – University of Geneva – Multimedia Retrieval – September 2015 - 116

PCA
[Illustration Wikipedia]

© [email protected] – University of Geneva – Multimedia Retrieval – September 2015 - 117
•A local neighbourhood graph (eg 5-NN graph) is built to create
a topology and ensure continuity
•Distances are replaced by geodesics (paths on the
neighbourhood graph)
•MDS is applied on this interdistance matrix (eg with m=2)
IsoMap (non Euclidean)
[Illustration from http://isomap.stanford. edu]

© [email protected] – University of Geneva – Multimedia Retrieval – September 2015 - 118
•MNIST dataset
t-SNE example
[Illustration from L. van der Maaten’s website]

© [email protected] – University of Geneva – Multimedia Retrieval – September 2015 - 119
Traces of our everyday activities can be:
•Captured, exchanged (production,
communication)
•Aggregated, Stored
•Filtered, Mined (Processing)
The “V”’s of Big Data:
•Volume, Variety, Velocity (technical)
•and hopefully... Value
Raw data is almost worthless, the added value is in
juicing the data into information (and knowledge)
Big Data

© [email protected] – University of Geneva – Multimedia Retrieval – September 2015 - 120

© [email protected] – University of Geneva – Multimedia Retrieval – September 2015 - 121
Large-scale: Massive data volume, too large to
perform sequential scan of a small portion

Aim:
From a query, “filtering” by (fast) approximate
search) a tiny (fixed-size) portion of the data as
candidate answers. And then perform (slow) exact
search in that tiny-scale sample

Method:
Assuming the features group the document
coherently, approximately identify the vicinity of
the query and search within that neighborhood

Tools for Large Scale indexing

© [email protected] – University of Geneva – Multimedia Retrieval – September 2015 - 122
Approximate search
Fast approximate filtering
border (exact search zone)
Actual slow border
Final results Never considered

© [email protected] – University of Geneva – Multimedia Retrieval – September 2015 - 123
Idea: reference-based indexing
“If I am close to you, I see the same thing as you”

I am close to Geneva, Montreux and Evian, anything also close
to these places is close to me

•Fix few reference documents (landmarks)
•For each document, measure the distance to each
reference document
–Eg: define the set of the 5 closest landmarks

•For a query
–Find its 5 closest landmarks
–Find all documents whose 5 closest landmarks are the same
–Actually (slow distance) compare to these
Fast approximate filtering

© [email protected] – University of Geneva – Multimedia Retrieval – September 2015 - 124
•Fix few reference documents (landmarks)
Can be done offline
•For each document, measure the distance to each
reference document
–Eg: define the set of the 5 closest landmarks
Also done offline

•For a query
–Find its 5 closest landmarks
Fast since few landmarks
–Find all documents whose 5 closest landmarks are the
same
Can be made fast (eg Inverted File)
–Actually (slow distance) compare to these
 Fast since tiny sample
Fast approximate filtering

© [email protected] – University of Geneva – Multimedia Retrieval – September 2015 - 125
Permutation-based Indexing
L(x
1, R)= (??????
1, ??????
2, ??????
3, ??????
4, ??????
5) L(x
2, R)=(??????
1, ??????
2, ??????
3, ??????
4, ??????
5) L(x
3, R)= (??????
5, ??????
3, ??????
2, ??????
4, ??????
1)
n=5:
D={x
1, . . . , x
??????}, N objects,

R = {??????
1, . . . , ??????
&#3627408475;} ⊂ D, n references


Each &#3627408476;
&#3627408470; is identified by an ordered list:
L(x
&#3627408470;, R)= {??????
&#3627408470;1, . . . , ??????
&#3627408470;&#3627408475;} such that

d(x
&#3627408470;, ??????
&#3627408470;&#3627408471;) ≤ d(x
&#3627408470;, ??????
&#3627408470;&#3627408471;+1 ) ∀j = 1, . . . , n − 1

x
y
z 1
x 2
x 3x 4
x 5x r
1 r
2 r
3 r
4 r
5  
j
ririSFD
rank
i
jj
RxLRqLxqdxqd
|| ),(),(),(),(

© [email protected] – University of Geneva – Multimedia Retrieval – September 2015 - 126
•Assess system performance on standard
challenges
–Data collection
–Annotation / groud-truth collection
–Query formulation
–Performance measures ( precision, recall,…)
Yearly event / related to scientific conference

In general, participants are academic teams


Evaluation

© [email protected] – University of Geneva – Multimedia Retrieval – September 2015 - 127
ImageCLEF
http://www.imageclef.org/
CLEF= Cross
Lingual Evaluation
Forum

© [email protected] – University of Geneva – Multimedia Retrieval – September 2015 - 128
ImageCLEF Wikipedia
Images + text (Overall: 237,434)
•English only: 70,127
•German only: 50,291
•French only: 28,461
•English and German: 26,880

• English and French: 20,747
• German and French: 9,646
• English, German and French: 22,899
• Language undetermined: 8,144
• No textual annotation: 239

© [email protected] – University of Geneva – Multimedia Retrieval – September 2015 - 129
TRECVid
The goal of the conference series is to encourage research in
information retrieval by providing a large test collection, uniform
scoring procedures, and a forum for organizations interested in
comparing their results.
http://trecvid.nist.gov/

© [email protected] – University of Geneva – Multimedia Retrieval – September 2015 - 130
MIREX

http://www.music-ir.org


•The Music Information Retrieval Evaluation
eXchange (MIREX) is an annual evaluation
campaign for Music Information Retrieval
(MIR) algorithms, coupled to the International
Society (and Conference) for Music
Information Retrieval (ISMIR)

© [email protected] – University of Geneva – Multimedia Retrieval – September 2015 - 131
Other
•3D Retrieval
–SHREC: Shape Retrieval Contest
–http://www.aimatshape.net/event/SHREC
•XML retrieval
–INitiative for the Evaluation of XML retrieval
–https://inex.mmci.uni-saarland.de/
•Many dedicated corpora for specific tasks
–Generic (Image-Net)
–Medical (ImageCLEF)
–Satellite imagery
–…

© [email protected] – University of Geneva – Multimedia Retrieval – September 2015 - 132
Conclusions
•Multimedia Information Retrieval extends classical IR
–Word occurrence model is preserved

•Text IR is not always sufficient / feasible
–Completeness
–Scalability

•Multimedia IR requires accurate interpretation of multimodal
data

•Fusion is required for reaching such an accurate level

•Multimedia IR still faces many challenges
–Accuracy, interactivity, scalability

© [email protected] – University of Geneva – Multimedia Retrieval – September 2015 - 133
Challenges
•Speed and accuracy of response are the major
challenges
•IR tends to be mobile
–Low computational power, connection, memory,…
•Large scale
–Big data
•Partial search
–Searching for a part of a document
•Semantic search
–Searching with high-level description

© [email protected] – University of Geneva – Multimedia Retrieval – September 2015 - 134
•Baeza-Yates R. and Ribeiro-Neto B. (1999): Modern
Information Retrieval. Addison-Wesley.
http://people.ischool.berkeley.edu/~hearst/irbook

•Baeza-Yates R. and Ribeiro-Neto B. (2011): Modern
Information Retrieval. Addison-Wesley, Second
Edition
http://www.mir2ed.org/


Recommended reading

© [email protected] – University of Geneva – Multimedia Retrieval – September 2015 - 135
Thank you!







Questions?

© [email protected] – University of Geneva – Multimedia Retrieval – September 2015 - 136
Big Data and Large-scale data

–Mohammed, H., & Marchand-Maillet, S. (2015). Scalable Indexing for
Big Data Processing. Chapman & Hall.
–Marchand-Maillet, S., & Hofreiter, B. (2014). Big Data Management
and Analysis for Business Informatics. Enterprise Modelling and
Information Systems Architectures (EMISA), 9.
–M. von Wyl, H. Mohamed, E. Bruno, S. Marchand-Maillet, “A parallel
cross-modal search engine over large-scale multimedia collections
with interactive relevance feedback” in ICMR 2011 - ACM International
Conference on Multimedia Retrieval.
–H. Mohamed, M. von Wyl, E. Bruno and S. Marchand-Maillet,
“Learning-based interactive retrieval in large-scale multimedia
collections” in AMR 2011 - 9th International Workshop on Adaptive
Multimedia Retrieval.
–von Wyl, M., Hofreiter, B., & Marchand-Maillet, S. (2012).
Serendipitous Exploration of Large-scale Product Catalogs. In 14th IEEE
International Conference on Commerce and Enterprise Computing
(CEC 2012), Hangzhou, CN.

More at http://viper.unige.ch/publications
References

© [email protected] – University of Geneva – Multimedia Retrieval – September 2015 - 137
References
Large-scale Indexing

–Mohamed, H., & Marchand-Maillet, S. (2015). Quantized Ranking for Permutation-
Based Indexing. Information Systems.
–Mohamed, H., Osipyan, H., & Marchand-Maillet, S. (2014). Multi-Core (CPU and
GPU) For Permutation-Based Indexing. In Proceedings of the 7th Internation
Conference on Similarity Search and Applications (SISAP2014), Los Cabos, Mexico.
–H. Mohamed and S. Marchand-Maillet “Parallel Approaches to Permutation-Based
Indexing using Inverted Files” in SISAP 2012 - 5th International Conference on
Similarity Search and Applications .
–H. Mohamed and S. Marchand-Maillet “Distributed Media indexing based on MPI
and MapReduce” in CBMI 2012 - 10th Workshop on Content-Based Multimedia
Indexing.
–H. Mohamed and S. Marchand-Maillet “Enhancing MapReduce using MPI and an
optimized data exchange policy”, P2S2 2012 - Fifth International Workshop
onParallel Programming Models and Systems Software for High-End Computing.
–Mohamed, H., & Marchand-Maillet, S. (2014). Distributed media indexing based on
MPI and MapReduce. Multimedia Tools and Applications, 69(2).
–Mohamed, H., & Marchand-Maillet, S. (2013). Permutation-Based Pruning for
Approximate K-NN Search. In DEXA, Prague, CZ.

More at http://viper.unige.ch/publications

© [email protected] – University of Geneva – Multimedia Retrieval – September 2015 - 138
References
Large data analysis – Manifold learning

–Sun, K., Morrison, D., Bruno, E., & Marchand-Maillet, S. (2013).
Learning Representative Nodes in Social Networks. In 17th Pacific-Asia
Conference on Knowledge Discovery and Data Mining, Gold Coast, AU.
–Sun, K., Bruno, E., & Marchand-Maillet, S. (2012). Unsupervised
Skeleton Learning for Manifold Denoising and Outlier Detection. In
International Conference on Pattern Recognition (ICPR'2012), Tsukuba,
JP.
–Sun, K., & Marchand-Maillet, S. (2014). An Information Geometry of
Statistical Manifold Learning. In Proceedings of the International
Conference on Machine Learning (ICML 2014), Beijing, China.
–Wang, J., Sun, K., Sha, F., Marchand-Maillet, S., & Kalousis, A. (2014).
Two-Stage Metric Learning. In Proceedings of the International
Conference on Machine Learning (ICML 2014), Beijing, China.
–Sun, K., Bruno, E., & Marchand-Maillet, S. (2012). Stochastic Unfolding.
In IEEE Machine Learning for Signal Processing Workshop (MLSP'2012),
Santander, Spain.

More at http://viper.unige.ch/publications
Tags