EarGram: an Application for Interactive Exploration of Large Databases of Audio Snippets for Creative Purposes

GilbertoBernardes 542 views 41 slides Nov 10, 2012
Slide 1
Slide 1 of 41
Slide 1
1
Slide 2
2
Slide 3
3
Slide 4
4
Slide 5
5
Slide 6
6
Slide 7
7
Slide 8
8
Slide 9
9
Slide 10
10
Slide 11
11
Slide 12
12
Slide 13
13
Slide 14
14
Slide 15
15
Slide 16
16
Slide 17
17
Slide 18
18
Slide 19
19
Slide 20
20
Slide 21
21
Slide 22
22
Slide 23
23
Slide 24
24
Slide 25
25
Slide 26
26
Slide 27
27
Slide 28
28
Slide 29
29
Slide 30
30
Slide 31
31
Slide 32
32
Slide 33
33
Slide 34
34
Slide 35
35
Slide 36
36
Slide 37
37
Slide 38
38
Slide 39
39
Slide 40
40
Slide 41
41

About This Presentation

Presented at CMMR 2012, London, UK.


Slide Content

earGram
concatenative sound
synthesis in pure data
Gilberto Bernardes FEUP INESC
Carlos Guedes FEUP INESC
Bruce Pennycook UT Austin

•brief introduction to concatenative sound
synthesis (CSS)
•description of the main modules of earGram
•detail the recombination strategies for
automatic music generation implemented in
earGram
•demo
•future work
[outline]

[concatenative sound synthesis]
(2, 4, 100, ...) (5, 1, 80, ...)
units
0
25
50
75
100
target
corpus
synthesis

[earGram: design scheme]

•offline and online segmentation
•segmentation modes: beat (Dixon, 2001),
onset (Brossier, 2006; Puckette, 2005;
Brent, 2010), uniform size
•auto-mode automatically selects between
beat and onset segmentation modes,
according to the audio characteristics
[analysis: segmentation]

•a n-dimensional vector of numerical
features represents each unit
•the feature values correspond to low- and
mid-level description of the audio data
•feature vectors can be static and dynamic
[analysis: feature vector]


amplitude

zero-crossing
rate

duration [analysis: low-level features]

centroid

flux

flatness

rolloff

spread

kurtosis

skewness

Irregularity

virtual pitch

pitch salience

roughness

multiplicity

bfcc

mfcc
time-domainfrequency-domainperceptual


chroma

fundamental
bass

key [analysis: mid-level features]

onsets

onset rate

ioi
harmonic rhythmic

•transition probability table between
consecutive units for harmony
(fundamental bass) and timbre
•meter induction
•label and group units according to their
position within the meter
•representation of the noisiness of the signal
on a meter basis
[analysis: representing the temporal evolution]

[analysis: representing the temporal evolution]
transition
probability
table
1st-7th order
fundamental
bass
CC#DD#E...
C 0.1 0.6
C#
D 0.3 0.1 0.2
D# 0.1
E 0.4 0.4
...
CC#DD#E...
C C0.1 0.6
C C#
C D0.3 0.1 0.2
C D# 0.1
C E0.4 0.4
...
1st order 2nd order
spec
audibilitypitch weightpitch salience

[analysis: representing the temporal evolution]
0
0.5
1
123456789101112131415161718192021222324
5
14
15
101
1110
1111
<< 0
<< 5
<< 10
101
111000000
11110000000000
101
448
15360
15813
transition
probability
table
1st-7th order

[analysis: representing noisiness]
histogram presenting the normalized
distribution of the noisiness of the units
over the length of a bar
1 2 3 4
eg. 4/4 /C

[visualizations]

[dimensionality reduction]
P (24, 26, 20, 50, 15, 60, 38, 20)
P
C1
C2
C3
C4
C5
C6
C7
C8
d1d2d3d4d5d6d7d8
•star coordinates (Kandogan,
2001)
•requires interactive and visual
exploration of the corpus to
extract meaningful
information (notably clusters)
•Scaling changes contribution
to resulting visualization
•Rotation induces correlation
between data columns

[clustering]
k-means QT-clustering DBSCAN
number of clusters
neighborhood
proximity threshold
minimum number
of units per cluster
cluster diameter
minimum number
of units per cluster

[database]
•collection of arrays (one array per feature)
•can be saved into a text file (.txt) and
loaded later to not repeat the time
consuming tasks of the database
construction

[performance]
spaceMap
shuffMeter
infiniteMode
soundscapeMode
stepSeq

•allows intuitive and
interactive exploration of
the corpus
•extended granular
synthesizer with a refined
control
•dynamic parameters: gain,
density of events, pitch
deviations, and panning
[performance: spaceMap]

[performance: infiniteMode]
•extends infinitely the
audio source while
retaining its structural
qualities
•user must select and rank
the characteristics we/she
wants to evolve over time
•is suitable for both
soundscapes and
polyphonic “concert”
music

[performance: infiniteMode]
1234
meterharmony timbre noisiness
CC#DD#...
C0.1000
C#0000
D0.300.10
D#0000
E0.400.40
... 1234
123124125126...
12300.300
1240000
125000.10
12600.100
1270.5000
...

[performance: infiniteMode]
meter harmony noisiness
previously
selected
unit: 3
∩∩
4
8
2
12
13
15
7
0
14
3
9
10
11
5
1
6
16

[performance: infiniteMode]
meter harmony noisiness
previously
selected
unit: 3
∩∩
4
8
2
13
15
7
0
3
9
10
11
5
1

[performance: infiniteMode]
meter harmony noisiness
previously
selected
unit: 3
∩∩
4
8
2
13
0
3
9
10
11
1

[performance: infiniteMode]
meter harmony noisiness
previously
selected
unit: 3
∩∩
4
8
2
13
0
3
9
11
1

[performance: infiniteMode]
meter harmony noisiness
previously
selected
unit: 3
∩∩
4
8
2
0

[performance: infiniteMode]
meter harmony noisiness
previously
selected
unit: 3
∩∩
4
8
2
0
from the remaining units the algorithm selects the
closest to the previously played unit

[performance: shuffMeter]
•creates patterns stochastically
according to a preassigned time
signature and metrical level
•Clarence Barlow’s Indispensability
algorithm defines a templates that
is used as a target phrase
•template is assigned to spectral
flux and loudness
•adaptation during performance
according to two parameters:
roughness or loudness
•roughness scales the template
towards it’s medium value and
loudness scales the template up to
a maximum value

[performance: shuffMeter]
•creates patterns stochastically
according to a preassigned time
signature and metrical level
•Clarence Barlow’s Indispensability
algorithm defines a templates that
is used as a target phrase
•template is assigned to spectral
flux and loudness
•adaptation during performance
according to two parameters:
roughness or loudness
•roughness scales the template
towards it’s medium value and
loudness scales the template up to
a maximum value

[performance: shuffMeter]
•creates patterns stochastically
according to a preassigned time
signature and metrical level
•Clarence Barlow’s Indispensability
algorithm defines a templates that
is used as a target phrase
•template is assigned to spectral
flux and loudness
•adaptation during performance
according to two parameters:
roughness or loudness
•roughness scales the template
towards it’s medium value and
loudness scales the template up to
a maximum value

[performance: soundscapeMode]
•dynamic control over
soundscapes generation
•density (vertical axis) the
number of units played
simultaneously and ranges
from 1 to 5
• sharpness (horizontal axis)
defines the target to be
synthesized according to the
diversity and stability of the
units, which is translated to a
value in the spectral flux space

[performance: soundscapeMode]
•dynamic control over
soundscapes generation
•density (vertical axis) the
number of units played
simultaneously and ranges
from 1 to 5
• sharpness (horizontal axis)
defines the target to be
synthesized according to the
diversity and stability of the
units, which is translated to a
value in the spectral flux space

[performance: soundscapeMode]
•dynamic control over
soundscapes generation
•density (vertical axis) the
number of units played
simultaneously and ranges
from 1 to 5
• sharpness (horizontal axis)
defines the target to be
synthesized according to the
diversity and stability of the
units, which is translated to a
value in the spectral flux space

[performance: soundscapeMode]
•dynamic control over
soundscapes generation
•density (vertical axis) the
number of units played
simultaneously and ranges
from 1 to 5
• sharpness (horizontal axis)
defines the target to be
synthesized according to the
diversity and stability of the
units, which is translated to a
value in the spectral flux space

[performance: soundscapeMode]
density = 3
given a group of units that satisfies the query for
the next unit we select the one that minimizes the
distance on the bark space to the previously
selected unit

•concatenating units with short overlap or
through a phase vocoder
•avoid remaining discontinuities between
concatenated units by a spectral
compander (+spectralcompand~ from
soundhack plugins bundle)
[synthesis]

•studio experimentations and live performances
ranging from installations to concert polyphonic
music
•computer-assisted composition
•computer-assisted improvisation
•interactive music systems
•collecting statistics from a corpus for
musicological purposes
[applications]

[eargram.modules]
•modularity
•PD-friendly
•expert users
•multiple corpus
at once

•meaningful representations of the corpus (P. Schaeffer’s
Typo-morphology and D. Smalley’s Spectromorphology)
•more robust rhythmic descriptions
•new performance modes: stepSeq
•bridging the playing modes to interactive music systems
[future work]

•https://sites.google.com/site/eargram/
•code (released under the GNU GPL)
•examples
•documentation
[website]


Brent W.: A Timbre Analysis and Classification Toolkit for Pure Data. In: Proceedings
of the ICMC, New York, EUA (2010)

Brossier, P.: Automatic Annotation of Musical Audio for Interactive Applications.
Ph.D. thesis, Queen Mary, University of London (2006)

Dixon, S.: An interactive beat tracking and visualization system. In: Proceedings of
the ICMC (2001)

Inselberg, A.: Parallel Coordinates: Visual Multidimensional Geometry and Its
Applications. Springer (2009)

Kandogan, E.: Visualizing Multi-dimensional Clusters, Trends, and Outliers using
StarCoordinates. In: Proceedings of the Knowledge and Data Mining (2001)

Pure Data, http://www.puredata.info

SoundHack Plugins Bundle, http://soundhack.henfast.com/
[references]

[questions]
[thank you]
[comments]