MusicData for basis of data science in context.pdf
MehediHasanRidoy4
7 views
30 slides
Mar 01, 2025
Slide 1 of 30
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
About This Presentation
how data scientist use music
Size: 371.38 KB
Language: en
Added: Mar 01, 2025
Slides: 30 pages
Slide Content
Data Science on Music Data
Claus Weihs
TU Dortmund, Germany
Data Science in Context, TU Dortmund, January, 2025
Reference: Claus Weihs, Dietmar Jannach, Igor Vatolkin und Günter Rudolph (Eds) (2017):
Music Data Analysis: Foundations and Applications; CRC Press, Taylor & Francis
C. Weihs (TU Dortmund) Data Science on Music Data Data Science, 20.01.2024
Contents
Contents
Overview
-
-
-
-
-
C. Weihs (TU Dortmund) Data Science on Music Data Data Science, 20.01.2024
Domain: Applications: Genres and Transcription
Domain Applications: Genres and Transcription
-
-
-
which specify genres set by listeners.
- what discriminates dierent genres?
-
classical music be distinguished from modern entertaining music?
-
-
-
Can wetranscribe the score from audio, only?
-
-
-
- Data Science?
C. Weihs (TU Dortmund) Data Science on Music Data Data Science, 20.01.2024
Domain: Music and Mathematics: History
Domain: Music and Maths: History Bethge (2003)I
500 BCFar ahead of his time, the philosopherPythagorasdescribed an
astonishing relationship between mathematics and music:
Strucking a string (on a 'monochord') shortened to 3/4 leads to a fourth, to
2/3 to a fth, and to 1/2 to an octave higher than for the full length.01:1 3:42:3 1:2
l
Figure 1:
17th centuryThe French monk and mathematicianMersennefound a
physical explanation of Pythagoras' ndings.
Generating sound by up to 40 m longs strings, he counted their oscillations.
Actually, an octave oscillated twice as fast as the corresp. fundamental tone.
C. Weihs (TU Dortmund) Data Science on Music Data Data Science, 20.01.2024
Domain: Music and Mathematics: History
Domain: Music and Maths: History Bethge (2003)II
-Indeed, the basic musical signals, thetones, are nothing but uniform
vibrations of air molecules.
-This dierentiates music from noise.
1822The mathematicianFourierpresented his famous theorem that
periodic functions can be represented by sums of sines and cosines.
1843Since the physicistOhmapplied this property to acoustics,
frequencies of sine waves are used as representation of pitches (tone
heights) (cp. Fig. 2).
C. Weihs (TU Dortmund) Data Science on Music Data Data Science, 20.01.2024
Domain: Music and Mathematics: History
Domain: Music and Maths: History Bethge (2003)III
Figure 2:
Particle compression and thinning replace each other regularly.
C. Weihs (TU Dortmund) Data Science on Music Data Data Science, 20.01.2024
Domain: Music and Mathematics: Tones
Domain: Music and Maths: Tones
-Overtones: Moreover, most 'real' tones comprise superimposed
vibrations of several frequencies, i.e. the fundamental, whose pitch we
perceive, and its multiples (partial tones, overtones, Obertöne
(Helmholtz, 1863)), which establish the timbre of the tone.
-
-
vibrations of, e.g., one instrument, is the capacity ofhuman hearing.
-4 dimensions: Tones have the (not quite independent) dimensions
tone height (pitch), loudness, duration, and timbre.
- pitch.
-
identied as the same in musical notation.
-
intensities of theovertonesas dierent timbres.
C. Weihs (TU Dortmund) Data Science on Music Data Data Science, 20.01.2024
Domain: Music and Mathematics: Tones
Domain: Music and Maths: Tones
- instrumentscan be generally distinguished by
their overtone distribution.
-
and an E-guitar tone.
-
Figure 3:
C. Weihs (TU Dortmund) Data Science on Music Data Data Science, 20.01.2024
Data: Understanding Music Data
Data:
- tone dimensionspitch, loudness, duration,
and timbre fromaudio.
- small time windows
in order to identify changes quickly.
- audio-characteristicsare based on the so-called
spectrum, the power of the dierent frequencies in the time window.
-
determination of pitch, rhythm, harmonies, and timbre.
- windowed periodogram(spectrum, magnitudes of Fourier
transform) is used forpitch estimation.
-Spectral Fluxis arhythm feature, dened as the magnitude of the
spectral changes in neighboring time windows.
By this, onsets can often very well identied (cp.: Onsets).
C. Weihs (TU Dortmund) Data Science on Music Data Data Science, 20.01.2024
Data: Understanding Music Data
Data:
- chromagramis aharmony featurerepresenting the spectral
energy of all 12 halftones in a time window.
-Mel Frequency Cepstral Coecients (MFCCs)are often used in speech
recognition, but are also helpful fortimbre recognitionin music.
The essential idea is to transform the acoustic frequencies to perceived
pitches on the so-called mel-scale (mel from melody).
This scale shows that for a frequency higher than 1000 Hz more than
doubling of the acoustic frequency is necessary to double the perceived
pitch.
For lower frequencies the perceived pitches are proportional to the
acoustic frequencies.
-: Instruments), but
are also used for more complex tasks like genre classication (cp.
Analysis E: Genres).
-
C. Weihs (TU Dortmund) Data Science on Music Data Data Science, 20.01.2024
Methods
Analysis: Overview
Here is an overview over the exemplifying results of our music data
Analysesshown in the following:
Preparations
8
>
<
>
:
A
B
C
Dautomatic transcription, and
Egenre determination.
C. Weihs (TU Dortmund) Data Science on Music Data Data Science, 20.01.2024
Methods Methods I: Classication
Methods
Let us start with mentioning the main type of statistical methods / models
used in our analyses:Classication rules.
-
guitar), onset time (yes / no), and genres (classic / non-classic)
should be predicted by so-called inuencing features.
- classication rulesare used for the prediction of an unknown
class from known new values of the inuencing features.
- datahave to be observed
for the true classes together with the corresponding inuencing
features fornsubjects / objects (learning sample).
-
distinction of 2 music instruments like piano and guitar, onset yes /
no in some time window, or the genres classic and non-classic.
-
classes (cp., however, the analysis below).
C. Weihs (TU Dortmund) Data Science on Music Data Data Science, 20.01.2024
Methods Methods II: Evaluation
Methods
Classication rules should be evaluated concerning theirprediction /
generalization ability.
- predictive powerof classication rules we have
to test them on new data, i.e. data not used for rule construction.
- learning data
setintotraining dataused for rule learning and so-calledhold-out
data, often also calledtest set.
- resampling.
-cross-validation, e.g., learning data are split into more than 2 parts
and all parts but one are used for learning and only one for testing.
This way, the class of each observation is predicted once and the error
rate can be calculated.
An example is 10-fold cross-validation, where learning data is split into
10 possibly equal sized parts (folds).
C. Weihs (TU Dortmund) Data Science on Music Data Data Science, 20.01.2024
AnalysesAnalysis A: Pitch
Analysis A
-
so-calledauditory model,
approximating the properties of the human ear by relatively simple
mathematical / statistical models.
-
usage of only 30 channels (instead of the approx. 3000 hair sensory
cells in the human ear) appears to be justiable.
-
and 3000 Hz, dening which frequency is stimulated most.
-Pitch experiment
As learning data we use 100 tones with uniformly random fundamental
for each of the 30 channels, i.e. 3000 stratied random fundamentals.
Variation of the number (1, 3, 6, 10) and strength of overtones
((probability of 0 dB) = 0, 0.2, 0.5) and the duration of tones (0.2,
0.05 secs) leads to overall 243000 dierent tones.
C. Weihs (TU Dortmund) Data Science on Music Data Data Science, 20.01.2024
AnalysesAnalysis A: Pitch
Analysis A
For each channel and each tone property we derive an individual
classication modelby means of adecision treeto decide, whether
this channel contributes to the frequency spectrum of the perceived
tone (2 classes !!).
Evaluation of each the 720=2430 decision trees is done via 10-fold
cross-validation.
Aclassication rulehas, e.g., the following form:
If the strongest frequency in a channel has a very high spectral energy
and the spectral energy of the next-strongest higher frequency is low,
then identify the strongest frequency as the frequency
of (an overtone of) the perceived tone.
From the identied (overtone) frequencies, the frequency of the
fundamental is identied.
C. Weihs (TU Dortmund) Data Science on Music Data Data Science, 20.01.2024
AnalysesAnalysis A: Pitch
Analysis A
-
fundamental frequency over the 24 models for each of the 30 channels
(Model Evaluation).
-
Table 1:Weihs et al. (2012))
channel 1 2 3 4 5 6 7 8 9 10
error rate in %0.2 0.4 0.5 0.5 0.3 0.3 0.3 0.4 0.6 0.7
channel 11 12 13 14 15 16 17 18 19 20
error rate in %0.6 0.5 0.7 0.7 0.6 0.9 1.0 1.2 1.4 1.3
channel 21 22 23 24 25 26 27 28 29 30
error rate in %1.4 1.3 1.4 1.4 1.3 1.4 1.3 1.1 1.0 1.1
C. Weihs (TU Dortmund) Data Science on Music Data Data Science, 20.01.2024
AnalysesAnalysis B: Instruments
Analysis B
-
duration.
-
a trumpet.
-
'dimension' oftimbre.
-
pitches and loudnesses can have dierent timbre.
-
tones of an instrument or instrument type.
-
piece it is often important which instruments are played.
C. Weihs (TU Dortmund) Data Science on Music Data Data Science, 20.01.2024
AnalysesAnalysis B: Instruments
Analysis B
-Instruments experiment
Which tone properties are decisive to decide whether a piano is playing
or a guitar?
A corresponding classication rule should assign an instrument to
certain timbre properties.
For this, we need features characterizing timbre.
We use, inter alia, the rst 13 MFCCs of the 5 dierent parts ('blocks')
of a temporal partition of a tone.
We determined a classication rule byLinear Discriminant Analysis
from 4309 guitar- and 1345 piano tones, with only5.5% 5-fold
cross-validated prediction error (Model Evaluation) (Weihs et al.
(2014)).
Model / Variable Selection: In the rule, the beginning and the end of
a tone appeared to be most important (blocks 1 and 5) (cp. Figure 4).
This is because, e.g, the tone generation (striking of a guitar string and
hammer stroke of a piano string) is most important for the distinction
of these instruments.
C. Weihs (TU Dortmund) Data Science on Music Data Data Science, 20.01.2024
AnalysesAnalysis C: Onsets
Analysis C
- onset detection function
(ODF). Finding an adequate ODF is the "Art" part of the analysis!
-
the ODF is calculated.
-
-
threshold.
-
-
beginning of a new tone (onset).
C. Weihs (TU Dortmund) Data Science on Music Data Data Science, 20.01.2024
AnalysesAnalysis C: Onsets
Analysis C
-Onsets experiment
We compare onset detection for piano and ute.
We use the normalizedSpectral Fluxas ODF.
The threshold is chosen in two ways (dot-dash-, dash-line).
Figure 5 illustrates the eect of the threshold. True onsets are
indicated by vertical lines.
The gure demonstrates the importance of the correct choice of the
threshold.
With the dash-line-threshold some onsets of the ute are not detected.
With the dot-dash-line some wrong onsets are found.
Figure 5 also shows that the above simple onset detection with the
ODFSpectral Fluxoperates very well for instruments with well-dened
tone onsets (like piano).
- spectral uxfeature achieves remarkable results for almost all
musical instruments.
C. Weihs (TU Dortmund) Data Science on Music Data Data Science, 20.01.2024
AnalysesAnalysis C: Onsets
Analysis C0 20 40 60 80
0.0 0.2 0.4 0.6 0.8 1.0
Piano
0 20 40 60 80
0.0 0.2 0.4 0.6 0.8 1.0
Flute
Figure 5:
to piano (left) and ute (right).
C. Weihs (TU Dortmund) Data Science on Music Data Data Science, 20.01.2024
AnalysesAnalysis D: Automatic Transcription
Analysis D
-
onset and tone duration detection (Analysis C) and
pitches between the identied onsets (Analysis A) as well as
identication/assumption of time signature (like 3/4 or 4/4) and
(so-called) quantization of identied tones into measures.
-
have to be identied and assigned to the dierent instruments.
-Analysis B) is helpful by
understanding whether an instrument is just playing or not.
C. Weihs (TU Dortmund) Data Science on Music Data Data Science, 20.01.2024
AnalysesAnalysis D: Automatic Transcription
Analysis D
- monophonic musicis already quite well understood.
-Transcription experiment
We aim to transcribe the rst part of Händel's Tochter Zion sung by
a professional female singer.
Figure 6 shows two transcriptions, one by our R-softwaretuneR(b)
and one by the commercial softwareMelodyne(c).
Actually, both transcriptions are not optimal compared to theoriginal
score(a).
However, you cannot be sure that the singer exactly reproduced the
score!!
C. Weihs (TU Dortmund) Data Science on Music Data Data Science, 20.01.2024
AnalysesAnalysis D: Automatic Transcription
Analysis D
Figure 6:
Melodyne-transcriptions of recorded vocal
C. Weihs (TU Dortmund) Data Science on Music Data Data Science, 20.01.2024
AnalysesAnalysis E: Genre Determination
Analysis E
Genre experiment
We consider a2-classexample for genre determination: the dierentiation
of classic and non-classic, comprising Pop, Rock, Jazz and other genres.
We train with 26 MFCCfeatures(13 means and standard deviations each)
on 10 classic and 10 non-classic pieces (small training sets are not unusual).
The features were calculated on time windows, which are 4 seconds long
with 2 seconds overlap, leading to
2361 training observations over all 20 music pieces.
The resultingclassication ruleis tested on additional 15 classic and 105
non-classic music pieces with
no overlap with the training pieces, not even for the artists.
Thetypical error rateof the used classication methods is17%of the
15 387 time windows of the test set, i.e. more than 2500 windows.
C. Weihs (TU Dortmund) Data Science on Music Data Data Science, 20.01.2024
AnalysesAnalysis E: Genre Determination
Analysis E: Interpretation & Deployment: Genres
Interpretation & Deployment
However, we are actually interested in longer sections of the 120 music
pieces of the test set.
Some classication rules were also prepared to give (estimates of)class
probabilitiesfor each time window.
If the probability of the true class is lower than 0:5, the window was
misclassied.
For the identication oflonger connected (strongly) misclassied parts
of a music piece (longer than 32 seconds) we calculated the mean of the
probabilities of the true class over the involved small time windows.
If this mean is very small, then this part of the music piece is assumed very
much misclassied.
The most extreme examples were found in 18 dierent music pieces.
C. Weihs (TU Dortmund) Data Science on Music Data Data Science, 20.01.2024
AnalysesAnalysis E: Genre Determination
Analysis E: Interpretation & Deployment: Genres
Strikingly many such parts stem from 8 pieces of European Jazz, not
represented in training (where only older American Jazz appeared).
Looking at the 5 most extremely low probabilities of the true class (<3:5%),
only 2 are not in this group, namely Fake Empire of the Indie-Rock Band
The National and Trilogy from Emerson, Lake, and Palmer.
TrilogyFake Empire
For Trilogy classical parts might have been expected, for Fake Empire
the beginning was misclassied where only piano and vocals were involved.
Overall, classication lead tointerpretable results, which, however, make
clear that a balanced training set is very important and that some parts of
pop pieces should be expected to be at least near to classic.
Obviously,deploymentof our results, as they are, cannot not be
recommended.
C. Weihs (TU Dortmund) Data Science on Music Data Data Science, 20.01.2024
Conclusion
Conclusion
- Data Scienceanalysis structure toMusic Data
Analysis, mainly concentrating on the statistical approach.
- domain knowledgeis very important.
-
-
Was this alreadyData Scienceor still onlyStatistics?
-Discussionvery welcome!
C. Weihs (TU Dortmund) Data Science on Music Data Data Science, 20.01.2024
References
(Additional) References
Bethge, P. (2003): Musik, die Mathematik der Gefühle, DER SPIEGEL Nr.
31/28.07.2003.
Weihs, C., Friedrichs, K., Bischl, B. (2012): Statistics for hearing aids:
Auralization; In: J. Pociecha, R. Decker (eds), Data Analysis Methods and its
Applications, 183196
Weihs, C., Mersmann, O., Ligges, U. (2014): Foundations of Statistical
Algorithms; CRC Press, Taylor & Francis, 386394
C. Weihs (TU Dortmund) Data Science on Music Data Data Science, 20.01.2024