Bayesian Methods for Historical Linguistics

robinryder 1,092 views 90 slides Apr 19, 2016
Slide 1
Slide 1 of 90
Slide 1
1
Slide 2
2
Slide 3
3
Slide 4
4
Slide 5
5
Slide 6
6
Slide 7
7
Slide 8
8
Slide 9
9
Slide 10
10
Slide 11
11
Slide 12
12
Slide 13
13
Slide 14
14
Slide 15
15
Slide 16
16
Slide 17
17
Slide 18
18
Slide 19
19
Slide 20
20
Slide 21
21
Slide 22
22
Slide 23
23
Slide 24
24
Slide 25
25
Slide 26
26
Slide 27
27
Slide 28
28
Slide 29
29
Slide 30
30
Slide 31
31
Slide 32
32
Slide 33
33
Slide 34
34
Slide 35
35
Slide 36
36
Slide 37
37
Slide 38
38
Slide 39
39
Slide 40
40
Slide 41
41
Slide 42
42
Slide 43
43
Slide 44
44
Slide 45
45
Slide 46
46
Slide 47
47
Slide 48
48
Slide 49
49
Slide 50
50
Slide 51
51
Slide 52
52
Slide 53
53
Slide 54
54
Slide 55
55
Slide 56
56
Slide 57
57
Slide 58
58
Slide 59
59
Slide 60
60
Slide 61
61
Slide 62
62
Slide 63
63
Slide 64
64
Slide 65
65
Slide 66
66
Slide 67
67
Slide 68
68
Slide 69
69
Slide 70
70
Slide 71
71
Slide 72
72
Slide 73
73
Slide 74
74
Slide 75
75
Slide 76
76
Slide 77
77
Slide 78
78
Slide 79
79
Slide 80
80
Slide 81
81
Slide 82
82
Slide 83
83
Slide 84
84
Slide 85
85
Slide 86
86
Slide 87
87
Slide 88
88
Slide 89
89
Slide 90
90

About This Presentation

Bayesian Methods for Historical Linguistics


Slide Content

Bayesian Methods for Historical Linguistics
Robin J. Ryder
Centre de Recherche en Mathématiques de la Décision, Université Paris-Dauphine, PSL
and Département de Mathématiques et Applications, ENS, PSL
7 April 2016
LSCP seminar, Ecole normale supérieure
Robin Ryder (Dauphine & ÉNS) Bayesian Methods for Historical Linguistics LSCP 07/04/2016 1 / 81

Introduction
A large number of recent papers describe computationally-intensive
statistical methods for Historical Linguistics
Increased computational power
Advances in statistical methodology
New datasets
Complex linguistic questions which cannot be answered with
traditional methods
Robin Ryder (Dauphine & ÉNS) Bayesian Methods for Historical Linguistics LSCP 07/04/2016 2 / 81

Caveats
I am not a linguist
I am a statistician
These papers were not written by me; most gures were created
by the papers' authors
I use the word "evolution" in a broad sense
"All models are wrong, but some are useful"
Robin Ryder (Dauphine & ÉNS) Bayesian Methods for Historical Linguistics LSCP 07/04/2016 3 / 81

Aims of this talk
Review of several recent papers on statistical models for Historical
Linguistics
Statisticianswon'treplace linguists
When done correctly, collaborations between statisticians and
linguists can provide useful results
Robin Ryder (Dauphine & ÉNS) Bayesian Methods for Historical Linguistics LSCP 07/04/2016 4 / 81

Advantages of statistical methods
Analyse (very) large datasets
Test multiple hypotheses
Cross-validation
Estimate uncertainty
Robin Ryder (Dauphine & ÉNS) Bayesian Methods for Historical Linguistics LSCP 07/04/2016 5 / 81

Languages diversify
Languages “evolve” similarly to biologically species
Similarities between languages indicate they may be cousins
Most standard model: tree
Robin Ryder (Dauphine & ÉNS) Bayesian Methods for Historical Linguistics LSCP 07/04/2016 6 / 81

Questions of interest
Which languages are related?
Given a set of related languages, can we reconstruct their history
and the age of the most recent common ancestor (MRCA)?
What mechanisms drive language change?
How do the various parts of language change? Vocabulary,
syntax, phonetics...
Robin Ryder (Dauphine & ÉNS) Bayesian Methods for Historical Linguistics LSCP 07/04/2016 7 / 81

Why be Bayesian?
In the settings described in this talk, it usually makes sense to use
Bayesian inference, because:
The models are complex
Estimating uncertainty is paramount
The output of one model is used as the input of another
We are interested in complex functions of our parameters
Robin Ryder (Dauphine & ÉNS) Bayesian Methods for Historical Linguistics LSCP 07/04/2016 8 / 81

Bayesian statistics
Statistical inference deals with estimating an unknown parameter
given some dataD.
In the Bayesian framework, the parameteris seen as inherently
random: it has a distribution.
Before I see any data, I have apriordistribution onˆ, usually
uninformative.
Once I take the data into account (through the likelihood function
L), I get aposteriordistribution, which is hopefully more
informative.
ˆSDŒˆLˆSD
Different people have different priors, hence different posteriors.
But with enough data, the choice of prior matters little.
We are allowed to make probability statements about, such as
"there is a 95% probability thatbelongs to the interval
78;119" (credible interval)
Robin Ryder (Dauphine & ÉNS) Bayesian Methods for Historical Linguistics LSCP 07/04/2016 9 / 81

Advantages and drawbacks of Bayesian statistics
More intuitive interpretation of the results
Easier to think about uncertainty
In a hierarchical setting, it becomes easier to take into account all
the sources of variability
Prior specication: need to check that changing your prior does
not change your result
Computationally intensive
Robin Ryder (Dauphine & ÉNS) Bayesian Methods for Historical Linguistics LSCP 07/04/2016 10 / 81

Statistical method in a nutshell
1
Collect data
2
Design model
3
Perform inference (MCMC, ...)
4
Check convergence
5
In-model validation (is our inference method able to answer
questions from our model?)
6
Model mis-specication analysis (do we need a more complex
model?)
7
Conclude
In general, it is more difcult to perform inference for a more complex
model.
Robin Ryder (Dauphine & ÉNS) Bayesian Methods for Historical Linguistics LSCP 07/04/2016 11 / 81

Outline
1
Swadesh: Glottochronology
2
Gray & Atkinson: Language phylogenies
3
Pagel et al.: Frequency of use
4
Ryder & Nicholls: Dating Proto-Indo-European
5
Language universals
6
Re-examining Bergsland and Vogt
7
Conclusions
Robin Ryder (Dauphine & ÉNS) Bayesian Methods for Historical Linguistics LSCP 07/04/2016 12 / 81

Swadesh (1952)LEXICO-STATISTIC DATING OF PREHISTORIC ETHNIC CONTACTS
With Special Reference to North American Indians and Eskimos
MORRIS SWADESH
PREHISTORY refers to the long period of early
human society before writing was available for the
recording of events. In a few places it gives way
to the modern epoch of recorded history as much
as six or eight thousand years ago; in many areas
this happened only in the last few centuries.
Everywhere prehistory represents a great obscure
depth which science seeks to penetrate. And in-
deed powerful means have been found for illumi-
nating the unrecorded past, including the evidence
of archeological finds and that of the geographic
distribution of cultural facts in the earliest known
periods. Much depends on the painstaking analy-
sis and comparison of data, and on the effective
reading of their implications. Very important is
the combined use of all the evidence, linguistic and
ethnographic as well as archeological, biological,
and geological. And it is essential constantly to
seek new means of expanding and rendering more
accurate our deductions about prehistory.
One of the most significant recent trends in the
field of prehistory has been the development of ob-
jective methods for measuring elapsed time.
Where vague estimates and subjective judgments
formerly had to serve, today we are often able to
determine prehistoric time within a relatively nar-
row margin of accuracy. This development is im-
portant especially because it adds greatly to the
possibility of interrelating the separate reconstruc-
tions.
Unquestionably of the highest value has been the
development of radiocarbon dating.' This tech-
nique is based on W. F. Libby's discovery that all
living substances contain a certain percentage of
radioactive carbon, an unstable substance which
tends to change into nitrogen. During the life of
a plant or animal, new radiocarbon is continually
taken in from the atmosphere and the percentage
remains at a constant level. After death the per-
centage of radiocarbon is gradually dissipated at
an essentially constant statistical rate. The rate of
"decay" being constant, it is possible to determine
the time since death of any piece of carbon by
1
See Radiocarbon dating, assembled by Frederick
Johnson, Mein. Soc. Amner. Archacol. 8, 1951.
measuring the amount of radioactivity still going
on. Consequently, it is possible to determine
within certain limits of accuracy the time depth of
any archeological site which contains a suitable bit
of bone, wood, grass, or any other organic sub-
stance.
Lexicostatistic dating makes use of very dif-
ferent material from carbon dating, but the broad
theoretical prin~iple is similar. Researches by the
present author and several other scholars within
the last few years have revealed that the funda-
mental everyday vocabulary of any language-as
against the specialized or "cultural" vocabulary-
changes at a relatively constant rate. The per-
centage of retained elements in a suitable test
vocabulary therefore indicates the elapsed time.
Wherever a speech community comes to be divided
into two or more parts so that linguistic change
goes separate ways in each of the new speech com-
munities, the percentage of common retained vo-
cabulary gives an index of the amount of time that
has elapsed since the separation. Consequently,
wherever we find two languages which can b)e
shown by comparative linguistics to be the end
products of such a divergence in the prehistoric
past, we are alble to determine when the first
separation took place. Before taking up the de-
tails of the method, let us examine a concrete il-
lustrative instance.
The Eskimo and Aleut languages are by no
means the same. An Eskimo cannot understand
Aleut unless he learns the language like any other
foreign tongue, except that structural similarities
and occasional vocabulary agreements make the
learning a little easier than it might otherwise be.
The situation is roughly comparable to that of an
English-speaking person learning Gaelic or 1 ithu-
anian. It has been shown that Eskimo and Aleut
are modern divergent forms of an earlier single
language." In other words, the similarities be-
- Concrete proof of this relationship has recently been
presented in two independent studies: Knut Bergslund,
Kleinschmidt Centennial IV: Aleut demonstratives and
the Aleut-Eskimo relationship, InternatI. Joulr. ,Aizcr.
Ling. 17: 167-179, 1951: Gordon Marsh and Morris
PROCEEDINGS OF THE AMERICAN PHILOSOPHICAL SOCIETY, VOL. 96, No. 4, AUGU ST, 1952
452

This content downloaded on Fri, 1 Mar 2013 08:19:32 AM
All use subject to JSTOR Terms and Conditions
Robin Ryder (Dauphine & ÉNS) Bayesian Methods for Historical Linguistics LSCP 07/04/2016 13 / 81

First attempt: Swadesh (1952)
Aim: dating the MRCA (Most Recent Common Ancestor) of a pair of
languages.
Data: "core vocabulary" (Swadesh lists). 215 or 100 words.
Robin Ryder (Dauphine & ÉNS) Bayesian Methods for Historical Linguistics LSCP 07/04/2016 14 / 81

Core vocabulary
Robin Ryder (Dauphine & ÉNS) Bayesian Methods for Historical Linguistics LSCP 07/04/2016 15 / 81

Assumptions
Swadesh assumed that core vocabulary evolves at a constant rate
(through time, space and meanings). Given a pair of languages with
percentageCof shared cognates, and a constant retention rater, the
agetof the MRCA is
t
logC
2 logr
The constantrwas estimated using a pair of languages for which the
age of the MRCA is known.
Robin Ryder (Dauphine & ÉNS) Bayesian Methods for Historical Linguistics LSCP 07/04/2016 16 / 81

Issues with glottochronology
Many statistical shortcomings. Mainly:
1
Simplistic model
2
No evaluation of uncertainty of estimates
3
Only small amounts of data are used
Bergsland and Vogt (1962) debunked glottochronology, showing on 3
pairs of languages with known history that the assumption of constant
rates does not hold.
Robin Ryder (Dauphine & ÉNS) Bayesian Methods for Historical Linguistics LSCP 07/04/2016 17 / 81

What has changed?
More elaborate models + model misspecication analyses
We can estimate the uncertainty (easier to answer "I don't
know")
Large amounts of data
Robin Ryder (Dauphine & ÉNS) Bayesian Methods for Historical Linguistics LSCP 07/04/2016 18 / 81

Outline
1
Swadesh: Glottochronology
2
Gray & Atkinson: Language phylogenies
3
Pagel et al.: Frequency of use
4
Ryder & Nicholls: Dating Proto-Indo-European
5
Language universals
6
Re-examining Bergsland and Vogt
7
Conclusions
Robin Ryder (Dauphine & ÉNS) Bayesian Methods for Historical Linguistics LSCP 07/04/2016 19 / 81

Gray & Atkinson (2003)reanneal. Although our model simulations do not include calcu-
lations past the fragmentation threshold, we propose that a local
decrease in shear-strain rates associated with fragmentation may
promote reannealing
28
. Furthermore, it seems reasonable to assume
that shear-induced fragmentation has a marked effect on the flow of
the ascending magma and that upon continued ascent, fragments
from different parts of the ascending magma may become juxta-
posed. If the magma is texturally heterogeneous, which in itself may
be a consequence of repeated cycles of fragmentation, flow defor-
mation and reannealing, fragments can become elongated into
bands
10
(Fig. 1). Minimum strain estimates to produce milli-
metre-size bands from decimetre-size fragments is of the order of
100. Usingdas an estimate of the length scale for shear, this
corresponds to an ascent distance,Dz<g˙
Rd,oftheorderof
10 m. We propose that the long-standing enigma of pervasive flow
banding of silicic magmas may in some cases be viewed as a record
of fragmentation and reannealing during magma ascent, in much
the same way as banding can be made by fragmentation and
reannealing in flows
29
. In addition, we expect that shear-induced
fragmentation can, to some degree, replace viscous deformation as
the mode of shear along conduit walls, thereby reducing the
exceedingly large dynamic pressures required to erupt highly
crystalline silicic magmas. However, none of our model simulations
explicitly include the effect of crystals on fragmentation
30
.
Our prediction that shear-induced fragmentation occurs in both
explosive and effusive silicic volcanism is consistent with the
observed conditions of volcanic systems
22
(Fig. 3), with the degassed
nature of effusive silicic lavas
7,8
, and with textural observations at
the outcrop scale down to the microscale
9–11
(Fig. 1). As opposed to
the common view that explosive volcanism “is defined as involving
fragmentation of magma during ascent”
1
, we conclude that frag-
mentation may play an equally important role in reducing the
likelihood of explosive behaviour, by facilitating magma degassing.
Because shear-induced fragmentation depends so strongly on the
rheology of the ascending magma, our findings are in a broader
sense equivalent to Eichelberger’s hypothesis
1
that “higher viscosity
of magma may favour non-explosive degassing rather than
hinder it”, albeit with the added complexity of shear-induced
fragmentation. A
Received 19 May; accepted 15 November 2003; doi:10.1038/nature02138.
1. Eichelberger, J. C. Silicic volcanism: ascent of viscous magmas from crustal reservoirs.Annu. Rev.
Earth Planet. Sci.23,41–63 (1995).
2. Dingwell, D. B. Volcanic dilemma: Flow or blow?Science273,1054–1055 (1996).
3. Papale, P. Strain-induced magma fragmentation in explosive eruptions.Nature397,425–428 (1999).
4. Dingwell, D. B. & Webb, S. L. Structural relaxation in silicate melts and non-Newtonian melt rheology
in geologic processes.Phys. Chem. Miner.16,508–516 (1989).
5. Webb, S. L. & Dingwell, D. B. The onset of non-Newtonian rheology of silcate melts.Phys. Chem.
Miner.17,125–132 (1990).
6. Webb, S. L. & Dingwell, D. B. Non-Newtonian rheology of igneous melts at high stresses and strain
rates: experimental results for rhyolite, andesite, basalt, and nephelinite.J. Geophys. Res.95,
15695–15701 (1990).
7. Newman, S., Epstein, S. & Stolper, E. Water, carbon dioxide and hydrogen isotopes in glasses from the
ca. 1340 A.D. eruption of the Mono Craters, California: Constraints on degassing phenomena and
initial volatile content.J. Volcanol. Geotherm. Res.35,75–96 (1988).
8. Villemant, B. & Boudon, G. Transition from dome-forming to plinian eruptive styles controlled by
H
2O and Cl degassing.Nature392,65–69 (1998).
9. Polacci, M., Papale, P. & Rosi, M. Textural heterogeneities in pumices from the climactic eruption of
Mount Pinatubo, 15 June 1991, and implications for magma ascent dynamics.Bull. Volcanol.63,
83–97 (2001).
10. Tuffen, H., Dingwell, D. B. & Pinkerton, H. Repeated fracture and healing of silicic magma generates
flow banding and earthquakes?Geology31,1089–1092 (2003).
11. Stasiuk, M. V.et al.Degassing during magma ascent in the Mule Creek vent (USA).Bull. Volcanol.58,
117–130 (1996).
12. Goto, A. A new model for volcanic earthquake at Unzen Volcano: Melt rupture model.Geophys. Res.
Lett.26,2541–2544 (1999).
13. Mastin, L. G. Insights into volcanic conduit flow from an open-source numerical model.Geochem.
Geophys. Geosyst.3,doi:10.1029/2001GC000192 (2002).
14. Proussevitch, A. A., Sahagian, D. L. & Anderson, A. T. Dynamics of diffusive bubble growth in
magmas: Isothermal case.J. Geophys. Res.3,22283–22307 (1993).
15. Lensky, N. G., Lyakhovsky, V. & Navon, O. Radial variations of melt viscosity around growing bubbles
and gas overpressure in vesiculating magmas.Earth Planet. Sci. Lett.186,1–6 (2001).
16. Rust, A. C. & Manga, M. Effects of bubble deformation on the viscosity of dilute suspensions.
J. Non-Newtonian Fluid Mech.104,53–63 (2002).
17. Pal, R. Rheological behavior of bubble-bearing magmas.Earth Planet. Sci. Lett.207,165–179 (2003).
18. Llewellin, E. W., Mader, H. M. & Wilson, S. D. R. The constitutive equation and flow dynamics of
bubbly magmas.Geophys. Res. Lett.29,doi:10.1029/2002GL015697 (2002).
19. Simmons, J. H., Mohr, R. K. & Montrose, C. J. Non-Newtonian viscous flow in glass.J. Appl. Phys.53,
4075–4080 (1982).
20. Hess, K.-U. & Dingwell, D. B. Viscosities of hydrous leucogranitic melts: A non-Arrhenian model.Am.
Mineral.81,1297–1300 (1996).
21. Manga, M. & Loewenberg, M. Viscosity of magmas containing highly deformable bubbles.J. Volcanol.
Geotherm. Res.105,19–24 (2001).
22. Pyle, D. M. inEncyclopedia of Volcanoes(eds Sigurdsson, H., Houghton, B. F., McNutt, S. R., Rymer, H.
& Stix, J.) 263–269 (Academic, San Diego, 2000).
23. Jaupart, C. & Allegre, C. J. Gas content, eruption rate and instabilities of eruption regime in silicic
volcanoes.Earth Planet. Sci. Lett.102,413–429 (1991).
24. Boudon, G., Villemant, B., Komorowski, J.-C., Ildefonse, P. & Semet, M. P. The hydrothermal system
at Soufriere Hills volcano, Montserrat (West Indies): characterization and role in the on-going
eruption.Geophys. Res. Lett.25,3693–3696 (1998).
25. Blower, J. D. Factors controlling porosity-permeability relationships in magma.Bull. Volcanol.63,
497–504 (2001).
26. Klug, C. & Cashman, K. V. Permeability development in vesiculating magmas: implications for
fragmentation.Bull. Volcanol.58,87–100 (1996).
27. Klug, C., Cashman, K. V. & Bacon, C. R. Structure and physical characteristics of pumice from the
climactic eruption of Mount Mazama (Crater Lake), Oregon.Bull. Volcanol.64,486–501 (2002).
28. Gottsmann, J. & Dingwell, D. B. The thermal history of a spatter-fed lava flow: the 8-ka pantellerite
flow of Mayor Island, New Zealand.Bull. Volcanol.64,410–422 (2002).
29. Smith, J. V. Ductile-brittle transition structures in the basal shear zone of a rhyolite lava flow, eastern
Australia.J. Volcanol. Geotherm. Res.72,217–223 (1996).
30. Martel, C., Dingwell, D. B., Spieler, O., Pichavant, M. & Wilke, M. Experimental fragmentation of
crystal- and vesicle-bearing melts.Bull. Volcanol.63,398–405 (2001).
AcknowledgementsWe thank P. Papale and D. L. Sahagian for comments on the previous
versions of the manuscript, and K. V. Cashman, A. Rust, and A. M. Jellinek for comments on
earlier versions. This work was supported by the National Science Foundation and the Sloan
Foundation.
Competing interests statementThe authors declare that they have no competing financial
interests.
Correspondenceand requests for materials should be addressed to H.M.G.
([email protected]).
..............................................................
Language-tree divergence times
support the Anatolian theory
of Indo-European origin
Russell D. Gray & Quentin D. Atkinson
Department of Psychology, University of Auckland, Private Bag 92019,
Auckland 1020, New Zealand
.............................................................................................................................................................................
Languages, like genes, provide vital clues about human history
1,2
.
The origin of the Indo-European language family is “the most
intensively studied, yet still most recalcitrant, problem of his-
torical linguistics”
3
. Numerous genetic studies of Indo-European
origins have also produced inconclusive results
4,5,6
.Herewe
analyse linguistic data using computational methods derived
from evolutionary biology. We test two theories of Indo-
European origin: the ‘Kurgan expansion’ and the ‘Anatolian
farming’ hypotheses. The Kurgan theory centres on possible
archaeological evidence for an expansion into Europe and the
Near East by Kurgan horsemen beginning in the sixth millen-
nium
BP
7,8
. In contrast, the Anatolian theory claims that Indo-
European languages expanded with the spread of agriculture
from Anatolia around 8,000–9,500 years
BP
9
. In striking agree-
ment with the Anatolian hypothesis, our analysis of a matrix of
87 languages with 2,449 lexical items produced an estimated age
range for the initial Indo-European divergence of between 7,800
and 9,800 years
BP. These results were robust to changes in coding
procedures, calibration points, rooting of the trees and priors in
the bayesian analysis.
letters to nature
NATURE | VOL 426 | 27 NOVEMBER 2003 | www.nature.com/nature 435© 2003 Nature PublishingGroup
Robin Ryder (Dauphine & ÉNS) Bayesian Methods for Historical Linguistics LSCP 07/04/2016 20 / 81

Swadesh lists, better analysed
Use Swadesh lists for 87 Indo-European languages, and a
phylogenetic model from Genetics
Assume a tree-like model of evolution with constant rate of change
Bayesian inference via MCMC (Markov Chain Monte Carlo)
Reconstruct trees and dates
Main parameter of interest: age of the root (Proto-Indo-European,
PIE)
Robin Ryder (Dauphine & ÉNS) Bayesian Methods for Historical Linguistics LSCP 07/04/2016 21 / 81

Lexical trees
Robin Ryder (Dauphine & ÉNS) Bayesian Methods for Historical Linguistics LSCP 07/04/2016 22 / 81

Correcting the issues with glottochronology
Returning to the issues with Swadesh's glottochronology:
1
Simplistic modelSlightly better, but the model of evolution is
rudimentary
2
No evaluation of uncertainty of estimatesBayesian inference
3
Only small amounts of data are usedLarge number of
languages reduces variability of estimates
Robin Ryder (Dauphine & ÉNS) Bayesian Methods for Historical Linguistics LSCP 07/04/2016 23 / 81

Bayesian inference for lexical trees
The tree parameter is seen as random: it has a distribution
Via MCMC, G & A get a sample of possible trees, with associated
probabilities, rather than a single tree
The uncertainty in trees is thus made explicit
Robin Ryder (Dauphine & ÉNS) Bayesian Methods for Historical Linguistics LSCP 07/04/2016 24 / 81

G & A: conclusions
Age of PIE: 7800-9800 BP (Before Present)
Large error bars, but this is a good thing
Reconstruct many known features of the tree of Indo-European
languages
Little validation of the model, no model misspecication analysis
We shall return to these data, with more models.
These trees can also be used as a building block to answer other
questions.
Robin Ryder (Dauphine & ÉNS) Bayesian Methods for Historical Linguistics LSCP 07/04/2016 25 / 81

Outline
1
Swadesh: Glottochronology
2
Gray & Atkinson: Language phylogenies
3
Pagel et al.: Frequency of use
4
Ryder & Nicholls: Dating Proto-Indo-European
5
Language universals
6
Re-examining Bergsland and Vogt
7
Conclusions
Robin Ryder (Dauphine & ÉNS) Bayesian Methods for Historical Linguistics LSCP 07/04/2016 26 / 81

Pagel et al. (2007)LETTERS
Frequencyofword-usepredictsratesoflexical
evolutionthroughoutIndo-Europeanhistory
Mark Pagel
1,2
, Quentin D. Atkinson
1
& Andrew Meade
1
Greek speakers say ‘‘oura´’’, Germans ‘‘schwanz’’ and the French
‘‘queue’’ to describe what English speakers call a ‘tail’, but all of
these languages use a related form of ‘two’ to describe the number
after one. Among more than 100 Indo-European languages and
dialects, the words for some meanings (such as ‘tail’) evolve
rapidly, being expressed across languages by dozens of unrelated
words, while others evolve much more slowly—such as the num-
ber ‘two’, for which all Indo-European language speakers use the
same related word-form
1
. No general linguistic mechanism has
been advanced to explain this striking variation in rates of lexical
replacement among meanings. Here we use four large and diver-
gent language corpora (English
2
, Spanish
3
, Russian
4
and Greek
5
)
and a comparative database of 200 fundamental vocabulary mean-
ings in 87 Indo-European languages
6
to show that the frequency
with which these words are used in modern language predicts their
rate of replacement over thousands of years of Indo-European
language evolution. Across all 200 meanings, frequently used
words evolve at slower rates and infrequently used words evolve
more rapidly. This relationship holds separately and identically
across parts of speech for each of the four language corpora, and
accounts for approximately 50% of the variation in historical rates
of lexical replacement. We propose that the frequency with which
specific words are used in everyday language exerts a general and
law-like influence on their rates of evolution. Our findings are
consistent with social models of word change that emphasize the
role of selection, and suggest that owing to the ways that humans
use language, some words will evolve slowly and others rapidly
across all languages.
Languages, like species, evolve by way of a process of descent with
modification (Supplementary Table 1). The remarkable diversity of
languages—there are about 7,000 known living languages
7
—is a
product of this process acting over thousands of years. Ancestral
languages split to form daughter languages that slowly diverge as
shared lexical, phonological and grammatical features are replaced
by novel forms. In the study of lexical change, the basic unit of
analysis is the cognate. Cognates are words of similar meaning with
systematic sound correspondences indicating they are related by
common ancestry. For example, cognates meaning ‘water’ exist in
English (water), German (wasser), Swedish (vatten) and Gothic
(wato), reflecting descent from proto-Germanic (*water).
Early lexicostatistical
8
studies of Malayo-Polynesian and Indo-
European language families revealed that the rate at which new
cognates arise varies across meaning categories
1,9
. More recently we
have obtained direct estimates of rates of cognate replacement on
linguistic phylogenies (family trees) of Indo-European and Bantu
languages, using a statistical model of word evolution in a bayesian
Markov chain Monte Carlo (MCMC) framework
10
. We found that
rates of cognate replacement varied among meanings, and that rates
for different meanings in Indo-European were correlated with their
paired meanings in the Bantu languages. This indicates that variation
in the rates of lexical replacement among meanings is not merely an
historical accident, but rather is linked to some general process of
language evolution.
Social and demographic factors proposed to affect rates of
language change within populations of speakers include social status
11
,
the strength of social ties
12
, the size of the population
13
and levels of
outside contact
14
. These forces may influence rates of evolution on a
local and temporally specific scale, but they do not make general
predictions across language families about differences in the rate of
lexical replacement among meanings. Drawing on concepts from
theories of molecular
15
and cultural evolution
16–18
, we suggest that
the frequency with which different meanings are used in everyday
language may affect the rate at which new words arise and become
adopted in populations of speakers. If frequency of meaning-use is
a shared and stable feature of human languages, then this could
provide a general mechanism to explain the large differences across
meanings in observed rates of lexical replacement. Here we test this
idea by examining the relationship between the rates at which Indo-
European language speakers adopt new words for a given meaning
and the frequency with which those meanings are used in everyday
language.
We estimated the rates of lexical evolution for 200 fundamental
vocabulary meanings
8
in 87 Indo-European languages
6
. Rates were
estimated using a statistical likelihood model of word evolution
10
applied to phylogenetic trees of the 87 languages (Supplementary
Fig. 1). The number of cognates observed per meaning varied from
one to forty-six. For each of the 200 meanings, we calculated the
mean of the posterior distribution of rates as derived from a bayesian
MCMC model that simultaneously accounts for uncertainty in the
parameters of the model of cognate replacement and in the phylo-
genetic tree of the languages (Methods). Rate estimates were scaled
to represent the expected number of cognate replacements per
10,000 years, assuming a 8,700-year age for the Indo-European lan-
guage family
6
. Opinions on the age of Indo-European vary between
approximately 6,000 and 10,000 years before present
19,20
. Using a
different calibration would change the absolute values of the rates
but not their relative values.
Figure 1a shows the inferred distribution of rate estimates, where
we observe a roughly 100-fold variation in rates of lexical evolution
among the meanings. At the slow end of the distribution, the rates
predict zero to one cognate replacements per 10,000 years for words
such as ‘two’, ‘who’, ‘tongue’, ‘night’, ‘one’ and ‘to die’. By compa-
rison, for the faster evolving words such as ‘dirty’, ‘to turn’, ‘to stab’
and ‘guts’, we predict up to nine cognate replacements in the same
time period. In the historical context of the Indo-European language
family, this range yields an expectation of between 0–1 and 43 lexical
replacements throughout the,130,000 language-years of evolution
the linguistic tree represents, very close to the observed range in the
1
School of Biological Sciences, University of Reading, Whiteknights, Reading, Berkshire, RG6 6AS, UK.
2
Santa Fe Institute, 1399 Hyde Park Road, Santa Fe, New Mexico 87501, USA.
Vol 449|11 October 2007|doi:10.1038/nature06176
717
Nature ©2007 PublishingGroup
Robin Ryder (Dauphine & ÉNS) Bayesian Methods for Historical Linguistics LSCP 07/04/2016 27 / 81

Question at hand
Check link between frequency of use and rate of change for
vocabulary.
Hypothesis: when a meaning is used more often, the
corresponding word has less chances of changing.
Problem: since this rate is expected to be very slow, we need to
look at the deep history. But then the evolutionary history is
unknown.
Robin Ryder (Dauphine & ÉNS) Bayesian Methods for Historical Linguistics LSCP 07/04/2016 28 / 81

Workaround
Use Indo-European core vocabulary data, and frequencies from
English, Greek, Russian and Spanish
Get a sample from the distribution on trees and ancestral ages
using G&A's method
For each tree in the sample, estimate the rate of change for each
meaning.
Average across all trees.
Robin Ryder (Dauphine & ÉNS) Bayesian Methods for Historical Linguistics LSCP 07/04/2016 29 / 81

Results
(The different colours correspond to different classes of words:
numerals, body parts, adjectives...)
Robin Ryder (Dauphine & ÉNS) Bayesian Methods for Historical Linguistics LSCP 07/04/2016 30 / 81

Comments
There is signicant (negative) correlation between frequency of
use and rate of change.
Even if there is high uncertainty in the phylogenies, we can still
answer other questions (integrating out the tree)
Similar results for Bantu (Pagel & Meade 2006)
It would have been much harder to evaluate this hypothesis
without the Bayesian paradigm.
Robin Ryder (Dauphine & ÉNS) Bayesian Methods for Historical Linguistics LSCP 07/04/2016 31 / 81

Outline
1
Swadesh: Glottochronology
2
Gray & Atkinson: Language phylogenies
3
Pagel et al.: Frequency of use
4
Ryder & Nicholls: Dating Proto-Indo-European
5
Language universals
6
Re-examining Bergsland and Vogt
7
Conclusions
Robin Ryder (Dauphine & ÉNS) Bayesian Methods for Historical Linguistics LSCP 07/04/2016 32 / 81

Ryder & Nicholls (2011)©2011 Royal Statistical Society 0035–9254/11/60071
Appl. Statist.(2011)
60,Part1,pp.71–92
Missing data in a stochastic Dollo model for binary
trait data, and its application to the dating of
Proto-Indo-European
Robin J. Ryder and Geoff K. Nicholls
University of Oxford, UK
[Received June 2009. Revised April 2010]
Summary.Nicholls and Gray have described a phylogenetic model for trait data. They used
their model to estimate branching times on Indo-European language trees from lexical data.
Alekseyenko and co-workers extended the model and gave applications in genetics. We extend
the inference to handle data missing at random. When trait data are gathered, traits are thinned
in a way that depends on both the trait and the missing data content. Nicholls and Gray treated
missing records as absent traits. Hittite has 12% missing trait records. Its age is poorly predicted
in their cross-validation. Our prediction is consistent with the historical record. Nicholls and Gray
dropped seven languages with too much missing data. We fit all 24 languages in the lexical data
of Ringe and co-workers. To model spatiotemporal rate heterogeneity we add a catastrophe
process to the model. When a language passes through a catastrophe, many traits change at
the same time. We fit the full model in a Bayesian setting, via Markov chain Monte Carlo sam-
pling. We validate our fit by using Bayes factors to test known age constraints. We reject three
of 30 historically attested constraints. Our main result is a unimodal posterior distribution for the
age of Proto-Indo-European centred at 8400 years before Present with 95% highest posterior
density interval equal to 7100–9800 years before Present.
Keywords: Bayesian inference; Dating methods; Markov chain Monte Carlo methods; Missing
data; Phylogenetics; Proto-Indo-European; Rate heterogeneity
1. Introduction
The Indo-European languages descend from a common ancestor called Proto-Indo-European.
Lexical data show the patterns of relatedness among Indo-European languages. These data are
‘cognacy classes’: a pair of words in the same class descend, through a process of change in
sound, from a common ancestor. For example, Englishseaand GermanSeeare cognate to
one another, but not to the Frenchmer. Gray and Atkinson (2003) coded data of this kind in
a matrix in which rows correspond to languages and columns to distinct cognacy classes, and
entries are 0 or 1 as the language possesses or lacks a term in the column class. They analysed
these data by using phylogenetic algorithms that are similar to those used for genetic data. Our
analysis has the same objectives, but we fit a model that was designed for lexical trait data.
We work with data that were compiled by Ringeet al.(2002), recording the distribution of
872 distinct cognacy classes in 24 modern and ancient Indo-European languages. In Section 8,
we give estimates for the unknown topology and branching times of the phylogeny of the core
vocabulary of these languages.
Address for correspondence: Robin J. Ryder, Department of Statistics, University of Oxford, 1 South Parks
Road, Oxford, OX1 3TG, UK.
E-mail: [email protected]
Robin Ryder (Dauphine & ÉNS) Bayesian Methods for Historical Linguistics LSCP 07/04/2016 33 / 81

Example of a tree
Robin Ryder (Dauphine & ÉNS) Bayesian Methods for Historical Linguistics LSCP 07/04/2016 34 / 81

Questions to answer
Topology of the tree
Age of ancestor nodes
Age of root: 6000-6500 BP or 8000-9500 BP (Before Present) ?
6000 BP: Kurgan horsemen ; 8000 BP: Anatolian farmers
Robin Ryder (Dauphine & ÉNS) Bayesian Methods for Historical Linguistics LSCP 07/04/2016 35 / 81

Core vocabulary
100 or 200 words, present in almost all languages:bird, hand, to
eat, red...
Borrowing can occur (evolution not along a tree), but:
“Easy” to detect
Rare
Does not bias the results
Robin Ryder (Dauphine & ÉNS) Bayesian Methods for Historical Linguistics LSCP 07/04/2016 36 / 81

Core vocabulary
100 or 200 words, present in almost all languages:bird, hand, to
eat, red...
Borrowing can occur (evolution not along a tree), but:
“Easy” to detect
Rare
Does not bias the results
Robin Ryder (Dauphine & ÉNS) Bayesian Methods for Historical Linguistics LSCP 07/04/2016 36 / 81

Binary data:he dies, three, all
he dies threeall
Old English stierfþ þr¯eealle
Old High German stirbit, touwitdr¯ alle
Avestan miriieteþr¯aii¯ovispe
Old Church Slavonic umretu trjevsi
Latin moritur tr¯esomn¯es
Oscan ? tríssúllus
Cognacy classes (traits) for the
meaninghe dies:
Robin Ryder (Dauphine & ÉNS) Bayesian Methods for Historical Linguistics LSCP 07/04/2016 37 / 81

Binary data:he dies, three, all
he dies threeall
Old English stierfþ þr¯eealle
Old High German stirbit, touwitdr¯ alle
Avestan miriieteþr¯aii¯ovispe
Old Church Slavonic umretu trjevsi
Latin moritur tr¯esomn¯es
Oscan ? tríssúllus
Cognacy classes (traits) for the
meaninghe dies:
1
{stierfþ, stirbit}
2
{touwit}
3
{miriiete, umretu, moritur}
Robin Ryder (Dauphine & ÉNS) Bayesian Methods for Historical Linguistics LSCP 07/04/2016 37 / 81

Binary data:he dies, three, all
he dies threeall
Old English stierfþ þr¯eealle
Old High German stirbit, touwitdr¯ alle
Avestan miriieteþr¯aii¯ovispe
Old Church Slavonic umretu trjevsi
Latin moritur tr¯esomn¯es
Oscan ? tríssúllus
O. English100
OH German 110
Avestan 001
OC Slavonic001
Latin 001
Oscan ???
Cognacy classes (traits) for the
meaninghe dies:
1
{stierfþ, stirbit}
2
{touwit}
3
{miriiete, umretu, moritur}
Robin Ryder (Dauphine & ÉNS) Bayesian Methods for Historical Linguistics LSCP 07/04/2016 37 / 81

Binary data:he dies, three, all
he dies threeall
Old English stierfþ þr¯eealle
Old High German stirbit, touwitdr¯ alle
Avestan miriieteþr¯aii¯ovispe
Old Church Slavonic umretu trjevsi
Latin moritur tr¯esomn¯es
Oscan ? tríssúllus
O. English1 0 0 1
OH German 1 1 0 1
Avestan 0 0 1 1
OC Slavonic0 0 1 1
Latin 0 0 1 1
Oscan ? ? ? 1
Cognacy classes for
the meaningthree:
1
{þr¯e, dr¯,þr¯aii¯o, trje, tr¯es, trís}
Robin Ryder (Dauphine & ÉNS) Bayesian Methods for Historical Linguistics LSCP 07/04/2016 37 / 81

Binary data:he dies, three, all
he dies threeall
Old English stierfþ þr¯eealle
Old High German stirbit, touwitdr¯ alle
Avestan miriieteþr¯aii¯ovispe
Old Church Slavonic umretu trjevsi
Latin moritur tr¯esomn¯es
Oscan ? tríssúllus
O. English1 0 0 11000
OH German 1 1 0 11000
Avestan 0 0 1 10100
OC Slavonic0 0 1 10100
Latin 0 0 1 10010
Oscan ? ? ? 10001
Cognacy classes
forall:
1
{ealle, alle}
2
{vispe, vsi}
3
{omn¯es}
4
{súllus}
Robin Ryder (Dauphine & ÉNS) Bayesian Methods for Historical Linguistics LSCP 07/04/2016 37 / 81

Observation process
Old English 1 0 0 1 1 0 0 0
Old High German 1 1 0 1 1 0 0 0
Avestan 0 0 1 1 0 1 0 0
Old Church Slavonic0 0 1 1 0 1 0 0
Latin 0 0 1 1 0 0 1 0
Oscan ? ? ? 1 0 0 0 1
Robin Ryder (Dauphine & ÉNS) Bayesian Methods for Historical Linguistics LSCP 07/04/2016 38 / 81

Observation process
Old English 10 0 1 1 0 00
Old High German 11 0 1 1 0 00
Avestan 00 1 1 0 1 00
Old Church Slavonic00 1 1 0 1 00
Latin 00 1 1 0 0 10
Oscan ?? ? 1 0 0 01
Robin Ryder (Dauphine & ÉNS) Bayesian Methods for Historical Linguistics LSCP 07/04/2016 38 / 81

Observation process
Old English 1 0 1 1 0
Old High German 1 0 1 1 0
Avestan 0 1 1 0 1
Old Church Slavonic0 1 1 0 1
Latin 0 1 1 0 0
Oscan ? ? 1 0 0
Robin Ryder (Dauphine & ÉNS) Bayesian Methods for Historical Linguistics LSCP 07/04/2016 38 / 81

Constraints
Constraints on the tree topology
30 constraints on the age of some nodes or ancient languages
These constraints are used to estimate the evolution rates and the
age.
Also provide one way of validating the model and inference
procedure.
Robin Ryder (Dauphine & ÉNS) Bayesian Methods for Historical Linguistics LSCP 07/04/2016 39 / 81

Constraints
Robin Ryder (Dauphine & ÉNS) Bayesian Methods for Historical Linguistics LSCP 07/04/2016 40 / 81

Model (1): birth-death process
Traits (=cognacy
classes) are born at
rate.
Traits die at rate.
andare constant.
11 0 0 0 0 0 0 0
21 0 1 0 0 0 0 0
31 0 0 0 0 0 0 1
40 0 0 0 1 0 0 0
50 0 0 0 1 0 0 0
61 1 0 0 0 1 1 0
71 1 0 0 0 1 0 0
81 0 0 0 0 0 0 0
Robin Ryder (Dauphine & ÉNS) Bayesian Methods for Historical Linguistics LSCP 07/04/2016 41 / 81

Statistical method in a nutshell
1
Collect data
2
Design model
3
Perform inference (MCMC, ...)
4
Check convergence
5
In-model validation (is our inference method able to answer
questions from our model?)
6
Model mis-specication analysis (do we need a more complex
model?)
7
Conclude
In general, it is more difcult to perform inference for a more complex
model.
Robin Ryder (Dauphine & ÉNS) Bayesian Methods for Historical Linguistics LSCP 07/04/2016 42 / 81

Limitations of this model
1
Constant rates across time and space
2
No handling of missing data
3
No handling of borrowing
4
Treats all traits in the same fashion
5
Binary coding loses part of the structure
6
...
Do any of these limitations introduce systematic bias?
(Answer: YES,
some do.)
Check each misspecication in turn, and adapt the model if necessary.
Robin Ryder (Dauphine & ÉNS) Bayesian Methods for Historical Linguistics LSCP 07/04/2016 43 / 81

Limitations of this model
1
Constant rates across time and space
2
No handling of missing data
3
No handling of borrowing
4
Treats all traits in the same fashion
5
Binary coding loses part of the structure
6
...
Do any of these limitations introduce systematic bias?
(Answer: YES,
some do.)
Check each misspecication in turn, and adapt the model if necessary.
Robin Ryder (Dauphine & ÉNS) Bayesian Methods for Historical Linguistics LSCP 07/04/2016 43 / 81

Model (2): catastrophic rate heterogeneity
Catastrophes occur at rate
At a catastrophe, each trait dies
with probabilityandPoissˆ
traits are born.
~~: the number of traits
is constant on average.
11 0 0 0 0 0 0 0 0 0 0 0 0 0
21 0 1 0 0 0 0 0 0 0 0 0 0 1
30 0 0 0 0 0 0 0 0 1 1 0 0 0
40 0 0 0 1 0 0 0 0 0 0 0 0 0
50 0 0 0 1 0 0 0 0 0 0 0 0 0
61 0 0 0 0 1 1 0 0 0 0 0 1 0
71 0 0 0 0 1 0 0 0 0 0 0 1 0
81 0 0 0 0 0 0 0 0 0 0 0 1 0
Robin Ryder (Dauphine & ÉNS) Bayesian Methods for Historical Linguistics LSCP 07/04/2016 44 / 81

Model (3): missing data
Observation process: each
point goes missing with
probabilityi
Some traits are not observed
and are thinned out of the data
11 0 0 0 ? 0 0 0 0 0 ? 0 0 0
2? 0 1 0 0 0 ? 0 0 0 0 0 0 ?
30 ? 0 0 ? 0 0 0 0 1 1 0 0 0
40 0 0 0 ? 0 ? 0 0 0 0 ? 0 0
50 0 ? 0 1 ? 0 0 0 0 0 0 0 0
61 0 0 0 0 ? ? 0 ? 0 0 0 ? 0
7? 0 0 0 0 ? 0 ? 0 0 0 0 1 0
81 0 0 0 0 0 0 0 0 0 0 0 1 0
Robin Ryder (Dauphine & ÉNS) Bayesian Methods for Historical Linguistics LSCP 07/04/2016 45 / 81

Inference
TraitLab software
Bayesian inference
Markov Chain Monte Carlo
(Almost) uniform prior over the age of the root
Extensive validation (in-model and out-model; real data and
synthetic data)
Robin Ryder (Dauphine & ÉNS) Bayesian Methods for Historical Linguistics LSCP 07/04/2016 46 / 81

Mis-specications
Heterogeneity between traitsAnalyse subset of data+ sim-
ulated data
Heterogeneity in time/space
(non catastrophic)
Simulated data analysis with
edge rate from adistribution
Borrowing Simulated data analysis +
check level of borrowing
Data missing in blocks Simulated data analysis
Non-empty meaning cate-
gories
Simulated data analysis
Robin Ryder (Dauphine & ÉNS) Bayesian Methods for Historical Linguistics LSCP 07/04/2016 47 / 81

Posterior distribution
pˆg; ; ; ; ; SDD

1
N!

N

N
exp
’
”



Q
`i;je>E
PEZSZˆti;i;g; ; ; ˆ1e
ˆt
jt
ik
iT


“
•

N
M
a1
’
”
Q
`i;je>Ea
Q
!>a
PM!SZˆti;i;g; ˆ1e
ˆt
jt
ik
iT


“
•

1

pˆf
GˆgST
e
SgS
ˆSgS
k
T
kT!
L
M
i1
ˆ1i
Q
i

NQ
i
i
Robin Ryder (Dauphine & ÉNS) Bayesian Methods for Historical Linguistics LSCP 07/04/2016 48 / 81

Likelihood calculation
Q
!>
ˆc
a
PM!SZˆti;c;g;
¢
¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨
¦
¨¨¨¨¨¨¨¨¨¨¨¨¨¨¨
¤
i;cQ
!>
ˆc
a
PM!SZˆtc;c;g; ifYˆ
ˆc
aC1
ˆ1i;ci;cv
ˆ0
c ifYˆ
ˆc
aQˆ
ˆc
a0
(i.e.
ˆc
a˜g)
1i;cˆ1Q
!>
ˆc
a
PM!SZˆtc;c;g; ifYˆ
ˆc
a0
andQˆ
ˆc
aC1
Q
!>
ˆc
a
PM!SZˆtc;c;g;
¢
¨¨¨¨¨
¦
¨¨¨¨¨
¤
1 if
ˆc
a˜˜c;gor˜˜c
(i.e. Dc;a>˜?;1)
0 if
ˆc
a˜g(i.e. Dc;a0)
Robin Ryder (Dauphine & ÉNS) Bayesian Methods for Historical Linguistics LSCP 07/04/2016 49 / 81

Tests on synthetic data
Figure :
words/language
Figure :
With in-model synthetic data, the tree is well reconstructed.
Robin Ryder (Dauphine & ÉNS) Bayesian Methods for Historical Linguistics LSCP 07/04/2016 50 / 81

Tests on synthetic data (2)
Figure : )
(Not shown: other parameters are also well reconstructed.)
Robin Ryder (Dauphine & ÉNS) Bayesian Methods for Historical Linguistics LSCP 07/04/2016 51 / 81

Inuence of borrowing (1)
Figure :
words/language, 10% borrowing
Figure :
With out-of-model synthetic data with borrowing, the tree is well
reconstructed.
Robin Ryder (Dauphine & ÉNS) Bayesian Methods for Historical Linguistics LSCP 07/04/2016 52 / 81

Inuence of borrowing (2)
Figure :
words/language, 50% borrowing
Figure :
Robin Ryder (Dauphine & ÉNS) Bayesian Methods for Historical Linguistics LSCP 07/04/2016 53 / 81

Inuence of borrowing (3)
The topology is well reconstructed
Dates are under-estimated if borrowing levels are high
Figure :Figure : )
Robin Ryder (Dauphine & ÉNS) Bayesian Methods for Historical Linguistics LSCP 07/04/2016 54 / 81

Is there (much) borrowing?2 4 6 8 10 12 14 16 18 20 22 24
0.4
0.5
0.6
0.7
0.8
0.9
1


Ringe 100
b=0
b=0.1
b=0.5
b=1
Figure :Number of languages per trait. Blue: observed data. Green and black:
synthetic data, low borrowing. Red and pink: synthetic data, high borrowing.
Robin Ryder (Dauphine & ÉNS) Bayesian Methods for Historical Linguistics LSCP 07/04/2016 55 / 81

Data
Indo-European languages
Core vocabulary (Swadesh 100 ou 207)
Two (almost) independent data sets
Dyen et al. (1997): 87 languages, mostly modern
Ringe et al. (2002): 24 languages, mostly ancient
Robin Ryder (Dauphine & ÉNS) Bayesian Methods for Historical Linguistics LSCP 07/04/2016 56 / 81

Cross-validation by Bayes' factors
1
Predict age of nodes for which we have a calibration constraint:
would we reject the truth?
2
space of trees following all constraints
3

c
: remove constraintc1: : :30
4
M0g>,M1;g>
c
. Bayes' factor:
B
ˆc

Pg>SD;g>
c

Pg>S
c

5
Constraintcconicts the model if 2 logB
ˆc
@5.
Robin Ryder (Dauphine & ÉNS) Bayesian Methods for Historical Linguistics LSCP 07/04/2016 57 / 81

Cross validation8000
6000
4000
2000
0
-100
-10
-5
-2
0
2
5
10
100
HITATBLULYOIUMOSLAGKARGOONOEOGOSPRAVPEVECEITGEWGNWBSBAIRIITG
Robin Ryder (Dauphine & ÉNS) Bayesian Methods for Historical Linguistics LSCP 07/04/2016 58 / 81

Consensus tree: modern languages (Dyen data)
Robin Ryder (Dauphine & ÉNS) Bayesian Methods for Historical Linguistics LSCP 07/04/2016 59 / 81

Consensus tree; ancient languages (Ringe data)armenian
albanian
oldirish
welsh
luvian
oldnorse
oldenglish
oldhighgerman
gothic
lycian
oldcslavonic
latvian
lithuanian
oldprussian
tocharian_a
tocharian_b
hittite
greek
vedic
avestan
oldpersian
latin
umbrian
oscan
62
78
66
85
58
0 10002000300040005000600070008000
Robin Ryder (Dauphine & ÉNS) Bayesian Methods for Historical Linguistics LSCP 07/04/2016 60 / 81

Root age
Robin Ryder (Dauphine & ÉNS) Bayesian Methods for Historical Linguistics LSCP 07/04/2016 61 / 81

Conclusions
Strong support for Anatolian farming hypothesis: root around 8000
BP
Statistics reconstruct known linguistic facts and answer
unresolved questions.
Importance of being Bayesian: uncertainty measured and
integrated out; complex model.
Note that Chang et al. (2015) get strong support for the Kurgan
steppe hypothesis, with 95% HPD on the root age [4870 – 7190].
Robin Ryder (Dauphine & ÉNS) Bayesian Methods for Historical Linguistics LSCP 07/04/2016 62 / 81

Outline
1
Swadesh: Glottochronology
2
Gray & Atkinson: Language phylogenies
3
Pagel et al.: Frequency of use
4
Ryder & Nicholls: Dating Proto-Indo-European
5
Language universals
6
Re-examining Bergsland and Vogt
7
Conclusions
Robin Ryder (Dauphine & ÉNS) Bayesian Methods for Historical Linguistics LSCP 07/04/2016 63 / 81

Language universals
Do the following traits evolve independently?
1
Adjective-Noun order
2
Adposition-Noun phrase order
3
Demonstrative-Noun order
4
Genitive-Noun order
5
Numeral-Noun order
6
Object-Verb order
7
Relative clause-Noun order
8
Subject-Verb order
Greenberg (1966) suggests some pairs of traits co-evolve, and uses
the term "language universal" for such co-evolution.
(Work in this section is joint with Vincent Divol, Dominique Sportiche, Hilda Koopman
and Isabelle Charnavel, building on an idea of Michael Dunn, Simon Greenhill,
Stephen Levinson and Russell Gray.)
Robin Ryder (Dauphine & ÉNS) Bayesian Methods for Historical Linguistics LSCP 07/04/2016 64 / 81

Idea
Difcult to get an independent sample of languages
Instead, look at languages known to be related and check for
co-evolution
Robin Ryder (Dauphine & ÉNS) Bayesian Methods for Historical Linguistics LSCP 07/04/2016 65 / 81

Two models
Figure from Dunn et al. (2011).
Robin Ryder (Dauphine & ÉNS) Bayesian Methods for Historical Linguistics LSCP 07/04/2016 66 / 81

Trees from the posterior
Sample trees from the posterior. Left: Austronesian languages; right:
Indo-European languages. The symbols represent the traits for
Subject-Verb and Verb-Object orders. First symbol:ÉSV,ËVS,†
both; second symbol:ÉVOËOV,†both.
Robin Ryder (Dauphine & ÉNS) Bayesian Methods for Historical Linguistics LSCP 07/04/2016 67 / 81

Model
We model the value of propertyiby a latent variableZ
i
, which
follows a Brownian motion along the tree, with the following
transformation at each leafl:
IfZ
il
A1 then only order AB occurs in languagel
IfZ
il
@1 then only order BA occurs in languagel
IfZ
il
>1;1then both orders occur in languagel
For each propertyi, there are in factFdifferent latent Brownian
motions, one for each family of languages.Z
i
1
;Z
i
2
: : :share
common parameters and are independent.
Then compute the Bayes' factor between the two models
M1Z
i
ÙÙZ
j
andM2corˆZ
i
;Z
j
with aUˆ1;1prior on.
The Bayesian setting allows us to integrate out the tree and other
parameters.
Model validation: in progress.
Robin Ryder (Dauphine & ÉNS) Bayesian Methods for Historical Linguistics LSCP 07/04/2016 68 / 81

Preliminary results
An edge appears between two traits iff the Bayes factor for
co-evolution veriesBFA3 ("substantial" evidence on Jeffrey's scale).
Robin Ryder (Dauphine & ÉNS) Bayesian Methods for Historical Linguistics LSCP 07/04/2016 69 / 81

Outline
1
Swadesh: Glottochronology
2
Gray & Atkinson: Language phylogenies
3
Pagel et al.: Frequency of use
4
Ryder & Nicholls: Dating Proto-Indo-European
5
Language universals
6
Re-examining Bergsland and Vogt
7
Conclusions
Robin Ryder (Dauphine & ÉNS) Bayesian Methods for Historical Linguistics LSCP 07/04/2016 70 / 81

Back to Bergsland and Vogt
Norse family, 8 languages
Selection bias
B&V claim that the rate of change is signicantly different for these
data.
B&V included words used only in literary Icelandic, which we
exclude.
We can handle polymorphism.
Do not include catastrophes
Robin Ryder (Dauphine & ÉNS) Bayesian Methods for Historical Linguistics LSCP 07/04/2016 71 / 81

Known historyIcelandic
Riksmal
Sandnes
Gjestal
X XI XII XIII
Robin Ryder (Dauphine & ÉNS) Bayesian Methods for Historical Linguistics LSCP 07/04/2016 72 / 81

Tests
Two possible ways to test whether the same model parameters apply
to this example and to Indo-European:
1
Assume parameters are the same as for the general
Indo-European tree, and estimate ancestral ages.
2
Use Norse constraints to estimate parameters, and compare to
parameter estimates from general Indo-European tree
Robin Ryder (Dauphine & ÉNS) Bayesian Methods for Historical Linguistics LSCP 07/04/2016 73 / 81

Results
If we use parameter values from another analysis, we can try to
estimate the age of 13th century Norse.
True constraint: 660–760 BP. Our HPD: 615 – 872 BP.
If we analyse the Norse data on its own, we estimate parameters.
Value offor Norse: 2:470:410
4
Value offor IE: 1:860:3910
4
(Dyen), 2:370:2110
4
(Ringe)
Robin Ryder (Dauphine & ÉNS) Bayesian Methods for Historical Linguistics LSCP 07/04/2016 74 / 81

But...
We can also try to estimate the age of Icelandic (which is 0 BP)
Find 439–560 BP, far from the true value
B&V were right: there was signicantly less change on the branch
leading to Icelandic than average
However, we are still able to estimate internal node ages.
Robin Ryder (Dauphine & ÉNS) Bayesian Methods for Historical Linguistics LSCP 07/04/2016 75 / 81

Georgian
Second data set: Georgian and Mingrelian
Age of ancestor: last millenium BC
Code data given by B&V, discarding borrowed items
Use rate estimate from Ringe et al. analysis
95% HPD: 2065 – 3170 BP
Robin Ryder (Dauphine & ÉNS) Bayesian Methods for Historical Linguistics LSCP 07/04/2016 76 / 81

Georgian
Second data set: Georgian and Mingrelian
Age of ancestor: last millenium BC
Code data given by B&V, discarding borrowed items
Use rate estimate from Ringe et al. analysis
95% HPD: 2065 – 3170 BP
Robin Ryder (Dauphine & ÉNS) Bayesian Methods for Historical Linguistics LSCP 07/04/2016 76 / 81

B&V: conclusions
Third data set (Armenian) not clear enough to be recoded.
There is variation in the number of changes on an edge.
Nonetheless, we are still able to estimate ancestral language age.
Variation in borrowing rates
B& V: "we cannot estimate dates, and it follows that we cannot
estimate the topology either".
We can estimate dates, and even if we couldn't, we might still be
able to estimate the topology.
Robin Ryder (Dauphine & ÉNS) Bayesian Methods for Historical Linguistics LSCP 07/04/2016 77 / 81

Outline
1
Swadesh: Glottochronology
2
Gray & Atkinson: Language phylogenies
3
Pagel et al.: Frequency of use
4
Ryder & Nicholls: Dating Proto-Indo-European
5
Language universals
6
Re-examining Bergsland and Vogt
7
Conclusions
Robin Ryder (Dauphine & ÉNS) Bayesian Methods for Historical Linguistics LSCP 07/04/2016 78 / 81

Overall conclusions
When done right, statistical methods can provide new insight into
linguistic history
Importance of collaboration in building the model and in checking
for mis-specication.
Bayesian statistics play a big role, for estimating uncertainty,
handling complex models and using analyses as building blocks
Major avenues for future research. Challenges in nding relevant
data, building models, and statistical inference:
Models for morphosyntactical traits
Putting together lexical, phonemic and morphosyntactic traits
Escape the tree-like model (borrowing, networks, diffusion
processes...)
Incorporate geography
Robin Ryder (Dauphine & ÉNS) Bayesian Methods for Historical Linguistics LSCP 07/04/2016 79 / 81

References
Swadesh, Morris. "Lexico-statistic dating of prehistoric ethnic contacts:
with special reference to North American Indians and Eskimos."
Proceedings of the American philosophical society (1952): 452-463.
Gray, Russell D., and Quentin D. Atkinson. "Language-tree divergence
times support the Anatolian theory of Indo-European origin." Nature
426.6965 (2003): 435-439.
Pagel, Mark, Quentin D. Atkinson, and Andrew Meade. "Frequency of
word-use predicts rates of lexical evolution throughout Indo-European
history." Nature 449.7163 (2007): 717-720.
Ryder, Robin J., and Geoff K. Nicholls. "Missing data in a stochastic
Dollo model for binary trait data, and its application to the dating of
Proto-Indo-European." Journal of the Royal Statistical Society: Series C
(Applied Statistics) 60.1 (2011): 71-92.
Ryder, Robin J. "Phylogenetic Models of Language Diversication".
DPhil Diss. University of Oxford, UK, 2010.
Robin Ryder (Dauphine & ÉNS) Bayesian Methods for Historical Linguistics LSCP 07/04/2016 80 / 81

Questions
otázky kesses
spørgsmåler cwestiwnau
pytania preguntes
preguntas vrae
kláusimai Fragen
âîïðîñû quaestiones
întrebari questions
vragen !&
çàïèòàííi spurningar
domande spørsmåler
questões frågor
vprašanja
Robin Ryder (Dauphine & ÉNS) Bayesian Methods for Historical Linguistics LSCP 07/04/2016 81 / 81
Tags