Calibrating the Lee-Carter and the Poisson Lee-Carter models via Neural Networks

129 views 33 slides Sep 15, 2021
Slide 1
Slide 1 of 33
Slide 1
1
Slide 2
2
Slide 3
3
Slide 4
4
Slide 5
5
Slide 6
6
Slide 7
7
Slide 8
8
Slide 9
9
Slide 10
10
Slide 11
11
Slide 12
12
Slide 13
13
Slide 14
14
Slide 15
15
Slide 16
16
Slide 17
17
Slide 18
18
Slide 19
19
Slide 20
20
Slide 21
21
Slide 22
22
Slide 23
23
Slide 24
24
Slide 25
25
Slide 26
26
Slide 27
27
Slide 28
28
Slide 29
29
Slide 30
30
Slide 31
31
Slide 32
32
Slide 33
33

About This Presentation


Calibrating the Lee-Carter and the Poisson Lee-Carter models via Neural Networks


Slide Content

Introduction
Model For-
malisation
Numerical
Results
Calibrating the Lee-Carter and the Poisson Lee-Carter
models via Neural Networks
Salvatore Scognamiglio
Department of Management and Quantitative Studies,
University of Naples \Parthenope"
XLV Annual Meeting of the AMASES (2021)
S. Scognamiglio 13 September 2021 1 / 33

Introduction
Model For-
malisation
Numerical
Results
Introduction
Mortality modelling:Lee and Carter (JASA 1992), Brouhns, Denuit and Vermunt
(IME 2002), Renshaw and Haberman (IME 2006);
can be applied to a single population.
Multi-Population Mortality modelling:Li and Lee (Demography 2005), Kleinow
(IME 2015)
generally applied on smaller sub-sets of data;
usually intended for forecasting the mortality of similar populations;
hard to t (complex optimisation schemes/less known statistical techniques).
Large-Scale Mortality Modelling:Richman and Wuthrich (AAS 2020), Perla,
Richman, Scognamiglio and Wuthrich (SAJ 2021)
allows more accurate forecasting than the traditional models for a large set of
populations;
provides only point forecasts.
S. Scognamiglio 13 September 2021 2 / 33

Introduction
Model For-
malisation
Numerical
Results
Large-Scale Mortality Modelling via neural networks
We develop aneural network modelwhich describes the mortality dynamics of
many dierent and potentially unrelated populations:
individual stochastic mortality models are combined into a neural network
environment which encouragesthe information sharing among populations;
the model parameters are jointly optimised in a single stageusing all
available information instead of using population-specic subsets of data as in
the traditional tting schemes;
the proposed model presents veryfew easy-to-interpret parametersand allows
to measure uncertainty in the predictions;
the parameter estimates appearmore robustand the forecasting performance
improves.
The full paper is available on
SSRN:https://papers.ssrn.com/sol3/papers.cfm?abstract_id=3868303
S. Scognamiglio 13 September 2021 3 / 33

Introduction
Model For-
malisation
Numerical
Results
The Lee-Carter Model
LetX=fx0;x1; : : : ;x!gbe the set of the ages andT=ft0;t1; : : : ;tngthe set of
calendar years considered.
The Lee-Carter (LC) model denes the logarithm of the central death rate
log(mx;t)2Rat agex2 Xin the calendar yeart2 Tas
log(mx;t) =ax+bxkt+x;t;
where:
axis the average force of mortality at agex;
ktis the overall mortality trend in calendar yeart;
bxis the rate of change of force of mortality broken down to dierent ages.
To avoid identiability problems, the following constraints are imposed
X
x2X
bx= 1
X
t2T
kt
j T j
= 0:
S. Scognamiglio 13 September 2021 4 / 33

Introduction
Model For-
malisation
Numerical
Results
The Lee-Carter Model: Ordinary Least Squared (OLS) estimation
The Ordinary Least Squared (OLS) estimation of the parameters can be obtained by
solving
arg min
(ax)x;(bx)x;(kt)t
X
x2X
X
t2T

log(mx;t)axbxkt

2
:
The (ax)xare estimated as
^ax=log

Y
t2T
(mx;t)
1=jT j

;
while (kt)tand (bx)xare estimated as the rst right and rst left singular vectors in
the Singular Value Decomposition (SVD) of the center log-mortality matrix
M=

log(mx;t)^ax

x2X;t2T
2R
jX jjT j
:
In order to forecast, (ax)xand (bx)xare assumed to be constant over time and the
time indexktis modeled as an ARIMA (0,1,0) process
kt=kt1++et with i:i:d etN(0;
2
)
where2R.
S. Scognamiglio 13 September 2021 5 / 33

Introduction
Model For-
malisation
Numerical
Results
Multi-population mortality modelling: the Individual Lee-Carter
Approach
A simple way of modelling the mortality of a set of dierent populationsIis
to describe each population separately with its own LC model
log(m
(i)
x;t) =a
(i)
x+b
(i)
xk
(i)
t+
(i)
x;t 8i2 I:
This approach is sometimes called Individual Lee Carter (ILC) approach. In
this case, the model tting is performed individually8i2 Iand the
population and time-specic termsk
(i)
tare projected with independent
ARIMA (0,1,0) processes.
S. Scognamiglio 13 September 2021 6 / 33

Introduction
Model For-
malisation
Numerical
Results
The Poisson Lee-Carter Model
The main drawback of SVD is the assumption of homoskedastic errors (see Alho
(NAAJ 2000)).
In Brouhns (IME 2002), a maximum likelihood estimation based on a Poisson death
countD
(i)
x;t
is proposed to allow heteroskedasticity:
D
(i)
x;t
Poisson(E
(i)
x;t
m
(i)
x;t
)with m
(i)
x;t
=e
a
(i)
x+b
(i)
xk
(i)
t
whereE
(i)
x;t
is the number of exposure-to-risk in agexat timetin the populationi
and the classical LC constraints still hold8i2 I.
The model parameters can be estimated by solving
arg max
(a
(i)
x)x;(b
(i)
x)x;(k
(i)
t
)t
X
x2X
X
t2T

D
(i)
x;t
(a
(i)
x+b
(i)
xk
(i)
t
)E
(i)
x;t
e
a
(i)
x+b
(i)
xk
(i)
t

+ci;8i2 I
whereci2R.
S. Scognamiglio 13 September 2021 7 / 33

Introduction
Model For-
malisation
Numerical
Results
Lee-Carter Model forecasting performance vs population size (Perla,
Richman, Scognamiglio and Wuthrich (SAJ 2021)):l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
ll
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
Male
Female
USA RUS JPN
DEUTW FRATNP
ITA
GBRTENW
UKR ESP POL
TWN
AUS NLD
DEUTE
GRC HUN PRT BLR CZE BEL
SWE
AUT BGR CHE
ISR
SVK DNK
FIN
GBR_SCO
NOR
IRL LTU
NZL_NM
LVA
SVN
GBR_NIR
EST LUX
ISL
−2
0
2
4
−2
0
2
4
Country
log(MSE)
Model
l
l
LC_Poisson
LC_SVD
Figure: Forecasting Mean Squared Error (MSE) in log-scale of the LC model (LCSVD) and the Poisson LC
model (LCPoisson) on the population of the Human Mortality Database (HMD), tting period 1950-1999;
forecasting period 2000-2019; countries are sorted by population size in 2000.
S. Scognamiglio 13 September 2021 8 / 33

Introduction
Model For-
malisation
Numerical
Results
USA mortality data (high-population country)Male Female
0 25 50 75 1000 25 50 75 100
−7.5
−5.0
−2.5
Age
log(mx)
1960
1980
2000
Year
Figure: Log mortality rates for dierent ages in USA from 1950 to 2018. Source:Human Mortality Database
(HMD).
S. Scognamiglio 13 September 2021 9 / 33

Introduction
Model For-
malisation
Numerical
Results
LC model estimations using USA data (high-population country)ax bx kt
0 25 50 75 100 0 25 50 75 100195019601970198019902000
−20
0
20
40
0.00
0.01
0.02
0.03
−8
−6
−4
−2
value
Model
LC_poisson
LC_SVD
Gender
Female
Male
Figure: LC model and Poisson LC models parameter estimates for USA mortality data.
S. Scognamiglio 13 September 2021 10 / 33

Introduction
Model For-
malisation
Numerical
Results
Luxembourg Mortality Data (low-population country)Male Female
0 25 50 75 1000 25 50 75 100
−10.0
−7.5
−5.0
−2.5
0.0
Age
log(mx)
1960
1970
1980
1990
2000
2010
Year
Figure: Log mortality rates for dierent ages in Luxembourg from 1960 to 2018. Source:Human Mortality
Database (HMD).
S. Scognamiglio 13 September 2021 11 / 33

Introduction
Model For-
malisation
Numerical
Results
LC model estimations using Luxembourg data (low-population
country)ax bx kt
0 25 50 75 100 0 25 50 75 10019601970198019902000
−40
−20
0
20
0.00
0.02
0.04
−7.5
−5.0
−2.5
0.0
value
Model
LC_poisson
LC_SVD
Gender
Female
Male
Figure: LC model and Poisson LC models parameter estimates for Luxembourg mortality data.
S. Scognamiglio 13 September 2021 12 / 33

Introduction
Model For-
malisation
Numerical
Results
The Model formalisation
We simultaneously model the mortality of a set of populationsIwhich dier
among them for the region and genderi= (r;g)2 I=R fmale;femaleg.
The network model provides three subnets that approximate the parameters
of the LC model. Each one of these subnets combines several kinds of neural
network layers:
Thea
(i)
-subnet uses embedding and fully-connected layers;
Theb
(i)
-subnet uses embedding and fully-connected layers;
Thek
(i)
t-subnet uses fully-connected layers and/or other feed-forward
layers.
S. Scognamiglio 13 September 2021 13 / 33

Introduction
Model For-
malisation
Numerical
Results
Thea
(i)
-subnet
The two embedding layers mapr2 Randg2 Ginto real-valued vectors:
z
(a)
R
:R !R
q
(a)
R; r7!z
(a)
R
(r) =

z
(a)
R;1
(r);z
(a)
R;2
(r); : : : ;z
(a)
R;q
(a)
R
(r)
!
>
;
z
(a)
G
:G !R
q
(a)
G; g7!z
(a)
G
(g) =

z
(a)
G;1
(g);z
(a)
G;2
(g); : : : ;z
(a)
G;q
(a)
G
(g)
!
>
:
The vectorz
(a)
I
=z
(a)
I
(r;g) =

z
(a)
R
(r)

>
;

z
(a)
G
(g)

>

>
2R
q
(a)
I(withq
(a)
I
=q
(a)
R
+q
(a)
G
) is a
learned representation of the populationi= (r;g).
It is further processed by a FCN layer which mapsz
(a)
I
in a newjX j-dimensional real-valued
space
f
(a)
:R
q
(a)
I!R
jX j
; z
(a)
I
7!f
(a)
(z
(a)
I
) =

f
(a)
x
0
(z
(a)
I
);f
(a)
x
1
(z
(a)
I
); : : : ;f
(a)
x!
(z
(a)
I
)

>
:
Each new featuref
(a)
x
(z
(a)
I
) is a age-specic function of the vectorz
(a)
I
z
(a)
I
7!f
(a)
x
(z
(a)
I
) =
(a)

w
(a)
x;0
+
q
(a)
IX
l=1
w
(a)
x;l
z
(a)
I;l

=
(a)

w
(a)
x;0
+
D
w
(a)
x
;z
(a)
I
E
;x2 X;
where
(a)
:R!Ris a (non-linear) activation function,w
(a)
x;l
2Rare the network parameters.
S. Scognamiglio 13 September 2021 14 / 33

Introduction
Model For-
malisation
Numerical
Results
Theb
(i)
-subnet
Similarly to the rst subnet, the second one provides two embedding layers of sizeq
(b)
R
;q
(b)
G
2N
z
(b)
R
:R !R
q
(b)
R; r7!z
(b)
R
(r) =

z
(b)
R;1
(r);z
(b)
R;2
(r); : : : ;z
(b)
R;q
(b)
R
(r)
!
>
;
z
(b)
G
:G !R
q
(b)
G; g7!z
(b)
G
(g) =

z
(b)
G;1
(g);z
(b)
G;2
(g); : : : ;z
(b)
G;q
(b)
G
(g)
!
>
;
and ajX j-dimensional FCN layer which maps the population-specic vector
z
(b)
I
=z
(b)
I
(r;g) =

z
(b)
R
(r)

>
;

z
(b)
G
(g)

>

>
2R
q
(b)
I(withq
(b)
I
=q
(b)
R
+q
(b)
G
) in
jX j-dimensional real-valued space
f
(b)
:R
q
(b)
I!R
jX j
; z
(b)
I
7!f
(b)
(z
(b)
I
) =

f
(b)
x
0
(z
(b)
I
);f
(b)
x
1
(z
(b)
I
); : : : ;f
(b)
x!
(z
(b)
I
)

>
:
Also in this case, each new componentf
(b)
x
(z
(b)
I
) is an age-specic function ofz
(b)
I
z
(b)
I
7!f
(b)
x
(z
(j)
I
) =
(b)

w
(b)
x;0
+
q
(b)
IX
l=1
w
(b)
x;l
z
(b)
I;l

=
(b)

w
(b)
x;0
+
D
w
(b)
x
;z
(b)
I
E

;x2 X;
with
(b)
:R!Randw
(b)
x;l
2R.
S. Scognamiglio 13 September 2021 15 / 33

Introduction
Model For-
malisation
Numerical
Results
The rst two subnets in compact form
Denoting byw
(j)
0
= (w
(j)
x;0
)x2X2R
jX j
andW
(j)
= (w
(j)
x;I
)
>
x2X
2R
jX jq
(j)
I,8j2 fa;bg, the
output of the rst two subnets can be written in compact form
f
(a)

z
(a)
I

=
(a)

w
(a)
0
+
D
W
(a)
;z
(a)
I
E

=
(a)

w
(a)
0
+
D
W
(a)
R
;z
(a)
R
(r)
E
+
D
W
(a)
G
;z
(a)
G
(g)
E

;
f
(b)

z
(b)
I

=
(b)

w
(b)
0
+
D
W
(b)
;z
(b)
I
E

=
(b)

w
(b)
0
+
D
W
(b)
R
;z
(b)
R
(r)
E
+
D
W
(b)
G
;z
(b)
G
(g)
E

;
where one could carry out the decompositionW
(j)
=

W
(j)
R
;W
(j)
G

of the matrices of the FCN
layers to distinguish the weights which refer to the gender-specic and the region-specic
features.
S. Scognamiglio 13 September 2021 16 / 33

Introduction
Model For-
malisation
Numerical
Results
Thek
(i)
t-subnet
The rst FCN layer mapslog(m
(i)
t
) into aqz
1
-dimensional real-valued space:
f
(k
1
)
:R
jX j
!R
qz
1; log(m
(i)
t
)7!f
(k
1
)

log(m
(i)
t
)

=

f
(k
1
)
1

log(m
(i)
t
)

; : : : ;f
(k
1
)
qz
1

log(m
(i)
t
)


>
;
where each new feature componentf
(k
1
)
s(log(m
(i)
t
)) is function of the mortality rates of all ages
log(m
(i)
t
)7!f
(k
1
)
s

log(m
(i)
t
)

=
(k
1
)

w
(k
1
)
s;0
+
D
w
(k
1
)
s
;log(m
(i)
t
)
E
;s= 1; : : : ;qz
1
;
wherew
(k
1
)
s;0
2Randw
(k
1
)
s2R
jX j
are parameters.
The second FCN layer of sizeqz
2
= 1 is a mapping
f
(k
2
)
:R
qz
1!R; f
(k
1
)

log(m
(i)
t
)

7!f
(k
2
)

f
(k
1
)

log(m
(i)
t
)

=

f
(k
2
)
f
(k
1
)

log(m
(i)
t
)

:
It extracts a single new feature
(f
(k
2
)
f
(k
1
)
)(log(m
(i)
t
)) =
(k
2
)

w
(k
2
)
0
+

w
(k
2
)
;
(k
1
)

w
(k
1
)
0
+
D
W
(k
1
)
;log(m
(i)
t
)
E

;
where
w
(k
2
)
0
2R;w
(k
1
)
0
= (w
(k
1
)
s;0
)1sqz
1
2R
qz
1;w
(k
2
)
2R
qz
1;W
(k
1
)
= (w
(k
1
)
s)
>
1sqz
1
2R
qz
1
jX j
are network parameters and
(j)
() :R!Rforj2 fk1;k2gare activation functions.
S. Scognamiglio 13 September 2021 17 / 33

Introduction
Model For-
malisation
Numerical
Results
Grapical summary of the modelfully conn. layer
embedding layer
concatening
Figure: Graphical representation of the neural network architecture for ILC models tting.
S. Scognamiglio 13 September 2021 18 / 33

Introduction
Model For-
malisation
Numerical
Results
Model Interpretation
Finally, an approximation of log-mortality curve at timetin the populationican be obtained as
\
log(m
(i)
t
) =f
(a)

z
(a)
I

+f
(b)

z
(b)
I

(f
(k
2
)
f
(k
1
)
)(log(m
(i)
t
))
where each age component is given by
\
log(m
(i)
x;t
) =f
(a)
x

z
(a)
I

|{z}
a
(i)
x
+f
(b)
x

z
(b)
I

|{z}
b
(i)
x

f
(k
2
)
f
(k
1
)

log(m
(i)
t
)

| {z }
k
(i)
t
:
A simple interpretation of all the terms can be provided:
f
(a)
x

z
(a)
I

2Ris a population and age-specic term that plays the same role ofa
(i)
x
in the
LC model.
f
(b)
x

z
(b)
I

2Ris a population and age-specic term that plays the same role ofb
(i)
x
in the
LC model.
(f
(k
2
)
f
(k
1
)
)(log(m
(i)
t
))2Ris a population and time-specic term that plays the same
role of thek
(i)
t
in the LC model.
S. Scognamiglio 13 September 2021 19 / 33

Introduction
Model For-
malisation
Numerical
Results
Model Interpretation
Setting linear activation
(j)
(x) =x;8j2 fa;bg, and expanding all the terms in previous
equation, some further interpretations can be argued:
\
log(m
(i)
x;t
) =

globalax
z}|{
w
(a)
x;0
+
population eect
z}|{
D
w
(a)
x
;z
(a)
I
E

| {z }
a
(i)
x
+

globalbx
z}|{
w
(b)
x;0
+
population eect
z}|{
D
w
(b)
x
;z
(b)
I
E

| {z }
b
(i)
x



f
(k
2
)
f
(k
1
)

log(m
(i)
t
)


| {z }
k
(i)
t
w
(a)
x;0
can be seen as a population-independentaxparameter,
D
w
(a)
x
;z
(a)
I
E
can be seen as a population-specicaxcorrection which can be decomposed
as:
D
w
(a)
x
;z
(a)
I
E
|{z}
population eect
=
D
w
(a)
x;R
;z
(a)
R
(r)
E
| {z }
regional eect
+
D
w
(a)
x;G
;z
(a)
G
(g)
E
| {z }
gender eect
:
w
(b)
x;0
can be seen as a population-independentbxparameter,
D
w
(b)
x
;z
(b)
I
E
can be seen as a population-specicbxcorrection which can be decomposed
as:
D
w
(b)
x
;z
(b)
I
E
|{z}
population eect
=
D
w
(b)
x;R
;z
(b)
R
(r)
E
| {z }
regional eect
+
D
w
(b)
x;G
;z
(b)
G
(g)
E
| {z }
gender eect
:
S. Scognamiglio 13 September 2021 20 / 33

Introduction
Model For-
malisation
Numerical
Results
Model tting and forecasting
Denoting by the full set of the network model's parameters, it can be splitted into two groups:
the population-specic parametersz
(a)
R
(r);z
(b)
R
(r);8r2 R, andz
(a)
G
(g);z
(b)
G
(g);8g2 G;
the cross-population parametersw
(j)
0
;W
(j)
;8j2 fa;b;k1g, andw
(k
2
)
;w
(k
2
)
0
.
These parameters are iteratively adjusted via Back-Propagation algorithm to minimise a given
loss function.
The resulting estimates^ can be used to compute the Neural Network (NN) estimates of the
LC parameters:
^a
(i)
x;NN
=
(a)

^w
(a)
x;0
+
D
^w
(a)
x;R
;^z
(a)
R
(r)
E
+
D
^w
(a)
x;G
;^z
(a)
G
(g)
E

; 8x2 X;8i2 I;
^
b
(i)
x;NN
=
(b)

^w
(b)
x;0
+
D
^w
(b)
x;R
;^z
(b)
R
(r)
E
+
D
^w
(b)
x;G
;^z
(b)
G
(g)
E

; 8x2 X;8i2 I;
^k
(i)
t;NN
=
(k
2
)

^w
(k
2
)
0
+

^w
(k
2
)
;
(k
1
)

^w
(k
1
)
0
+
D
^W
(k
1
)
;log(m
(i)
t
)
E

; 8t2 T;8i2 I:
Forecasting is performed assuming that ^a
(i)
x;NN
and
^
b
(i)
x;NN
are constant over time, while
^
k
(i)
t;NN
is
projected with a random walk with drift,8i2 I.
S. Scognamiglio 13 September 2021 21 / 33

Introduction
Model For-
malisation
Numerical
Results
Experiment Design: Human Mortality Database
Data description:
Human Mortality Database (HMD): wesimultaneouslyconsider all populations
i= (r;g)2 IwithjIj= 80 (Male and Female populations of 40 countries) for calendar
years inT=ft2N: 1950t2018g.
Data Partitioning:
ITraining dataTtrain=ft2N: 1950t1999g;
ITest dataTtest=ft2N: 2000t2018g.
We consider 3 dierent networks which dier from each other in thek
(i)
t
-subnet design that
processes the log-mortality curves:
1
LCFCN employs a fully-connected layer;
2
LCLCN employs a 1D locally-connected layer (local-connectivity) ;
3
LCCONV employs a 1D convolutional layer (local-connectivity and parameters sharing).
See Chapter 9 ofGoodfellow, I., Bengio, Y., and Courville, A. (2016). Deep learning. MIT press.
Table:
Model # parameters
LCFCN 5.171
LCLCN 2.771
LCCONV 2.651
S. Scognamiglio 13 September 2021 22 / 33

Introduction
Model For-
malisation
Numerical
Results
Model tting: MSE minimisation
In the rst stage, all the network models are tted minimising the Mean Squared Error (MSE).
The network training involves the minimisation of the following loss function
L( ) =
X
x2X
X
i2I
X
t2T

log(m
(i)
x;t
)
(a)

w
(a)
x;0
+
D
w
(a)
x
;z
(a)
I
E

+

(b)

w
(b)
x;0
+
D
w
(b)
x
;z
(b)
I
E


(k
2
)

w
(k
2
)
0
+

w
(k
2
)
;
(k
1
)

w
(k
1
)
0
+
D
W
(k
1
)
;log(m
(i)
x
)
E

2
:
S. Scognamiglio 13 September 2021 23 / 33

Introduction
Model For-
malisation
Numerical
Results
Forecasting performance
Table:
number of populations and ages in which each network beats the LCSVD model;
forecasting period 2000-2019; MSEs values are in 10
4
.
Model # MSE # Populations# Ages
LCCONVmse 3.41 52/8083/100
LCFCNmse 3.25 59/8084/100
LCLCNmse 3.22 60/8084/100
LCSVD 6.12
S. Scognamiglio 13 September 2021 24 / 33

Introduction
Model For-
malisation
Numerical
Results
Estimates comparisonLTU NZL_NM LVA SVN GBR_NIR EST LUX ISL
CHE ISR SVK DNK FIN GBR_SCO NOR IRL
HUN PRT BLR CZE BEL SWE AUT BGR
ESP POL CAN TWN AUS NLD DEUTE GRC
USA RUS JPN DEUTW FRATNP ITA GBRTENW UKR
02550751000255075100025507510002550751000255075100025507510002550751000255075100
−7.5
−5.0
−2.5
0.0
−7.5
−5.0
−2.5
0.0
−7.5
−5.0
−2.5
0.0
−7.5
−5.0
−2.5
0.0
−7.5
−5.0
−2.5
0.0
Age
value
Model
LC_LCN_mse
LC_SVD
Gender
Female
Male
Figure: Comparison of the LCLCNmse and LCSVD estimates of (a
(i)
x)
x2Xfor all the populations
considered; tting period 1950-1999; countries are sorted by population size in 2000.
S. Scognamiglio 13 September 2021 25 / 33

Introduction
Model For-
malisation
Numerical
Results
Estimates comparisonLC_LCN_mse LC_SVD
0 25 50 75 100 0 25 50 75 100
−7.5
−5.0
−2.5
0.0
Age
value
Gender
Female
Male
Figure: Comparison of the LCLCNmse and LCSVD estimates of (a
(i)
x)
x2Xdistinguishing by model; tting
period 1950-1999.
S. Scognamiglio 13 September 2021 26 / 33

Introduction
Model For-
malisation
Numerical
Results
Estimates comparisonLTU NZL_NM LVA SVN GBR_NIR EST LUX ISL
CHE ISR SVK DNK FIN GBR_SCO NOR IRL
HUN PRT BLR CZE BEL SWE AUT BGR
ESP POL CAN TWN AUS NLD DEUTE GRC
USA RUS JPN DEUTW FRATNP ITA GBRTENW UKR
02550751000255075100025507510002550751000255075100025507510002550751000255075100
0.00
0.25
0.50
0.75
1.00
0.00
0.25
0.50
0.75
1.00
0.00
0.25
0.50
0.75
1.00
0.00
0.25
0.50
0.75
1.00
0.00
0.25
0.50
0.75
1.00
Age
value
Model
LC_LCN_mse
LC_SVD
Gender
Female
Male
Figure: Comparison of the LCLCNmse and LCSVD estimates of (b
(i)
x)
x2Xfor all the populations
considered; tting period 1950-1999; countries are sorted by population size in 2000.
S. Scognamiglio 13 September 2021 27 / 33

Introduction
Model For-
malisation
Numerical
Results
Estimates comparisonLTU NZL_NM LVA SVN GBR_NIR EST LUX ISL
CHE ISR SVK DNK FIN GBR_SCO NOR IRL
HUN PRT BLR CZE BEL SWE AUT BGR
ESP POL CAN TWN AUS NLD DEUTE GRC
USA RUS JPN DEUTW FRATNP ITA GBRTENW UKR
1960 1970 1980 1990 2000 1950 1960 1970 1980 1990 2000 1960 1970 1980 1990 2000 1985 1990 1995 1950 1960 1970 1980 1990 2000 1960 1970 1980 1990 2000 1960 1970 1980 1990 2000 1950 1960 1970 1980 1990 2000
1950 1960 1970 1980 1990 2000 1985 1990 1995 1950 1960 1970 1980 1990 2000 1950 1960 1970 1980 1990 2000 1950 1960 1970 1980 1990 2000 1950 1960 1970 1980 1990 2000 1950 1960 1970 1980 1990 2000 1950 1960 1970 1980 1990 2000
1950 1960 1970 1980 1990 2000 1950 1960 1970 1980 1990 2000 1960 1970 1980 1990 2000 1950 1960 1970 1980 1990 2000 1950 1960 1970 1980 1990 2000 1950 1960 1970 1980 1990 2000 1950 1960 1970 1980 1990 2000 1950 1960 1970 1980 1990 2000
1950 1960 1970 1980 1990 2000 1960 1970 1980 1990 2000 1950 1960 1970 1980 1990 2000 1970 1980 1990 2000 1950 1960 1970 1980 1990 2000 1950 1960 1970 1980 1990 2000 1960 1970 1980 1990 2000 1985 1990 1995
1950 1960 1970 1980 1990 2000 1960 1970 1980 1990 2000 1950 1960 1970 1980 1990 2000 1960 1970 1980 1990 2000 1950 1960 1970 1980 1990 2000 1950 1960 1970 1980 1990 2000 1950 1960 1970 1980 1990 2000 1960 1970 1980 1990 2000
−10
0
10
20
−10
0
10
0
25
50
−50
−25
0
25
50
75
−40
−20
0
20
40
−25
0
25
50
−40
−20
0
20
−30
0
30
60
−25
0
25
50
−40
−20
0
20
−60
−30
0
30
60
−25
0
25
−50
−25
0
25
50
−40
−20
0
20
40
60
−20
−10
0
10
20
−30
0
30
60
−50
−25
0
25
50
−50
−25
0
25
50
−50
−25
0
25
50
75
−25
0
25
50
−50
−25
0
25
−20
0
20
−40
−20
0
20
40
60
−20
0
20
40
−20
−10
0
10
−40
0
40
80
120
−25
0
25
50
−20
−10
0
10
20
30
−25
0
25
50
−10
0
10
20
−10
0
10
20
−20
0
20
40
−50
−25
0
25
50
−20
−10
0
10
20
−40
−20
0
20
−20
0
20
40
−50
0
50
−20
0
20
40
60
−30
0
30
−10
0
10
20
Year
value
Model
LC_LCN_mse
LC_SVD
Gender
Female
Male
Figure: Comparison of the LCLCNmse and LCSVD estimates of (k
(i)
t
)
t2Tfor all the populations
considered; tting period 1950-1999; countries are sorted by population size in 2000.
S. Scognamiglio 13 September 2021 28 / 33

Introduction
Model For-
malisation
Numerical
Results
Model tting: Poisson loss minimisation
Assuming a Poisson number of deathD
(i)
x;t
, we explore the use of the Poisson loss function to
train the neural network models:
D
(i)
x;t
Poisson(E
(i)
x;t
e
m
(i)
x;t);
where
m
(i)
x;t
=

w
(a)
x;0
+
D
w
(a)
x
;z
(a)
I
E

+
+

w
(b)
x;0
+
D
w
(b)
x
;z
(b)
I
E



w
(k
2
)
0
+

w
(k
2
)
;
(k
1
)

w
(k
1
)
0
+
D
W
(k
1
)
;log(m
(i)
x
)
E

:
In this setting, the neural networks model tting involves the minimisation of
L( ) =
X
x2X
X
i2I
X
t2T

E
(i)
x;t
e
m
(i)
x;tD
(i)
x;t
m
(i)
x;t

+c
which corresponds to maximise the log-likelihood function under the Poisson assumption and
c2R.
We use all the same data HMD; however, this time we exclude the Canadian populations since
the data present several missing values in the Exposure to Risk time series. Here, we have
jIj= 78.
S. Scognamiglio 13 September 2021 29 / 33

Introduction
Model For-
malisation
Numerical
Results
Forecasting performance
Table:
number of populations and ages in which each network beats the LCPoisson and
the LCSVD models; forecasting period 2000-2019; MSEs values are in 10
4
.
LCPoisson LCSVD
Model MSE # Populations # Ages # Populations # Ages
LCCONVPoisson3.02 57/78 83/100 64/78 83/100
LCFCNPoisson 3.07 55/78 83/100 63/78 83/100
LCLCNPoisson 2.89 61/78 83/100 67/78 83/100
LCSVD 6.12
LCPoisson 5.19
S. Scognamiglio 13 September 2021 30 / 33

Introduction
Model For-
malisation
Numerical
Results
Forecasting MSEs of the LCLCNPoisson and the LCPoisson
models on dierent populations
Male Female
Country LCLCNPoisson LCPoissonLCLCNPoisson LCPoisson
1 USA 1.15 1.42 0.27 0.50
2 RUS 2.19 8.35 2.23 5.89
3 JPN 0.91 0.45 2.30 0.40
4 DEUTW 0.63 0.80 0.23 0.35
5 FRATNP 0.77 0.52 0.64 0.34
6 ITA 0.49 0.58 0.94 0.24
7 GBRTENW 0.74 1.11 0.66 0.38
8 UKR 2.05 7.19 3.40 3.72
9 ESP 0.80 1.72 0.63 1.27
10 POL 2.61 4.69 0.85 3.29
11 TWN 4.82 10.49 1.42 0.95
12 AUS 0.89 1.14 0.32 0.41
13 NLD 1.11 1.76 0.43 0.35
14 DEUTE 1.82 2.71 0.70 1.45
15 GRC 1.73 3.16 0.55 1.97
16 HUN 3.45 6.01 1.22 1.38
17 PRT 1.33 2.42 0.99 2.01
18 BLR 3.34 12.76 3.47 10.24
19 CZE 2.97 4.68 1.03 2.27
20 BEL 1.56 2.31 0.47 0.51
21 SWE 1.10 1.13 0.25 0.38
22 AUT 1.51 2.57 0.40 0.61
23 BGR 5.83 11.30 2.95 6.14
24 CHE 1.41 1.81 0.32 0.32
25 ISR 2.38 1.85 2.03 1.81
26 SVK 7.20 13.27 3.20 2.54
27 DNK 2.01 2.27 0.53 0.42
28 FIN 3.74 3.73 0.82 1.10
29 GBR SCO 1.69 1.97 0.41 0.67
30 NOR 2.11 3.50 0.71 0.51
31 IRL 3.40 7.82 1.52 2.23
32 LTU 6.59 9.37 9.54 7.60
33 NZLNM 2.50 4.19 0.70 1.19
34 LVA 10.38 11.37 3.00 3.57
35 SVN 10.18 69.32 2.01 4.77
36 GBR NIR 5.75 8.21 1.62 1.80
37 EST 16.05 18.88 3.49 6.88
38 LUX 15.90 43.12 5.42 6.74
39 ISL 19.17 19.98 7.56 7.40
S. Scognamiglio 13 September 2021 31 / 33

Introduction
Model For-
malisation
Numerical
Results
Projected log-mortality surface for the Luxembourg male population
Figure: Projected log-mortality surface of the LCPoisson and LCLCNPoisson models projected for the
Luxembourg male population.
S. Scognamiglio 13 September 2021 32 / 33

Introduction
Model For-
malisation
Numerical
Results
Future works
Future research intends to
1
analyse the performance of the proposed model on other available data
sources such as the United States Mortality Database (USMB) and
insurance portfolio's data;
2
investigate the use of neural networks for tting other stochastic
mortality models:
Ithe single-population models belonging to the family of Generalized Age
Period Cohort (GAPC) models;
Ithe multi-population extensions of the LC model: Li and lee (Demography
2005), Kleinow (IME 2015);
3
explore the potential of the proposed large-scale mortality model in the
actuarial evaluations and longevity risk management.
For advice or comments:
[email protected]
S. Scognamiglio 13 September 2021 33 / 33
Tags