How is Data Science going to Improve Insurance?

jonsedar 479 views 107 slides Apr 19, 2016
Slide 1
Slide 1 of 107
Slide 1
1
Slide 2
2
Slide 3
3
Slide 4
4
Slide 5
5
Slide 6
6
Slide 7
7
Slide 8
8
Slide 9
9
Slide 10
10
Slide 11
11
Slide 12
12
Slide 13
13
Slide 14
14
Slide 15
15
Slide 16
16
Slide 17
17
Slide 18
18
Slide 19
19
Slide 20
20
Slide 21
21
Slide 22
22
Slide 23
23
Slide 24
24
Slide 25
25
Slide 26
26
Slide 27
27
Slide 28
28
Slide 29
29
Slide 30
30
Slide 31
31
Slide 32
32
Slide 33
33
Slide 34
34
Slide 35
35
Slide 36
36
Slide 37
37
Slide 38
38
Slide 39
39
Slide 40
40
Slide 41
41
Slide 42
42
Slide 43
43
Slide 44
44
Slide 45
45
Slide 46
46
Slide 47
47
Slide 48
48
Slide 49
49
Slide 50
50
Slide 51
51
Slide 52
52
Slide 53
53
Slide 54
54
Slide 55
55
Slide 56
56
Slide 57
57
Slide 58
58
Slide 59
59
Slide 60
60
Slide 61
61
Slide 62
62
Slide 63
63
Slide 64
64
Slide 65
65
Slide 66
66
Slide 67
67
Slide 68
68
Slide 69
69
Slide 70
70
Slide 71
71
Slide 72
72
Slide 73
73
Slide 74
74
Slide 75
75
Slide 76
76
Slide 77
77
Slide 78
78
Slide 79
79
Slide 80
80
Slide 81
81
Slide 82
82
Slide 83
83
Slide 84
84
Slide 85
85
Slide 86
86
Slide 87
87
Slide 88
88
Slide 89
89
Slide 90
90
Slide 91
91
Slide 92
92
Slide 93
93
Slide 94
94
Slide 95
95
Slide 96
96
Slide 97
97
Slide 98
98
Slide 99
99
Slide 100
100
Slide 101
101
Slide 102
102
Slide 103
103
Slide 104
104
Slide 105
105
Slide 106
106
Slide 107
107

About This Presentation

An overview of the applications for data science in insurance: (1) Intelligent use of external data, (2) Advanced but interpretable statistical modelling, (3) Careful use of exotic machine learning. As presented at QCon London 2016 Conference.


Slide Content

> e

HOW IS DATA SCIENCE GOING TO
IMPROVE INSURANCE?

/
MICHAEL CRAWFORD &

JONATHAN SEDAR
APPLIED AILTD 4

HOW ARE WE ALREADY USING
DATA SCIENCE-TO IMPROVE
INSURANCE?

THREE STORIES FROM OUR
EXPERIENCES OVER THE PAST
TWO YEARS.

1. CURATING EXTERNAL DATA TO BETTER
UNDERSTAND CUSTOMERS

2. BRINGING STATISTICAL MODELING INTO THE
BOARDROOM

3. APPLYING MORE EXOTIC MACHINE LEARNING TO THE
BUSINESS

APPLIED Al IS A DATA SCIENCE
CONSULTANCY

WE HELP LEADIN' TE A COMPETITIVE
ADVANTAGI 0) ED ARTIFICIAL
~ INTELLIGENCE

WHO ARE WE?

A SMALL BUT EXPERIENCED TEAM WITH EXPERTISE IN

STATISTICS, MACHINE LEARNING, ACTUARIAL SCIENCE,

SOFTWARE DEVELOPMENT, FINANCIAL SYSTEMS AND
CONSULTANCY.

Michael Crawford

Actuarial Science | Financial Systems | Software
Dev

Jonathan Sedar

Machine Learning | Physical Sciences | Y

Consulting ad

A Team of Expert Practitioners

Quantitative Finance | Statistics | Software Development |
Insurance

FOUND

sa WE MET IN THE PUB
ét … NOT A'MILLION MIPESAN
Nm” WE BONDED OVER BEER & GENERAL GEI GERS

f Ag

AND THOUGHT THAT MACHINE LEARNING &
INSURANCE WERE A NATURAL FIT

: e LETS ASK INSURANCE COMPANIES

e

SO WHAT HAPPENED
NEXT?

PART ONE

CURATING EXTERNAL DATA TO BETTER
' UNDERSTAND CUSTOMERS

OR WHAT HAPPENED WHEN WE CUT EXPERIAN OUT
t OF THE LOOP

CLUSTERING
INTROSPECTION
VISUALISATION

BACKGROUND
WE WORK MAINLY WITH INSURANCE COMPANIES
THEY DON'T HAVE A REPUTATION FOR BEING EXCITING
BUT FROM A DATA SCIENCE POINT OF VIEW...

d pa HI
. = "ud
SN a Gr o) Y A 7 |
A | NEN TY

THE PROBLEM

N 301114
The Great Eastern Life Assurance Co., Ltd,

(imenepernind in the Straits Setthoments.)
HEAD OFFICE: - - SINGAPORE.

y TEE {
Ng Hop Choon

< ie. pp qa nr gegen Due 1
Le ce cmt 22e tga eas terms of the Policy and

endorsed on the

neve aoe On matt a a

eae ee

4 th eet E pm pd Director.

TERM INSURANCES,

WHEN AN INSURANCE COMPANY SETS A POLICY UP:

+ IT PAYS A COMMISSION TO THE BROKER WHO SOLD
THE POLICY

+ IT MAY HAVE TO SEND YOU FOR A MEDICAL - AND
PAY

+ IT INCURS ADMINISTRATION & REGULATORY
EXPENSES

IT'S 2 YEARS BEFORE THE INSURER IS IN THE BLACK
SO THEY REALLY WANT YOU TO STICK AROUND

THE THING IS N

PEOPLE DON'T WANT TO STICK AROUND!
+ IN THE RECESSION THEY WERE DROPPING LIKE FLIES
+ WE WERE ASKED IF WE COULD FIGURE OUT WHY
+ AND TRY FIND WAYS TO REDUCE IT
= We used survival analysis (of which more later)

= After a few weeks we had a good model

ALONG THE WAY WE NOTICED SOMETHING

nee UE ce
WAS SKY-HIGHIN
Pneu NEW ESTATES - = |

GEOGRAPHIC
PRREUTS

HOW CAN WE USE THESE
EFFECTS?

+ TO ENCOURAGE CUSTOMERS TO STAY

+ TO HELP PRICE RISK

+ IDENTIFY NEW MARKETS

WE CALLED EXPERIAN...

IT WAS A SHORT CONVERSATION

I'D LIKE SOCIOECONOMIC INFORMATION FOR 250K
ADDRESSES

HOW MUCH!?

i + A Y
fen ri |
CHEAPER & BETTER

OURSELVES

GEOCODING

FIRST LET'S GEOCODE OUR ADDRESSES
WE HAD TWO CHOICES:
+ USE GOOGLE - WHICH YOU PAY FOR
+ USE NOMINATIM - FOSS / ROLL YOUR OWN
WE TRIED BOTH:
« FOR IRELAND, GOOGLE IS BETTER
+ MAINLY BECAUSE ...

| Aa
ESSES ARE

PATHOLOGICAL!

BUT WE NOW HAVE A LAT / LONG FOR EACH CLIENT

A

AE 6

SHOPPING FOR DATA

¡Bl ITIS ASTOUNDING

5 NE y; Lhd | | '
| ATION |
700 il ¡TURES

P emes ! ff
Theme Subject’ - Theme Subject
S S ial class | |
10 A Emi {
| Commuting
at | Health
13 Occupation
Industries

15 C Wu "A

SMALL AREA MAPS

THE SMALLEST OUTPUT AREA FOR CENSUS DATA
+ -20,000 SMALL AREAS COVERING IRELAND
+ EACH COVERS APPROX. 200 PEOPLE
« EACH CENSUS FEATURE AVAILABLE AT THIS LEVEL

THIS IS THE POWER WE WERE
LOOKING FOR!

a

BUT ALSO:

« WE WOULD HAVE THE

+ WE COULD INTEGRATI
PROJECT

Se
+ WE COULD TUNE IT TO FIT OUR PARTICULAR NEEDS
~

«

=
el — 2

NOT A TRIVIAL TASK

IT'S HARD TO MAKE SENSE OF THIS MUCH DATA:
« THERE ARE 18,488 SMALL AREA MAPS
+ EACH SMALL AREA MAP IS REPRESENTED BY A ROW

« EACH ROW HAS 767 ENTRIES ONE FOR EACH
FEATURE

WHAT WE HAVE ISA
REALLY BIG MATRIX

WITH 18,488 ROWS & 767 COLUMNS

LINEAR ALGEBRA
TO THE RESCUE

. SOW SCRIBE BOTH USING A SINGLE COLUMN
(WITH SOME MINOR LOSS OF INFO)

m= MM

_ SD LETS US SHRINK DOWN TO
100 COLUMNS
+ AND RETAIN 80% OF THE FULL INFORMATION
+ THIS IS STILL PRETTY HARD TO VISUALISE

VISUALISATIONAN
DATA EXPLORATION

HUMAN INTERPRETATION IS
OFTEN VITAL

WE WANT TO SEE STRUCTURE IN THE DATA, BUT
VIEWING 100 DIMS IS STILL TRICKY

+ USE UNSUPERVISED LEARNING

« T-DISTRIBUTED STOCHASTIC NEIGHBOR EMBEDDING
(T-SNE)

+ CREATE A 2D REPRESENTATION OF ND SPACE

OVERLAY KNOWN CLASSES

+ Instantly see
grouping in the
regions

+ The tSNE was
fitted using full
census data, but
not the region id

+ Yet we see the
regions have
similarity

AGGLOMERATIVE HIERARCHICAL
CLUSTERING

+ Group nearby
datapoints into
progressively
larger clusters

+ Get anested
hierarchy of
clusters

+ Choose your
level

INTERESTING STRUCTURE!

¢ Clustering was
entirely
unsupervised

+. ie. determined
only by the data
itself

+ Nowwe need to
understand what
the clusters mean

FIRST TRICK: MA

Re —

= E

ae *

k LA.
4

N

es
aan. |

EA
a À > e À 1
o , bs Y BR
IFYOU LIVE INR D-VOUCAN MAREA GOOD
G A

ESS". ise >
ns | BR. > a à DAS ho

INTERPRETING THE CLUSTERS

+ Custom
visualisation of
raw feature
proportions per
cluster

e Requires lots of
careful evaluation

— + Incorporate
expert opinion

RESULT
VALUABLE GEO-
SOCIOECONOMIC

INDICATORS FOR USE

IN OTHER PROJECTS

"WHAT'S OUR CHURN RATE?"

SEVERAL DIFFERENT WAYS TO MEASURE CHURN
PARTICULARLY IMPORTANT IN LIFE INSURANCE

EUREKA LIFE INSURANCE COMPANY of Baltimore, Md.
INFANTILE BONUS ADDITIONAL POLICY
Table A
mount payable if death occur after policy has been in force the following periods. Weekly Premium Ten Con
VAART ee | IA

4 eS
walaslsal tel 828 |
Ha A EL S OUR al TEL hf it
: a > o A
$165 30$ 508 7018 908110 $160 IRIS (; 38 $28 $268
| 30)" 50! 70) 90) 110 160) 240 | 40 30 270
| 40! 70 90) 110) 160! 240 | 42 32 272
| 50! 90! 110! 160) 240) ) CY TIC 34 274
| 60) 110 160 240| | q 46 36 276
| 70 130, 240 ij 48 38 278
| 80, 240) Es 30 40 - 280
90! 240, | 52: 048 282
100, 240) | 54 44 284
tert lad lune de D el a wl one pul de mat

O
ll our GA

fog
E
>

gr A mr

SURVIVAL ANALYSIS

A WELL-STUDIED PART OF MEDICAL STATISTICS
BUT NOT USUALLY DISCUSSED IN MACHINE LEARNING
SEEMS TO HAVE BEEN OVERLOOKED

THE BASIC IDEA:

+ Take a cohort at
some start date

+ Count the prop.
remaining at
subsequent dates

+ Simply draw or fit
the line

Time until first lapse for protection policies

SIMPLEST VERSION: KAPLAN-
MEIER

« Simple count-
based
description of
events

+ Draw the line
per group

+ Prediction not
possible

A MODEL-BASED VERSION: COX
PROPORTIONAL HAZARDS

Fit semi-
parametric linear
model

+ Learn effect of
feature values

+ Make predictions
on new data

OF SURVIVAL OVER

INCREASE THE COMPLEXITY TO
BETTER FIT THE REAL-WORLD

Survival
Regression

+ Allow hazards to
vary with time:
Aalen Additive

+ Bayesian
approach:
Gaussian
Processes

D
3
THE BEST MODELS®
PRESERVE) @ Y
UNCERTAIN O /2

a 4

BAYESI - PL
AE A er |

A NATURAL ALTERNATIVE TO TRADITIONAL
(FREQUENTIST) STATISTICS

MADE POSSIBLE THROUGH COMPUTATIONAL POWER

FAR TOO MUCH DETAIL FOR HERE

FAR TOO MUCH DETAIL FOR HERE

FAR TOO MUCH DETAIL FOR HERE

FAR TOO MUCH DETAIL FOR HERE

BIGGEST WIN: MODEL
INTROSPECTION

Design structure of
model to reflect real
processes &
relationships

Inform the model
using prior
knowledge

Use qualified results
and predictions
Maintain uncertainty

STATE OF THE ART ACADEMIC
TOOLS & TECHNIQUES OPEN TO
ALL

¢ Basic non-sampling
frameworks: scikit-
learn, arm, nlme etc

+ Advanced model
frameworks and
samplers: STAN,
PyMC3, emcee

+ Probabilistic graphical
models: gRain, pgmpy

IVE POLICY CLAIMS
HE PAST 10 YEARS,
IDE FOR CLAIMS
COHORT, AND TO

IS FOR LIFE INSURANCE
HICH PROSPECTIVE

RE-SCREENING, AND
) ON NEW EVIDENCE?

y

Vi

NEYPIT

"OUR COMPANY HAS BEEN USING
DATA FOR YEARS"
SQL-& SPREADSHEETS
BHDASHBOARDS
EXPERFARULE BASED-SYSFEMS

[ICH
E TALKING ABOUT

“INJBLLIGENTLY LEARNING FROM
ATA

SP TRAINING MOD 2 Le SLA

DISCOVERING CORRELATIONS, Soto = Y
OF BEHAVIOR Y

BUSINESS MORE EFFECTIVELY A (=

OD
E TALKING ABOUT

“INTELLIGENTLY LEARNING FROM

> > D TA y
” SELLINGANSIGHTS AND DATA TA PRODUCTS BACKINTO
THE MARKET
4
EXTRA! MATION FROM Aa! BIG DATA 4
U'RE COLLECTING

O,

/
D”
>

NLP UPON UNCONVENTIONAL
DATA SOURCES

Succession Plan

kenneth [email protected] No

Ris my great pleasure to announce that the Board has accepted my recommendation to appoint Jeff Skiling as chief execut
officer, effective at the time of their next board meeting on February 12, 2001. Jeff wil also retain his duties as president and chiet
‘operating officer. | wil continue as chairman of the Board and wil remain at Enron, working with Jeff onthe strategic direction of
the company and our day-to-day global operatos

Jef will assume the role at a time when the company is hiting on al cylinders and is positioned for phenomenal growth. Ho is
Clearly ready forthe job, and after 15 years as CEO ofthis great company, Im ready for a somewhat diferent roo.

Our succossion plan has been clea for some time. This has afforded Jeff and me the freedom to combine our strengths toward
building a company that continues to excoed everyone's expectations, including our own, We ook forward to furthering that
relationship as Jeff expands his ro

"There are no plans for any other changes in our management team nor broad shits in strategy and direction,

Please join me in congratulating Jett. 1 look forward to a great 2001

AFTER CAREFUL CLEANING AND
PREPROCESSING

v
Office Memrandum + UNYTED S“ATES GOVERNMENT
77

DATE: July

MODEL THE TOPICS OF
CONVERSATION WITHIN A
CORPUS

. mao « Topic modelling (Latent
Dirichlet Allocation)

+ Cluster words into
topics by their co-
occurence

+ Gain anew way to
define the contents of

documents

111

USE TOPICS AS A METHOD TO
INTROSPECT THE HEARTBEAT OF
COMMUNICATIONS

+ View trends of topics
by time

+ Correlate topics to
internal / external
events

+ Use topics in other
models e.g. predict
customer satisfaction
& longevity according
to topics

USE THE FROM: AND TO: TO
CREATE A NETWORK / GRAPH

+ Nodes are email
accounts / people

+ Edges are the
communications
between them

+ Overall counts /
frequency / time of day /
season

ATTACH THOSE TOPICS FROM
BEFORE TO GAIN A RICH PICTURE
OF COMMUNICATIONS

+ Non-invasive
introspection of the
state of the
organisation

+ Relevant to networks
of brokers,
employees,
customers, reinsurers
etc.

OUTLIER 7 FRAUD DETECTION

+ Describing data in a
vector space and
assigning outlier flags to
unusual points

« (or use graphs / trees)

+ Triage these first in
fraud or operations
investigations

SUPERVISED CLASSIFICATION

+ Good old-fashioned
prediction

« Product-market fit

+ Operational
improvements

+ Meeting compliance
etc.

INTERACTIVE DASHBOARDING

+ Surface descriptive &
predictive insights to
the business to enable
better decision making

+ Lightweight Javascript
frameworks are
powerful and easy to
use

O OO LOL
> MAT te

4 9 Sn AD ©)

IH SUN

) TOR VYOO@C(

IN SUMMARY

DATA SCIENCE IS APPLICABLE
THROUGHOUT THE INSURANCE BUSINESS

WE ARE PROVING THIS EVERY DAY THROUGH OUR
PROJECTS

1. CURATING EXTERNAL DATA TO
BETTER UNDERSTAND
CUSTOMERS

2. BRINGING STATISTICAL
MODELING INTO THE
BOARDROOM

3. APPLYING MORE EXOTIC
MACHINE LEARNING ACROSS
THE BUSINESS

Fig. 349. — Batteuse Damey à manège direct placé sous la batteuse.

Fig. 349. — Batteuse Damey à manège direct placé sous la batteuse.
Tags