2023 RS_3 - Data Analysis Methodologies.pdf

emanaserges55 8 views 140 slides Feb 27, 2025
Slide 1
Slide 1 of 140
Slide 1
1
Slide 2
2
Slide 3
3
Slide 4
4
Slide 5
5
Slide 6
6
Slide 7
7
Slide 8
8
Slide 9
9
Slide 10
10
Slide 11
11
Slide 12
12
Slide 13
13
Slide 14
14
Slide 15
15
Slide 16
16
Slide 17
17
Slide 18
18
Slide 19
19
Slide 20
20
Slide 21
21
Slide 22
22
Slide 23
23
Slide 24
24
Slide 25
25
Slide 26
26
Slide 27
27
Slide 28
28
Slide 29
29
Slide 30
30
Slide 31
31
Slide 32
32
Slide 33
33
Slide 34
34
Slide 35
35
Slide 36
36
Slide 37
37
Slide 38
38
Slide 39
39
Slide 40
40
Slide 41
41
Slide 42
42
Slide 43
43
Slide 44
44
Slide 45
45
Slide 46
46
Slide 47
47
Slide 48
48
Slide 49
49
Slide 50
50
Slide 51
51
Slide 52
52
Slide 53
53
Slide 54
54
Slide 55
55
Slide 56
56
Slide 57
57
Slide 58
58
Slide 59
59
Slide 60
60
Slide 61
61
Slide 62
62
Slide 63
63
Slide 64
64
Slide 65
65
Slide 66
66
Slide 67
67
Slide 68
68
Slide 69
69
Slide 70
70
Slide 71
71
Slide 72
72
Slide 73
73
Slide 74
74
Slide 75
75
Slide 76
76
Slide 77
77
Slide 78
78
Slide 79
79
Slide 80
80
Slide 81
81
Slide 82
82
Slide 83
83
Slide 84
84
Slide 85
85
Slide 86
86
Slide 87
87
Slide 88
88
Slide 89
89
Slide 90
90
Slide 91
91
Slide 92
92
Slide 93
93
Slide 94
94
Slide 95
95
Slide 96
96
Slide 97
97
Slide 98
98
Slide 99
99
Slide 100
100
Slide 101
101
Slide 102
102
Slide 103
103
Slide 104
104
Slide 105
105
Slide 106
106
Slide 107
107
Slide 108
108
Slide 109
109
Slide 110
110
Slide 111
111
Slide 112
112
Slide 113
113
Slide 114
114
Slide 115
115
Slide 116
116
Slide 117
117
Slide 118
118
Slide 119
119
Slide 120
120
Slide 121
121
Slide 122
122
Slide 123
123
Slide 124
124
Slide 125
125
Slide 126
126
Slide 127
127
Slide 128
128
Slide 129
129
Slide 130
130
Slide 131
131
Slide 132
132
Slide 133
133
Slide 134
134
Slide 135
135
Slide 136
136
Slide 137
137
Slide 138
138
Slide 139
139
Slide 140
140

About This Presentation

Ce document nous permet de comprendre la notion de sécurité routière


Slide Content

Data Collection and Analysis
Methodologies
Road Safety
A.A. 2023-2024
Module 3
Stephen Kome

Mod. 3: DataAnalysis Methodologies
•3-0 Road Safety Data Collection
•3-1 Basic statistical concepts
•3-2 Contingency Tables
•3-3 Safety Performance Functions
Slide 2Mod. 3

3-0 -ROAD SAFETY DATA
COLLECTION
Slide 3

•Reliable and harmonised road accident data are crucial
for defining evidence-based road safety policies and
to monitor performances and assess results
•Data on infrastructures, traffic (exposure), accident
costs, Safety Performance Indicators are also needed
•Also road users must be involved in the information
collection and planning process
•European Union invested a lot of resources in improving
quality and availability of accident data, mainly through
dedicated research projects
•Observatories of different levels (Continental, National,
Regional, Urban) are fundamental tools
Page 4
The problem of accident data

Why collect data?
Define
problems
Identify risk
factors,
priorities
Formulate
strategy
Set targets
Monitor
performance
Slide 5

Which data to collect?
Only 22% of
countries are able to
provide information
on road traffic
fatalities, non-fatal
injuries, economic
impact and selected
SPIs (WHO)
Slide 6

SOURCE AND TYPES OF
DATA
Slide 7

Who are the main sources?
•Police
•Health authorities
•Transport bodies
•Other stakeholders may include:
–national statistics office,
–the insurance industry,
–non-governmental organizations working for road
safety,
–academic institutions…
Slide 8

Sources and types of data (1)
WHO, 2004
Slide 9Mod. 2

Sources and types of data (2)
WHO, 2004
Slide 10Mod. 2

QUALITY OF ACCIDENT DATA
Slide 11

What affects crash data quality?
1.Definitions
2.Reporting/under-reporting of crashes or
injuries
3.Missing data
4.Errors
Slide 12

The definition of road accident fatality
•The classification of the severity of injuries
and crashes vary among countries.
•The range of injury severity categories that
may be used by health professionals or police
officers includes slight/minor, moderate,
serious/severe, and fatal
•The recommended definition of a road traffic
fatality is (WHO 2009):
–“any person killed immediately or dying within 30 days
as a result of a road traffic injury accident, excluding
suicides”
Slide 13

How many countries use ‘died within
30 days’ definition?
Less than half of the 178 countries monitored by WHO use the
recommended definition of a road traffic fatality (WHO, 2009)
Slide 14
When a road fatality is not defined in such a
way, the reported number of fatalities can be
made more accurate by multiplying the reported
number by an appropriate adjustment factor.
(depending on the definition used recommended
by the European Conference of Ministers of
Transport).

Under-reporting of road accidents
•Not all crashes and injuries that occur are
documented in the data system (particularly
for slight injury and PDO).
•Reasons for under-reporting are:
–Police may not be informed when a crash occurs
–Police do not always go to the scene
–Police may go to the crash scene, but not formally
register the crash
–Some data are missing
–Data are not transmitted to Statistical Offices
–Health consequences follow-up not monitored
–Died after 30 days
Slide 15

Methods for assessing under-reporting
•Compare the number of police reports filed on
certain events to those captured in the database.
•Compare the number of road traffic fatalities
and/or injuries counted by one data source,
usually the police database, to those counted in a
survey.
•Compare the number of road traffic fatalities
and/or injuries counted in the police database to
the number counted in other databases.
•Use linkage or capture-recapturemethods to
match records from different databases
Slide 16

Mean level of accident reporting by
injury severity
Elvik & Mysen, 1999
Slide 17

Reporting of road accident injuries in
different countries64
88
21
39
43
37
49
0 20 40 60 80 100
Australia
Canada
Danimarca
Germania
Olanda
Norvegia
USA
Elvik, 2004
Official crash Statistics / Hospitals data
Slide 18

Page 19
In African Countries
1
Source: WHO, SAFERAFRICA Project

Page 20
In African Countries
17
12
27
47 48
116
181
238
85
332
324
229
264
276
241
311
269
281
0
50
100
150
200
250
300
350
400
Congo,
Dem. Rep.
Central
African
Republic
GabonCongo, Rep.Cameroon Chad Sao Tome
and Principe
Angola Central
Africa
Reported fatality rates per million population (2013)Estimated fatality rates per million population (2013)
Source: WHO, SAFERAFRICA Project

Actions On Data Loss
Slide 21Lesson 3 18/12/2024
Lost Information Direct/Indirect Actions
Accidents not detected by the policeCollect health and insurance statistics
Accidents involving only property
damage
Collect data on all accidents by police
authorities
Accidents not reported Local control of reporting procedures
Data not collected in the fieldUse of innovative IT tools, Redefine
collected data
Health consequences monitoring not
controlled
Local control of information exchange
procedures with AUSL (local health
services)
Deaths occurring 30 days after the
accident
Collect health statistics data
Field collection errors, Transcription
errors on ISTAT forms
Use of innovative IT tools
Data difficult to collect Use of innovative IT tools

ITALY CASE STUDY
Slide 22

Example: Accident data collection in
Italy
•Road accidents are recorded by 3 police
bodies:
–National Traffic Police
–Local Police
–Carabinieri
•The data related to injury accidents
collected from each body are sent to ISTAT
•During the process, some is lost
Slide 23

Police bodies in Italy
Urban area Outside urban
area
Property
Damage
Only (PDO)
accidents
Local police
On request: Traffic
Police and Carabinieri
-Traffic Police
-Carabinieri
Exception: Local Police
Injury
Accidents
-Local police
-Traffic police
- Carabinieri
-Traffic Police
- Carabinieri
Slide 24

Accident data collection in Italy
Slide 25
National Traffic
Police
Carabinieri
Carabinieri Headquarter
Local Police
Local
Statistics
Office
National
Institute of
Statistics -
ISTAT
Provincial
Monitoring
Centers
ISTAT
Regional
offices
Headquar
ters
Police Data
Center

The ISTAT Ctt/Inc accident form
Location
Vehicles involved
Crash conditions
Injuries
Drivers
Up to 200 «variables» per accident
Occup
ants
Slide 26

ITS for Data Collection
Page 27

•Creation and implementation of traffic
accident databases and of an information
system for road safety at national level
•Creation of the National Centre for Analysis of
Traffic Accidents
Coordinated by CTL
Main partners: IBSR,
IT, SWOV
Page 28
Example of good practices: Cameroon

Page 29
The actors involved

Page 30
The network architecture

Page 31
The modules
Sfinge
©
Statistical
analysis
Collection and management data
module
Documentation
module
Authentication, roles and
security module
Plan
Module
Online help
Integration and
validation data
Hospital services
Module
Geographic
Images module

Page 32
Screenshots

Page 33
Automatic location

Page 34
Mapping

Page 35
Charts

Page 36
Hospital Data Collection screenshot

Slide 37
Road Safety Databases
http://istat.maps.arcgis.com/apps/MapSeries/index.html?app
id=b34ba84168da4147b810f0d04f59881d
https://ec.europa.eu/transport/road_safety/specialist/statistics
/map-viewer/
https://extranet.who.int/roadsafety/death-on-the-
roads/#deaths//all
World level
European level
Italian level

3-1 BASIC STATISTICAL
CONCEPTS
Slide 38Mod. 3

Main contents
•Safety, units, traits and populations
•Recorded and Expected number of accidents
•Randomand Systematicvariation in accident
counts
•Regression-to-the-mean
Slide 39Mod. 3

What is SAFETY?0
1
2
3
4
200120022003200420052006200720082009
Recorded number of accident at an intrsection in
Perugia
Recorded number of accidents
Here is a count of injury accidents for an
intersection in Perugia.
What is its SAFETY?
Slide 40Mod. 3

What is a UNIT?
•… “what is its safety?” implies that SAFETY
is a property of UNITS
•A Unitcan be:
–a road segment
–an intersection
–Mr. Mario Rossi
–a car
–etc.
Slide 41Mod. 3

What is the safety of a Unit?
•…The number of accidents that has been reported at
a certain location during a certain period?
Slide 420
1
2
3
4
200120022003200420052006200720082009
Recorded number of accident at an intrsection in
Perugia
Recorded number of accidents
•The intersection shows
different values of
accidents, and in general
some fluctuations
•If we use the recorded
number of accidents,
that would mean that
safety improved from
2002 to 2003,
deteriorated from 2003 to
2004 etc.
•The probability that the
intersection is chosen for
interventions depends on
the year taken for
reference
Mod. 3

Observed values: the recorded number
of accidents
•The number of accidents that has been
reported at a certain location during a certain
period
•The recorded number ≠ the “true” number
Slide 43Mod. 3

0
1
2
3
4
200120022003200420052006200720082009
Recorded number of accident at an intrsection in
Perugia
Recorded number of accidents Annual mean 18/12/2024Mod. 2
Slide 44
There are 3 elements in the graph:
1.Observed values ●
2.The invisible (unknown) safety property μ
3.Our estimate of the unknown property ○
Number
of years
Average
value
1 3.0
2 2.5
3 2.7
4 2.3
5 2.0
6 2.0
What if we calculate the average value?

Variation in short-term accidents
frequency
Slide 45Mod. 3

The Recorded Number of accidents is
“not useful” for safety management…
•… because safety changes even if there is no
change in safety-relevant traits. (exposure,
traffic control, physical features, user
demography, etc.).
•Accidents are (thankfully!) rare events and
their pattern exhibits random fluctuations
•We need a definition of the safety of a unit
such that, as long as the ‘safety-relevant’
traits of the unit do not change, it’s ‘safety’
does not change.
Slide 46Mod. 3

What is the safety of a Unit?
The safety property of a unit is “the number of
accidents by type and severity, expectedto
occur on it in a specified period of time.”
(Hauer, 1997)
It will always be denoted by μ and its estimate
by
Slide 47
“ “
Mod. 3

The ‘safety’ of a unit depends on its
‘traits’
•Mass
•Height
•Engine capacity
•Stiffness
•Colour
•…
Slide 48Mod. 3

The ‘safety’ of a unit depends on its
‘traits’
Slide 49
•N°of
approaches
•Type of traffic
control
•AADT
•Number of lanes
•Visibility
•Roadside
conditions
•Road surface
condition
•…..
Mod. 3

What is the link between safety and
traits?
•A trait is ‘safety-related’(s-r) if when it
changes, μ changes.
•Consequence: Units with the same s-r traits
have the same μ (and of course, units that
differ in some s-r traits differ in μ‘s).
Slide 50Mod. 3

Populations
•Units that share sometraits form a
population of units.
•Example: (1) rural, (2) two-lane road
segments in (3) flat terrain
•Because only some traits are common, the
units differ in many safety-related traits and
therefore differ in their μ
Slide 51Mod. 3

Parameters of populations
We will describe the safety of a
populationby:
Mean of μ’s, E{μ}and
Standard deviation of μ’s, σ{μ}
Slide 52Mod. 3

Notational conventions to remember
μ -the expected number of accidents for a
unit
-estimate of μ . Caret above always
means: estimate of ...
-Mean of μ’s in a population of units.
-standard deviation of μ’s in a
population of units.
Slide 53Mod. 3

Variation in accident counts
Random variation
=
variation in the recorded
number of accidents
around a given expected
number of accidents
Systematic variation
=
variation in the expected
number of accidents in
time or space between
given units of observation
(drivers, road sections,
modes of travel, etc)
Slide 54Mod. 3

Why variation is important
Variation must be considered at two critical
points in safety analyses:
1.Identifying the best entities for investment;
2.Evaluating effectiveness of the action.
Slide 55Mod. 3

RANDOM VARIATION
Slide 56Mod. 3

Modelingaccidents with the Binomial
distribution
18/12/2024Mod. 2 Slide 57
Parameters:
n ∈{0,1,2,…} -number of
trials (number of
opportunities for an accident,
i.e. exposure )
p ∈[0,1] -success probability
for each trial (i.e. probability
of an accident, i.e. accident
risk)
The probability of observing k
accidents:
“n choose k”: represents the number
of combinations of selecting (k) items
from a set of (n) distinct items

From the Binomial distribution to the
Poisson distribution
Consider a set of binomial
trials:
1.Each trial has 2 possible
outcomes: success or
failure (not accident or
accident)
2.The probability of
success (or failure) is the
same at each trial
3.The outcome of each trial
is independent of the
outcome of other trials
•When the probability of
success (risk of accident)
goes toward zero, and
•When the number of trials
(exposure) goes toward
infinity, then
•The binomial distribution
will approach the Poisson
distribution
Slide 58Mod. 3

Pure random variation: The Poisson
probability model
•The variance of the accidents counts equals
the mean (E)
Var (x)= 

= E
x= accidents counts
Slide 59x!
)x;p(X





==
e
x Mod. 3

Exercise
•A city’s traffic department reports that in a
particular busy intersection, accidents occur
at an average rate of 2 per week. Let’s
assume the number of accidents follows a
Poisson distribution.
–What is the probability that in a givenweek, there
will be exactly 3 accidents?
–What is the probability that in a givenweek, there
will be 2 or fewer accidents?
–What is the probability that in a givenweek, there
will be more than 4 accidents?
18/12/2024Mod. 2 Slide 60

San Francisco Data (1974-1975)
Number of
Intersections
Number of Accidents
/Intersection In 1974
Average Numberof
Accidents/Intersection
in 1975
553 0 0.54
296 1 0.97
144 2 1.53
65 3 1.97
31 4 2.10
21 5 3.24
9 6 5.67
13 7 4.69
5 8 3.80
2 9 6.50
Average 1.142 intersections 1.09
Accidents counted on 1.142 4-legs Stop sign regulated
intersections in San Francisco
Slide 61
(2 intersections had 13 accidents, one had 16)
Mod. 3

San Francisco Data (1975-1976)
Source: Hauer, E., 1986
Number of
Intersections
Number of Accidents
/Intersection in 1975
Average Numberof
Accidents/Intersection
in 1976
559 0 0.55
286 1 0.98
144 2 1.41
73 3 1.82
35 4 1.97
18 5 2.50
11 6 3.91
9 7 4.22
3 8 2.00
1 9 3.00
2 10 2.50
1 11 5.00
Slide 62Mod. 3

San Francisco Data (1976-1977)
Source: Hauer, E., 1986
Number of
Intersections
Number of Accidents
Per Intersection in 1976
Average Number of
Accidents Per
Intersection in 1977
562 0 0.53
287 1 0.94
155 2 1.37
74 3 1.72
33 4 2.61
13 5 3.00
11 6 2.64
4 7 2.25
1 8 1.00
2 9 3.50
Slide 63Mod. 3

The evolution of the first groups
Slide 64Mod. 3

Regression-to-the-mean (RTM)
•If, in part or in whole as a result ofrandom
variation, an abnormally high or low number
of accidents has been recorded in a specific
period, the number of accidents in the next
period will return to (regress towards) the
long-term expected value
•High numbers go down, low numbers go up
Slide 65Mod. 3

Regression-to-the-mean (RTM) and
RTM Bias
Slide 66Mod. 3

Autre exemple
•Nous avons 100 carrefours dans la même ville
ayant les mêmes caractéristiques (régulation,
flux de trafic, géométrie)
•Le nombre prévu (réel) d'accidents est de 3
accidents par an pour chaque intersection
•En réalité, ils ont des fluctuations aléatoires,
pour lesquelles on peut supposer une
distribution de Poisson :x!
)x;p(X





==
e
x
Slide 67Lesson 3 18/12/2024

Chiffres attendus
Nombre
d’accidents X
Probabilité d’avoir une
intersection avec
l’incidence X
Nombre
d’intersections
attendus avec X
accidents
0 0.0498 5
1 0.1494 15
2 0.2240 22
3 0.2240 22
4 0.1680 17
5 0.1008 10
6 0,0504 5
7 0,0216 2
8 0,0081 1
9 0.0040 1
Slide 68Lesson 3 18/12/2024

Qu'en est-il du traitement de certaines
intersections ?
Nombre
d’accidents
X
Probabilité d’avoir
une intersection avec
X accidents
Nombre
d’intersections
attendus avec X
accidents
0 0.0498 5
1 0.1494 15
2 0.2240 22
3 0.2240 22
4 0.1680 17
5 0.1008 10
6 0,0504 5
7 0,0216 2
8 0,0081 1
9 0.0040 1
Slide 69Lesson 3 18/12/2024

Traitement de certaines intersections...
•Nous introduisons un feu de circulation à la
place de la régulation par un STOP aux
carrefours où se produisent un certain
nombre d'accidents ≥ 5 (19 cas)
•Supposons que les feux de circulation
réduisent les accidents de 10%.
•Combien d'accidents seront évités si un feu
de circulation est mis en place aux carrefours
avec x ≥ 5 ?
Slide 70Lesson 3 18/12/2024

Resultats de l’intervention (1)
•Pour les 19 carrefours signalisés, la valeur
moyenne prévue pour l'année suivante (après
l’intervention) est de 2,7 accidents / an
•Le nombre total d'accidents au cours de la
première année dans les 19 carrefours
signalisés a été de 111
•Le nombre total attendu dans les mêmes 19
intersections après l'introduction du feu de
signalisation est de 2,7 x 19 = 51 accidents
18/12/2024Lesson 3 Slide 71

•La réduction semble avoir été de 54% (de
111 à 51), alors qu'elle est concrètement de
10%.
•Pourquoi ?
•→Si le traitement n'a eu aucun effet et que
rien d'autre n'a changé, combien d'accidents
sont prévus pour la période d’après
traitement" ?
Slide 72Lesson 3 18/12/2024
Resultats de l’intervention (2)

•Avant: 5*10+6*5+7*2+8*1+9*1=111
accidents.
•S’ilestinéfficace, nombred’accidentsprévu
après intervention = 19*3 =57 accidents.
•Réductionattendue= 19*3*(1-0.9)=5.7
accidents.
•111-57 = 54 Regression versla Moyenne !
Slide 73Lesson 3 18/12/2024
Resultats de l’intervention

Statistical modellingin accidents
•PURE RANDOM VARIATION is usually
modelledby the Poisson probability law.
•SYSTEMATIC VARIATION is modelledby
Multivariate statistical models (also known
as Safety Performance Functions) used to
analysefactors that explain systematic
variation of the number of accidents
Slide 74Mod. 3

3-2 CONTINGENCY TABLES
Slide 75Mod. 3

Investigating accident causation
•Case-by-case approach: Accident causes
identified through expert judgement based on
accident reconstruction and causation
analysis
•Statistical approach: the causal relation
between a risk factor and accident
occurrence is not investigated directly, but
inferred from the association between these
two
Slide 76Mod. 3

Measures of association
•Chi-square
•Risk ratio or Relative risk (RR)
•Odds Ratio (OR)
Slide 77Mod. 3

Contingency Tables
•They allow for:
•Analysis of accidents frequency (deaths and
injured) relating with two or more variables
•Assessment of the association between the
variables examined
•They can include both category variables and
quantitative discrete or continuous variables
(divided in classes)
Slide 78Mod. 3

Example of Contingency Tables
Slide 79
Use of
seatbelt
Accident consequence
Total
Fatal Non fatal
No 1,600 162,500 164,100
Yes 500 412,00 412,500
Total 2,100 574,500 576,600
Absolute
Frequency
Marginal
Frequency
Mod. 3

Association between the variables
•The contingency tables allow to assess if
there is an association between the variables
considered
•E.g., in the previous case: is the use of
seatbelt associated with the accident
consequence?
Slide 80Mod. 3

Conditions for association (1)
Slide 81
A
B
B1 … Bj … Bc Total
A1 n11 … n1j … n1c n1o
… … … … … … …
Ai ni1 … nij … nic nio
… … … … … … …
Ar nr1 … nrj … nrc nro
Totalno1 … noj … noc n..
Having a generic contingency table
Mod. 3

Conditions for association (2)
•B is not associatedwith A if nij/niofor each j
fixed does not vary with i
•Thus, in symbols:
Slide 82.........................................
..........
0
1
0
1
10
11
r
r
i
i
n
n
n
n
n
n
==== Mod. 3

In the previous example
Slide 83
Use of
seatbelt
Accident consequence
Total
Fatal Non fatal
No 1,600 162,500 164,100
Yes 500 412,000 412,500
Total 2,100 574,500 576,600
We should compare the same accident
consequence (fatal accident) with the two possible
conditions of seatbelt use (No / Yes)
Mod. 3

The conditions are
•The variable B (Accident consequence) is
associated with the variable A (Use of
seatbelt) if the possible conditions of A (Yes
or No) provide information on the possible
conditions of B (Fatal or Non fatal)
•In numbers:
Slide 84
1,600 / 164,100 = 0,0097  500 / 412,500 = 0,0012
Mod. 3

Degree of association
•It can be measured through statistical tests
calculated based on the difference between
the observed frequencies and the expected
(theoretical) frequencies:
•Observed Frequencies: the effective number of
observations (accidents)
•Theoretical Frequencies: the number of
observations (accidents) expected under the
hypothesis of complete independency between
variables
Slide 85Mod. 3

Theoretical frequencies
•If the variables are not associated, for the given
frequencies the following relationship would be
valid:
•This formula is used to estimate the theoretical
frequencies
•Expected =(row total X column total)/Grand Total
Slide 86
??????
��=
??????
�0⋅??????
0�
??????
Mod. 3

In the previous example
Slide 87
Use of
seatbelt
Accident consequence
Total
Fatal Non fatal
No 1,600 162,500 164,100
Yes 500 412,000 412,500
Total 2,100 574,500 576,600502,163
600,576
500,574100,16400
=

=

=
n
nn
n
ji
ij
Mod. 3

The Pearson 
2
•This statistic is based on the differences
between observed and theoretical
frequencies
•The higher is “Chi-Square”, the higher is the
association between the variables
Slide 88
0 
Variable
associated
Variables
independent
Mod. 3

How to calculate 
2
•nij= observed frequency for the cell (i,j)
•n
ij
= theoretical frequency for the cell (i,j)
•r = number of rows
•c = number of columns
Slide 89

2
= 
r
i = 1

c
j =1
(n
ij
- n
ij
)
2
/ n
ij
Mod. 3

In the previous example
Slide 90
Use of
seatbelt
Accident consequence
Total
Fatal Non fatal
No 1,600 162,500 164,100
Yes 500 412,000 412,500
Total 2,100 574,500 577,600
Exercise: calculate the Chi-square
Result:
Chi-Square = 2,358
Mod. 3

How to interpretate
2
•To assess the degree of association, the estimated
Chi-square has to be compared with a critical Chi-
square tabled based on «degrees of freedom» (df)
and on the significance level (p = 0,001-0,05):
df = (r-1)*(c-1)
•r = number of rows
•c = number of columns
In the previous example: df= (2-1)*(2-1) = 1
There is association if the calculated chi squares
is greater than the estimate
Slide 91Mod. 3

Example
Table 2*2 ➔ df = 1

2
critical = 10.83 /

2
estimated = 2,358
Thus the variables are
associated
Slide 92Mod. 3
This suggests that not using a
seatbelt is associated with a higher
likelihood of fatal injuries.

Risk
•At the road user level, accident involvement
risk is the ratio of two counts, namely, the
number N* of accident-involved road users
and the total number N of all road users
exposed to accident involvement risk during
the study period of one year:
Slide 93
??????=Τ??????

??????
Mod. 3

The Relative Risk (RR)
•RR measures the risk of an event occurring
as a result of exposure to one or more causal
factors (e.g. not using seatbelt)
Slide 94
0 
Positive
association
Negative
association
1
No
association
Mod. 3

The Relative Risk formula
Accident No Accident Total
Exposed a b r1
Not exposed c d r2
Total c1 c2 T
Slide 95
If more than two groups are distinguished (risk factor measured at
several levels), one group (e.g. group 1) may be considered as the
reference group (also termed base group) and the analyst may relate
the risk of the other groups to that of the reference group.
????????????=

�
??????1

�
??????2
Mod. 3

Let’s calculate the relative risk!
18/12/2024Mod. 2 Slide 96
Use of
seatbelt
Accident consequence
Total
Fatal Non fatal
No 1,600 162,500 164,100
Yes 500 412,000 412,500
Total 2,100 574,500 577,600

In the example…
•For a driver not wearing the seatbelt, the risk
of death is approximately 8 times higher than
the risk of death for a driver wearing the
seatbelt.
Slide 97
????????????=

1,600
164,100

500
412,500
= 8.04
Mod. 3

Odds
•At the road user level, Odds are the ratio
between the number N* of accident-involved
road users and the number of road users not
involved in accidents:
Slide 98
??????=Τ??????

(??????−??????

)
Mod. 3

The Odds Ratio (OR)
•OR represents the odds that an outcome will
occur given a particular exposure, compared
to the odds of the outcome occurring in the
absence of that exposure
Slide 99
0 
Positive
association
Negative
association
1
No
association
Mod. 3

The Odds Ratio formula
Accident No Accident Total
Exposed a b r1
Not exposed c d r2
Total c1 c2 T
Slide 100
OR=

�
�

�
�
Mod. 3

Let’s calculate the Odds Ratio!
18/12/2024Mod. 2 Slide 101
Use of
seatbelt
Accident consequence
Total
Fatal Non fatal
No 1,600 162,500 164,100
Yes 500 412,000 412,500
Total 2,100 574,500 577,600

Odds Ratio…
•For a driver not wearing the seatbelt, the
odds of death is approximately 8 times more
frequent than the odds of death for a driver
wearing the seatbelt.
Slide 102
????????????=

1,600
162,500

500
412,000
=8.11
Mod. 3

Levels of association
Slide 103Mod. 3

Exercise 1
18/12/2024Exercise Slide 104
Road
Geometry
Fatal InjuryNon-Fatal
Injury
Total
Straight8 20 28
Curved 18 12 30
Total 26 32 58
1. Calculate the value of Chi-Square 
2

and Determine the association between
road geometry and crash severity
2. Calculate the risk ratio and odds ratio

Exercise 2
18/12/2024Exercise Slide 105
Pavement
Condition
Fatal Injury
Non-Fatal
Injury
Total
Wet 15 25 40
Dry 5 45 50
Frozen 10 10 20
Total 30 80 110
1. Calculate the value of Chi-Square 
2
and
Determine the association between road pavement
conditions and crash severity
2. Calculate the odds of wet vs dry pavement

3-3 SAFETY PERFORMANCE
FUNCTIONS
Slide 106Mod. 3

Summary
•Distribution of accidents
–Poisson distribution
•Statistical modelling of systematic variation
–Estimating the E{μ} and the σ{μ} of a population
–Accident prediction models, Safety Performance
Functions (SPF)
•Statistical modelling of random variation
–The Empirical Bayes method
Slide 107Mod. 3

Variation in accident counts
•Random variation =
variation in the
recorded number of
accidents around a
given expected number
of accidents
•Systematic variation =
variation in the
expected number of
accidents in time or
space between given
units of observation
(drivers, road sections,
modes of travel, etc)
Slide 108Mod. 3

Statistical modelling of systematic
variation
Slide 109Mod. 3

Statistical road safety modelling
(SRSM)
•It is the fitting of a statistical model to data:
Accidents prediction models
•Data are about past accidents and traits for a
set of road elements
•Two uses of SRSM:
–To estimate the expected number of accidents over a
given time period on an infrastructure based on its
traits
–To estimate the change in the expected number of
accidents over a given time period on an
infrastructure caused by a change in its traits
Slide 110Mod. 3

What is a Safety Performance Function
SPM is “a device which for a multitude of
populations provides estimates of two elements:
1.E{μ}, the mean of the μ’sin populations;
2.σ{μ}, the standard deviation of the μ’sin these
populations.”
Hauer, 2014
Slide 111Mod. 3

Accidents Predictions models
Regression models
•Use historic accident data collected at sites
with similar roadway characteristics
•Answer ‘What Is the Relationship Between
the Variables?’
•Equation Used
–1 Numerical Dependent (Response) Variable
–1 or More Numerical or Categorical Independent
(Explanatory) Variables
Slide 112Mod. 3

Model equations
�=�(�
1, �
2,�
3 , ……, �
??????, β
1, β
2, β
3 ,………, β
?????? )
•The SRSM model is that of curve fitting
Slide 113
Dependent
(Response)
Variable
Independent
(Explanatory)
Variable
Parameters
Mod. 3

Variables
•Numerical (or continuous)
•Categorical (or discrete)
–Count data (e.g. 0, 1, 2, 3, …)
–Binomial data (e.g. 0 or 1)
–Censored data (e.g. >0)
•The type of dependent variable largely
determines the type of model
Slide 114Mod. 3

Explanatory variables ( X
i )
•Variables commonly included in accident models:
–A measure of traffic volume, usually AADT
–Variables describing cross-section (lane width, number of
lanes)
–Variables describing traffic control (speed limit is most
common)
–Variables describing type of land use (rural, urban)
•Variables often missing from accident models:
–Traffic volume of pedestrians and cyclists
–Variables describing road user behaviour (exceeding in
speed, etc)
–Variables describing safety measures on the road
Slide 115Mod. 3

Dependent variables ( Y
)
•Total number of accidents (mixing all levels of severity)
•Groups of accidents formed according to (for example):
–Accident severity (property damage, injury, fatal)
–Type of accident (pedestrian, cyclist, motor vehicle only)
–Type and severity combined
•Accident rate (accidents per million vehicle kilometres)
Accident rates are rarely used as dependent variable in
recent models, as any rate relies on an assumption of
linearity that may not be correct
Slide 116Mod. 3

Example of Variables
Slide 117Mod. 3
Inputs
Explanatory variables Response
variable
Traffic Data Infrastructure data Environment dataCrash data
Flow data Speed Key parameters Road quality
parameters
Pavement
- AADT
- Hourly and
daily traffic
- Pedestrian
flows along
and across
- Vehicle
kilometers
travelled
- Traffic flows
for all road
users
(Data for night and
day)
- Free flow
speed
(headwa
y > 5 s)
- Average
speed for
each
road user
(Data for night
and day)
- Segment
length
- Median type
and median
width
- Number of
lanes and lane
width
- Shoulder width
(right shoulder
and left
shoulder)
- Vertical grades
(%)/ degree of
hilliness
- Access density
- Junction
density
- Degree of
horizontal and
vertical
curvature
- Bend density
- Land use type
- Bus stops
Presence of:
- Safety
barriers
- Roadside
hazards
- Road
markings and
signs
- Pedestrian
crossing
facilities
- Sidewalks
- International
roughness
coefficient
- Surface type or
pavement type
- Surface
condition or
pavement
condition index
- Rainfall data:
average rainy
days per year:
monthly rainfall
events
- Wind data:
average number
of windy days
per year (if exist)
A Proxy on post crash
care advancement
would be included
- Total
crashes
- Number of
Fatal
crashes
- Number of
Injury
crashes
- Number of
casualties
(Fatal,
serious
injury) and
type (road
user)
- Injury
casualties
- Vehicle type
for fatalities
- Collision
type
- Accident
location

Variables types example
•Numerical e.g. Speed, volume, lenght
•Categorical (or discrete)
–Count data: number of intersections, number of
parkings
–Binomial data e.gpresence of junction, bus stops,
safety barriers
–Censored data e.gaProxy on post crash care
advancement
Slide 118Mod. 3

Common measures of exposure
•AADT
•Entering vehicles
major, entering vehicles
minor
•Annual kilometres of driving
•Often mixes very different types of road users and may not
include all of them (pedestrians and cyclists are rarely
counted)
•Averages over conditions representing different levels of risk
•Relationship to the number of accidents is often highly non-
linear
•Different composite measures of exposure can be developed
Mod. 3

How to build a SPF
120
Period:1994-1998; Segment Length: 0,5 to
1,0 miles; N=2.228 segments.
AADT Bins
No. of I&F
accidents
No. of 0.5-1.5
mile segments
0-1.000 376 975 0,39
1.000-2.000 445 466 0,95
... ... ... ...
9.000-10.000 102 19 5,37
10.000-11.000 81 18 4,50
... ...
Data
Bins and Computations()μEˆ
Hauer, 2014
18/12/2024 Slide 120
An average segment in this bin had
102/19=5.37 I&F crashes in 5 years.Mod. 3

AADT Bins
0-1.000 0,39
1.000-2.0000,95
... ...
9.000-10.0005,37
10.000-11.0004,50
Ordinate, , is
estimate of
average number
of crashes/
segment in bin ˆ
E{μ} ˆE{μ}
Slide 121Mod. 3

SPM development
Model equation selection
Data for selected
variables
Parameters
estimation
Slide 122
�=�(�
1, �
2,�
3 , ……, �
??????, β
1, β
2, β
3 ,………, β
?????? )
N=??????∗(β
1�
1+ β
2, �
2)
N=??????∗(??????
0 �
1
??????1
)
N=??????∗(�
??????0
�
??????1??????1
)
Mod. 3

The estimate of {}
AADT Bins
I&F
acc.
Segments S
2
... ... ... ......
9K-10K102 19 5.37...35.18 ±5.46
... ... ...μEˆ μσˆ  countsaccidentofmeanSamplecountsaccidentofvarianceSampleμσˆ −=
Slide 123Mod. 3

Safety Performance Function and
AADT
•The most common formulation is
N= a* (AADT)
b
–Depedentvariable is Nis the predicted crash
frequency over a given time period,
–Explanatory variable is AADTthe average
annual daily traffic volume
–a, b regression coefficients
Slide 124Mod. 3

Examples of functions
Effect of flow: Elasticity b
Accidents with injuries 0,911
Car occupants injured 0,962
Injured Motorcyclists 0,749
Cyclists injured 1,079
Pedestrians injured 1,109
Multi-vehicle injury accidents1,032
Single-vehicle accidents 0,804
Fridstrom, 1999 - Norway
Pagina 125Mod. 3

The contribution of traffic volume to
explaining systematic variation of the
number of accidents
Pagina 126Mod. 3

Graphically
Elvik, 20040
20
40
60
80
100
1 50 99
Relative number of accidents
Relative traffic volume
Injury accidents
Fatal accidents
79.4
25.9
Slide 127
N= AADT
b
Mod. 3

Schematically ...
•By varying traffic volume we move along the curve
•Varying other factors will change the slope and / or
the shape of the curve
Traffic volume
Accidents
Slide 128Mod. 3

It’s useful to note that:
Generally, significant increases in traffic volumes,
corresponds to an increase of accidents, but a decrease
in the accident rate (angular coeff. in the figure)
Traffic volume
Accidents
Slide 129Mod. 3

18/12/2024Mod. 1 Pagina 130
Let us test our understanding

Slide 131Mod. 3
Where, A= accident frequency, Q= AADT = traffic volume, L = length of road segment in kilometres, V = mean speed of traffic (miles per hour), G = group of road, equal to
1.000 for group 1, 0.539 for group 2, 0.364 for group 3 and 0.253 for group
A=�.�����
−�
×??????
�.���
×??????
�.���
×??????
�.���
×??????
??????
This equation was derived by Taylor et al., (2002) for European roads to estimates the effects of speed and different variables on accidents
1. What is the general name given to this type of equations?
Let us test our understanding

Slide 132Mod. 3
Where, A= accident frequency, Q= AADT = traffic volume, L = length of road segment in kilometres, V = mean speed of traffic (miles per hour), G = group of road, equal to
1.000 for group 1, 0.539 for group 2, 0.364 for group 3 and 0.253 for group
A=�.�����
−�
×??????
�.���
×??????
�.���
×??????
�.���
×??????
??????
This equation was derived by Taylor et al., (2002) for European roads to estimates the effects of speed and different variables on accidents
1. Whatis the general name given to this type of equations?
Let us test our understanding
Ans: Safety performance function; Crash/accident prediction model

Slide 133Mod. 3
Where, A= accident frequency, Q= AADT = traffic volume, L = length of road segment in kilometres, V = mean speed of traffic (miles per hour), G = group of road, equal to
1.000 for group 1, 0.539 for group 2, 0.364 for group 3 and 0.253 for group
A=�.�����
−�
×??????
�.���
×??????
�.���
×??????
�.���
×??????
??????
This equation was derived by Taylor et al., (2002) for European roads to estimates the effects of speed and different variables on accidents
2. What is the general name given to the variables Q, L, V or G?
Let us test our understanding
a)Dependent variables
b)Explanatory variables
c)Response variable
d)Categorical variables

Slide 134Mod. 3
Where, A= accident frequency, Q= AADT = traffic volume, L = length of road segment in kilometres, V = mean speed of traffic (miles per hour), G = group of road, equal to
1.000 for group 1, 0.539 for group 2, 0.364 for group 3 and 0.253 for group
A=�.�����
−�
×??????
�.���
×??????
�.���
×??????
�.���
×??????
??????
This equation was derived by Taylor et al., (2002) for European roads to estimates the effects of speed and different variables on accidents
2. What is the general name given to the variables Q, L, V or G?
Let us test our understanding
a)Dependent variables
b)Explanatory variables (or predictor variables)
c)Response variable
d)Categorical variables

Slide 135Mod. 3
Where, A= accident frequency, Q= AADT = traffic volume, L = length of road segment in kilometres, V = mean speed of traffic (miles per hour), G = group of road, equal to
1.000 for group 1, 0.539 for group 2, 0.364 for group 3 and 0.253 for group
A=�.�����
−�
×??????
�.���
×??????
�.���
×??????
�.���
×??????
??????
This equation was derived by Taylor et al., (2002) for European roads to estimates the effects of speed and different variables on accidents
3. The variable ??????
??????is best described as a ?
Let us test our understanding
a)Numerical variable
b)Categorial variables
c)Continuous variable

Slide 136Mod. 3
Where, A= accident frequency, Q= AADT = traffic volume, L = length of road segment in kilometres, V = mean speed of traffic (miles per hour), G = group of road, equal to
1.000 for group 1, 0.539 for group 2, 0.364 for group 3 and 0.253 for group
A=�.�����
−�
×??????
�.���
×??????
�.���
×??????
�.���
×??????
??????
This equation was derived by Taylor et al., (2002) for European roads to estimates the effects of speed and different variables on accidents
3. The variable ??????
??????is best described as a ?
Let us test our understanding
a)Numerical variable
b)Categorial variables
c)Continuous variable

Slide 137Mod. 3
Where, A= accident frequency, Q= AADT = traffic volume, L = length of road segment in kilometres, V = mean speed of traffic (miles per hour), G = group of road, equal to
1.000 for group 1, 0.539 for group 2, 0.364 for group 3 and 0.253 for group
A=�.�����
−�
×??????
�.���
×??????
�.���
×??????
�.���
×??????
??????
This equation was derived by Taylor et al., (2002) for European roads to estimates the effects of speed and different variables on accidents
4. Which of these varaibles have the highest effect on
accidents?
Let us test our understanding
a)Volume (Q)
b)Speed (V)
c)Length (L)

Slide 138Mod. 3
Where, A= accident frequency, Q= AADT = traffic volume, L = length of road segment in kilometres, V = mean speed of traffic (miles per hour), G = group of road, equal to
1.000 for group 1, 0.539 for group 2, 0.364 for group 3 and 0.253 for group
A=�.�����
−�
×??????
�.���
×??????
�.���
×??????
�.���
×??????
??????
This equation was derived by Taylor et al., (2002) for European roads to estimates the effects of speed and different variables on accidents
4. Which of these varaibles have the highest effect on
accidents?
Let us test our understanding
a)Volume (Q)
b)Speed (V)
c)Length (L)

Slide 139Mod. 3
Where, A= accident frequency, Q= AADT = traffic volume, L = length of road segment in kilometres, V = mean speed of traffic (miles per hour), G = group of road, equal to
1.000 for group 1, 0.539 for group 2, 0.364 for group 3 and 0.253 for group
A=�.�����
−�
×??????
�.���
×??????
�.���
×??????
�.���
×??????
??????
This equation was derived by Taylor et al., (2002) for European roads to estimates the effects of speed and different variables on accidents
5. Keeping all other variables constant, A 10% change in
one of these variables leads to the same 10% change in
Crashes, which variable is this?
Let us test our understanding
a)Volume (Q)
b)Speed (V)
c)Length (L)

Slide 140Mod. 3
Where,A= accident frequency, Q= AADT = traffic volume, L = length of road segment in kilometres, V = mean speed of traffic (miles per hour), G = group of road, equal to
1.000 for group 1, 0.539 for group 2, 0.364 for group 3 and 0.253 for group
A=�.�����
−�
×??????
�.���
×??????
�.���
×??????
�.���
×??????
??????
This equation was derived by Taylor et al., (2002) for European roads to estimates the effects of speed and different variables on accidents
5. Keeping all other variables constant, A 10% change in
one of these variables leads to the same 10% change in
Crashes, which variable is this?
Let us test our understanding
a)Volume (Q) (10 % change leads to ( 7.2%)
b)Speed (V) (10% change leads to 26.6% change)
c)Length (L) (10% change leads to 10% change)