St Lucia Road Transport Policy and Strategic Roadmap - Final Transportation Model Report - REPORT (2) (1).pdf
mhwcrrpdigital
0 views
35 slides
Sep 27, 2025
Slide 1 of 35
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
About This Presentation
The "Final Transportation Model Report" presents a data-driven analysis to inform the development of an integrated and sustainable road transport policy for Saint Lucia, with a specific focus on the feasibility of a Park and Ride (P&R) facility. The report is based on a survey of 489 c...
The "Final Transportation Model Report" presents a data-driven analysis to inform the development of an integrated and sustainable road transport policy for Saint Lucia, with a specific focus on the feasibility of a Park and Ride (P&R) facility. The report is based on a survey of 489 commuters conducted from May to October 2023, which collected data on demographics, commute habits, and willingness to use the proposed P&R system.
Using analytical methods like logistic regression and decision tree models, the study identified key factors influencing commuter behavior.
Key findings include:
• P&R User Choice: The likelihood of commuters using a P&R facility is positively correlated with high personal fuel costs but also linked to low associated public transport costs, suggesting the service is viewed as a complement to, not a substitute for, existing options. Other significant predictors for adoption include longer commute times, county of residence (origin), and shorter wait times at the facility. Affirmative responses for P&R use were strongly associated with choosing Castries as a destination.
• Mode Choice: The decision to use public transport is primarily influenced by lower fuel spending, moderate public transport costs (EC$10-EC$20), and the number of mode changes required during a journey.
• Trip Generation: The Castries-Gros Islet corridor is the dominant traffic artery, with Castries identified as the single highest-gravity destination with a net inflow of traffic. A commuter's point of origin was the most critical factor in predicting their destination.
The report recommends prioritizing P&R facilities along the main routes into Castries, ensuring they are secure, integrated with efficient shuttle services, and priced to be an attractive alternative to driving personal vehicles
Size: 1.17 MB
Language: en
Added: Sep 27, 2025
Slides: 35 pages
Slide Content
Department of Infrastructure, Ports and
Transport of Saint Lucia
Millennium Highway and West Coast
Road Reconstruction Project
Formulation of an Integrated,
Sustainable Road Transport
Policy and Strategic Roadmap
for Implementation, 73599-C-4
Final Transportation
Model Report
By
LF Systems Ltd
June 4, 2024
L F Systems Ltd (i)
TABLE OF CONTENTS
Chapter Page
1 INTRODUCTION……………………………………………………… …………… 1
1.1 BACKGROUND ........................................................................................................... 1
1.2 OBJECTIVES OF THE SURVEY CONTENT. ............................................................ 1
1.3 METHODOLOGY..........................................................................................................2
1.3.1 Survey Design.................................................................................................................2
1.3.2 Questionnaire Development...........................................................................................2
1.3.3 Dissemination Methods..................................................................................................3
1.3.4 Rationale for Dissemination Approach..........................................................................4
1.4 RESULTS AND OUTCOMES ......................................................................................4
1.4.1 Principal Components Analysis.....................................................................................4
1.5 SYNTHESIS OF RESPONDENT DATA …………………………………………….
2 METHODOLOGY AND RESULTS OF SURVEY ANALYSIS ……………....… 12
2.1 INTRODUCTION.........................................................................................................12
2.2 USER CHOICE.............................................................................................................13
2.3 AGE AND GENDER CONSIDERATIONS ................................................................18
2.4 MODE CHOICE...........................................................................................................20
2.4.1 Logit Regression ..........................................................................................................20
2.4.2 Decision Tree Analysis.................................................................................................23
2.5 TRANSPORT DEMAND AND TRIP GENERATION ...............................................26
2.6 SUMMARY AND DISCUSSION ................................................................................30
L F Systems Ltd (ii)
LIST OF FIGURES
Figure Page
Figure 1.1 Component Eigenvalues for all extracted factors...........................................6
Figure 1.2 Cumulative variance of Principal Components for Eigenvalues greater
than 1..............................................................................................................6
Figure 1.3 Highest |CL| for component and its respective response parameter...............7
LIST OF CHARTS
Chart Page
Chart 2.1 Logit Estimate of Use Choice.......................................................................14
Chart 2.2 Fuel Cost vs. Predicted Use Choice..............................................................14
Chart 2.3 Public Transport Costs vs. Predicted Use Choice.........................................15
Chart 2.4 Variable Importance......................................................................................18
Chart 2.5 SVM Classification of Age and Gender vs. Use Choice...............................20
Chart 2.6 Fuel Spending vs. Public Transport Use........................................................21
Chart 2.7 Public Transport Spending vs. Public Transport Use....................................22
Chart 2.8 Classification Tree.........................................................................................24
Chart 2.9 Random Forest Variable Importance.............................................................25
Chart 2.10 Trimmed Tree................................................................................................26
Chart 2.11 Regional Net Flow Indicator..........................................................................28
Chart 2.12 Variable Importance.......................................................................................30
L F Systems Ltd (iii)
LIST OF TABLES
Table Page
Table 1.1 Frequency of Responses by Combined Origin/Destination of
Geographical Locations………………………………………………………7
Table 1.2 Key Responses of Participant Data with Either an Origin or Destination
of Castries or Gros Islet………………………………………………………8
Table 2.1 Sensitivity and Specificity...............................................................................16
Table 2.2 Decision Tree Model of Use Choice...............................................................16
Table 2.3 Summary Alphanumeric Decision Tree Results.............................................17
Table 2.4 Gender and Age Summary Statistics ..............................................................19
Table 2.5 Classification Table.........................................................................................22
Table 2.6 Critical Tree Paths...........................................................................................24
Table 2.7 Regional Destination and Origin Count..........................................................27
Table 2.8 Decision Rules for Destination Choice...........................................................29
The development of an efficient and sustainable transportation system is crucial for the well-being
and economic growth of any region. In the case of Saint Lucia, the need to comprehensively
understand the transportation requirements and preferences of its residents provided the impetus
for creation of a survey instrument. Its design was specific to gathering valuable insights from the
public which will inform the formulation of an effective transportation policy.
Rationale
The primary motivation for the creation of the survey was based on the data-scarce environment
relating to transportation in Saint Lucia and its perception by the citizenry. In remedying this, a
carefully constructed survey was developed to bridge this gap through the collection of relevant
information directly from the public. Accurate and current data captured through this mechanism
was deemed crucial for the formulation of informed and data-driven decisions, foundational
towards creating evidence-based policies.
1.2 Objectives of the Survey Content
The Ministry
1
survey contained 31 questions. The data in the survey takes various alphanumeric
forms, and 489 unique responses were collected. The content of the survey was designed to
facilitate estimation of classification problems related to use choice, mode choice and trip
generation characteristics of the survey respondents. The content of the questionnaire emphasizes
estimating demand for the proposed structural intervention i.e., a Park and Ride (P&R) facility.
The responses collected via the survey are intended to provide analytical input for estimation.
Therefore, the questions in the survey are guided by the variables expected to be utilised in the
estimation models. Basically, three categories of data are expressed in the questionnaire:
Participant Characteristics, Commute Characteristics and Characteristics of the proposed
intervention (e.g., the P&R). The questions providing information about these categories in the
survey would provide the data necessary for estimation.
For the estimation of use choice, a specific question identifying whether a respondent is amenable
to using the P&R facility is included. This is converted to a binary variable for estimating via the
classification methods used. Regarding modal split, specific questions about utilisation of public
or private transport options are included in the survey and can be used for estimation similarly to
what was done for use choice. Trip generation analysis will leverage geographic information
regarding origin and destinations of respondents versus factors driving choice of destination. Other
important modelling features included in the survey include travel times, willingness to pay for
time saved, spending on fuel, and spending on public transport. Additionally, machine learning
algorithms will be applied to specifically categorise and estimate explicitly non-numeric data such
1
Ministry of Infrastructure, Ports, Transport, Physical Development and Urban Renewal od St. Lucia (MI henceforward).
L F Systems Ltd 2
as the named stops on a journey. Finally, data reflecting age and gender will be obtainable from
the survey responses.
1.3 Methodology
1.3.1 Survey Design
The survey design follows a scientific approach, ensuring that data captured is both dependable
and relevant to the goals and aims of the project. Key considerations in its development include:
• Literature Review: A comprehensive review of existing literature on the island’s
transportation systems, urban planning, and appropriate strategies in survey methodologies
to best suite the citizenry on gathering data on transportation issues was conducted. This
insured that the survey instrument was aptly aligned with the established best practices in
the field.
• Clarity of Survey Objectives: Establishment of the key guiding objectives of the survey –
aimed at understanding the transportation habits, challenges, and preferences of the
citizenry, also targeting the acquisition of additional insights which further informed the
policy.
• Question Structure: Towards better enabling further numerical analysis and modelling,
close-ended quantitative and qualitative questions were employed, incorporating aspects
of logical branching and multi-response designs to provide a deeper context.
• Questionnaire Piloting: Prior to deployment, a limited and internal testing phase was
conducted to identify ambiguities and survey issues before going live. This phase also
informed the survey’s augmentation to better probe perceptions of future development of
the island’s transportation infrastructure.
• Ethical Considerations: Ethical principles, including informed consent, confidentiality, and
data protection, were strictly adhered to throughout the survey process. Participants were
provided with clear information about the purpose of the survey, their rights as respondents,
and how their data would be used.
1.3.2 Questionnaire Development
The survey was designed to cover several thematic areas relating to the present transportation
infrastructure; its needs and weaknesses, along with perceptions of future plans and beneficial
mechanisms which can be integrated into their current transportation systems.
• Demographics: Basic information on age, sex and physical disabilities were collected to
better understand the transportation exigencies of the citizenry.
L F Systems Ltd 3
• Current Transportation Habits: This was ascertained over several survey sections covering
origin-destination pathways and determined travelling distance, modes of transportation,
frequency of use, time spent in commuting as well as its economic cost.
• Challenges and Concerns: Questions were designed to directly probe such matters, with
particular emphasis on existing public transport systems as it relates to safety, accessibility,
convenience, and affordability.
• Desired Improvements: One of the key objectives of the survey was to delve into the
identification of solutions based on the issues found to envision enhanced transportation
systems which will encourage its use. This also incorporated the presentation of potential
transit system augmentations and their supporting facilities.
1.3.3 Dissemination Methods
To maximize participation and reduce the cost of administering the survey, a digital online form
was adopted. The solution implemented was able to incorporate the key elements of the survey
design and question structure. Communicating access to the form was done via:
• Promotional Flyers: Designed to be visually appealing, flyers were posted at key, high
traffic public locations to encourage survey participation. Two editions – implementing
distinct design components, were created to initially attract then reengage potential
participants upon posting of its second issue. Both editions illustrated several ways for
accessing the survey form and further information on the project and its objectives.
• Quick Response (QR) Codes: The flyer was the primary illustrative tool for the display of
a QR Code designed for accessing the survey form directly through optical scanning using
a mobile smart device. The application of branding and clear code design was adhered to
– ensuring ease of capture and compatibility.
• Informational Webpage: Considering the online approach of administering the survey,
providing sufficient project information was necessary for encouraging citizen
participation. An informational webpage was constructed which outlined the survey’s
objectives, and a brief of the overall project.
• Autoresponder Systems: The promotional flyer was used as the primary mechanism for
communicating a local phone number, used to provide access to the survey. This was
carried out through configuring custom applications designed to listen for incoming
messages via Short Message Service (SMS) and commonly used instant messaging
applications such as WhatsApp, Telegram, Viber and Signal. Once received, a
corresponding message containing links to both the survey form and the information page
would be sent to the participant as shortened, custom-branded URLs.
• Researchers on the Ground: Through the use of advertised QR Codes and Instant
messaging pathways, a wide demographic reach was achieved. Towards further expanding
L F Systems Ltd 4
on this, researchers were hired to administer the digital survey with persons who either
were not originally willing to participate, or those who are not inclined to access or use an
online survey.
1.3.4 Rationale for Dissemination Approach
• Reliability: Both the online survey form and informational webpage boasted 100% uptime
during its deployment. Autoresponder systems closely matched this degree of
dependability at 99.6%, accommodating scheduled downtime for maintenance.
• Wider Reach: Through the above hybridized dissemination methods, participation was
maximized, covering a wide range and varied demographic of the citizenry. This included
persons over the geographic entirety of the island and covered persons who are both adept
in the use of technology for accessing the online form, and those who are not through
researchers on the ground.
• Cost Effectiveness: Using an online dissemination and education platform significantly
reduced the cost of traditional survey administration, associated data entry and its inherent
errors. Implementing autoresponder technology to SMS and other platforms further
realized cost savings by circumventing the use of a messaging/notification service
providers and any negative perceptions commonly associated with the use of unsolicited
messaging services.
1.4 Results and Outcomes
The survey aimed to achieve the following outcomes:
• Derive a comprehensive understanding of the current state of the transportation system in
Saint Lucia, including existing challenges, preferences, and priorities of the residents.
• Identification of key areas of improvement and intervention, informed by the insights and
feedback gathered from the survey respondents.
• Formulate data-driven recommendations for the development of a transportation policy
that is responsive to the needs and aspirations of the Saint Lucian population.
1.4.1 Principal Components Analysis
In total four hundred and eighty-three (483) responses were totalled over a six-month period
between May to October 2023. This data was then transformed into 167 binary response indicators
for each of the completed surveys received – each expressing the selection event, or lack thereof,
of a response by the participant.
Given this large data array, analytical focus was then placed on Factor Analysis using Principal
Components Analysis (PCA) towards the following outcomes:
L F Systems Ltd 5
• Dimension Reduction: PCA condensed the original survey data into a smaller set of
uncorrelated variables, i.e., components; thereby simplifying the response database’s
complexity while maintaining a high degree of its statistical variability.
• Pattern Identification: Underlying patterns and relationships among participant responses
were identified, with further uncovering of latent factors which drive respondents’
preferences, behaviours and challenges related to transportation.
• Enhanced Interpretability: Through dimension reduction, components were easier to
interpret than the entirety of the response database. Towards identifying these key data
aspects, focus was narrowed on a limited set of key data dimensions when formulating
policy.
• Multidimensional Insights: PCA allowed for the exploration of multiple data dimensions
simultaneously by capturing response nuances across various aspects of transportation, as
captured by the survey.
Towards limiting the discovery of statistically apparent correlations, the Pearson Correlation
Coefficient (PCC) was first done. This effectively removed any analytical biasing away from
determining latent factors. Data suitability analyses were further conducted using the Kaiser-
Meyer-Olkin test for sampling adequacy and Bartlett’s sphericity test for correlation. Both
standards indicated favourably for the continuation of Factor Analysis on the response database.
The Principal Components Analysis resulted in the extraction of 52 principal components having
an eigenvalue of one or greater, following Varimax Rotation. The scree plot representing this is
illustrated in Figure 1.1. Each component loading was then assessed against their respective
responses to better identify those which are statistically uncorrelated and represent a high degree
(72.8%) of the total response variability. This cumulative variability of extracted components and
their component loadings (CL) on survey responses are illustrated in Figures 1.2 and 1.3
respectively.
L F Systems Ltd 6
Figure 1.1: Component Eigenvalues for all extracted factors
Figure 1.2: Cumulative variance of Principal Components for Eigenvalues greater than 1
L F Systems Ltd 7
Figure 1.3: Highest |CL| for component and its respective response parameter
The identified survey response parameters – as extracted from the PCA analysis, streamlined the
complexity of the survey database, which allowed for the determination of underlying data patterns
for use in evidence-based policy formulation. Through placing focus on these extracted
parameters, prioritization of these key data dimensions allowed for a targeted approach founded
on the perceptions of Saint Lucia’s citizenry.
1.5 Synthesis of Respondent Data
Frequency of responses by combined origin/destination to gather the geographical locations which
most correspond to survey results to best tailor analysis has been shown in Table 1.1:
Based on the frequency of responses, Table 1.2 shows participant data with either an origin or
destination of Castries or Gros Islet were analysed – with focus being placed on responses which
total greater than 1% of the expressed survey contribution (for both communities), the following
table summarises the key responses which were used to deeper investigate data trends for the
construction of policy recommendations:
Table 1.2: Key Responses of Participant Data with Either an Origin or Destination of
Castries or Gros Islet
Response Castries
(origin)
Gros
Islet
(origin)
Castries
(dest.)
Gros
Islet
(dest.)
Total
No specific type of special support
needed
301 102 127 141 671
No bike transport used 300 103 127 141 671
Not physically challenged 299 102 127 139 667
Special support not needed 299 102 127 139 667
EC$5 - EC$10 (willing to pay for a
park and ride)
274 89 113 124 600
Work (most frequent reason for travel) 252 87 111 115 565
If my level of stress commuting by
shuttle were lower than during my
regular commute by car
192 78 101 110 481
Personal vehicle (primary mode of
transport)
185 84 92 116 477
Castries (travel to most often) 301 0 88 75 464
Yes to consider a park and ride 191 67 95 105 458
7 am - 9 am start to morning travel 173 65 92 86 416
If the station is safe 154 58 75 92 379
Public transport not used (bus stop) 142 61 69 86 358
Public transport not used (no cost) 142 59 69 80 350
I would worry about my vehicle's
security
169 45 64 70 348
Female 159 50 67 68 344
Grocery (stop on journey) 130 63 66 82 341
If there are enough shuttles to
guarantee I don’t have to wait long in
the pickup or drop-off stations
142 51 68 80 341
0 times per week (park and take public
transport)
127 57 58 90 332
EC$100 - EC$250 (spent on fuel -
private transport)
127 53 67 78 325
Public transport not used (mode
change)
134 51 63 73 321
L F Systems Ltd 9
5 pm - 7 pm (start of return jouney) 126 59 69 62 316
Male 136 50 56 73 315
I want to be able to use my car during
the workday
117 59 62 71 309
If the shuttle is Cheaper than my
current transportation
129 41 66 66 302
31 - 45 (age demographic) 123 50 57 56 286
Public transport not used
(affordability)
120 46 59 60 285
If the station is well located, on the
way of my commute
126 37 61 61 285
3 pm - 5 pm (start of return journey) 145 28 46 54 273
Gros Islet (lives in) 75 56 0 141 272
4 - 5 days per week 142 32 48 48 270
15 - 30 minutes (length of morning
travel)
106 38 69 55 268
46 - 60 (age demographic) 122 38 44 62 266
Park and Ride not used (mode of
arrival)
133 37 48 42 260
Castries (travel from most often) 88 30 128 0 246
Data trends were then analysed from the frequency of responses data, disaggregated by origin-
destination between the communities of Castries and Gros Islet to determine the following policy
summaries and recommendations:
1. Park and Ride (P&R):
o The frequency of respondents willing to pay for a Park and Ride service (EC$5 -
EC$10) is substantial (600 total responses).
o Recommendation: Prioritize the establishment of Park and Ride facilities near
major commuting routes, especially during peak hours. Promote P&R as an
alternative to personal vehicles.
2. Shuttle Services:
o Respondents express interest in shuttle services. The frequency of those considering
a shuttle if it reduces stress compared to driving by car (481 responses) highlights
this.
o Recommendation: Investigate shuttle availability, reliability, and comfort. Address
concerns related to stress reduction and convenience.
3. Public Transport:
o Some respondents avoid public transport due to perceived safety issues (142
responses) or affordability concerns (120 responses).
L F Systems Ltd 10
o Recommendation: Enhance safety measures at bus stops and stations. Explore
subsidies or discounts to improve affordability.
4. Vehicle Security:
o A significant number of respondents (348) worry about their vehicle’s security.
o Recommendation: Implement security measures at transportation hubs and
communicate them effectively to alleviate concerns.
5. Work Commute:
o Work-related travel is prominent (565 responses). Policies should focus on
optimizing transportation options during peak work hours.
o Recommendation: Align transportation services with work schedules and office
locations.
6. Age Demographics:
o Age groups 31-45 (286 responses) and 46-60 (266 responses) actively participate.
Tailor policies to address their specific needs.
o Recommendation: Consider age-specific incentives or services.
7. Travel Times:
o The time of day matters. Morning travel (7 am - 9 am) and return journeys (5 pm -
7 pm) are crucial.
o Recommendation: Optimize shuttle schedules and frequency during these peak
hours.
8. Geographical Locations:
o Castries (464 responses) and Gros Islet (272 responses) are key areas. Understand
travel patterns between these towns.
o Recommendation: Develop targeted transportation solutions for each region.
9. Gender Considerations:
o Gender-specific preferences emerge (e.g., females express more concern about
vehicle security).
o Recommendation: Ensure gender-inclusive policies and address specific concerns.
10. Mode of Arrival:
o Respondents rarely use Park and Ride as their mode of arrival (260 responses).
o Recommendation: Promote Park and Ride as a convenient arrival option.
L F Systems Ltd 11
11. Mode Change (adjustments to transport type):
o Some respondents avoid public transport due to mode changes (321 responses).
o Recommendation: Simplify transfers and improve information dissemination.
12. Specific Time Windows:
o Respondents mention specific time windows (e.g., 3 pm - 5 pm for return journeys).
o Recommendation: Tailor services to match these time frames.
13. Location-Specific Insights:
o Gros Islet residents (272 responses) have unique preferences. Investigate further.
o Recommendation: Customize solutions for each town.
14. Travel Frequency:
o Respondents who travel 4-5 days per week (270 responses) are significant.
o Recommendation: Consistently available transportation options are crucial.
15. Length of Morning Travel:
o Respondents value shorter morning travel times (268 responses).
o Recommendation: Optimize routes to minimize travel duration.
16. Origin and Destination Preferences:
o Castries (travel to most often) and Gros Islet (lives in) are key locations.
o Recommendation: Develop targeted services for these specific routes.
L F Systems Ltd 12
2. Methodology and Results of Survey Analysis
2.1 Introduction
This study aims to determine the technical and economic feasibility of a Park-and-Ride (P&R)
initiative in Saint Lucia. To assess the business case for the P&R facility a survey of commuters
was undertaken for empirical evaluation. Firstly, the questionnaire inquired about the typical
demographic characteristics of potential users. Additionally, the survey focused on characteristics
related to the origins and destinations of users, financial costs related to public and private
transport, commute times preferences of respondents indicating their likelihood of using, and
willingness to pay for the use of the proposed P&R facility. From the 31-question survey, 489
unique responses were obtained. Determining likelihoods of use versus non-use, modal choice and
some aspects of trip generation are inherently classification problems, and thus logistic regressions
as well as various versions of decision trees were utilized in the empirical evaluation of the survey
data.
The empirical exercise focuses on evaluating user choice related to the P&R facility, mode choice
and trip-generating characteristics of commuters. Commuters face the economic problem of
balancing various costs and benefits in choosing private or public transport options. The main aim
of the commuter is to minimize the disutility of traveling, subject to costs they face. A public
transport intervention such as a P&R facility will manifest as a viable choice to the commuter if it
contributes to their minimization of the disutility of travel. The viability of such a system can be
estimated by relating how such a system affects the disutility (i.e., costs) of commuting, and the
likelihood of a commuter using the P&R facility. In this case, the aim would be to assess how
various cost measures influence the decision by a commuter to utilize the P&R intervention.
Additionally, assessing the determinants of mode choice matters to the analysis, particularly with
respect to the difference between the motivations for the use of public and private transport. Mode
choice mainly assesses how commuters view the balance between the financial and logistical costs
of public transport versus the flexibility of private transport and can provide information about the
perceptions of the social benefit generated by a public transport intervention such as the P&R
facility. More structural determinants of travel demand related to economic geography can also be
assessed via a trip-generation model. Such a model measured either over time or between
geographic administrative districts like counties, would be able to indicate which geographic
corridors ought to be emphasized in a transport policy intervention.
The next sections will assess models of user choice, modal choice, and trip-generating
characteristics of survey respondents. The modeling techniques will briefly be described, and the
output of the models will be discussed. Finally, a discussion about their general implications will
be conducted.
L F Systems Ltd 13
2.2 User Choice
An estimation exercise is undertaken to analyse the data contained in the 489 responses generated
from the Ministry
2
survey. The data in the survey takes various alphanumeric forms, and thus
continuity transforms were applied to alphabetical and categorical numeric regressors via bounded
randomization in order to allow regular estimation.
Categorical regression via the logit, decision trees and gradient boosting were used to generate
results. Linear regression models operate by minimizing the residual sum of squares (RSS) loss
function. For example if a set of n observations for variables xi and yi where i=(1,2,…,n), a linear
regression would take the form:
Yi = β0 + β1Xi + εi (I = 1,2,…,n) eq. 1
For variable Y and X, β0 and β1 are intercept and slope coefficients respectively, and are selected
to minimize the RSS. If the dependent variable becomes binary in nature minimizing the RSS
becomes limited in its ability to estimate the relationship between X and Y. Thus if Y is assumed
to be binomially distributed in the fashion where Y ~ B(ni, πi), where ni is the binomial denominator
and πi is its governing probability, the logit of the governing probability πi is a linear function i.e.,
logit(πi) = X’iβ, where X is the regressor and β the regression coefficient. The logistic regression
thus fits a sigmoidal rather than linear function of the form:
π(x) = exp(α + βx)/(1 + exp(α + βx) eq. 2
The model essentially estimates the probability of a particular binary result for Y as values of X
increase linearly (or categorically in some cases). The estimation problem can be set up with the
aim of applying decision tree analysis to the data. Decision trees depend on a recursive partitioning
algorithm that is dependent on a squared loss function like the RSS that applies to traditional
ordinary least squares regressions. The features (or variables) are partitioned to generate decision
rules about the targeted outcome. The threshold value used to partition features are selected by the
algorithm so that they minimise squared loss. A very brief overview of the partitioning algorithm
shows that for each feature j = 1, …, d, for each real-value v associated with each feature, the data
can be split according to I< = {i : xij < v} and I> = {i : xij > v}. Associating parameters β< and β>
with I< and I> respectively means that the quality of the split can be measured by the squared loss
i.e., ∑i(I<)(yi – β<)
2
+ ∑i(I>)(yi – β>)
2
. The split with the minimal loss is chosen, and the process is
recursed on subsequent nodes until the terminal condition is achieved.
The logistic regression model regresses total daily minutes traveled (travelmins), willingness to
pay to save 15 minutes of commuting (pay15), fuel costs (fuelc) and spending on public transport
(pubtran) which remains binary (i.e., a yes or no answer to whether the facility will be used). The
results of the logistic regression is relayed in Chart 2.1:
2
Ministry of Infrastructure, Ports, Transport, Physical Development and Urban Renewal od St. Lucia (MI henceforward).
L F Systems Ltd 14
Chart 2.1: Logit Estimate of Use Choice
Only fuel costs and public transport costs are significant at conventional levels
3
. The details of
these variables will be analysed further. Chart 2.2 shows the scatterplot of the logistic regression’s
predicted use of the park and ride facility versus fuel costs:
Chart 2.2: Fuel Cost vs. Predicted Use Choice
3
That is, where p<0.1.
L F Systems Ltd 15
Both LOESS
4
(red) and OLS
5
(green) fit lines versus the predicted choice scores are positive,
suggesting that as fuel costs rise, predicted use of the facility also increases. This corroborates the
positive coefficient provided in the model.
Chart 2.3 shows the scatterplot of the logistic regression’s predicted use of the park and ride facility
versus public transport costs:
Chart 2.3: Public Transport Costs vs. Predicted Use Choice
Both the LOESS (red) and OLS (green) fit lines versus the predicted choice scores are similarly
negative, though the sign of relationship ought to be discussed. While the fit lines correctly reflect
the negative coefficient in the model table in Figure 2.1. Apriori expectation was that individually
higher spending on public transport ought to drive demand for a cost saving option. The inverse
relationship is however likely more reflective of ‘ex ante’ demand for the service, suggesting that
to encourage use of the facility, related public transport costs ought to be kept down. This suggests
the park and ride service is viewed as complementary rather than as a substituting alternative to
current public transport options.
Table 2.1 summarises data concerning sensitivity and specificity of the two models:
4 Locally Estimated Scatterplot Smoothing
5 Ordinary Least Squares
L F Systems Ltd 16
Table 2.1: Sensitivity and Specificity
Sensitivity, i.e., prediction of true positives is high in both models and supposedly fuels costs
perfectly identified positive use choice perfectly, with a measure of 1. However, fuel costs do not
predict true negatives efficiently, and public transport cost is a superior classifier in this aspect.
The larger Receiver/Operator Characteristic integral (AUC) is thus associated with public transport
costs, suggesting it is better able to predict choice of use and non-use of the Park and Ride facility.
AUC measures should approach 1. AUC measures near to 0.5 suggest the model does not have
classifying power superior to random chance.
A cross validated decision tree
6
provided the results in Table 2.2:
Table 2.2: Decision Tree Model of Use Choice
The tree model uses the full numeric dataset with the aim of outlining the combinations of values
of the variables that best predict use choice. Fuel cost is the most consistently identified
determinant, corroborating the high sensitivity value in the logit model. Characteristics that predict
choosing the Park and Ride facility above a 90% probability are age being greater than 52 years
(sample average is around 47), high fuel costs, and low public transport costs. However, the
combination of events generating a 90% probability of choice only occurs 22% of the time. Non-
use is predicted by low fuel costs and short total travel time.
Tree models are however powerful enough to estimate non-numeric data. An expanded sample
including non-numeric data, including responses given in natural language was used to predict use
6
Estimated in rpart in R. Fuel CostsPublic Transport Costs
Sensitivity 1.000 0.813
Specificity 0.471 0.763
AUC 0.504 0.812
DeLong AUC CI(95%)0.3995-0.60780.6822-0.9422
L F Systems Ltd 17
choice (which remained a binary numeric variable). The decision rule table is large, so Table 2.3
summarises the main findings of the model:
Table 2.3: Summary Alphanumeric Decision Tree Results
Longer travel times, lower wait times, frequent travel over 10 miles and higher public transport
spending predicted use of the facility. On the other hand, transport by bus and high numbers of
changes on a journey predicted non-use. A broad array of destination types was identified as
determining use, though the distribution of these destinations is not identifiable from the data.
The importance of each variable in determining use choice was assessed by resampling the
alphanumeric decision tree 500 times
7
and shown in Chart 2.4:
7
This procedure is known as a ‘random forest’ estimate. Probability of Use = 0% Probability of Use = 100%
Morning Travel Time of 75 - 90 mins Morning Travel Time of 45-120 minutes
Transportation Method: Bus Max Wait Time between 10 and 15 Minutes
Max Wait Time between 20 and 30 MinutesPublic Transport Costs are between EC$10 and EC$20
Changes: between 2 and 4 Travel of 10 miles or more 2 to 5 days per week.
L F Systems Ltd 18
Chart 2.4: Variable Importance
Importance is measured as percentage of variance of target variable (use_choice) accounted for by
explanatory features from the MI’s survey. Residence indicates the county of residence, and stop
type is trip’s destination identified by the respondent. These features, along with maximum wait
time (maxwait), morning commute time (morning_time) and number of people sharing a commute
(people_on_commute), constituted the five most important factors determining use resulting from
the ‘random forest’ resampling method. These features explained 22.8 percent of the variation of
the choice to use the park and ride facility. Notably, fuel costs and spending on public transport,
the significant variables identified in the logit, were not estimated to be as important as these
features. While further traditional analysis can be immediately undertaken on numerical variables
such as maximum wait times and morning commute time, some consideration would be required
to assign numerical meaning to features such as residence and stop type.
2.3 Age and Gender Considerations
While neither the logit nor decision tree models identified any convincing relationship between
age, gender and use of the proposed facility, summarizing the data may provide some insight into
any potential relationship. Table 2.4 summarizes some simple count and proportional information
about the survey data related to age, gender and potential use of the facility:
L F Systems Ltd 19
Table 2.4: Gender and Age Summary Statistics
Survey responses were separated into categories describing whether a respondent would or would
not use the proposed facility. Of the total respondents, 45.0 percent were male, and 55.0 percent
were female. Regarding respondents who suggested they would not use the facility, 39.5 percent
were male, suggesting that female respondents were overrepresented among non-users of the
facility, relative to their proportional presence in the sample. Conversely, males were
overrepresented among respondents suggesting they would use the facility (47.9 percent) relative
to their proportional presence in the sample.
The age range category of 46 – 60 years old exhibited the highest likelihood of a positive response
regarding use of the facility. The previously identified use-determining age threshold of 52 years,
as estimated by the decision tree model, falls within this category. The least likely age category to
suggest they would use the facility was individuals over the age of 61 years old.
Data representing age and gender and their relationship to use can be summarized in a support
vector machine (SVM
8
) model. Chart 2.5
9
plots the combinations of age and gender that
correspond to responses denoting use or non-use of the facility:
8
See Kowalczyk (2017), “Support Vector Machines Succinctly”, for analytical details.
9
Produced from a Gaussian Radial Basis Function (RBF) kernel in the kernlab module in R. Count MalePercent
Total 489 220 45.0
Non-Use172 68 39.5
Use 317 152 47.9
CountUse P&RPercent
18 - 3054 33 61.1
31 - 45206 130 63.1
46 - 60189 131 69.3
61 + 40 23 57.5
Gender
Age Range
L F Systems Ltd 20
Chart 2.5: SVM Classification of Age and Gender vs. Use Choice
Regions in blue denote combinations that generally predict use of the facility. Conversely, regions
in red denote combinations of age and gender that predict non-use of the facility. While relatively
small, the set of combinations that robustly predict non-use clusters around female respondents
above the age of 60. In summary, while females were more likely to respond to the survey, they
were also more likely to suggest they would not use the facility. Females were more likely to
suggest they would use the proposed facility but were underrepresented compared to males.
Finally, females over the age of 60 were the category of respondents least likely to utilize the
proposed facility.
2.4 Mode Choice
This section analyses the determinants of mode choice. Transport mode choice in this case is
summarized as a choice of public vs private transportation. Choice of public transport means
choosing a ‘Bus’ while private means choosing the alternatives i.e Personal Vehicle, Carpool or
Walking
10
. Various Two classification models are used to assess mode choice i.e. Logit Regression
and Decision Tree.
2.4.1 Logit Regression
Initially a logistic regression model is utilised to determine which responses provided in the survey
best influenced use of a particular type of transport by commuters. Coefficients tested against
mode choice are spending on fuel (fuel_c), spending on public transport (pubtran), travel minutes
10
Of n=489 responses, 158 responses chose public (Bus). 9 Respondents selected the choice of carpooling
while 3 selected walking. The remainder selected the response of a private vehicle.
L F Systems Ltd 21
(traveltime) and a logarithmic integration of fuel and transport costs (cost). After repeated trials,
travel minutes proved insignificant
11
and the model of modal choice that used fuel and public
transport spending separately to model modal choice rather than the integrated cost measure had a
lower AIC
12
, and was thus selected for further analysis. The resulting test equation was:
Mode = -0.018*fuel_c + 0.338*pubtran – 1.611; p=<0.000 for all variables eq.3
Assessing the determinants against the predicted values generated by the test equation can
demonstrate the robustness of the classifier. The model suggested that higher spending on public
transport predicted higher usage of public transport. Conversely higher spending on fuel predicted
lower use of public transport. While these results are intuitive, they are not very informative. In
Charts 2.1 and 2.2, the red fit lines are derived from the LOESS procedure, as the relationships are
clearly non-linear. Regarding fuel spending versus predicted mode choice (Chart 2.6), the LOESS
fit suggests that spending less than EC$100 on fuel per week dramatically increases a respondent’s
chances of using public transport, which is intuitive as it is expected that private transport users
would spend more on fuel. The LOESS fit line of the relationship between public transport
spending and predicted use of public transport (Chart 2.7) also slopes upward, corroborating this
idea.
Chart 2.6: Fuel Spending vs. Public Transport Use
11
Coefficient value of -.004 and a p-value of 0.62.
12
Akaike information criteria of 0.502 vs. 0.505 respectively.
L F Systems Ltd 22
Chart 2.7: Public Transport Spending vs. Public Transport Use
The main classification table (Table 2.5) shows that the model delivers generally good performance
in identifying sensitivity (i.e., true positives) and specificity (i.e., true negatives), with total gain
being positive in both the restricted and expected value baselines.
Table 2.5: Classification Table
Expectation-Prediction Evaluation for Binary Specification
Equation: MODE1
Date: 02/13/24 Time: 17:15
Success cutoff: C = 0.5
Estimated Equation Constant Probability
Dep=0Dep=1 TotalDep=0Dep=1 Total
P(Dep=1)<=C 310 21 331 331 157 488
P(Dep=1)>C 21 136 157 0 0 0
Total 331 157 488 331 157 488
Correct 310 136 446 331 0 331
% Correct 93.6686.6291.39100.00 0.0067.83
% Incorrect 6.3413.38 8.61 0.00100.0032.17
Total Gain* -6.3486.6223.57
Percent Ga... NA 86.6273.25
Estimated Equation Constant Probability
Dep=0Dep=1 TotalDep=0Dep=1 Total
E(# of Dep=0)294.9936.01331.00224.51106.49331.00
E(# of Dep=1)36.01120.99157.00106.4950.51157.00
Total 331.00157.00488.00331.00157.00488.00
Correct 294.99120.99415.99224.5150.51275.02
% Correct 89.1277.0785.2467.8332.1756.36
% Incorrect10.8822.9314.7632.1767.8343.64
Total Gain*21.2944.8928.89
Percent Ga...66.1966.1966.19
*Change in "% Correct" from default (constant probability) specification
**Percent of incorrect (default) prediction corrected by equation
L F Systems Ltd 23
However, splitting the classification performance by variable shows that spending on public
transport is a superior classifier to spending on fuel (Panel 2.1):
Panel 2.1: Individual Classifier Performance
Fuel Public Transport
’
While public transport spending performed acceptably on both sensitivity and specificity, fuel
spending had low specificity, and while a sensitivity of 1 suggests it is a perfect classifier of true
positives, perhaps this warrants further clarification. In any event, the shapes of the Receiver-
Operator Characteristic (ROC) curves
13
demonstrate that spending on public transport performs
more robustly as a classifier. The Area Under the Curve (AUC) measure should be greater than
0.5 and should approach 1, and by this measure, the best performing classifier of mode choice is
individual public transport spending.
2.4.2 Decision Tree Analysis
While the logit regression was able to deliver some insight into the nature of modal choice, a
decision tree was utilized to broaden the analysis of its contributing factors, mainly to expand
beyond the relationship between spending and mode choice. The tree model was estimated via
classification, and included data reflecting respondent origin, destination, reason for travel and the
number of mode changes in addition to spending on fuel, spending on public transport and travel
time as determinants of mode choice. Chart 2.8 shows the critical paths of the classification tree:
13
ROC curves should be near hyperbolas. SensitivitySpecificityAUC
Fuel Spending 1.000 0.427 0.566
Public Transport Spending0.813 0.843 0.857
Classifier Performance
L F Systems Ltd 24
Chart 2.8: Classification Tree
The paths can be summarized in the sequences shown in Table 2.6:
Fuel spending, public transport spending and respondent origin are selected as determinants in the
tree. However, the model does not identify a particular range of fuel spending as determining mode
choice. This corroborates the finding of the logit model regarding classification ability. Higher
probabilities of public transport use are related to a range of public transport spending between
EC$10 – EC$20. The tree model can tell us that this higher probability is associated with points of
destination in Canaries, Cap Estate, Dennery, Laborie, Micoud, Soufriere and Vieux Fort.
For further insight, the tree model was resampled 500 times as a random forest procedure (Chart
2.9), revealing that the number of mode changes was also an important factor in mode selection.
Travel time, respondent destination and reason for travel were less important factors.
Chart 2.9: Random Forest Variable Importance
When the tree is re-estimated with superfluous variables dropped
14
, public transport use is
predicted with probability between 68 – 88% where public transport spending is between EC$0 –
EC$20, the number of stops number between 2 and 4, with points of origin now including Anse
La Raye, Babonneau, and Choiseul in addition to these mentioned previously. The resultant critical
path is shown in Chart 2.10:
14
mode ~ pubtran_c+Origin+traveltime+mode_changes (in rpart)
L F Systems Ltd 26
Chart 2.10: Trimmed Tree
2.5 Transport Demand and Trip Generation
It is important to establish travel demand in establishing whether consumers would choose a type
of travel service. A ‘trip-generation’ type analysis could be conducted based on socioeconomic
factors, as well as those rooted in economic geography to benchmark sentiments estimated in
consumer choice analysis. These and other pertinent facts will be used to estimate the flow of trips
by region, as identifiable from the data in the MI survey. Raw data from the survey will be further
examined utilizing extrapolations of decision tree models, which can handle the non-numeric
aspect of some of the survey data. In assessing the size of the trip-generating effects rooted in
economic geography, a count of the origin and destination responses in the survey is taken. The
responses are summarised in Table 2.7:
L F Systems Ltd 27
Table 2.7: Regional Destination and Origin Count
15
The region with the largest number of origin responses is Gros Islet, and the region with the largest
number of destination responses is Castries. Castries and Gros Islet are also the second largest
points of origin and destination, suggesting that transit between the two regions generates most
total trip activity. Assessing the ratio
16
of destination responses to origin responses however can
provide a measure of the net flow of traffic among the regions in order to identify the region
generating the most gravity. These results are shown in Chart 2.11:
The chart shows that only Castries has a positive net flow of traffic, based on the survey responses.
Despite its sizeable origin and destination counts, Gros Islet still has a negative flow indicator,
meaning that more traffic leaves Gros Islet for Castries than vice versa. Castries is thus clearly the
highest-gravity destination amongst the regions.
To analyse the factors driving trips onto Castries a decision tree model is employed. The choice of
Castries as a destination is modelled as a binary variable, against several features originating from
the
17
survey questions. Table 2.8 relates some findings about destination choice evident from the
tree model.
17
Including age, gender, travel time, reason for travel, frequency of traveling more than 10 miles, travel mode,
fuel spending, public transport spending, number of stops, number of mode changes, and preferences for
willingness to pay for a P&R service to save 15 minutes of travel time, as well as preferred P&R wait times. This
data was not transformed, hence the emphasis on the tree model rather than a logit model. Relevant model
rules are presented in Table 2 as the actual tree is too large to be presented.
L F Systems Ltd 29
Table 2.8: Decision Rules for Destination Choice
The results can be classified as those that characterise the section of Castries as a destination, and
those that do not. Selection of Castries as a destination stems from origins such as Anse-La-Raye,
Babonneau, Cap Estate, Castries itself, Choiseul, Dennery, Gros Islet, Micoud and Union. Not
selecting Castries as a destination is related to journey origin regions of Canaries, Laborie,
Soufriere and Vieux Fort. Interestingly there is no overlap between these two groups. Another
important result is that an affirmative response related to use of the proposed Park and Ride facility
characterised choosing Castries as a destination. The converse was also true, in that choosing a
destination other than Castries was related to a rejection
18
of use of the Park and Ride facility.
Destination choices outside of Castries were also characterized by journey start times between 9
to 11AM or start times that could vary. Further context for destination choice was provided by a
random forest (Chart 2.12) resampled 500 times which provided a measure of how important a
particular variable was regarding destination choice. Point of origin remained the most important
feature in the model, though other features like travel time, frequency of traveling more than 10
miles (16 kilometres) spending on fuel, number of people sharing a commute and mode changes
show up as other important features in destination choice.
18
Specifically, the choice of the answer ‘Nothing can make me use the service’ on the survey. Origin P&R Use OriginP&R Use
Anse-La-Raye Yes CanariesNo9 -11AM or Variable
Babonneau Laborie
Cap Estate Soufriere
Castries Vieux Fort
Choiseul
Dennery
Gros Islet
Micoud
Union
Start Time
Not CastriesCastries
Destination Choice
L F Systems Ltd 30
Chart 2.12: Variable Importance
2.6 Summary and Discussion
Regarding use choice of the P&R facility, only fuel costs and public transport costs are significant
at conventional levels when assessed via a logit model. However, fuel costs do not predict true
negatives efficiently, and public transport costs emerged as a superior classifier in this aspect.
Utilising a decision tree showed that characteristics predicting P&R use above a 90% probability
are age being greater than 52 years, high fuel costs, and low public transport costs. However, the
combination of events generating a 90% probability of choice only occurs 22% of the time. Non-
use of the P&R facility is predicted by low fuel costs and short total travel time. Using a random
forest model, county of residence, or origin, and stop type, along with maximum wait time,
morning commute time, and number of people sharing a commute, constituted the five most
important factors determining user choice. These features explained 22.8 percent of the variation
of the choice to use the park and ride facility. Notably, fuel costs and spending on public transport,
the significant variables identified in the logit, were not estimated to be as important as these
features. An SVM analysis showed that female respondents over the age of 60 were the category
least likely to indicate they would utilize the proposed facility.
While spending on public transport and fuel intuitively predict mode choice, the relationship is not
very informative beyond what can be expected. Increasing the number of variables and allowing
resampling via decision tree and random forest methods revealed that points of origin and the
number of mode changes on a journey mattered most to mode choice. While fuel spending was
selected as a determinant in the tree, the model does not identify a particular range of fuel spending
L F Systems Ltd 31
as determining mode choice corroborating the finding of the logit model regarding classification
ability. Higher probabilities of public transport use are related to a range of public transport
spending between EC$10 – EC$20. The tree model is able to tell us that this higher probability is
associated with destinations in Canaries, Cap Estate, Dennery, Laborie, Micoud, Soufriere and
Vieux Fort. Estimation via the random forest shows public transport use is predicted with
probability between 68 – 88% where public transport spending is between EC$0 – EC$20, the
number of stops number between 2 and 4, with points of origin now including Anse La Raye,
Babonneau, and Choiseul in addition to these mentioned previously.
Only Castries generated a net inflow of traffic according to the survey responses, and selection of
Castries as a destination stems from origins such as Anse-La-Raye, Babonneau, Cap Estate,
Castries itself, Choiseul, Dennery, Gros Islet, Micoud and Union. Not selecting Castries as a
destination is related to journey origin regions of Canaries, Laborie, Soufriere and Vieux Fort.
Interestingly there is no overlap between these two groups. Affirmative responses related to use of
the proposed Park and Ride facility characterised the choice of Castries as a destination. The
converse was also true, in that choosing a destination other than Castries was related to a rejection
of the use of the Park and Ride facility. Destination choices outside of Castries were also
characterized by journey start times between 9 to 11AM or those that could vary. Point of origin
remained the most important feature in the random forest model, though other features like travel
time, frequency of traveling more than 10 miles (16 kilometres) spending on fuel, number of
people sharing a commute and mode changes show up as other important features in destination
choice.
Regarding use of the proposed facility, keeping costs of public transport low relative to fuel costs
matter for use choice. Targeting people with longer travel times, who share their commute with
many people, and facilitating low wait times will make the policy intervention more likely to be
used. To facilitate public transport use generally, the focus again would have to rely on keeping
public transport costs generally low for individuals, especially those living outside of Castries. A
clear division between regions of origin that selected Castries as a destination and those that did
not was evident. Finally, respondents more likely to use the P&R facility also were more likely to
select Castries as a destination.