04 ME Dissertation Presentation project on bus transportation.pptx

ppritam23mtcomp 13 views 95 slides Jun 16, 2024
Slide 1
Slide 1 of 95
Slide 1
1
Slide 2
2
Slide 3
3
Slide 4
4
Slide 5
5
Slide 6
6
Slide 7
7
Slide 8
8
Slide 9
9
Slide 10
10
Slide 11
11
Slide 12
12
Slide 13
13
Slide 14
14
Slide 15
15
Slide 16
16
Slide 17
17
Slide 18
18
Slide 19
19
Slide 20
20
Slide 21
21
Slide 22
22
Slide 23
23
Slide 24
24
Slide 25
25
Slide 26
26
Slide 27
27
Slide 28
28
Slide 29
29
Slide 30
30
Slide 31
31
Slide 32
32
Slide 33
33
Slide 34
34
Slide 35
35
Slide 36
36
Slide 37
37
Slide 38
38
Slide 39
39
Slide 40
40
Slide 41
41
Slide 42
42
Slide 43
43
Slide 44
44
Slide 45
45
Slide 46
46
Slide 47
47
Slide 48
48
Slide 49
49
Slide 50
50
Slide 51
51
Slide 52
52
Slide 53
53
Slide 54
54
Slide 55
55
Slide 56
56
Slide 57
57
Slide 58
58
Slide 59
59
Slide 60
60
Slide 61
61
Slide 62
62
Slide 63
63
Slide 64
64
Slide 65
65
Slide 66
66
Slide 67
67
Slide 68
68
Slide 69
69
Slide 70
70
Slide 71
71
Slide 72
72
Slide 73
73
Slide 74
74
Slide 75
75
Slide 76
76
Slide 77
77
Slide 78
78
Slide 79
79
Slide 80
80
Slide 81
81
Slide 82
82
Slide 83
83
Slide 84
84
Slide 85
85
Slide 86
86
Slide 87
87
Slide 88
88
Slide 89
89
Slide 90
90
Slide 91
91
Slide 92
92
Slide 93
93
Slide 94
94
Slide 95
95

About This Presentation

project on bus transportation


Slide Content

DEPARTMENT OF Information technology A Dissertation Seminar on “Bus Arrival Time Prediction Using Ensemble Technique” Under the guidance of Dr. Satishkumar L. Varma (Supervisor, Information Technology) By Ninad Gaikwad (ME Information Technology)

Outline Introduction Literature Survey System Architecture System Interface Results And Analysis Applications Conclusion & Future Work References & Acknowledgement ME Stage 2 Seminar By Ninad V Gaikwad 2

Introduction Importance of transportation Modes of transportation Road Rail Waterways Airways Importance of road transportation Increase in standard of living ME Stage 2 Seminar By Ninad V Gaikwad 3

Introduction (Contd.) V ehicular pollution in cities Vital Role of Bus Transportation Hesitation in opting for bus transportation Uncertainty of bus arrival time This dissertation uses Machine Learning techniques ME Stage 2 Seminar By Ninad V Gaikwad 4

Machine Learning Fast Growing Branch of Artificial Intelligence P rograms that learn from the given data A llow the computers to learn automatically P redicting various data based on input factor ME Stage 2 Seminar By Ninad V Gaikwad 5

Machine Learning ( Contd.) Supervised algorithms Apply l earned experiences to new data Learning from known dataset Outputs for new input after training Semi-Supervised algorithms Labeled and unlabeled data for training M ore of unlabeled data S killed intervention required Unsupervised ME Stage 2 Seminar By Ninad V Gaikwad 6

Motivation Need to make the transport system more efficient Facility of arrival time for commuters E xpected arrival time as the output prediction ME Stage 2 Seminar By Ninad V Gaikwad 7

Beneficiaries Passengers who travel daily by bus M ake informed decisions Failures notified ME Stage 2 Seminar By Ninad V Gaikwad 8

Literature Survey Sr. No. Authors Observation and remarks 1 Lovell D J 2001 [1] Calculates current speed of the bus. Classifies buses as high, medium and low speed. 2 S. I.-J. Chien et al. 2002 [2] Uses ANN to detect bus arrival time. Uses microscopic simulation model CORSM. 3 Dihua Sun et al. 2007 [3] Route direction of the bus is predicted. 4 S. Gaonkar et al. 2008 [4] Involves people’s participation Developed application called Micro-Blog. 5 Amir Saffari et al. 2009 [5] Covers online implementation of random forest model. Propose online decision tree growing procedure. 6 Simon Bernard et al. 2009 [6] Making random forest more accurate by selecting optimal number of decision trees made by random forest. ME Stage 2 Seminar By Ninad V Gaikwad 9

Literature Survey (Contd.) 7 Huan Xu et al. 2010 [7] Discusses the robustness of lasso regression technique . Lasso does not give protection from noise. 8 G. Agamennoni et al. 2011 [8] Digital maps help in finding the direction of the bus. Extracts the principal paths used by the buses. 9 Feng Li et al. 2011 [9] Considers detailed information relating to a particular traffic route. System is too complicated to be implemented. 10 Biagioni James et al. 2011 10 ] Proprietary software EasyTracker is developed. 11 M. A. Hannan et al. 2012 [11] GPS, GPRS, RFID combined to track the vehicle location. High accuracy in detection. Costly implementation . 12 Paola Arce et al. 2012 [12] Online facility to use ridge regression is applied . 13 Mohammed S. Alam et al. 2013 [13] Random forest method is used to detect android malware. ME Stage 2 Seminar By Ninad V Gaikwad 10

Literature Survey (Contd.) 14 Yidan Fan et al. 2014 [14] Cell tower positioning used to detect location of bus. Has lower power consumption. Requires tie-up with network providers. 15 Pengfei Zhou et al. 2014 [15] Participatory sensing used. Cheaper implementation at server side. Users have to invest their own data to contribute. 16 LeiWang et al. 2014 [16] Uses ANN to detect location of bus. System has higher accuracy. Costly Implementation. 17 Jinrong He et al. 2014 [17] Extends ridge regression classification method to kernel version . 18 Luis G. Jaimes et al. 2015 [18] Cheaper solution to bus arrival prediction problem Suffers heavily if the user decides not to cooperate 19 B. Dhivyabharathi et al. 2016 [19] Detailed survey about Indian traffic conditions. No application developed. ME Stage 2 Seminar By Ninad V Gaikwad 11

Literature Survey (Contd.) 20 Tianqi Chen et al. 2016 [20] Introduces XG Boost algorithm. Performance as compared to ridge, random forest, etc. 21 Ferran Diego et al. 2016 [21] Performs poorly as compared to other algorithms. 22 Muthukrishnan R et al. 2016 [22] Lasso regression does not perform well on its own. Performance is not enhanced. 23 Gabriel B. Kalejaiye et al. 2017 [23] Cheaper implementation and suffers if user decides not to participate. GPS data is not used to track the bus in real time. 24 Xiaobo Liu et al. 2017 [24] Stacking algorithm can perform predictions on its own. Results that are obtained are better if other algorithms are used. ME Stage 2 Seminar By Ninad V Gaikwad 12

Technologies Used By Authors ME Stage 2 Seminar By Ninad V Gaikwad 13 Parameters Participatory Sensing Used GPS Used App Developed Cost Effective Feng Li et al. 2011 [1] ü ü James Biagioni et al 2011 [10] ü ü M. A. Hanna et al. 2012 [11] ü ü Yidan Fan et al 2014 [14] ü Pengfei Zhou et al. 2014 [15] ü ü ü Luis G. Jaimes et al. 2015 [18] ü ü Gabriel B.Kalejaiye et al. 2017 [23] ü ü ü

List Of Parameters Used ME Stage 2 Seminar By Ninad V Gaikwad 14 Paper Technique Dataset Metrics # Parameters Steven I- Jy Chien et al. 2002 [2] ANN New Jersey Transit Corporation RMSE Stop-to-stop distance, Number of intersections, Simulated travel time Dihua Sun et al. 2007 [3] Different Algorithm Chongqing, China MAPE GPS Coordinates of bus Feng Li et al. 2011 [9] Statistical approach Hong Kong City MAE Departure time, Work day, Bus location, # links, # intersections, passenger demand Yidan Fan et al. 2014 [14] Cell of origin (COO) Beijing, China MAPE Cell tower location Pengfei Zhou et al. 2014 [15] Participatory sensing Singapore public buses Median Absolute Error Cell tower signals, movement statuses, audio recordings

Inference of the Literature Review RFID Most papers introduced here use RFID Financial investment in it S hould be read properly by the reader WSN Systems are also highly costly Participatory sensing Passengers are expected to share their location details They may decide to not participate in the process ME Stage 2 Seminar By Ninad V Gaikwad 15

Inference of the Literature Review ( Contd.) Machine Learning Papers discussing the advantages of machine learning Ensemble Combining two or more of machine learning algorithms Negligible amount of work has been done for bus arrival time prediction using ensemble technique. ME Stage 2 Seminar By Ninad V Gaikwad 16

System Architecture System consists of two modules: Passenger module Server module Server module consists of processing part Passenger module is responsible for sending and receiving data from the server ME Stage 2 Seminar By Ninad V Gaikwad 17

Ensemble Learning Process of combining two or more regression models In this dissertation ensemble method is used to combine Lasso , R idge, Random forest, XG boost and Gradient boost regression Predictions made by lasso and ridge regression are fed into the meta- regressor Random forest is used as the meta- regressor Prediction obtained from the meta- regressor is the final output ME Stage 2 Seminar By Ninad V Gaikwad 18

Ensemble Learning ( Contd.) ME Stage 2 Seminar By Ninad V Gaikwad 19

Data Processing Cycle Data that is used has to go through a data processing cycle Collection Provides the data for input Preparation Includes filtering, and grouping of data ME Stage 2 Seminar By Ninad V Gaikwad 20

Data Processing Cycle (Contd.) Input Feeding of data for processing Processing Converting raw data into a form that is usable for predictions Output and Interpretation Presentation of data in the form interpret able by the user Storage Data is stored into the files after being processed ME Stage 2 Seminar By Ninad V Gaikwad 21

Machine Learning Terminology Entropy Measure of randomness or unpredictability in a dataset Information Gain Decrease in entropy after the dataset is split Leaf node End node of the decision tree Decision node Node which decides whether data lies in one leaf or the other ME Stage 2 Seminar By Ninad V Gaikwad 22

Machine Learning Terminology ( Contd.) Root node Starting node of the decision tree Decision trees Data structure with conditions at its decision nodes and subsets of the dataset at its leaf node ME Stage 2 Seminar By Ninad V Gaikwad 23

Process Of Formation Of Decision Tree Sr. No Direction Source Destination Vehicle Type Vehicle Location Arrival 1 UP Mumbai Chennai AC Thane On Time 2 DOWN Delhi Banglore Express Kalyan On Time 3 UP Mumbai Banglore Express Nerul On Time 4 UP Delhi Hydrabad NonAC Thane Late 5 DOWN Mumbai Chennai AC Kalyan Late 6 UP Kolkata Madgaon Ordinary Nerul Late ME Stage 2 Seminar By Ninad V Gaikwad 24 Sample Set S

Process Of Formation Of Decision Tree ( Contd.) Step 1 We have an example named set “S”. The given set has 6 values with 3 “On Time” and 3 “Late”. Entropy (S) = = - (3/6 log2 3/6) - (3/6 log2 3/6) = - (0.5 log2 0.5) - (0.5 log2 0.5) = - (0.5 * (-1)) - (0.5 * (-1)) = 0.5 + 0.5 = 1 ME Stage 2 Seminar By Ninad V Gaikwad 25

Process Of Formation Of Decision Tree ( Contd.) Step 2 Attribute taken as “Direction” Direction has two values, viz : UP and DOWN. Occurrence of UP = 4/6 = 0.666 Occurrence of DOWN = 2/6 = 0.333 Direction = (On Time/UP) = 2/4 = 0.5 Direction = (Late/UP) = 2/4 = 0.5 Direction = (On Time/DOWN) = 1/2 = 0.5 Direction = (Late/DOWN) = 1/2 = 0.5 ME Stage 2 Seminar By Ninad V Gaikwad 26

Process Of Formation Of Decision Tree ( Contd.) Entropy (SUP) = - (0.5 log2 0.5) - (0.5 log2 0.5) = - (0.5 * (-1)) - (0.5 * (-1)) = 1 Entropy (SDOWN) = - (0.5 log2 0.5) - (0.5 log2 0.5) = -(0.5 * (-1)) - (0.5 * (-1)) = 1 Therefore, Information Gain (S, Direction) = Entropy (S) - 4/6 * Entropy (SUP) - 2/6 * Entropy (SDOWN) = Entropy (S) - 4/6 * 1 - 2/6 * 1 = 1 - 4/6 * 1 - 2/6 * 1 = 1 - 0.666 - 0.333 = 0.001 ME Stage 2 Seminar By Ninad V Gaikwad 27

Process Of Formation Of Decision Tree ( Contd.) Entropy (S) 1 Gain (S, Direction) 0.001 Gain (S, Source) 0.21 Gain (S, Destination) 0.667 Gain (S, Vehicle Type) 0.667 Gain (S, Vehicle Location) 0.001 ME Stage 2 Seminar By Ninad V Gaikwad 28 Results Of Gain In S

Process Of Formation Of Decision Tree ( Contd.) Step 3: Now , we have an example named set “S1” for Vehicle Type “AC”. Sr. No Direction Source Destination Vehicle Location Arrival 1 UP Mumbai Chennai Thane On Time 2 DOWN Mumbai Chennai Kalyan Late ME Stage 2 Seminar By Ninad V Gaikwad 29 Results of gain in S1 Entropy (S1AC) 1 Gain (S1, Direction) 1 Gain (S1, Source) Gain (S1, Destination) Gain (S1, Vehicle Location) 1

Process Of Formation Of Decision Tree ( Contd.) ME Stage 2 Seminar By Ninad V Gaikwad 30 Sr. No Direction Source Destination Vehicle Location Arrival 1 DOWN Kolkata Banglore Kalyan On Time 2 UP Mumbai Banglore Nerul On Time Set “S2” for Vehicle Type “Express”

Process Of Formation Of Decision Tree ( Contd.) “S3” for Vehicle Type “ NonAC ” ME Stage 2 Seminar By Ninad V Gaikwad 31 Sr. No Direction Source Destination Vehicle Location Arrival 1 UP Kolkata Hydrabad Thane Late “S4” for Vehicle Type “Ordinary” Sr. No Direction Source Destination Vehicle Location Arrival 1 DOWN Kolkata Banglore Kalyan On Time It is observed that when Vehicle Type = “Express”, “ NonAC ” and “Ordinary ”; the predicted scheme is always On Time, Late, Late respectively. Therefore, there is no need to find the Information Gain since the Entropy is always zero.

Process Of Formation Of Decision Tree ( Contd.) The Final Decision Tree ME Stage 2 Seminar By Ninad V Gaikwad 32

Linear Regression Technique in which a linear model is used Codify the relationship between the independent and dependent The relationship for a two dimensional data is of the form: y is the prediction parameter x is the independent parameter m is the slope of the line c is the coefficient   ME Stage 2 Seminar By Ninad V Gaikwad 33

Linear Regression ( Contd.) Points are scattered in the space Finds a line passing through the space Rainfall plotted on the x-axis Crop yield plotted on y-axis ME Stage 2 Seminar By Ninad V Gaikwad 34

Working Of Linear Regression Models ME Stage 2 Seminar By Ninad V Gaikwad 35 Equation for prediction p(i) = Prediction variable x = Prediction Parameter = The Y intercept = Slope of the line Each training instance is taken at a time The model makes prediction for each instance Error is calculated for each prediction The next predictions made consider the errors in prediction

Working Of Linear Regression Models ( Contd.) Weights are updated as follows w = Coefficient or Weight α = Learning rate δ = Error Error for a prediction p(i) = Prediction for the ith instance y(i) = ith output variable ME Stage 2 Seminar By Ninad V Gaikwad 36

Working Of Linear Regression Models ( Contd.) Update for the coefficient (t+1) = Updated Coefficient (t) = Current Coefficient Updating the value for (t+1) = Update Coefficient (t) = Current Coefficient ME Stage 2 Seminar By Ninad V Gaikwad 37

Working Of Linear Regression Models (Contd.) x y 1 1 2 3 4 3 3 2 5 5 ME Stage 2 Seminar By Ninad V Gaikwad 38 Consider following dataset x is the input parameter y is the prediction value

Working Of Linear Regression Models (Contd.) ME Stage 2 Seminar By Ninad V Gaikwad 39 A line can pass through the center Represents the relation between the points

Working Of Linear Regression Models ( Contd.) Assuming values of 0.0 for both coefficients, x=1, y=1 = 0.0 = 0.0 Prediction p(i) is calculated as p(i) = 0.0 + 0.0 * 1 = 0 Calculating the error δ = 0 – 1 = -1 Assuming learning rate α= 0.01 (t+1) = 0.0 – 0.01 * -1.0 = 0.01 (t+1) = 0.0 – 0.01 * -1 * 1 = 0.01 ME Stage 2 Seminar By Ninad V Gaikwad 40

Working Of Linear Regression Models ( Contd.) Iteration Epoch x y ( t+1) ( t+1) P(i) δ 1 1 1 1 0.01 0.01 0.02 -1 2 1 2 3 0.02 0.02 0.06 -2.9 3 1 4 3 0.049 0.078 0.36 -2.6 4 1 3 2 0.075 0.182 0.62 -1.4 5 1 5 5 0.089 0.224 1.21 -3.8 6 2 1 1 0.127 0.414 0.54 -0.5 7 2 2 3 0.132 0.419 0.97 -2 8 2 4 3 0.152 0.459 1.99 -1 9 2 3 2 0.162 0.499 1.66 -0.3 10 2 5 5 0.165 0.508 2.71 -2.3 ME Stage 2 Seminar By Ninad V Gaikwad 41

Working Of Linear Regression Models ( Contd.) ME Stage 2 Seminar By Ninad V Gaikwad 42 Iteration Epoch x y ( t+1) ( t+1) P(i) δ 11 3 1 1 0.188 0.623 0.81 -0.2 12 3 2 3 0.19 0.625 1.44 -1.6 13 3 4 3 0.206 0.657 2.83 -0.2 14 3 3 2 0.208 0.665 2.2 0.2 15 3 5 5 0.206 0.659 3.5 -1.5 16 4 1 1 0.221 0.734 0.96 17 4 2 3 0.221 0.734 1.69 -1.3 18 4 4 3 0.234 0.76 3.27 0.3 19 4 3 2 0.231 0.748 2.48 0.5 20 4 5 5 0.226 0.733 3.89 -1.1 Final coefficients have the values =0.226 and = 0.733

Working Of Linear Regression Models (Contd.) ME Stage 2 Seminar By Ninad V Gaikwad 43

Working Of Linear Regression Models ( Contd.) Using the values of and in our predictions we get ME Stage 2 Seminar By Ninad V Gaikwad 44 x y Prediction 1 1 0.959 2 3 1.692 4 3 3.158 3 2 2.425 5 5 3.891

Working Of Linear Regression Models ( Contd.) Plotting the predicted and actual values ME Stage 2 Seminar By Ninad V Gaikwad 45

Multiple Linear Regression Model where more than one parameters are used Rainfall along the x-axis Fertilizers along the y-axis Predicted value will lie on a plane contained in the three dimensional cube Features exceed three prediction will be a hyper-plane with n dimensions ME Stage 2 Seminar By Ninad V Gaikwad 46

Normalization To avoid the over-fitting problem R egularization or normalization A ssigning weights to the features No one feature overwrites the importance of other features Zero weights are assigned to the features not important Every feature gets a fair share of prediction opportunity Loss function Loss function should be the minimum Two types of loss functions – L1 & L2 ME Stage 2 Seminar By Ninad V Gaikwad 47

Lasso Regression Uses L1 type of loss function to minimize the error L1 loss function is given below: Takes the absolute difference between Y and f(x) Give us more than one solution Regression which considers L1 loss function is called as L1 regression or Lasso regression ME Stage 2 Seminar By Ninad V Gaikwad 48

Ridge Regression Uses L2 type of loss function to minimize the error L2 loss function is given below: Takes the square of absolute difference between Y and f(x) Give us one solution Regression which considers L2 loss function is called as L2 regression or Ridge regression ME Stage 2 Seminar By Ninad V Gaikwad 49

Random Forest Regression Many decision trees are used Each decision tree has some attributes from the dataset Consider the dataset mta_1706 Dataset has 17 features Thirty lac data points What random forest does: Takes any five features, twenty thousand samples Constructs a decision tree based on this data Random Samples with replacement Repeat till it has sufficient trees for each data point in the dataset ME Stage 2 Seminar By Ninad V Gaikwad 50

Random Forest Regression ( Contd.) ME Stage 2 Seminar By Ninad V Gaikwad 51

Gradient Boosting Ensemble of week models of prediction are prepared Models are in the form of decision trees Weak model of prediction is prepared Continuous improvements are applied to the model New and improved models replace the existing models Weaker models are discarded at each stage Accuracy improved in incremental fashion ME Stage 2 Seminar By Ninad V Gaikwad 52

Gradient Boosting ( Contd.) We can specify the number of models that will be produced More number of models means higher accuracy ME Stage 2 Seminar By Ninad V Gaikwad 53

XG Boosting XG Boosting stands for Extreme Gradient Boosting Improvement over the gradient boosting algorithm Pushes the boundaries of the gradient boosting algorithm New features added XG boosting algorithm: Parallelization Distributed computing Out of core computing Cache optimization ME Stage 2 Seminar By Ninad V Gaikwad 54

System Interface Main Menu ME Stage 2 Seminar By Ninad V Gaikwad 55

Operations On Fields Remove More Than 17 Fields ME Stage 2 Seminar By Ninad V Gaikwad 56

Drop Rows With Empty Data Values ME Stage 2 Seminar By Ninad V Gaikwad 57

Convert Scheduled Arrival Time To Valid Format ME Stage 2 Seminar By Ninad V Gaikwad 58

Convert Recorded Time To Seconds ME Stage 2 Seminar By Ninad V Gaikwad 59

Convert Expected Time To Seconds ME Stage 2 Seminar By Ninad V Gaikwad 60

Convert Scheduled Time To Seconds ME Stage 2 Seminar By Ninad V Gaikwad 61

Deleting Extra Columns ME Stage 2 Seminar By Ninad V Gaikwad 62

Fit Data Into Models ME Stage 2 Seminar By Ninad V Gaikwad 63

Save Model To Disk ME Stage 2 Seminar By Ninad V Gaikwad 64

Start Server Using Selected Model ME Stage 2 Seminar By Ninad V Gaikwad 65

Working Of Client Configure Server Details ME Stage 2 Seminar By Ninad V Gaikwad 66

Data Input by User ME Stage 2 Seminar By Ninad V Gaikwad 67

Prediction made by server ME Stage 2 Seminar By Ninad V Gaikwad 68

Dataset Description Data set from website kaggle.com. D ata set is1.3 GB. CSV file is downloaded Name of the file used is mta_1706 There are 17 fields, 67,30,856 numbers of records. Latitude and longitude values are of type float Distance is of type integer Time are of type datetime and the rest of type text. ME Stage 2 Seminar By Ninad V Gaikwad 69

Dataset Description (Contd.) After cleaning dataset is written to the file mta_1706_clean. It has 58,04,362 number of records, 11 fields. All the values in the data set are either integer or float. ME Stage 2 Seminar By Ninad V Gaikwad 70

Performance Measures A bus arrival time prediction application is a technique of regression We use the most common regression metrics: Variance Mean absolute error Mean squared error Mean squared log error Median absolute error R2 score ME Stage 2 Seminar By Ninad V Gaikwad 71

Variance Measure of deviation of the random variable from the mean Measures how far the samples are spread from the their average value Zero indicates that there is no variability in the values Formula for variance: = Variance X = Average value of sample = Individual values N = Total number of samples Variance among the samples is higher ME Stage 2 Seminar By Ninad V Gaikwad 72

Mean Absolute Error Measure of difference between two continuous variables MAE is the average of the differences between the observed and predicted values Formula for MAE: = Observed value = Predicted value = Number of samples ME Stage 2 Seminar By Ninad V Gaikwad 73

Mean Absolute Error ( Contd.) ME Stage 2 Seminar By Ninad V Gaikwad 74

Mean Squared Error MSE is the square of difference between the observed value and the predicted value Formula for MSE: = Observed value = Predicted value n = Number of Samples ME Stage 2 Seminar By Ninad V Gaikwad 75

Mean Squared Error ( Contd.) ME Stage 2 Seminar By Ninad V Gaikwad 76

Mean Squared Log Error Difference between the logarithm of the observed and predicted values Formula for mean squared log error: N = Total number of samples = Observed values = predicted values ME Stage 2 Seminar By Ninad V Gaikwad 77

Mean Squared Log Error ( Contd.) ME Stage 2 Seminar By Ninad V Gaikwad 78

Median Absolute Error Average of the differences between the median value and predicted values Advantage of being less affected by noise Formula for MAE: = Median value = Predicted value = Number of samples ME Stage 2 Seminar By Ninad V Gaikwad 79

Median Absolute Error ( Contd.) ME Stage 2 Seminar By Ninad V Gaikwad 80

R2 Score Known as coefficient of determination Proportion of dependent variable that can be determined from the independent variable Provides a measure of how well the observed outcome is reflected in the predicted outcomes The sum of square is given by the formula: = Mean of observed data = Individual observed value ME Stage 2 Seminar By Ninad V Gaikwad 81

R2 Score ( Contd.) Residual sum of squares is given by the formula: = Observed values = Residual values Then the coefficient of determination R2 is given by formula: R2 value is 1 for all the algorithms It can be inferred that the independent factors are represented very well in the predicted values. ME Stage 2 Seminar By Ninad V Gaikwad 82

Applications There are following applications to the software: Bus Arrival Time The arrival time of the bus is unknown Develops anxiety among the daily commuters of the bus Application of this software will help reduce this anxiety. Promote Public Transportation B us arrival time no more a mystery Increasing in number of people travelling per unit of fuel Air will get more and more clean. ME Stage 2 Seminar By Ninad V Gaikwad 83

Applications ( Contd.) Deciding On Important Appointments Passenger can come to know whether he/she will be on time or late for a meeting The passenger can then make desirable changes in his/her schedule The passenger can also opt for a faster means of transport Sharing Information In a scenario where a teenager is travelling by a bus, the parents can keep a track of where the bus is at a particular time This can be used to enhance the security of women and children. ME Stage 2 Seminar By Ninad V Gaikwad 84

Applications ( Contd.) Other Modes Of Transport The app developed here can be used by other means of transport also. Other means of transport such as logistics companies. Track of where the shipments are at a particular time. Inform the recipients of the package of the whereabouts of their shipments. ME Stage 2 Seminar By Ninad V Gaikwad 85

Contributions Standard Dataset MTA cleaned and more than 50 lac records used in the experiment for result analysis Five machine learning algorithms implemented Random Forest, Lasso, Ridge, Gradient Boosting, and XGBoosting Stacking (ensemble) used 31 combinations of algorithms experimented and analyzed Six performance metrics used for analysis Android application is developed along web server ME Stage 2 Seminar By Ninad V Gaikwad 86

Conclusion Prediction MSE of Random Forest is minimum among all 31 combinations Ensemble of RF, Lasso and GB gives 2 nd best result Lasso, Ridge and their combination gives poor performs X gives average prediction Time Lasso and Ridge take minimum time Boosting methods and their combinations take maximum time ME Stage 2 Seminar By Ninad V Gaikwad 87

Future Scope A full scale system with live data feeds to the server can be developed in future. These feeds can be stored in the database for learning purpose. This will increase the usage of bus transportation system . ME Stage 2 Seminar By Ninad V Gaikwad 88

References [1] Lovell D J. Accuracy of speed measurements from cellular phone vehicle location systems. Journal of Intelligent TranspOliation Systems, 6(4):303-25, 2001. [2] S. I.-J. Chien , Y. Ding, and C. Wei, "Dynamic bus arrival time prediction with artificial neural networks", Journal of Transportation Engineering 128(5): 429-438, 2002. [3] Dihua Sun, Hong Luo, Liping Fu, Weining Liu, Xiaoyong Liao, and Min Zhao Predicting Bus Arrival Time on the Basis of Global Positioning System Data, Transportation Research Record: Journal of the Transportation Research Board, No.2034 , Transportation Research Board of the National Academies, Washington, D.C., pp. 62–72, 2007. [4] S. Gaonkar , J. Li, R. R. Choudhury, L. Cox, and A. Schmidt. Micro-blog: sharing and querying content through mobile phones and social participation. In Proceedings of ACM MobiSys , pp. 174–186, 2008. [5] Amir Saffari , Christian Leistner , Jakob Santner , Martin Godec , Horst Bischof , "On-line Random Forests", 2009 IEEE 12th International Conference on Computer Vision Workshops, ICCV Workshops, pages 1393-1400, 2009. [6] Simon Bernard, Laurent Heutte and Sebastien Adam, "On the Selection of Decision Trees in Random Forests", Proceedings of International Joint Conference on Neural Networks, Atlanta, Georgia, USA, June 14-19, pages 302-307 , 2009 . ME Stage 2 Seminar By Ninad V Gaikwad 89

References ( Contd.) [7] Huan Xu, Constantine Caramanis , Shie Mannor , "Robust Regression and Lasso", IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 56, NO. 7, pages 3561-3574, JULY 2010. [8] G. Agamennoni , J. Nieto, and E. Nebot . Robust inference of principal road paths for intelligent transportation systems. Intelligent Transp. Systems, IEEE Transactions on, 12(1):298–308, March 2011. [9] Feng Li , Yuan Yu , HongBin Lin , WanLi Min , "Public bus arrival time prediction based on traffic information management system", Proceedings of 2011 IEEE International Conference on Service Operations, Logistics and Informatics, pp.336 - 341, 2011. [10] Biagioni , James , Gerlich , Tomas , Merrifield, Timothy , Eriksson, Jakob . ( 2011). " EasyTracker : Automatic Transit Tracking, Mapping, and Arrival Time Prediction Using Smartphones". SenSys 2011 - Proceedings of the 9th ACM Conference on Embedded Networked Sensor Systems. pp. 68-81, 2011. [11] M. A. Hannan , A. M. Mustapha, A. Hussain and H. Basri , "Intelligent Bus Monitoring and Management", Proceedings of the World Congress on Engineering and Computer Science 2012 Vol II, pp. 1049-1054, 2012. [12] Paola Arce, Luis Salinas, "Online Ridge Regression method using sliding windows ", 2012 31st International Conference of the Chilean Computer Science Society , pages 87-90, 2012 . ME Stage 2 Seminar By Ninad V Gaikwad 90

References ( Contd.) [13] Mohammed S. Alam , Son T. Vuong , "Random Forest Classification for Detecting Android Malware", 2013 IEEE International Conference on Green Computing and Communications and IEEE Internet of Things and IEEE Cyber, Physical and Social Computing, pages 663-669, 2013. [14] Yidan Fan , Kun Niu , Nanjie Deng, "A real-time bus arrival prediction method based on energy-efficient cell-tower positioning", 2014 IEEE 3rd International Conference on Cloud Computing and Intelligence Systems, pp. 717 - 721, 2014. [15] Pengfei Zhou, Yuanqing Zheng, Mo Li, "How Long to Wait? Predicting Bus Arrival Time With Mobile Phone Based Participatory Sensing", IEEE Transactions on Mobile Computing, Volume 13 Issue 6, pp.1228 - 1241, 2014. [16] LeiWang , Zhongyi Zuo , and Junhao Fu. 2014. Bus Arrival Time Prediction Using RBF Neural Networks Adjusted by Online Data. Procedia – Social and Behavioral Sciences 138, pages 67–75, 2014. [17] Jinrong He, Lixin Ding, Lei Jiang and Ling Ma, "Kernel Ridge Regression Classification ", 2014 International Joint Conference on Neural Networks ( IJCNN), pages 2263-2267, 2014. [18] Luis G. Jaimes , Idalides J. Vergara-Laurens , Andrew Raij , "A Survey of Incentive Techniques for Mobile Crowd Sensing", IEEE Internet of Things Journal Volume 2 Issue 5, pp. 370 - 380, 2015 . ME Dissertation Seminar By Ninad V Gaikwad 91

References ( Contd.) [19] B. Dhivyabharathi , B. Anil Kumar, Lelitha Vanajakshi , "Real time bus arrival time prediction system under Indian traffic condition", 2016 IEEE International Conference on Intelligent Transportation Engineering (ICITE), pp.18 - 22, 2016. [20] Tianqi Chen, Carlos Guestrin , " XGBoost : A Scalable Tree Boosting System", KDD ’16, August 13-17, 2016, San Francisco, CA, USA, Pages 785-794, 2016. [21] Ferran Diego, Fred A. Hamprecht ,"Structured Regression Gradient Boosting", 2016 IEEE Conference on Computer Vision and Pattern Recognition, pages 1459-1467 , 2016. [22] Muthukrishnan R, Rohini R, "LASSO: A Feature Selection Technique In Predictive Modeling For Machine Learning", 2016 IEEE International Conf. on Advances in Computer Applications (ICACA), pages 18-20, 2016 . [23] Gabriel B. Kalejaiye , Henrique R. Orefice , Teogenes A. Moura, "Poster Abstract : Frugal Crowd Sensing for Bus Arrival Time Prediction in Developing Regions", 2017 IEEE Second International Conference on Internet-of-Things Design and Implementation ( IoTDI ), pp. 355 - 356, 2017. [24] Xiaobo Liu, Zhentao Liu1, Guangjun Wang1, Zhihua Cai , Harry Zhang , ”Ensemble Transfer Learning Algorithm”, Special section on adv. data analytics for large-scale complex data environments, pp 2389-2396, 2017 . [25 ] Michael Stone, “New York City Bus Data”, live data recorded fron NYC buses, ( version 4) , Available [Online] https:// www.kaggle.com/stoney71/new-york-city-transport-statistics , accessed on 5 May 2018. ME Stage 2 Seminar By Ninad V Gaikwad 92

List of Publications [1] Ninad Gaikwad and Satishkumar Varma, “Analysis Of Bus Tracking And Arrival Time Prediction Techniques” 2nd International Conference On Advanced Trends In Engineering (ICATE-2K18), 6 April 2018. [ 2] Ninad Gaikwad and Satishkumar Varma, “Performance Analysis of Bus Arrival Time Prediction Using Machine Learning Based Ensemble technique”, Conference On Technologies For Future Cities, 8 – 9 January 2019 [Communicated ]. ME Stage 2 Seminar By Ninad V Gaikwad 93

Acknowledgements Dr. Satishkumar L. Varma (Supervisor, Department of Information Technology , PCE New Panvel ) Dr. Sharvari Govilkar (HOD, Department of Information Technology, PCE New Panvel ) Dr Madhumita Chatterjee (HOD, Department of Computer Engg ., PCE New Panvel ) Dr. Sandeep M. Joshi (Princip al , PCE New Panvel ) All faculties of PCE, Family, Guardian and Friends ME Stage 2 Seminar By Ninad V Gaikwad 94

Thank You ME Stage 2 Seminar By Ninad V Gaikwad 95