Remaining Useful Life Prediction for Experimental Filtration System: A Data Challenge
kince
214 views
36 slides
Sep 02, 2024
Slide 1 of 36
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
About This Presentation
Abstract: Maintenance costs of industrial systems often exceed the initial investment cost. Predictive maintenance, which analyzes the health of the system and suggests maintenance planning, is one of the strategies implemented to reduce maintenance costs. Health status and life estimation of the ma...
Abstract: Maintenance costs of industrial systems often exceed the initial investment cost. Predictive maintenance, which analyzes the health of the system and suggests maintenance planning, is one of the strategies implemented to reduce maintenance costs. Health status and life estimation of the machinery are the most researched topics in this context. In this paper, we present our analysis for Fifth European Conference of the Prognostics and Health Management Society 2020 Data Challenge, which introduces an experimental filtration system for different experiment setups, and asks for remaining useful life predictions. We compared random forest, gradient boosting, and Gaussian process regression algorithms to predict the useful life of the experimental system. With the help of a new fault-based piecewise linear RUL assignment strategy, our gradient boosting based solution has been ranked 3rd in the data challenge.
https://doi.org/10.36001/phme.2020.v5i1.1317
Size: 1.34 MB
Language: en
Added: Sep 02, 2024
Slides: 36 pages
Slide Content
1/36
Remaining Useful Life Prediction for
Experimental Filtration
System: A Data Challenge
Team GTU
Kürşat İnce & EnginSirkeci
{kince, esirkeci}@gtu.edu.tr
2/36
Outline
•Data Challenge Info
•Our approach to Data Challenge
•Results & Discussion
3/36
Prognostics and Health Management
•Prognostics is the process of predicting the future reliability of
a system by assessing degradation of the system from its
normal operating conditions.
•Health Management is the process of measuring, recording,
and monitoring of the system status and operating conditions
in real time.
•Prognostics and Health Management (PHM) methodologies
enable us to monitor the health state of the systems and
dynamically update the health state based on these
measurements.
4/36
PHME20 Data Challenge
•System Under Investigation: An experimental filtration
system which is subject to clogging
•Objective: Create prognostic models to estimate Remaining
Useful Life (RUL) of the filtration system.
5/36
Experimental Setup
6/36
Definitions
•Remaining Useful Life (RUL) is the time left from a given time
to the end of the system’s useful life.
•RUL Prediction is forecasting the time left before the system
losses its operation ability, based on the condition monitoring
data.
•RUL Prediction (in context) is predicting the time until
clogging occurs in the experimental filtration system using the
operating conditions and ‘Run-to-Failure’ condition
monitoring data.
7/36
Dataset Info
•Readings from the sensor are recorded at 10 Hz.
•Features:
•Time, Flow_Rate, Upstream_Pressure, Downstream_Pressure
•Operating conditions:
•Particle_Size, Ratio
•Others:
•ExperimentID, ReadingID, Profile
•Calculated:
•Pressure_Drop= Upstream_Pressure-Downstream_Pressure
•Filter is assumed to be clogged and inoperational:
isClogged= (Pressure_Drop>= 20)
•Note: Target variable RUL is not given with the dataset.
8/36
Dataset Info –Continued
9/36
Submission Rules
•Submit a JupyterNotebook that creates four individual models.
•Individual models will use 25%, 50%, 75%, and 100% of the
training data.
•Models will be saved and delivered as Python pickle files.
•RULs should be evaluated every 10 seconds after the experiment
starts.
•Total submission file size can not be greater than 6 MBs.
10/36
Our Approach
11/36
Data Merging
•Merged individual data files/folders into training and validation
“Data Frames”.
•Added operating conditions for each data sample.
•Added ExperimentIDand ReadingID
•Features after data merging:
ExperimentID Profile
ReadingID Flow_Rate
Time Upstream_Pressure
Particle_Size Downstream_Pressure
Ratio Pressure_Drop
12/36
RUL Assignment Strategies
•How to assign RUL labels to the samples in the dataset.
•LinearRUL Assignment (Linear):
•Maximum RUL is the number of samples for the experiment.
•RUL value drops by one with every reading, and finally reaches to
zero at the end.
•Piecewise RUL Assignment (PwL):
•Afixed degradation point is assumed and used for all the
experiments.
•RUL is constant uptothis degradation point.
•Afterthe degradation point, RUL decreases linearly.
•Experimentedwith initial RUL values of 100,125, and 150.
13/36
RUL Assignment Strategies –Continued
•Fault Based RUL Assignment (PwL_Fault):
•Estimate degradation point foreach experiment heuristically.
•Heuristic:Flow_Rateis almost linear. Find the timestamp where
linearity changes. Use this timestamp as degradation point.
•Assume a linear RUL from the initial RUL to degradation point.
•Afterthe degradation point, RUL decreases linearly.
16/36
Data Preprocessing –Continued
•Operational Conditions Assignment
•Unseen profiles in the validation dataset (and test dataset) crashes
the scaling process.
•“Profile” enumerates the experimental conditions, namely “Particle
Size” and “Ratio”.
•Used K-Means clustering algorithm on “Particle Size” and “Ratio”
and assigned the cluster number as “KmeansProfile”.
•K-Means model is added to final model for further use.
•Scaling
•For each “KMeansProfile”, scaling is performed separately using
StandardScaler() from Scikit-Learn Library.
•Scaling model is added to final model for further use.
17/36
Data Preprocessing –Continued
•Feature Selection
ExperimentID Flow Rate
ReadingID Upstream Pressure
Time Downstream Pressure
Particle Size Pressure Drop
Ratio KMeansProfile
Profile
18/36
Data Preprocessing –Continued
•Resampling
•Speedup Gaussian Process model.
•Original sampling at 10 Hz.
•Resampled to 1, 0.5, 0.33, 0.25, and 0.2 Hz. using 10, 20, 30, 40, and
50 window sizes, moving half of the windows size at each step.
•Windowing
•Used window size 5, 10, 15, 20, 25, 30, 40, and 50 to create a
context for the time series data.
19/36
Data Modeling
•We used the following Machine Learning methods:
•Random Forest
•Gradient Boosting
•Gaussian Process
•We used 25%, 50%, 75%, and 100% of the training data
for model training.
•Validation data is used for model evaluation.
20/36
Random Forest
•Ensemble learning method for classification, regression.
•Based on Decision Trees
•Random selection of features at each tree node.
•Nonlinear modelling tool and overcomes low accuracy of
single decision-tree and overfitting
•Tuning the hyperparameters can often increase generalization
performance.
•Used implementation in Scikit-Learn library.
21/36
Gradient Boosting
•Ensemble learning method for classification, regression.
•Basedon weak prediction modelsthat fit pseudo-residuals
•UsedCatBoostlibrary
22/36
Gaussian Process
•GP models the underlying distribution of the training data as a
multivariate normal distribution.
•Used for regression and classification problems.
•Learning a distribution enables the model to output a
prediction and an uncertainty associated with the prediction.
•Non-parametric and expressive.
•Used implementation in Scikit-Learn library.
23/36
ModelTraining
•Hyperparameters:
24/36
Model Evaluations
•RUL will be predicted for every 10 seconds, then evaluations
will be performed.
•Mean Absolute Error (MAE):
where;
•i: sample index,
•n: number of samples,
•??????
??????: Assigned RUL to sample i
•ො??????
??????: Predicted RUL for sample i
MAE=
σ
??????=0
??????
??????
??????−ො??????
??????
??????
25/36
Model Evaluations
•Data Challenge Penalty Score:
where
•M
i = Model generated with i% of the training data, i in {25; 50; 75; 100}
•TV = Training + Validation datasets
•TE = Test dataset
26/36
Evaluation Results
•Evaluation results for each model is given in the following
slides.
27/36
Results for 25% of Training Data
28/36
Results for 50% of Training Data
29/36
Results for 75% of Training Data
30/36
Results for 100% of Training Data
31/36
Discussion – Model Results
•Best results for RF and CB achieved with PwL_Fault.
•Window size for best results changed a lot for RF and CB for
different sizes of training data.
•GP achieves the best results for PwL RUL assignment and for
window size 50.
•We were unable to submit GP results.
32/36
Discussion – Model Sizes
•Random Forest:
•Model size increases with number of trees in the ensemble, and
with the depth of these trees.
•Size of four RF models are as big as 28 MBs.
•Smaller models are upto 3x worse than that is reported here.
•Gaussian Process:
•Scikit-Learn implementation of GP stores training samples and
covariance matrix in the model.
•Size of the GP models increases with number of samples in the
dataset.
•Size of four GP models are 14.3 MBs in total.
•CatBoost:
•CB models are about 1 MBs each.
•We were able to submit our CB models.
33/36
Data Challenge Results
•Submissions to the data challenge are evaluated using public
training and validation datasets, and a private test dataset.
•Our CB models scored 86.74 using the challenge's penalty
score.
•Team GTU ranked 3
rd
in the data challenge.
34/36
Conclusion
•A new heuristic-based RUL assignment strategy (PwL_Fault) is
introduced.
•PwL_Fault achieves better than Linear RUL assignment (Linear) in
all cases.
•PwL_Fault performs better then piecewise linear version (PwL) for
RF and CatBoost, but not for GP.
•Gaussian process regressor predicted better than other
algorithms in all experiments.
•While performing the best, GP does not yield an acceptable
model size which prevented us from submitting our best model.
•The model using CatBoost carried us to the 3
rd
place in the
challenge with score 86.74.
35/36
Future Work
•Develop deep learning models for RUL prediction.
•Make Gaussian process work for smaller model sizes ☺