Anomaly_Detection_Insurance_PPT 12fad.pptx

cumasandiri 2 views 11 slides Sep 17, 2025
Slide 1
Slide 1 of 11
Slide 1
1
Slide 2
2
Slide 3
3
Slide 4
4
Slide 5
5
Slide 6
6
Slide 7
7
Slide 8
8
Slide 9
9
Slide 10
10
Slide 11
11

About This Presentation

Anomali data


Slide Content

A Comparative Study of Isolation Forest, LOF, OCSVM and Elliptic Envelope for Anomaly Detection in Insurance Dataset Wilsen G. Mokodaser, Green A. Sandag , Semmy W. Taju , Raissa C. Maringka , Rolly J. Lontaan Faculty of Computer Science, Universitas Klabat , Indonesia

Abstract Outliers are a major challenge in insurance data. Study compares 4 algorithms: Isolation Forest (IF), Local Outlier Factor (LOF), One-Class SVM (OCSVM), Elliptic Envelope (EE). Dataset: 766 insurance records, monthly income (Jan–Dec). Results: IF, LOF, EE → 39 outliers each, OCSVM → 38 outliers. 17 anomalies consistently detected by all 4 → high-confidence anomalies.

Introduction Data quality is critical in analytics. Outliers distort statistics, ML models, and business decisions. In insurance: impacts premium pricing, fraud detection, risk evaluation. Need systematic anomaly detection for reliable results.

Research Method Steps: Data preprocessing (cleaning, missing values → median, scaling with StandardScaler ). Algorithm implementation (IF, LOF, OCSVM, EE). Outlier visualization (heatmaps, PCA scatter plots). Interpretation & comparison.

Dataset Source: Insurance company income records. Size: 766 records. Features: Monthly revenue (Jan–Dec). Preprocessing: Missing values filled with median, data standardized.

Algorithms Isolation Forest (IF): Tree-based, isolates anomalies quickly. Local Outlier Factor (LOF): Density-based, detects local anomalies. Elliptic Envelope (EE): Assumes Gaussian distribution. One-Class SVM (OCSVM): Separates normal vs anomalies with hyperplane.

Results Number of Outliers Detected: IF: 39 LOF: 39 EE: 39 OCSVM: 38 Consensus (≥2): 36 Consensus (≥3): 22 Consensus (all 4): 17

Visualization Heatmap (IF): Shows anomaly distribution across months.

Visualization Scatter (PCA): Clear separation of normal vs anomalies. LOF & IF: Strong agreement. OCSVM: More conservative. EE: Distinct, Gaussian-based detection.

Discussion IF & LOF: Consistent, best balance between sensitivity & reliability. OCSVM: Conservative, fewer false positives. EE: Works best if data is Gaussian, less flexible otherwise. Ensemble approach (consensus) improves reliability.

Conclusion No algorithm is universally superior. IF & LOF most effective for financial anomaly detection. Combining methods strengthens confidence. Useful for fraud detection, premium accuracy, and risk management in insurance.
Tags