A Comparative Study of Isolation Forest, LOF, OCSVM and Elliptic Envelope for Anomaly Detection in Insurance Dataset Wilsen G. Mokodaser, Green A. Sandag , Semmy W. Taju , Raissa C. Maringka , Rolly J. Lontaan Faculty of Computer Science, Universitas Klabat , Indonesia
Abstract Outliers are a major challenge in insurance data. Study compares 4 algorithms: Isolation Forest (IF), Local Outlier Factor (LOF), One-Class SVM (OCSVM), Elliptic Envelope (EE). Dataset: 766 insurance records, monthly income (Jan–Dec). Results: IF, LOF, EE → 39 outliers each, OCSVM → 38 outliers. 17 anomalies consistently detected by all 4 → high-confidence anomalies.
Introduction Data quality is critical in analytics. Outliers distort statistics, ML models, and business decisions. In insurance: impacts premium pricing, fraud detection, risk evaluation. Need systematic anomaly detection for reliable results.
Research Method Steps: Data preprocessing (cleaning, missing values → median, scaling with StandardScaler ). Algorithm implementation (IF, LOF, OCSVM, EE). Outlier visualization (heatmaps, PCA scatter plots). Interpretation & comparison.
Dataset Source: Insurance company income records. Size: 766 records. Features: Monthly revenue (Jan–Dec). Preprocessing: Missing values filled with median, data standardized.
Algorithms Isolation Forest (IF): Tree-based, isolates anomalies quickly. Local Outlier Factor (LOF): Density-based, detects local anomalies. Elliptic Envelope (EE): Assumes Gaussian distribution. One-Class SVM (OCSVM): Separates normal vs anomalies with hyperplane.
Visualization Heatmap (IF): Shows anomaly distribution across months.
Visualization Scatter (PCA): Clear separation of normal vs anomalies. LOF & IF: Strong agreement. OCSVM: More conservative. EE: Distinct, Gaussian-based detection.
Discussion IF & LOF: Consistent, best balance between sensitivity & reliability. OCSVM: Conservative, fewer false positives. EE: Works best if data is Gaussian, less flexible otherwise. Ensemble approach (consensus) improves reliability.
Conclusion No algorithm is universally superior. IF & LOF most effective for financial anomaly detection. Combining methods strengthens confidence. Useful for fraud detection, premium accuracy, and risk management in insurance.