Telecom Customer Churn Analysis Using Gaussian Mixture model

ShreyasBadkar1 8 views 30 slides Sep 01, 2025
Slide 1
Slide 1 of 30
Slide 1
1
Slide 2
2
Slide 3
3
Slide 4
4
Slide 5
5
Slide 6
6
Slide 7
7
Slide 8
8
Slide 9
9
Slide 10
10
Slide 11
11
Slide 12
12
Slide 13
13
Slide 14
14
Slide 15
15
Slide 16
16
Slide 17
17
Slide 18
18
Slide 19
19
Slide 20
20
Slide 21
21
Slide 22
22
Slide 23
23
Slide 24
24
Slide 25
25
Slide 26
26
Slide 27
27
Slide 28
28
Slide 29
29
Slide 30
30

About This Presentation

Collected secondary data from a telecom company which includes features such as the customer's call history, data usage, billing information, etc. Train the Gaussian Mixture Model (GMM) for the data. This model will learn the
parameters of the two distributions in the mixture model and the prob...


Slide Content

Telecom Customer Churn Analysis using Gaussian Mixture Model Shreya Acharya A056 Harshal Mandawade A057 Harsh Gidwani A059 Shreyas Badkar A060 Mentored by: Prof. Prashant Dhamale

One way to predict customer churn is to use a machine learning model This project will investigate the use of GMMs to predict customer churn in the telecom industry.Gaussian mixture model (GMM) is a type of mixture model that can be used to model data that is generated from a mixture of different distributions. In the case of customer churn data, the data can be modelled as a mixture of two distributions: the distribution of customers who are likely to churn and the distribution of customers who are not likely to churn It is estimated that it costs five times more to acquire a new customer than to retain an existing one. Clearly customer churn is a major challenge for companies. Introduction

What is Churn? Customer churn, in the telecommunications industry, refers to the phenomenon where subscribers or customers discontinue their services with a telecom provider and switch to a different service provider or terminate their services altogether.

Rationale The telecommunications industry is highly competitive, with numerous service providers vying for market share. High churn rates are prevalent in the telecom industry, affecting customer loyalty and revenue. Customer churn is not solely dependent on pricing or network quality; it's often influenced by a complex interplay of factors. Identifying and understanding churn patterns is vital for the development of effective retention strategies. In today's data-driven era, telecom companies have access to vast amounts of customer data. Leveraging this data for data-driven decision-making is imperative.

Problem Statement The telecommunications industry faces a pressing challenge - customer churn. As subscribers switch between providers, telecom companies experience revenue loss, eroded market share, and increased competition. Understanding the intricate dynamics of customer churn and implementing effective retention strategies is paramount.

To propose retention strategies tailored to different customer segments, based on GMM probabilities, thereby aiding telecom companies in reducing churn. 2 1 To develop and implement a Gaussian Mixture Model (GMM) for accurate churn prediction, identifying diverse customer segments based on their likelihood of churn Objectives

A Novel Classification Approach for Credit Scoring based on Gaussian Mixture Model Authors: Hamidreza Arian Seyed Mohammad Sina Seyfi Azin Sharifi Research Paper

Our analysis is based on a telecom churn dataset, source. This rich dataset comprises a wide range of customer attributes, including call activity, messaging, complaints, and other relevant features. Data

Exploratory Data Analysis

Data Cleaning Data Splitting We partitioned the dataset into a training set (80%) and a test set (20%). This division allowed us to train and evaluate our predictive model effectively while ensuring independent validation. To ensure the reliability and accuracy of our analysis, we conducted a thorough data cleaning process. This involved handling missing values, identifying and addressing outliers, and standardizing data for consistency. Removed all the Null Values from Customer ID We removed the unwanted columns and columns with categorical data such as Area code, Gender, Pin Code, State etc

MODEL EVALUATION MATRIX CHURN PREDICTION AND MODELLING ACCURACY, PRECISION, AND RECALL 1 2 3 4 MAKING CLUSTERS/ SEGMENTATION Analysis

Model Selection Our choice of the Gaussian Mixture Model (GMM) is grounded in its ability to capture complex customer behaviors and patterns, making it well-suited for the telecom churn analysis. Understanding GMM: A Gaussian Mixture Model represents the data as a probabilistic mixture of Gaussian distributions. It can be mathematically expressed as follows: Multivariate Gaussian Mixture Model

Assume the data set consists of N samples x 1 , x 2 ,.. x N ,, where x i = (x 1 i , x 2 i ,…, x N i ), for i = 1, 2, · · · ,N is a d-dimensional vector. Each data point xi can belong to a discrete set of samples shown by { y i } 1 m with m being the set of churn classes. we have two possible churn classes, namely y i ∈ (0, 1) where 0 is non-churn and 1 is churn.

Assumptions of GMM Normal Distribution : GMM assumes that the data within each component (cluster) follows a Gaussian distribution. This assumption implies that the customer behavior within each cluster is approximately normally distributed mean mu and covariance matrix sigma. Independence of Features : GMM assumes that the features (variables) used in the model are statistically independent within each component. In GMM each component (cluster) is equally likely, meaning that the prior probabilities are equal.

EM Algorithm We used Maximum Likelihood estimation to estimate the mixture weigths, using the EM Algorithm. It is a simple but powerful method for solving the MLE problems and has been widely adopted in estimating mixture models.

EM Algorithm The Expectation-Maximization (EM) algorithm plays a pivotal role in fitting the GMM to our data. It comprises two primary steps: Expectation (E-Step) : In this step, we estimate the probabilities that each data point belongs to each component. Maximization (M-Step): The M-Step involves updating the model parameters (μ, Σ, π) based on the estimated responsibilities from the E-Step.

The Bayesian Information Criterion (BIC) penalizes model complexity more heavily than AIC. Here N is the number of data points. The model selected will have the lowest BIC score. we can also use the Akaike Information Criterion (AIC) which is a model selection method, formally defined as where represents the model parameter set, x is the observed data, and k is the number of parameters estimated by the model.

ω i 0.10523 0.37509 0.34996 0.10207 0.06763 2 1 3 4 5 Mixing Proportion of each cluster P i P i P i

Estimated Parameters for Cluster I Variance Covariance Matrix Mean Vector

Cluster-wise Churn-Non churn Probability

Associating individual data to clusters: By utilizing the GMM probabilities, we segmented customers into distinct clusters, facilitating the formulation of tailored retention strategies that address the specific needs of each group. Binary Classification Using GMM

Calculating Probability of churn for each customer: Setting the decision boundary: After calculating probabilities, we define a decision boundary 0<D<1 by which an arbitrary training data xi can be labeled either 0 or 1.

Accuracy: 70.61 % Sensitivity: 76.57 % Specificity: 34.736% Results Confusion Matrix

Receiving Operator Characteristics (ROC) (ROC) curve is a graphical plot used to show the diagnostic ability of binary classifiers. It shows the trade-off between sensitivity (or TPR) and specificity (1 –FPR). Classifiers that give curves closer to the top-left corner indicate a better performance.

Conclusion Guassian Mixture Model enables us to estimate the churn of a customer based on their profile. We can also develop churn prevention strategy based on the churn risk of the clusters formed from the GMM model. Our analysis provides actionable insights for the telecom company, with the potential to enhance customer retention strategies and, in turn, profitability. We've illustrated the power of data-driven decision-making in the telecom industry.

Limitations Limited Feature Set: Our analysis used a limited set of customer features, recognize that additional relevant attributes may exist and could enhance the model's accuracy. Model Complexity: There might be more sophisticated models that could provide better insights. Our analysis is based on historical data and may not fully capture changes in customer behavior in the future. There is need for continuous monitoring and model retraining. A larger dataset may lead to more robust findings.

Future Scope Exploration of the applicability of similar models and strategies in other industries facing customer churn challenges, such as e-commerce, finance, and subscription-based services. Expansion of the project to predict CLV, allowing telecom companies to not only retain customers but also maximize their long-term value. Development of dynamic GMMs that consider time-dependent changes in customer behavior and churn patterns. This could involve time series analysis.

References We've relied on a range of research papers, books, and online resources to inform and guide our project. These sources underpin the foundation of our work. Xie T, Li X, Ngai EWT, Ying W.( 2009). Customer churn prediction using improved balanced random forests. Expert Systems with Applications , 36(3),5445-9. Kiran Dahiya; Surbhi Bhatia (2015) Customer churn analysis in telecom industry 4th International Conference on Reliability, Infocom Technologies and Optimization (ICRITO) (Trends and Future Directions) 10.1109/ICRITO.2015.7359318 https://towardsdatascience.com/customer-churn-in-telecom-segment-5e49356f39e5 https://www.kaggle.com/datasets/blastchar/telco-customer-churn https://www.analyticsvidhya.com/blog/2019/10/gaussian-mixture-models-clustering/

Acknowledgement We would want to convey our heartfelt gratitude to Prof. Prashant Dhamale & Dr. Leena Kulkarni, our mentors, for their invaluable advice and assistance in completing our project. They were there to assist us for every step of the way, and their motivation is what enabled us to accomplish our task effectively.

THANK YOU
Tags