Credit card fraud detection

11,910 views 15 slides Dec 24, 2020
Slide 1
Slide 1 of 15
Slide 1
1
Slide 2
2
Slide 3
3
Slide 4
4
Slide 5
5
Slide 6
6
Slide 7
7
Slide 8
8
Slide 9
9
Slide 10
10
Slide 11
11
Slide 12
12
Slide 13
13
Slide 14
14
Slide 15
15

About This Presentation

Credit card fraud detection using R


Slide Content

CREDIT CARD FRAUD DETECTION Submitted by : Vineeta And Shubham Chandel

What is credit card fraud? Credit card fraud  is when someone uses your  credit card  or  credit  account to make a purchase you didn't authorize. This activity can happen in different ways: If you lose your  credit card  or have it stolen, it can be used to make purchases or other transactions, either in person or online .

What credit card fraud detection does? The Credit Card Fraud Detection Problem includes modeling past credit card transactions with the knowledge of the ones that turned out to be fraud. This model is then used to identify whether a new transaction is fraudulent or not. Our aim here is to detect maximum of the fraudulent transactions while minimizing the incorrect fraud classifications

Some important terms: True Positive : The fraud cases that the model predicted as ‘fraud.’ False Positive : The non-fraud cases that the model predicted as ‘fraud.’ True Negative : The non-fraud cases that the model predicted as ‘non-fraud.’ False Negative : The fraud cases that the model predicted as ‘non-fraud.’ Accuracy : The measure of correct predictions made by the model – that is, the ratio of fraud transactions classified as fraud and non-fraud classified as non-fraud to the total transactions in the test data. Sensitivity : Sensitivity, or True Positive Rate, or Recall, is the ratio of correctly identified fraud cases to total fraud cases. Specificity : Specificity, or True Negative Rate, is the ratio of correctly identified non-fraud cases to total non-fraud cases. Precision : Precision is the ratio of correctly predicted fraud cases to total predicted fraud cases . Confusion matrix : A   confusion matrix  is a table that is often used to describe the performance of a classification model (or “classifier”) on a set of test data for which the true values are known. It allows the visualization of the performance of an algorithm.

ABOUT DATASET We gathered the data from Kaggle (https://www.kaggle.com/mlg-ulb/creditcardfraud ) It has 32 variables and 284807 observations Class is used as the factor variable

Lets take a look at the data set

V1 to V28 are the transactions Amount denotes the amount of transaction Class is the factor variable ( 0 denotes legit transaction and 1 denotes fraud detection) Time is the time of transaction

Functions used for balancing the dataset Random over sampling Random oversampling duplicates examples from the minority class in the training dataset and can result in overfitting for some models . Random under sampling Random undersampling deletes examples from the majority class and can result in losing information invaluable to a model. Hybrid Sampling It’s a combination of random under sampling and random over sampling Smote SMOTE synthesizes new examples for the minority class .

Lets take a look at the model and try understand it…

BEGINNING WITH THE UI

XGBOOST Method used in UI for fraud prediction XGBoost ( Ex treme  G radient  Boost ing) is an optimized distributed gradient boosting library.  It provides parallel computing, regularization, Enabled cross verification, Missing values, Flexibility, Availibility , Save and reload, tree pruning

How does XGBOOST work? XGBoost belongs to a family of boosting algorithms that convert weak learners into strong learners. A weak learner is one which is slightly better than random guessing. Boosting is a sequential process; i.e., trees are grown using the information from a previously grown tree one after the other. This process slowly learns from data and tries to improve its prediction in subsequent iterations.

Parameters of XGBOOST nrounds [default=100] It controls the maximum number of iterations/growth of trees eta[default=0.3][range: (0,1)] It controls the learning rate gamma[default=0][range: (0,Inf)] It controls regularization and prevents overfitting max_depth [default=6][range: (0,Inf)] It controls the depth of the tree min_child_weight [default=1][range:(0,Inf)] the leaf node has a minimum sum of instance weight lower than min_child_weight , the tree splitting stops. subsample[default=1][range: (0,1)] It controls the number of samples supplied to a tree. colsample_bytree [default=1][range: (0,1)] It control the number of variables supplied to a tree

LET’S LOOK AT UI

THANKYOU