Title: Fraud Detection Experiments
1Fraud Detection Experiments
- Chase Credit Card
- 500,000 records spanning one year
- Evenly distributed
- 20 fraud, 80 non fraud
- First Union Credit Card
- 500,000 records spanning one year
- Unevenly distributed
- 15 fraud, 85 non fraud
2Intra-bank experiments
- Classifier Selection Algorithm Coverage/TP-FP
- Let V be the validation set
- Until no other examples in V be covered
- select the classifier with highest TP-FP rate on
V - Remove covered examples from V
- Setting
- 12 subsets
- 5 algorithms (Bayes, C4.5, Cart, ID3, Ripper)
- 6-fold cross validation
3TP-FP vs number of classifiers
- Input base classifiersChase
- Test data setChase
- Best meta classifier Naïve Bayes with 25-32 base
classifiers.
4TP-FP vs number of classifiers
- Input base classifiers First Union
- Test data setFirst Union
- Best meta classifier Naïve Bayes with 10-17
base classifiers.
5Accuracy vs number of classifiers
- Input base classifiers Chase
- Test data setChase
- Best meta classifier Ripper with 50 base
classifiers. Comparable performance is attained
with 25-30 classifiers
6Accuracy vs number of classifiers
- Input base classifiers First Union
- Test data setFirst Union
- Best meta classifier Ripper with 13 base
classifiers.
7Intra-bank experiments
- Coverage, cost model combined metric algorithm
- Let V be the validation set
- Until no examples can be covered from V
- select classifier Cj that achieves the highest
savings on V - Remove covered examples from V
8Savings vs number of classifiers
- Input base classifiers Chase
- Test data set Chase
- Best Meta ClassifiersSingle naïve bayesian base
classifier ( 820K)
9Savings of base classifiers
- Input base classifiers Chase
- Test data set Chase
- Conclusion Learning algorithms focus on binary
classification problem. If base classifiers
fail to detect expensive fraud, meta learning
cannot improve savings.
10Savings vs number of classifiers
- Input base classifiers First Union
- Test data set First Union
- Best Meta ClassifiersNaïve bayes with 22 base
classifiers ( 945K)
11Savings of base classifiers
- Input base classifiers First Union
- Test data set First Union
- ConclusionThe majority of base classifiers are
able to detect transactions that both fraudulent
and expensive. Meta learning saves an additional
100K.
12Different distributions experiments
- Number of Datasites 6
- Training sets 50-50 Fraud/Non-Fraud
- Testing sets 20-80 Fraud/Non-Fraud
- Base classifiers ID3, CART
- Meta classifiers ID3, CART, Bayes, Ripper
- Base classifiers 81 TP, 29 FP
- Meta-classifiers 86 TP, 25 FP
13Inter-bank experiments
- Chase includes 2 attributes not present in First
Union data - Add two fictitious fields
- Classifier agents support unknown values
- Chase and First Union define an attribute with
different semantics - Project Chase values on First Union semantics
14Inter-bank experiments
- Input base classifiersChase
- Test data setChase and First Union
- TaskCompare TP and FP rates of a classifier on
different test sets. - ConclusionChase classifiers CAN be applied to
First Union data, but not without penalty.
15Inter-bank experiments
- Input base classifiersFirst Union
- Test data setFirst Union and Chase
- TaskCompare TP and FP rates of a classifier on
different test sets. - ConclusionFirst Union classifiers CAN be
applied to Chase data, but not without penalty.
16TP-FP vs number of classifiers
- Input base classifiersFirst Union and Chase
- Test data setChase
- Result
- Ripper, CART comparable
- Naïve bayes slightly superior
- C4.5, ID3 inferior
17Accuracy vs number of classifiers
- Input base classifiersFirst Union and Chase
- Test data setChase
- Result
- CART, Ripper comparable
- Naïve Bayes, C4.5, ID3 inferior
18TP-FP vs number of classifiers
- Input base classifiersFirst Union and Chase
- Test data setFirst Union
- Result
- Naïve Bayes, C4.5, CART comparable only when
using all classifiers - Ripper superior only when using all classifiers
- ID3 inferior
19Accuracy vs number of classifiers
- Input base classifiersFirst Union and Chase
- Test data setFirst Union
- Result
- Naïve Bayes, C4.5, CART, Ripper comparable only
when using all classifiers - ID3 inferior
20CHASE max fraud loss 1,470K
Overhead 75
21FU max fraud loss 1,085K
Overhead 75
22Aggregate Cost Model
- X overhead to challenge a fraud
23Experiment Set-up
- Training data set 10/1995 - 7/1996
- Testing data set 9/1996
- Each data point is the average of the 10
classifiers (Oct. 1995 to July 1996) - Training set size 6,400 transactions (to allow
90 of frauds)
24Average Aggregate Cost(C4.5)
25Accuracy (C4.5)
26Average Aggregate Cost(CART)
27Accuracy (CART)
28Average Aggregate Cost(RIPPER)
29Accuracy (RIPPER)
30Average Aggregate Cost(BAYES)
31Accuracy (BAYES)
32Amount Saved Overhead 100
- Fraud in training data 30.00
- Fraud in training data 23.14
- Maximum saving 1337K
- Losses/transaction if no detection 40.81
33Do patterns change over time?
- Entire Chase credit card data set
- Original fraud rate (20 - 80)
- Due to billing cycle and fraud investigation
delays, training data are 2 months older than
testing data - Two experiments were conducted with different
training data sets - Test data set 9/1996 (last month)
34Training data sets
- Back in time experiment
- July 1996
- June July 1996
- ...
- October 1995 ... July 1996
- Forward in time experiment
- October 1995
- October 1995 November 1995
-
- October 1995 ... July 1996
35Patterns dont change Accuracy
36Patterns dont change Savings
37Divide and Conquer Conflict Resolving
- Conflicts Base level data with different class
labels yet same predicted classifications.
38Class-combiner meta-level training data
39Prevalence of Conflicts in Meta-level Training
Data
- Note True Label ID3CARTRIPPER
- 1 fraud, 0 non-fraud
40Divide and Conquer Conflict Resolving (contd)
- We divide the training sets into subsets of
training data according to each conflict pattern - For each subset, recursively apply divide-conquer
until stopping criteria is met - We use a rote table to learn meta-level training
data
41Experiment Set-up
- A full years Chase credit card data. Natural
fraud percentage (20). Fields not available at
authorization were removed - Each month from Oct. 1995 to July 1996 was used
as a training set - Testing set was chosen from month that is 2
months older. In real world, it takes 2 month for
billing and fraud investigation - Result was averages of 10 runs
42Results
- Without Conflict Resolving Technique but Uses
Rote Table to learn Meta-level Data - Overall Accuracy 88.8
- True Positive 59.8
- False Positive 3.81
- With Conflict Resolving Technique
- Overall Accuracy 89.1 (increase of 0.3)
- True Positive 61.2 (increase of 1.4 )
- False Positive 3.88 (increase of 0.07)
43 Achievable Maximum Accuracy
- Using nearest neighbor approach to estimate a
loose upper bound of the maximum accuracy we can
achieve - The algorithm calculates the percentage of noise
in training data - Approximately 91.0...so the we are 1.9 close
to the maximum accuracy
44Accuracy Result