A Combinatorial Fusion Method for Feature Mining

About This Presentation

Title:

Description:

Number of Views:46

Avg rating:3.0/5.0

Slides: 23

Provided by: Gar1128

Learn more at: https://storm.cis.fordham.edu

Category:

more less

Transcript and Presenter's Notes

Title: A Combinatorial Fusion Method for Feature Mining

1
A Combinatorial Fusion Method for Feature Mining

2
Introduction

Feature construction/engineering often a critical
step in the data mining process
Can be very time-consuming and may require a lot
of manual effort
Our approach is to use a combinatorial method to
automatically construct new features
We refer to this as feature fusion
Geared toward helping to predict rare classes
For now it is restricted to numerical features,
but can be extended to other features

3
How does this relate to MMIS?

4
How can we view this work as MMIS?

Think of each feature as piece of information
Our fusion approach integrates these pieces
Fusion itself is a proper topic for MMIS since it
can also be used with multiple info sources
The fusion method we employ does not really care
if the information (i.e., features) are from a
single source
As complexity of constructed features increases,
each can be viewed as a classifier
Each fused feature is an information source
This view is bolstered by other work on data
fusion that using ensembles to combine each fused
feature

5
Description of the Method

A data set is a collection of records where each
feature has a score
We assume numerical features
We then replace scores by ranks
Ordering of ranks determined by whether larger or
small scores better predict class
Compute performance of each feature
Compute performance of feature combinations
Decide which combinations to evaluate/use

6
Step 1 A data set
7
Step 2 Scores replaced by Ranks
8
Step 3 Compute Feature Performance

Performance measures how well feature predicts
minority class
We sort rows by feature rank and measure
performance on top n, where n belong to
minority class
In this case we evaluate on top 3 rows. Since 2
of 3 are minority (class1), performance .66

9
Step 3 continued
10
Step 4 Compute Performance of Feature
Combinations

11
Step 5 What Combinations to Use?

12
Combinatorial Fusion Table
13
Combinatorial Fusion Algorithm

14
Example Run of Algorithm
15
Description of Experiments

16
Results
Summary Results over all 10 Data Sets
Results over 4 Most Skewed Data Sets (lt 10
Minority)
17
Discussion of Results

No one of the 3 fusion schemes is clearly best
The methods seem to help, but the biggest
improvement is clearly with the DT method
May be explained by traditional DT methods having
limited expressive power
They can only consider 1 feature at a time
Can never perfectly learn simple concepts like
F1F2 gt 10, but can with feature-fusion
Bigger improvement for highly skewed data sets
Identifying rare cases is difficult and may
require looking at many features in parallel

18
Future Work

19
Conclusion

Showed how a method from information fusion can
be applied to feature construction
Results encouraging but more study needed
Extending the method should lead to further
improvements

20
Questions?
21
Detailed Results Accuracy
22
Detailed Results AUC

Write a Comment

User Comments (0)