Title: A Combinatorial Fusion Method for Feature Mining
1A Combinatorial Fusion Method for Feature Mining
- Ye Tian, Gary Weiss, D. Frank Hsu, Qiang Ma
- Fordham University
- Presented by Gary Weiss
2Introduction
- Feature construction/engineering often a critical
step in the data mining process - Can be very time-consuming and may require a lot
of manual effort - Our approach is to use a combinatorial method to
automatically construct new features - We refer to this as feature fusion
- Geared toward helping to predict rare classes
- For now it is restricted to numerical features,
but can be extended to other features
3How does this relate to MMIS?
- One MMIS category is local pattern analysis
- How to efficiently identify quality knowledge
from a single data source - Listed data preparation and selection as
subtopics and also mentioned fusion - We acknowledge that this work probably is not
what most people think of as MMIS
4How can we view this work as MMIS?
- Think of each feature as piece of information
- Our fusion approach integrates these pieces
- Fusion itself is a proper topic for MMIS since it
can also be used with multiple info sources - The fusion method we employ does not really care
if the information (i.e., features) are from a
single source - As complexity of constructed features increases,
each can be viewed as a classifier - Each fused feature is an information source
- This view is bolstered by other work on data
fusion that using ensembles to combine each fused
feature
5Description of the Method
- A data set is a collection of records where each
feature has a score - We assume numerical features
- We then replace scores by ranks
- Ordering of ranks determined by whether larger or
small scores better predict class - Compute performance of each feature
- Compute performance of feature combinations
- Decide which combinations to evaluate/use
6Step 1 A data set
7Step 2 Scores replaced by Ranks
8Step 3 Compute Feature Performance
- Performance measures how well feature predicts
minority class - We sort rows by feature rank and measure
performance on top n, where n belong to
minority class - In this case we evaluate on top 3 rows. Since 2
of 3 are minority (class1), performance .66
9Step 3 continued
10Step 4 Compute Performance of Feature
Combinations
- Let F6 be fused F1F2F3F4F5
- Rank combination function is average of ranks
- Compute rank of F6 for each record
- Compute performance of F6 as in step 3
11Step 5 What Combinations to Use?
- Given n features there are 2n 1 possible
combinations - C(n,1) C(n,2) C(n.n)
- This fully exhaustive fusion strategy is
practical for many values of n - We try other strategies in case not feasible
- k-exhaustive strategy selects k best features and
tries all combinations - k-fusion strategy uses all n features but fuses
at most k features at once
12Combinatorial Fusion Table
13Combinatorial Fusion Algorithm
- Combinatorial strategy generates features
- Performance metric determines which are best
- Used to determine which k features for k-fusion
- Also used to determine order of features to add
- We add a feature if it leads to a statistically
significant improvement (p .10) - As measured on validation data
- This limits the number of features
- But requires a lot of computation
14Example Run of Algorithm
15Description of Experiments
- We use Wekas DT, 1-NN, and Naïve Bayes methods
- Analyze performance on 10 data sets
- With and without fused features
- Focus on AUC as the main metric
- More appropriate than accuracy especially with
skewed data - Use 3 combinatorial fusion strategies
- 2-fusion, 3-fusion, and 6-exhaustive
16Results
Summary Results over all 10 Data Sets
Results over 4 Most Skewed Data Sets (lt 10
Minority)
17Discussion of Results
- No one of the 3 fusion schemes is clearly best
- The methods seem to help, but the biggest
improvement is clearly with the DT method - May be explained by traditional DT methods having
limited expressive power - They can only consider 1 feature at a time
- Can never perfectly learn simple concepts like
F1F2 gt 10, but can with feature-fusion - Bigger improvement for highly skewed data sets
- Identifying rare cases is difficult and may
require looking at many features in parallel
18Future Work
- More comprehensive experiments
- More data sets, more skewed data sets, more
combinatorial fusion strategies - Use of heuristics to more intelligently choose
fused features - Performance measure now used only to order
- Use of diversity measures
- Avoid building classifier to determine which
fused features to add - Handle non-numerical features
19Conclusion
- Showed how a method from information fusion can
be applied to feature construction - Results encouraging but more study needed
- Extending the method should lead to further
improvements
20Questions?
21Detailed Results Accuracy
22Detailed Results AUC