Feature Selection and Causal discovery

About This Presentation

Title:

Feature Selection and Causal discovery

Description:

Feature selection may benefit from introducing a notion of causality: ... Univariate feature selection often yields better accuracy results than ... – PowerPoint PPT presentation

Number of Views:111

Avg rating:3.0/5.0

Slides: 35

Provided by: Isabell47

Category:

more less

Transcript and Presenter's Notes

Title: Feature Selection and Causal discovery

1
Feature Selection and Causal discovery

Isabelle Guyon, Clopinet
André Elisseeff, IBM Zürich
Constantin Aliferis, Vanderbilt University

2
Road Map
Feature selection

What is feature selection?
Why is it hard?
What works best in practice?
How to make progress using causality?
Can causal discovery benefit from feature
selection?

Causal discovery
3
Introduction
4
Causal discovery

What affects your health?
What affects the economy?
What affects climate changes?
and
Which actions will have beneficial effects?

5
Feature Selection
Y
X
Remove features Xi to improve (or least degrade)
prediction of Y.
6
Uncovering Dependencies
?
Factors of variability
Actual
Artifactual
Known
Unknown
Unobservable
Observable
Controllable
Uncontrollable
7
Predictions and Actions
Y
X
See e.g. Judea Pearl, Causality, 2000
8
Predictive power of causes and effects
Smoking
Smoking is a better predictor of lung disease
than coughing.
Lung disease
Coughing
9
Causal feature selection

Abandon the usual motto of predictive modeling
we dont care about causality.
Feature selection may benefit from introducing a
notion of causality
To be able to predict the consequence of given
actions.
To add robustness to the predictions if the input
distribution changes.
To get more compact and robust feature sets.

10
FS-enabled causal discovery

Isnt causal discovery solved with experiments?
No! Randomized Controlled Trials (RCT) may be
Unethical (e.g. a RCT about the effects of
smoking)
Costly and time consuming
Impossible (e.g. astronomy)
Observational data may be available to help plan
future experiments ? Causal discovery may benefit
from feature selection.

11
Feature selection basics
12
Individual Feature Irrelevance

P(Xi, Y) P(Xi) P(Y)
P(Xi Y) P(Xi)

density
xi
13
Individual Feature Relevance
m-
m
-1
s-
s
1
14
Univariate selection may fail
Guyon-Elisseeff, JMLR 2004 Springer 2006
15
Multivariate FS is complex
Kohavi-John, 1997
n features, 2n possible feature subsets!
16
FS strategies

Wrappers
Use the target risk functional to evaluate
feature subsets.
Train one learning machine for each feature
subset investigated.
Filters
Use another evaluation function than the target
risk functional.
Often no learning machine is involved in the
feature selection process.

17
Reducing complexity

For wrappers
Use forward or backward selection O(n2) steps.
Mix forward and backward search, e.g. floating
search.
For filters
Use a cheap evaluation function (no learning
machine).
Make independence assumptions n evaluations.
Embedded methods
Do not retrain the LM at every step e.g. RFE, n
steps.
Search FS space and LM parameter space
simultaneously e.g. 1-norm/Lasso approaches.

18
In practice

Univariate feature selection often yields better
accuracy results than multivariate feature
selection.
NO feature selection at all gives sometimes the
best accuracy results, even in the presence of
known distracters.
Multivariate methods usually claim only better
parsimony.
How can we make multivariate FS work better?

NIPS 2003 and WCCI 2006 challenges
http//clopinet.com/challenges
19
Definition of irrelevance

We want to determine whether one variable Xi is
relevant to the target Y.
Surely irrelevant feature
P(Xi, Y S\i) P(Xi S\i)P(Y S\i)
for all S\i ? X\i
for all assignment of values to S\i
Are all non-irrelevant features relevant?

20
Causality enters the picture
21
Causal Bayesian networks

Bayesian network
Graph with random variables X1, X2, Xn as nodes.
Dependencies represented by edges.
Allow us to compute P(X1, X2, Xn) as
Pi P( Xi Parents(Xi) ).
Edge directions have no meaning.
Causal Bayesian network egde directions indicate
causality.

22
Markov blanket
Smoking
Lung disease
Allergy
Coughing
A node is conditionally independent of all other
nodes given its Markov blanket.
23
Relevance revisited

In terms of Bayesian networks in faithful
distributions
Strongly relevant features members of the
Markov Blanket
Weakly relevant features variables with a path
to the Markov Blanket but not in the Markov
Blanket
Irrelevant features variables with no path to
the Markov Blanket

Koller-Sahami, 1996 Kohavi-John, 1997 Aliferis
et al., 2002.
24
Is X2 relevant?
1
P(X1, X2 , Y) P(X1 X2 , Y) P(X2) P(Y)
25
Are X1 and X2relevant?
2
Y
disease
normal
P(X1, X2 , Y) P(X1 X2 , Y) P(X2) P(Y)
26
XOR and unfaithfulness
Y X1 ? X2
X2
X1
Example X1 and X2 Two fair coins tossed at
random Y Win if both coins end on the same side
Y
27
Adding a variable
3
28
X1 Y X2
3
life expectancy
Is chocolate good for your health?
chocolate intake
29
Really?
3
life expectancy
Is chocolate good for your health?
chocolate intake
30
Same independence relationsDifferent causal
relations
P(X1, X2 , Y) P(X1 X2) P(Y X2) P(X2)
P(X1, X2 , Y) P(Y X2) P(X2 X1) P(X1)
P(X1, X2 , Y) P(X1 X2) P(X2 Y) P(Y)
31
Is X1 relevant?
3
32
Non-causal features may be predictive yet not
relevant
1
2
3
33
Causal feature discovery
P(X,Y) P(XY) P(Y)
P(X,Y) P(YX) P(X)
Sun-Janzing-Schoelkopf, 2005
34
Conclusion

Feature selection focuses on uncovering subsets
of variables X1, X2, predictive of the target
Y.
Taking a closer look at the type of dependencies
may help refining the notion of variable
relevance.
Uncovering causal relationships may yield better
feature selection, robust under distribution
changes.
These causal features may be better targets of
action.

Write a Comment

User Comments (0)

About PowerShow.com

Feature Selection and Causal discovery - PowerPoint PPT Presentation

Feature Selection and Causal discovery

Feature selection may benefit from introducing a notion of causality: ... Univariate feature selection often yields better accuracy results than ... – PowerPoint PPT presentation