Correlation implies Causation ? - PowerPoint PPT Presentation

1 / 92
About This Presentation
Title:

Correlation implies Causation ?

Description:

Correlation implies Causation ? Saad Saleh Team Lead, Wisnet Lab, SEECS saad.saleh_at_seecs.edu.pk Approach 3 Nonlinear causal discovery with additive noise models ... – PowerPoint PPT presentation

Number of Views:177
Avg rating:3.0/5.0
Slides: 93
Provided by: Saa133
Category:

less

Transcript and Presenter's Notes

Title: Correlation implies Causation ?


1
Correlation implies Causation ?
  • Saad Saleh
  • Team Lead, Wisnet Lab, SEECS
  • saad.saleh_at_seecs.edu.pk

2
Contents
  • Correlation
  • Causality
  • Examples
  • Causal Research
  • Causality Techniques
  • Granger Causality
  • Zhang Causality
  • Peter Causality
  • LINGAM Causality
  • Practical Applications
  • Conclusion

3
Correlation
  • Correlation means how closely related two sets of
    data are
  • In statistics, Dependence refers to any
    statistical relationship between two random
    variables or two sets of data. Correlation refers
    to any of a broad class of statistical
    relationships involving dependence.
  • wiki http//en.wikipedia.org/wiki/Corre
    lation_and_dependence
  • Relates to closeness, implying a relationship
    between objects, people, events, etc.
  • For example, people often believe there are
    more bizarre behaviors exhibited when the moon is
    full.

4
Causality
  • Causality (also referred to as causation) is the
    relation between an event (the cause) and a
    second event (the effect), where the second event
    is understood as a consequence of the first.
  • Random House Unabridged Dictionary

5
Correlation ExamplesDrivers Age vs Sign
Legibility distance
Drivers age is negatively correlated with Sign
Legibility Distance
6
Speed vs Fuel Consumption
7
Speed vs Fuel Consumption
Speed is correlated with the fuel consumption by
the vehicle
8
Incentive vs Percentage Returned
Incentive is positively correlated with the
Percentage Returned
9
  • Gun ownership vs Crime rate
  • Gun ownership and crime
  • r .71

Gun Ownership correlates positively with crime
rate
10
In a Gallup poll, surveyors asked, Do you
believe correlation implies causation?
  • 64 of Americans answered Yes .
  • 28 replied No.
  • The other 8 were undecided.

11
  • See 10 simple questions to check
  • the influence of correlation over causality

12
Does Ice cream consumption leads to drowning ??
Ice cream consumption is positivey correlated
with number of drowning people
13
Do Pirates Stop Global Warming ??
No. of pirates are positivey correlated with
Global Temperature
14
Does Shoe Size increases Reading Ability??
Shoe Size is positivey correlated with Reading
Ability
15
Do Firemen cause Large Fire Damage??
Firemen are positivey correlated with amount of
damage
16
Does Nationality effect SAT Score??
SAT scores are positivey correlated with
nationality
17
Is Cholestrol level affected by Facebook??
Cholesterol level is correlated with Facebook
invention
18
Are bad movies made because of low sale of
newspapers??
Shyamalin bad movies production is correlated
with Newspapers
19
Can Internet Explorer effect Murder Rate??
Use of Internet explorer is correlated with
murder Rate
20
Can Mexican lemon imports effect highway
deaths??
Mexican Lemon imports are correlated with Highway
deaths
21
The number of Nobel prizes won by a country
(adjusting for population) correlates well with
per capita chocolate consumption.
Are noble prizes won by chocolate consumption??
(New England Journal of Medicine)
22
RealityCorrelation vs. Causation
  • The correlation between workers education
    levels and wages is strongly positive
  • Does this mean education causes higher wages?
  • We dont know for sure !
  • Correlation tells us two variables are related
    BUT does not tell us why

23
RealityCorrelation vs. Causation
  • Possibility 1
  • Education improves skills and skilled workers
  • get better paying jobs
  • Education causes wages to ?
  • Possibility 2
  • Individuals are born with quality A which is
    relevant for success in education and on the job
  • Quality (NOT education) causes wages to ?

24
Correlation vs Causation
25
  • Without proper interpretation, causation should
    not be assumed, or even implied.

26
Causal Research
  • If the objective is to determine which variable
    might be causing a certain behavior (whether
    there is a cause and effect relationship between
    variables) causal research must be undertaken.

27
Causal discovery
What affects
  • Which actions will have beneficial effects?

28
Available data
  • A lot of observational data.
  • Correlation ? Causality!
  • Experiments are often needed, but
  • Costly
  • Unethical
  • Infeasible

29
Establishing Causality
  • To establish whether two variables are causally
    related, that is, whether a change in the
    independent variable X results in a change in the
    dependent variable Y, you must establish
  • Time order The cause must have occurred before
    the effect
  • Co-variation (statistical association) Changes
    in the value of the independent variable must be
    accompanied by changes in the value of the
    dependent variable
  • Rationale There must be a logical and compelling
    explanation for why these two variables are
    related
  • Non-spuriousness It must be established that the
    independent variable X, and only X, was the cause
    of changes in the dependent variable Y rival
    explanations must be ruled out.

30
Establishing Causality
  • Note that it is never possible to prove
    causality, but only to show to what degree it is
    probable.

31
Causation Possibilities
  • A causes B.
  • B causes A.
  • A and B both partly cause each other.
  • A and B are both caused by a third factor, C.
  • The observed correlation was due purely to
    chance.

32
  • Third or Missing Variable Problem
  • A relationship other than causal might exist
    between the two variables.
  • It is possible that there is some other variable
    or factor that is causing the outcome.

33
Causal graph example
34
A ? B
A -gt B
B
B Temperature
A
A log(Altitude)
35
Best fit A -gt B
A -gt B
A lt- B
36
Linear case?
A lt- B
A -gt B
  • Linear function
  • Gaussian input
  • Gaussian noise

37
Google Trends Google Correlate
38
(No Transcript)
39
(No Transcript)
40
(No Transcript)
41
Approach 1Granger Causality
Prof. Clive W.J. Granger, recipient of the 2003
Nobel Prize in Economics
42
History
  • In the early 1960's, I was considering a pair
    of related stochastic processes which were
    clearly inter-related and I wanted to know if
    this relationship could be broken down into a
    pair of one way relationships. It was suggested
    to me to look at a definition of causality
    proposed by a very famous mathematician, Norbert
    Weiner, so I adapted this definition (Wiener
    1956) into a practical form and discussed it.
  • Applied economists found the definition
    understandable and useable and applications of it
    started to appear. However, several writers
    stated that "of course, this is not real
    causality, it is only Granger causality.
  • Clive W. J. Granger 

43
Grangers Idea
  • If variables are cointegrated, the relationship
    among them can be expressed as Error Correction
    Mechanism (ECM).

44
Granger Causality
  • Suppose that we have three terms, Xt , Yt , and Wt
     , and that we first attempt to
    forecast Xt1 using past terms of Xt and Wt 
    (without Yt). 
  • We then try to forecast Xt1 using past terms
    of Xt , Wt ,and Yt (with Yt). 
  • If the second forecast is found to be more
    successful, according to standard cost functions,
    then the past of Y appears to contain information
    helping in forecasting Xt1 that is not in
    past Xt or Wt . 
  • In short, Yt would "Granger cause" Xt1 if
  • Yt occurs before Xt1  
  • it contains information useful in
    forecasting Xt1 that is not found in a group of
    other appropriate variables.

45
Vector Autoregression (VAR)
Mathematical Definition
Yt AYt-1 AYt-k et or
where p the number of variables be
considered in the system k the number of lags
be considered in the system Yt, Yt-1, Yt-k
the 1x p vector of variables A, and A'
the p x p matrices of coefficients to be
estimated et a 1 x p vector of innovations
that may be contemporaneously correlated but are
uncorrelated with their own lagged values and
uncorrelated with all of the right-hand side
variables.
46
Vector Autoregression (VAR)
Example
Consider a case in which the number of variables
n is 2, the number of lags p is 1 and the
constant term is suppressed. For concreteness,
let the two variables be called money, mt and
output, yt .
The structural equation will be
47
Vector Autoregression (VAR)
Example
  • Then, the reduced form is

48
Vector Autoregression (VAR)
Example
Among the statistics computed from VARs are
helpful in predicting Granger Causality.
  • Granger causality tests which have been
    interpreted as testing, for example, the validity
    of the monetarist proposition that autonomous
    variations in the money supply have been a cause
    of output fluctuations.

49
Vector Autoregression (VAR)
Granger Causality
  • In a regression analysis, we deal with the
    dependence of one variable on other variables,
    but it does not necessarily imply causation.
  • In our GDP and M example, the often asked
    question is whether GDP ? M or M? GDP. Since we
    have two variables, we are dealing with bilateral
    causality.
  • Given the previous GDP and M VAR equations

50
Vector Autoregression (VAR)
Granger Causality
  • We can distinguish four cases
  • Unidirectional causality from M to GDP
  • Unidirectional causality from GDP to M
  • Feedback or bilateral causality
  • Independence
  • Assumptions
  • Stationary variables for GDP and M
  • Number of lag terms
  • Error terms are uncorrelated if it is,
    appropriate transformation is necessary

51
Vector Autoregression (VAR)
Granger Causality Estimation (t-test)
A variable, say mt is said to fail to Granger
cause another variable, say yt, relative to an
information set consisting of past ms and ys
if E yt yt-1, mt-1, yt-2, mt-2, E yt
yt-1, yt-2, .
mt does not Granger cause yt relative to an
information set consisting of past ms and ys
iff ?21 0.
yt does not Granger cause mt relative to an
information set consisting of past ms and ys
iff ?12 0.
  • In a bivariate case, as in our example, a t-test
    can be used to test the null hypothesis that one
    variable does not Granger cause another variable.
    In higher order systems, an F-test is used.

52
Vector Autoregression (VAR)
Granger Causality Estimation (F-test)
1. Regress current GDP on all lagged GDP terms
but do not include the lagged M variable
(restricted regression). From this, obtain the
restricted residual sum of squares, RSSR.
2. Run the regression including the lagged M
terms (unrestricted regression). Also get the
residual sum of squares, RSSUR.
3. The null hypothesis is Ho ?i 0, that is,
the lagged M terms do not belong in the
regression.
5. If the computed F gt critical F value at a
chosen level of significance, we reject the null,
in which case the lagged m belong in the
regression. This is another way of saying that m
causes y.
53
Criticisms of Causality Tests
  • Granger causality test, much used in VAR
    modelling, however do not explain some aspects of
    the VAR
  • It does not give the sign of the effect, we do
    not know if it is positive or negative
  • It does not show how long the effect lasts for.
  • It does not provide evidence of whether this
    effect is direct or indirect.

54
(No Transcript)
55
Max Planck at centre, 1931
Prof. Dr. Bernhard Schölkopf 
Kun Zhang
56
Approach 2
  • Distinguishing Causes from Effects using
  • Nonlinear Acyclic Causal Models

Kun Zhang, Aapo Hyvarinen
57
Background
  • Model-based causal discovery assumes a generative
    model to explain the data generating process.
  • If the assumed model is close to the true one,
    such methods could not only detect the causal
    relations, but also discover the form in which
    each variable is influenced by others.
  • For example,
  • Granger causality assumes that effects must
    follow causes and that the causal effects are
    linear (Granger,1980).
  • If the data are generated by a linear acyclic
    causal model and at most one of the disturbances
    is Gaussian, independent component analysis (ICA)
    (Hyvarinen et al., 2001)can be exploited to
    discover the causal relations in a convenient way
    (Shimizu et al., 2006).

58
Shortcomings
  • Previous models were too restrictive for
    real-life problems.
  • If the assumed model is wrong, model-based
    causal discovery may give misleading results.

59
Zhang Approach
  • In a large class of real-life problems, the
    following three effects usually exist.
  • 1. The effect of the causes is usually
    nonlinear.
  • 2. The final effect received by the target
    variable from all its causes contains some
    noise which is independent from the causes.
  • 3. Sensors or measurements may introduce
    nonlinear distortions into the observed
    values of the variables.
  • Assumption Involved nonlinearities are
    invertible.

60
  • Proposed Solution
  • Each observed variable is non-linear function
    of its parents with additive noise, followed by
    non-linear distortion
  • If all non-linearities are invertible,
    conditions are given for causal relationship
  • Two-step method Constrained nonlinear ICA
    followed by statistical independence tests, to
    distinguish the cause from the effect in the
    two-variable case

61
Noise Effect in transmission from pai to xi
  • Proposed Causal Model

Xi fi,2 fi,1 (pai) ei
Non-linear transformation (Not necessarily
Invertible)
Non-linear Distortion (Continuous and Invertible)
First stage a nonlinear transformation of its
parents pai, denoted by fi,1(pai), plus some
noise (or disturbance) ei (which is independent
from pai). Second stage a nonlinear distortion
fi,2 is applied to the output of the first stage
to produce xi.
62
Zhang Approach
  • Suppose the causal relation under examination is
    x1 ? x2. If this causal relation holds, there
    exist nonlinear functions f2,2 and f2,1 such that
  • e2 f-1 2,2 (x2)-f2,1(x1) is independent from
    x1.
  • y1 x1, y2 g2(x2) - g1(x1).
  • Use Multi-Layer perceptrons (MLPs) to model the
    nonlinearities g1 and g2.
  • Parameters in g1 and g2 are learned by making y1
    and y2 as independent as possible.

63
Multilayer Perceptron (MLP)
  • A multilayer perceptron (MLP) is
    a feedforward artificial neural network model
    that maps sets of input data onto a set of
    appropriate outputs.

64
Zhang Analysis
  • y1 and y2 produced by the first step are the
    assumed cause and the estimated corresponding
    disturbance, respectively.
  • In the second step, one needs to verify if they
    are independent.
  • If y1 and y2 are independent, it implies x1
    causes x2, and that g1 and g2 provide an estimate
    of f2,1 and f-12,2 , respectively.

65
Success !!
  • Zhang approach solved the problem
    CauseEffectPairs in the Pot-luck challenge, and
    successfully identified causes from effects
  • Earned Reward 200

66
Approach 3
  • Nonlinear causal discovery
  • with additive noise models

Patrik O. Hoyer, Dominik Janzing, Joris Mooij,
Jonas Peters, Bernhard Scholkopf
67
  • Claim
  • Non-linearities are a blessing rather than a
    curse -- Hoyer

Idea In reality, many causal relationships are
non-linear. How about generalizing Basic linear
framework to non-linear models??
68
Hoyer Approach
  • When causal relationships are nonlinear it
    typically helps break the symmetry between the
    observed variables and allows the identification
    of causal directions.
  • As Friedman and Nachman have pointed out,
    non-invertible functional relationships between
    the observed variables can provide clues to the
    generating causal model.
  • We show that the phenomenon is much more
    general for nonlinear models with additive noise
    almost any nonlinearities (invertible or not)
    will typically yield identifiable models.

69
Hoyer Approach
  • Model
  • xi fi ( xpa(i) ) ni
  • where
  • fi is an arbitrary function (possibly different
    for each i),
  • xpa(i) is a vector containing the elements xj
    such that there is an edge from j to i in the DAG
    G,
  • the noise variables ni may have arbitrary
    probability densities pni (ni),

70
Hoyer Model Estimation
  • Test whether x and y are statistically
    independent.
  • If not Test whether a model
  • y f(x)n
  • is consistent with the data, simply by doing a
    nonlinear regression of y on x (to get an
    estimate f of f), calculating the corresponding
    residuals n y - f(x),
  • and testing whether n is independent of x. If
    so, accept the model
  • y f(x) n
  • if not, reject it.
  • Similarly test whether the reverse model x
    g(y) n fits the data

71
Hoyer Test Results
  • the Old Faithful dataset
  • Obtains a p-value of 0.5 for the (forward) model
    current duration causes next interval length
    and
  • a p-value of 4410-9 for the (backward) model
    next interval length causes current duration

72
Hoyer Test Results
  • the Abalone dataset from the UCI ML repository
  • The correct model age causes length leads to a
    p-value of 0.19,
  • The reverse model length causes age comes with
    p lt 10-15

73
Hoyer Test Results
  • Temperature Alitude Statistics
  • The correct model altitude causes temperature
    leads to p 0017,
  • Temperature causes altitude can clearly be
    rejected (p 810-15)

74
Approach 4
  • A Linear Non-Gaussian Acyclic
  • Model for Causal Discovery (LINGAM)

Shohei Shimizu, Patrik O. Hoyer, Aapo
Hyvarinen, Antti Kerminen
75
Approach Use of Independent Component Analysis
(ICA)----- called Linear Non-Gaussian Acyclic
Model (LINGAM ) Analysis when working with
continuous-valued data, a significant
advantage can be achieved by departing from the
Gaussianity assumption
  • Assumptions
  • Data Generating Process is Linear
  • No unobserved confounders
  • Disturbance variables have non-gaussian
    distribution of
  • non-zero variances

76
LINGAM Model
  • Linear Non-Gaussina Acyclic Model
  • Data Generating process

77
LINGAM Idea
  • Key to Solution
  • Observed variables are linear functions of the
    disturbance variables, and the disturbance
    variables are mutually independent and
    non-Gaussian.
  • x Bxe,
  • x Ae,
  • where A (I-B)-1.

78
LINGAM Algorithm
  • LINGAM can be briefly summarized as follows
  • First, use a standard ICA algorithm (e.g.,
    FastICA algorithm) to obtain an estimate of the
    mixing matrix A (or equivalently of W),
  • subsequently permute it and normalize it
    appropriately before using it to compute B
    containing the sought connection strengths bij.3

79
LINGAM Algorithm
  • Given mn data matrix X (mltltn) where each
    column contains one sample vector X.
  • (a) Subtract mean from each row of X
  • (b) Apply ICA to get X AS, where S contains
    independent components in its rows
  • (c) Note W A-1
  • Find W1 where W1 contains NO zeros on main
    diagonal and is obtained by permutting rows of W.
  • (3) Divide each row of W1 by corresponding
    diagonal element to get W1 with all 1s on main
    diagonal

80
LINGAM Algorithm
(4) Find B such that B I W (5) To find
causal order, find permutation matrix P of B
which yields B PBPT B (close to
strictly lower triangular) can be measured using
summationiltj (Bij2)
81
Practical Experiments
  • Project
  • Detecting Covert Links in Instant Messaging (IM)
    Networks Using Flow Level Log Data

82
Introduction
  • Users sending Instant Messages (IM) to relay
    server
  • Relay server forwards messages to corresponding
    users
  • All packets contain source and destination IP
    addresses of user and server IP addresses only

Scenario 1
83
Introduction
  • Users may be communicating behind a proxy server
  • Users behind proxy servers are visible in
    scenario2.

Scenario 2
84
Data Set
  • Yahoo! Messenger IM network.
  • Data Set Details
  • Area New York City area.
  • Time 12am to 12am
  • Data Set Files
  • Input Data File
  • User-to-server traffic traces.
  • Ground Truth Data File
  • Record of the actual user-to-user connections.

85
Data Set Statistics
Time Duration Users Messages Sessions
8-810a 10 mins 3,420 15,370 1,968
8-820a 20 mins 5,405 33,192 3,265
8-830a 30 mins 7,438 53,649 4,661
8-840a 40 mins 9,513 75,810 6,179
8-850a 50 mins 11,684 99,721 7,669
8-9a 60 mins 13,953 126,694 9,264
86
Granger Causality
F-test statistics for Granger Causalty test
87
Zhang Approach Results
Zhang results for talking and non-talking pairs
for IM networks in Yahoo!
88
Just for Knowledge
  • Classifier Tool
  • WEKA (Waikato Environment for Knowledge
    Analysis) -gt popular suite of machine learning
    software written in Java, developed at the
    University of Waikato, New Zealand

WEKA Bird Found in New Zealand, Vulnerable
Species.
89
WEKA
90
Conclusion
91
Given A causes of BTo Prove Is it must that A
and B are correlated??Result YES or NO why??
Can you show??
92
(No Transcript)
Write a Comment
User Comments (0)
About PowerShow.com