Title: Correlation implies Causation ?
1Correlation implies Causation ?
- Saad Saleh
- Team Lead, Wisnet Lab, SEECS
- saad.saleh_at_seecs.edu.pk
2Contents
- Correlation
- Causality
- Examples
- Causal Research
- Causality Techniques
- Granger Causality
- Zhang Causality
- Peter Causality
- LINGAM Causality
- Practical Applications
- Conclusion
3Correlation
- Correlation means how closely related two sets of
data are - In statistics, Dependence refers to any
statistical relationship between two random
variables or two sets of data. Correlation refers
to any of a broad class of statistical
relationships involving dependence. - wiki http//en.wikipedia.org/wiki/Corre
lation_and_dependence - Relates to closeness, implying a relationship
between objects, people, events, etc. - For example, people often believe there are
more bizarre behaviors exhibited when the moon is
full.
4Causality
- Causality (also referred to as causation) is the
relation between an event (the cause) and a
second event (the effect), where the second event
is understood as a consequence of the first. - Random House Unabridged Dictionary
5Correlation ExamplesDrivers Age vs Sign
Legibility distance
Drivers age is negatively correlated with Sign
Legibility Distance
6Speed vs Fuel Consumption
7Speed vs Fuel Consumption
Speed is correlated with the fuel consumption by
the vehicle
8Incentive vs Percentage Returned
Incentive is positively correlated with the
Percentage Returned
9- Gun ownership vs Crime rate
- Gun ownership and crime
- r .71
Gun Ownership correlates positively with crime
rate
10In a Gallup poll, surveyors asked, Do you
believe correlation implies causation?
- 64 of Americans answered Yes .
- 28 replied No.
- The other 8 were undecided.
11- See 10 simple questions to check
- the influence of correlation over causality
12Does Ice cream consumption leads to drowning ??
Ice cream consumption is positivey correlated
with number of drowning people
13Do Pirates Stop Global Warming ??
No. of pirates are positivey correlated with
Global Temperature
14Does Shoe Size increases Reading Ability??
Shoe Size is positivey correlated with Reading
Ability
15Do Firemen cause Large Fire Damage??
Firemen are positivey correlated with amount of
damage
16Does Nationality effect SAT Score??
SAT scores are positivey correlated with
nationality
17Is Cholestrol level affected by Facebook??
Cholesterol level is correlated with Facebook
invention
18Are bad movies made because of low sale of
newspapers??
Shyamalin bad movies production is correlated
with Newspapers
19Can Internet Explorer effect Murder Rate??
Use of Internet explorer is correlated with
murder Rate
20Can Mexican lemon imports effect highway
deaths??
Mexican Lemon imports are correlated with Highway
deaths
21The number of Nobel prizes won by a country
(adjusting for population) correlates well with
per capita chocolate consumption.
Are noble prizes won by chocolate consumption??
(New England Journal of Medicine)
22RealityCorrelation vs. Causation
- The correlation between workers education
levels and wages is strongly positive - Does this mean education causes higher wages?
- We dont know for sure !
- Correlation tells us two variables are related
BUT does not tell us why
23RealityCorrelation vs. Causation
- Possibility 1
- Education improves skills and skilled workers
- get better paying jobs
- Education causes wages to ?
- Possibility 2
- Individuals are born with quality A which is
relevant for success in education and on the job - Quality (NOT education) causes wages to ?
24Correlation vs Causation
25- Without proper interpretation, causation should
not be assumed, or even implied.
26Causal Research
- If the objective is to determine which variable
might be causing a certain behavior (whether
there is a cause and effect relationship between
variables) causal research must be undertaken.
27 Causal discovery
What affects
- Which actions will have beneficial effects?
28Available data
- A lot of observational data.
- Correlation ? Causality!
- Experiments are often needed, but
- Costly
- Unethical
- Infeasible
29Establishing Causality
- To establish whether two variables are causally
related, that is, whether a change in the
independent variable X results in a change in the
dependent variable Y, you must establish - Time order The cause must have occurred before
the effect - Co-variation (statistical association) Changes
in the value of the independent variable must be
accompanied by changes in the value of the
dependent variable - Rationale There must be a logical and compelling
explanation for why these two variables are
related - Non-spuriousness It must be established that the
independent variable X, and only X, was the cause
of changes in the dependent variable Y rival
explanations must be ruled out.
30Establishing Causality
- Note that it is never possible to prove
causality, but only to show to what degree it is
probable.
31 Causation Possibilities
- A causes B.
- B causes A.
- A and B both partly cause each other.
- A and B are both caused by a third factor, C.
- The observed correlation was due purely to
chance.
32- Third or Missing Variable Problem
- A relationship other than causal might exist
between the two variables. - It is possible that there is some other variable
or factor that is causing the outcome.
33Causal graph example
34A ? B
A -gt B
B
B Temperature
A
A log(Altitude)
35Best fit A -gt B
A -gt B
A lt- B
36Linear case?
A lt- B
A -gt B
- Linear function
- Gaussian input
- Gaussian noise
37Google Trends Google Correlate
38(No Transcript)
39(No Transcript)
40(No Transcript)
41Approach 1Granger Causality
Prof. Clive W.J. Granger, recipient of the 2003
Nobel Prize in Economics
42History
- In the early 1960's, I was considering a pair
of related stochastic processes which were
clearly inter-related and I wanted to know if
this relationship could be broken down into a
pair of one way relationships. It was suggested
to me to look at a definition of causality
proposed by a very famous mathematician, Norbert
Weiner, so I adapted this definition (Wiener
1956) into a practical form and discussed it. - Applied economists found the definition
understandable and useable and applications of it
started to appear. However, several writers
stated that "of course, this is not real
causality, it is only Granger causality. - Clive W. J. GrangerÂ
43Grangers Idea
- If variables are cointegrated, the relationship
among them can be expressed as Error Correction
Mechanism (ECM).
44Granger Causality
- Suppose that we have three terms, Xt , Yt , and Wt
 , and that we first attempt to
forecast Xt1 using past terms of Xt and WtÂ
(without Yt). - We then try to forecast Xt1 using past terms
of Xt , Wt ,and Yt (with Yt). - If the second forecast is found to be more
successful, according to standard cost functions,
then the past of Y appears to contain information
helping in forecasting Xt1 that is not in
past Xt or Wt . - In short, Yt would "Granger cause" Xt1 if
- Yt occurs before Xt1 Â
- it contains information useful in
forecasting Xt1 that is not found in a group of
other appropriate variables.
45Vector Autoregression (VAR)
Mathematical Definition
Yt AYt-1 AYt-k et or
where p the number of variables be
considered in the system k the number of lags
be considered in the system Yt, Yt-1, Yt-k
the 1x p vector of variables A, and A'
the p x p matrices of coefficients to be
estimated et a 1 x p vector of innovations
that may be contemporaneously correlated but are
uncorrelated with their own lagged values and
uncorrelated with all of the right-hand side
variables.
46Vector Autoregression (VAR)
Example
Consider a case in which the number of variables
n is 2, the number of lags p is 1 and the
constant term is suppressed. For concreteness,
let the two variables be called money, mt and
output, yt .
The structural equation will be
47Vector Autoregression (VAR)
Example
- Then, the reduced form is
48Vector Autoregression (VAR)
Example
Among the statistics computed from VARs are
helpful in predicting Granger Causality.
- Granger causality tests which have been
interpreted as testing, for example, the validity
of the monetarist proposition that autonomous
variations in the money supply have been a cause
of output fluctuations.
49Vector Autoregression (VAR)
Granger Causality
- In a regression analysis, we deal with the
dependence of one variable on other variables,
but it does not necessarily imply causation.
- In our GDP and M example, the often asked
question is whether GDP ? M or M? GDP. Since we
have two variables, we are dealing with bilateral
causality.
- Given the previous GDP and M VAR equations
50Vector Autoregression (VAR)
Granger Causality
- We can distinguish four cases
- Unidirectional causality from M to GDP
- Unidirectional causality from GDP to M
- Feedback or bilateral causality
- Stationary variables for GDP and M
- Error terms are uncorrelated if it is,
appropriate transformation is necessary
51Vector Autoregression (VAR)
Granger Causality Estimation (t-test)
A variable, say mt is said to fail to Granger
cause another variable, say yt, relative to an
information set consisting of past ms and ys
if E yt yt-1, mt-1, yt-2, mt-2, E yt
yt-1, yt-2, .
mt does not Granger cause yt relative to an
information set consisting of past ms and ys
iff ?21 0.
yt does not Granger cause mt relative to an
information set consisting of past ms and ys
iff ?12 0.
- In a bivariate case, as in our example, a t-test
can be used to test the null hypothesis that one
variable does not Granger cause another variable.
In higher order systems, an F-test is used.
52Vector Autoregression (VAR)
Granger Causality Estimation (F-test)
1. Regress current GDP on all lagged GDP terms
but do not include the lagged M variable
(restricted regression). From this, obtain the
restricted residual sum of squares, RSSR.
2. Run the regression including the lagged M
terms (unrestricted regression). Also get the
residual sum of squares, RSSUR.
3. The null hypothesis is Ho ?i 0, that is,
the lagged M terms do not belong in the
regression.
5. If the computed F gt critical F value at a
chosen level of significance, we reject the null,
in which case the lagged m belong in the
regression. This is another way of saying that m
causes y.
53Criticisms of Causality Tests
- Granger causality test, much used in VAR
modelling, however do not explain some aspects of
the VAR - It does not give the sign of the effect, we do
not know if it is positive or negative - It does not show how long the effect lasts for.
- It does not provide evidence of whether this
effect is direct or indirect.
54(No Transcript)
55Max Planck at centre, 1931
Prof. Dr. Bernhard SchölkopfÂ
Kun Zhang
56Approach 2
- Distinguishing Causes from Effects using
- Nonlinear Acyclic Causal Models
Kun Zhang, Aapo Hyvarinen
57Background
- Model-based causal discovery assumes a generative
model to explain the data generating process. - If the assumed model is close to the true one,
such methods could not only detect the causal
relations, but also discover the form in which
each variable is influenced by others. - For example,
- Granger causality assumes that effects must
follow causes and that the causal effects are
linear (Granger,1980). - If the data are generated by a linear acyclic
causal model and at most one of the disturbances
is Gaussian, independent component analysis (ICA)
(Hyvarinen et al., 2001)can be exploited to
discover the causal relations in a convenient way
(Shimizu et al., 2006).
58Shortcomings
- Previous models were too restrictive for
real-life problems. - If the assumed model is wrong, model-based
causal discovery may give misleading results.
59Zhang Approach
- In a large class of real-life problems, the
following three effects usually exist. - 1. The effect of the causes is usually
nonlinear. - 2. The final effect received by the target
variable from all its causes contains some
noise which is independent from the causes. - 3. Sensors or measurements may introduce
nonlinear distortions into the observed
values of the variables. - Assumption Involved nonlinearities are
invertible.
60- Proposed Solution
- Each observed variable is non-linear function
of its parents with additive noise, followed by
non-linear distortion - If all non-linearities are invertible,
conditions are given for causal relationship - Two-step method Constrained nonlinear ICA
followed by statistical independence tests, to
distinguish the cause from the effect in the
two-variable case
61Noise Effect in transmission from pai to xi
Xi fi,2 fi,1 (pai) ei
Non-linear transformation (Not necessarily
Invertible)
Non-linear Distortion (Continuous and Invertible)
First stage a nonlinear transformation of its
parents pai, denoted by fi,1(pai), plus some
noise (or disturbance) ei (which is independent
from pai). Second stage a nonlinear distortion
fi,2 is applied to the output of the first stage
to produce xi.
62Zhang Approach
- Suppose the causal relation under examination is
x1 ? x2. If this causal relation holds, there
exist nonlinear functions f2,2 and f2,1 such that
- e2 f-1 2,2 (x2)-f2,1(x1) is independent from
x1. - y1 x1, y2 g2(x2) - g1(x1).
- Use Multi-Layer perceptrons (MLPs) to model the
nonlinearities g1 and g2. - Parameters in g1 and g2 are learned by making y1
and y2 as independent as possible.
63Multilayer Perceptron (MLP)
- A multilayer perceptron (MLP) is
a feedforward artificial neural network model
that maps sets of input data onto a set of
appropriate outputs.
64Zhang Analysis
- y1 and y2 produced by the first step are the
assumed cause and the estimated corresponding
disturbance, respectively. - In the second step, one needs to verify if they
are independent. - If y1 and y2 are independent, it implies x1
causes x2, and that g1 and g2 provide an estimate
of f2,1 and f-12,2 , respectively.
65Success !!
- Zhang approach solved the problem
CauseEffectPairs in the Pot-luck challenge, and
successfully identified causes from effects - Earned Reward 200
66Approach 3
- Nonlinear causal discovery
- with additive noise models
Patrik O. Hoyer, Dominik Janzing, Joris Mooij,
Jonas Peters, Bernhard Scholkopf
67- Claim
- Non-linearities are a blessing rather than a
curse -- Hoyer
Idea In reality, many causal relationships are
non-linear. How about generalizing Basic linear
framework to non-linear models??
68Hoyer Approach
- When causal relationships are nonlinear it
typically helps break the symmetry between the
observed variables and allows the identification
of causal directions. - As Friedman and Nachman have pointed out,
non-invertible functional relationships between
the observed variables can provide clues to the
generating causal model. - We show that the phenomenon is much more
general for nonlinear models with additive noise
almost any nonlinearities (invertible or not)
will typically yield identifiable models.
69Hoyer Approach
- Model
- xi fi ( xpa(i) ) ni
- where
- fi is an arbitrary function (possibly different
for each i), - xpa(i) is a vector containing the elements xj
such that there is an edge from j to i in the DAG
G, - the noise variables ni may have arbitrary
probability densities pni (ni),
70Hoyer Model Estimation
- Test whether x and y are statistically
independent. - If not Test whether a model
- y f(x)n
- is consistent with the data, simply by doing a
nonlinear regression of y on x (to get an
estimate f of f), calculating the corresponding
residuals n y - f(x), - and testing whether n is independent of x. If
so, accept the model - y f(x) n
- if not, reject it.
- Similarly test whether the reverse model x
g(y) n fits the data
71Hoyer Test Results
- Obtains a p-value of 0.5 for the (forward) model
current duration causes next interval length
and - a p-value of 4410-9 for the (backward) model
next interval length causes current duration
72Hoyer Test Results
- the Abalone dataset from the UCI ML repository
- The correct model age causes length leads to a
p-value of 0.19, - The reverse model length causes age comes with
p lt 10-15
73Hoyer Test Results
- Temperature Alitude Statistics
- The correct model altitude causes temperature
leads to p 0017, - Temperature causes altitude can clearly be
rejected (p 810-15)
74Approach 4
- A Linear Non-Gaussian Acyclic
- Model for Causal Discovery (LINGAM)
Shohei Shimizu, Patrik O. Hoyer, Aapo
Hyvarinen, Antti Kerminen
75Approach Use of Independent Component Analysis
(ICA)----- called Linear Non-Gaussian Acyclic
Model (LINGAM ) Analysis when working with
continuous-valued data, a significant
advantage can be achieved by departing from the
Gaussianity assumption
- Assumptions
- Data Generating Process is Linear
- No unobserved confounders
- Disturbance variables have non-gaussian
distribution of - non-zero variances
76LINGAM Model
- Linear Non-Gaussina Acyclic Model
- Data Generating process
77LINGAM Idea
- Key to Solution
- Observed variables are linear functions of the
disturbance variables, and the disturbance
variables are mutually independent and
non-Gaussian. - x Bxe,
- x Ae,
- where A (I-B)-1.
78LINGAM Algorithm
- LINGAM can be briefly summarized as follows
- First, use a standard ICA algorithm (e.g.,
FastICA algorithm) to obtain an estimate of the
mixing matrix A (or equivalently of W), - subsequently permute it and normalize it
appropriately before using it to compute B
containing the sought connection strengths bij.3
79LINGAM Algorithm
- Given mn data matrix X (mltltn) where each
column contains one sample vector X. - (a) Subtract mean from each row of X
- (b) Apply ICA to get X AS, where S contains
independent components in its rows - (c) Note W A-1
- Find W1 where W1 contains NO zeros on main
diagonal and is obtained by permutting rows of W. - (3) Divide each row of W1 by corresponding
diagonal element to get W1 with all 1s on main
diagonal
80LINGAM Algorithm
(4) Find B such that B I W (5) To find
causal order, find permutation matrix P of B
which yields B PBPT B (close to
strictly lower triangular) can be measured using
summationiltj (Bij2)
81Practical Experiments
- Project
- Detecting Covert Links in Instant Messaging (IM)
Networks Using Flow Level Log Data
82Introduction
- Users sending Instant Messages (IM) to relay
server - Relay server forwards messages to corresponding
users - All packets contain source and destination IP
addresses of user and server IP addresses only
Scenario 1
83Introduction
- Users may be communicating behind a proxy server
- Users behind proxy servers are visible in
scenario2.
Scenario 2
84Data Set
- Yahoo! Messenger IM network.
- Data Set Details
- Area New York City area.
- Time 12am to 12am
- Data Set Files
- Input Data File
- User-to-server traffic traces.
- Ground Truth Data File
- Record of the actual user-to-user connections.
85Data Set Statistics
Time Duration Users Messages Sessions
8-810a 10 mins 3,420 15,370 1,968
8-820a 20 mins 5,405 33,192 3,265
8-830a 30 mins 7,438 53,649 4,661
8-840a 40 mins 9,513 75,810 6,179
8-850a 50 mins 11,684 99,721 7,669
8-9a 60 mins 13,953 126,694 9,264
86Granger Causality
F-test statistics for Granger Causalty test
87Zhang Approach Results
Zhang results for talking and non-talking pairs
for IM networks in Yahoo!
88Just for Knowledge
- Classifier Tool
- WEKA (Waikato Environment for Knowledge
Analysis) -gt popular suite of machine learning
software written in Java, developed at the
University of Waikato, New Zealand
WEKA Bird Found in New Zealand, Vulnerable
Species.
89WEKA
90Conclusion
91Given A causes of BTo Prove Is it must that A
and B are correlated??Result YES or NO why??
Can you show??
92(No Transcript)