Title: Causal Data Mining
1Causal Data Mining
Richard Scheines Dept. of Philosophy, Machine
Learning, Human-Computer Interaction
Carnegie Mellon
21. Predictive Data Mining
- Finding predictive relationships in data
- What feature of student behavior predicts
learning - Who will default on credit cards
- Who will get an A in your course
- Which HS students will do well at CMU
- Do students cluster by learning style
3Causal Data Mining
- Finding causal relationships in data
- What feature of student behavior causes learning
- What will happen when we make everyone take a
reading quiz before each class - What will happen when we program our tutor to
intervene to give hints after an error
4Predictive Data Mining
X1 X2 X3 . . Xk Y
1 1.7 28 M . . 2.4 1
2 2.0 11 F . . 1.1 0
3 1.9 17 F . . 1.1 1
. . . . . . . .
. . . . . . . .
N 2.8 12 M . . 1.8 0
Data Mining Search
Predictive Model Y f(X1, X2, Xk)
5Predictive Data Mining
- Model Classes
- Simple Regression
- Locally Weighted Regression
- Logistic Regression
- Neural Nets
- Vector Support Machines
- Decision Trees
- Bayes Net
- Naïve Bayes Classifier
- Independent Components
- Clustering
- Etc.
Data Mining Search
Predictive Model Y f(X1, X2, Xk)
6Predictive Data Mining
Data Mining Search
Predictive Model under Constraints Y f(X1, X2,
Xk), e.g., f ? Additive functions
7Predictive Data Mining
Data Mining Search
Predictive Model under Constraints Y f(X1, X2,
Xk), Or Probability Model under
Constraints P(Y X1, X2, , Xk), where P ?
Gaussian, with mean 0
8Predictive Data Mining
Decision Tree Search
9Predictive Data Mining ?Causal Data Mining
Conditioning is not the same as intervening
- P(Y X1, X2, , Xk)
- ?
- P(Y X1set, X2, , Xk)
Teeth Slides
10Causal DiscoveryStatistical Data ? Causal
Structure
11Causal Discovery Software TETRAD IV
www.phil.cmu.edu/projects/tetrad
12Full Semester Online Course in Causal
Statistical Reasoning
13Full Semester Online Course in Causal
Statistical Reasoning
- Course is tooled to record certain events
- Logins, page requests, print requests, quiz
attempts, quiz scores, voluntary exercises
attempted, etc. - Each event was associated with attributes
- Time
- student-id
- Session-id
14Printing and Voluntary Comprehension Checks 2002
--gt 2003