Title: K' P' Unnikrishnan
1Data Mining Methods for Electronic Medical Records
Collaborators Indian Institute of Science P. S.
Sastry, (Srivatsan Laxman) Univ. Michigan Vijay
Nair, Casey Diekman, Kohinoor Dasgupta Virginia
Tech Naren Ramakrishnan, Debprakash Patnaik UC,
Davis Anne Smith, UCSF Loren Frank RIKEN Kazuo
Okanoya Wayne State Sorin Draghici
2Applications
3Discovering episodes with temporal constraints
- In Neuroscience, one can get delays synaptic and
axonal delays. - Automatic discovery of inter-event intervals and
episodes - Inter-event times of event occurrences have
valuable information - Goal Unearthing network connectivity patterns
4Graph Edges Patterns in data
t
U X Y
Z
G M
tUX
tGM
tXY
tYZ
A. Efficient level-wise mining
Counting all episodes above a threshold
B. Discovering inter-event intervals
Discovering the best fit interval from a
supplied set
5Tracking an Evolving Network
6Discovering Rare Patterns
- 10,000 spikes from 26 neurons
- 11 spikes (0.1 of the total) are in a pattern
G-M-B-E-A-T-S-F-O-R-D - A single occurrence is statistically significant
7Mining EMR using GMiner
- Example EMRPatient ID_0Recorded medical event
"DIAG_1" on Day 0Recorded medical event "DIAG_3"
on Day 1Recorded medical event "PRES_1" on Day
5Recorded medical event "PRES_3" on Day
6Recorded medical event "EVT_L" on Day
7Recorded medical event "TEST_4" on Day 7... - Embedded patterns
- TEST_1 -gt TEST_2 -gt DIAG_1 -gt PRES_1
- TEST_3 -gt TEST_4 -gt DIAG_2 -gt PRES_2
- TEST_5 -gt DIAG_3 -gt PRES_3
- GMiner Results
- No. of 3 node frequent episodes 5
- TEST_54-6-DIAG_34-6-PRES_3 (0.78141) 242
- No. of 4 node frequent episodes 2
- TEST_14-6-TEST_24-6-DIAG_14-6-PRES_1
(0.81822) 187 - TEST_34-6-TEST_44-6-DIAG_24-6-PRES_2
(0.80452) 175
8Imaginary Situation 1
- Patients arriving in Emergency Department (ED)
- Events Diagnostic tests EMR (historical data)
represented here as alphabets - Event patterns can be discovered
- Patients can be flagged as high-priority (based
on partial patterns)
A15-30 min-B15-30 min-C15-30 hours-Y
MAZXYCQBGMQPTARYCDJBSPASWCJDGMDYZXHGDH
Patient 1
Historical Data
ZXHADHOTCBFAKVPCLVIRXY
Patient 2
SARYCDJBSPASWCJDGMDYKVPQLVIRX
Patient 3
Raise flag at current time
Time
9GMiner Graph Visualization
10Imaginary Situation 2
- 1,000 patients come through the hospital
- Most of these events occur independently of one
another or with weak dependence - 2 of these patients have the same condition and
show the sequence of events we looked at before
- A4 to 6 hours-B1 to 3 hours-C5 to 7
hours-Y - However, another 2 of the patients also have the
same condition but a different pattern of events
occurs with different time delays -
- A9 to 11 hours-B3 to 5 hours-C11 to 13
hours-N - Imagine that event Y represents a positive
outcome, while event N represents a negative
outcome.
11GMiner Results (Simulation Example 2)
12Backup
13Complex dynamical system
- What is the problem we are trying to solve?
- Large graph with many nodes and edges
- Activity from many of the nodes are available
- How do we get the graph (strength, direction,
delay) out
14With inter-event time constraints
- Inter-event times in serial episodes
- Inter-event expiry constraint (0 lt ?ti lt TX)
- Inter-event interval constraint (Tlow lt ?ti lt
Thigh)
15Counting Episodes with inter-event constraints
- Complex state-transitions required for counting
with inter-event constraints - Space complexity O(mnC) and Time Complexity
O(mnC)
Accept_A()
Accept_B()
Accept_C()
Accept_D()
C10
A1
B4
D17
A2
B12
C13
A5
5
10
Data Read Head
A1
B4
A5
C10
B12
C13
D17
A2
Event Sequence
16Parallelizing counting
- Run several parallel automatons at different
start states for the same episode - Map step
- Merge count and state info from each auto
- Concatenate step
- Implemented on Nvidia GTX280 GPU
- 1.3 Ghz clock
- 1 GB device memory)
- 200X speed up w.r.t CPU
17Cortical cultures on micro-electrode arrays
18Relative spike counts
Days
19(No Transcript)
20Data mining Bayesian GLM
21TDMiner Finds Fault Correlations
22Finding Relevant Fault Correlations
Statistically significant correlations
Problem begins
Problem fixed
By using TDMiner, the root-cause could have been
identified 2.5 weeks earlier
23LOMA Robot Problems