Physiological%20Data%20Modeling - PowerPoint PPT Presentation

About This Presentation
Title:

Physiological%20Data%20Modeling

Description:

Commercially proven in our eCM software. Delta Airlines all engines, all flights. Power Plants Entergy, Dynegy, APS. Transportation GM-EMD, Caterpillar ... – PowerPoint PPT presentation

Number of Views:64
Avg rating:3.0/5.0
Slides: 22
Provided by: denis65
Category:

less

Transcript and Presenter's Notes

Title: Physiological%20Data%20Modeling


1
Physiological Data Modeling
ICML 2004 Banff, AL July 8, 2004 Jack Mott and
Matt Pipke SmartSignal Corporation
2
SmartSignal Corporation
  • Incubator of Similarity-Based Modeling technology
  • Universally applicable
  • Data driven, empirical
  • Scalable, deployable
  • Commercially proven in our eCM software
  • Delta Airlines all engines, all flights
  • Power Plants Entergy, Dynegy, APS
  • Transportation GM-EMD, Caterpillar

3
Similarity-Based Modeling
  • Snapshots at instants of time
  • Needs only historical data
  • Removal of normal variations
  • Anomaly detection and isolation
  • One technology for all applications

4
Physiological Data Modeling Method
  • A historical H matrix of reference data is first
    chosen comprising refXi vectors
  • A local D matrix is chosen comprising a small
    number of refXi vectors with the highest
    similarities to a newX vector
  • Identical vectors have similarity 1
  • Non-identical vectors have 0 lt similarity lt 1
  • The newY model vector is given by
  • newY D(DTD) 1(DT newX)
  • where the similarity operation () applies only
    to independent variables

5
Physiological Data
  • 11 independent variables
  • User characteristics (2)
  • Armband sensor values (9)
  • 2 dependent variables
  • Gender number
  • Annotation class

6
Training Data Setup
  • Select 2,500 3,000 records for each H matrix
  • One H matrix for gender
  • One H matrix for annotation 3004
  • One H matrix for annotation 5102
  • Each H matrix
  • Includes about equal populations for each user
  • Includes positive and negative examples
  • Contains no vectors too similar to each other
  • Contains only filtered data (99 of total)
  • User 17 excluded

7
Gender H Matrix
8
8
Annotation 5102 H Matrix
8
9
Annotation 3004 H Matrix
8
10
Training Data Modeling
  • If any vector to be modeled was in an H matrix it
    was removed from the H matrix before the D matrix
    was formed
  • Leave-one-out cross-validation of each H matrix
  • Chose 10 as number of vectors for the D matrices
  • Reduced the number of independent variables to 8
    - 9
  • Modeled all 580,264 unfiltered training vectors
  • Inferred gender with gender H matrix
  • Inferred class with annotation 5102 H matrix
  • Positive examples of annotation 5102 have actual
    class 1
  • Negative examples of annotation 5102 have actual
    class 0
  • Inferred class with annotation 3004 H matrix
  • Positive examples of annotation 3004 have actual
    class 1
  • Negative examples of annotation 3004 have actual
    class 0

11
Gender Windows and Thresholds
  • Chose gender windows to contain all vectors in a
    session
  • If the inferred gender was gt T for gt ½ the
    vectors in a window then all vectors in a window
    were assigned predicted gender 1, otherwise
    predicted gender 0
  • T .5 produced Sensitivity 1 and Specificity
    1

12
Annotation 5102 Windows and Thresholds
  • Chose annotation 5102 windows to contain 80
    vectors
  • If the inferred class was gt T for gt ½ the vectors
    in a window then only vectors in a window from
    the first to last instances where the inferred
    class was gt T were assigned predicted class 1,
    otherwise predicted class 0
  • Sensitivity and Specificity varied as T varied to
    produce an ROC curve
  • T .58 where the slope 1 on the ROC curve

13
Window Sizes for Annotation 5102
8
14
ROC curve for Annotation 5102
8
15
Annotation 3004 Windows and Thresholds
  • Chose annotation 3004 windows to contain 30
    vectors
  • If the inferred class was gt T for gt ½ the vectors
    in a window then only vectors in a window from
    the first to last instances where the inferred
    class was gt T were assigned predicted class 1,
    otherwise predicted class 0
  • Sensitivity and Specificity varied as T varied to
    produce an ROC curve
  • T .48 where the slope 1 on the ROC curve

16
Window Sizes for Annotation 3004
8
17
ROC curve for Annotation 3004
8
18
Training Data Overall Results
  • Gender Predictions
  • 23929 (4) gender 1
  • Sensitivity 23929 / 23929 1
  • 556335 (96) gender 0
  • Specificity 556335 / 556335 1
  • Annotation 5102 Predictions
  • 173759 (30) class 1
  • Sensitivity 96288 / 98172.98
  • 406505 (70) class 0
  • Specificity 72251 / 73668 .98
  • Annotation 3004 Predictions
  • 80511 (14) class 1
  • Sensitivity 4129 / 4413 .94
  • 499753 (86) class 0
  • Specificity 157993 / 167368 .94

19
Test Data Modeling
  • Modeled all 720,792 unfiltered test vectors
  • Assumed that characteristic 2 was an extremely
    important independent variable in modeling gender
  • Used the appropriate H matrices, D matrix size,
    independent variables, thresholds and window
    sizes developed from the training data
  • Predicted gender
  • Predicted class for annotation 5102
  • Predicted class for annotation 3004

20
Test Data Overall Results
  • Gender predictions
  • 84426 (12) gender 1
  • 4 for training data
  • 636366 (88) gender 0
  • 97 for training data
  • Annotation 5102 predictions
  • 232823 (32) class 1
  • 30 for training data
  • 487969 (68) class 0
  • 70 for training data
  • Annotation 3004 predictions
  • 80511 (11) class 1
  • 14 for training data
  • 640281 (89) class 0
  • 86 for training data

21
Conclusions
  • SBM is easy to apply to real people with real
    armbands
  • Modeling choices, the size of D matrix and
    independent variables, are determined by only a
    small fraction of training records, the H matrix
  • SBM accommodates anomalies in new data
  • Can be applied to raw, unfiltered data
  • SBM is automatically user-specific
  • Presence or absence of a user in new data can be
    detected
  • SBM might be made user-general
  • Transform data into t-scores with zero mean and
    unit standard deviation for each activity
Write a Comment
User Comments (0)
About PowerShow.com