Keystroke Biometric Identification Studies on Long-Text Input - PowerPoint PPT Presentation

1 / 67
About This Presentation
Title:

Keystroke Biometric Identification Studies on Long-Text Input

Description:

www.csis.pace.edu – PowerPoint PPT presentation

Number of Views:103
Avg rating:3.0/5.0
Slides: 68
Provided by: Justin304
Learn more at: http://www.csis.pace.edu
Category:

less

Transcript and Presenter's Notes

Title: Keystroke Biometric Identification Studies on Long-Text Input


1
Keystroke Biometric Identification Studies on
Long-Text Input
  • Mary Villani
  • DPS 2006
  • Fall 2007

2
Objective
  • For long-text input of 600 keystrokes
  • Determine the viability of the keystroke
    biometric two independent variables
  • Different entry modes copy and free text
  • Different keyboards desktop and laptop

3
Secondary Tertiary Goals
  • When subjects are Aware they are being observed
    or Unaware
  • Identify Patterns or recognition based on subject
    demographics (ie. handedness, gender, age,
    language)

4
Biometrics / Biometric Technologies
  • Biometrics
  • identifying an individual based on his or her
    distinguishing characteristics or the science of
    identifying, or verifying the identity of a
    person based on physiological or behavioral
    characteristics Bolle et al.
  • Biometric Technologies
  • Automated methods of verifying or recognizing the
    identity of a living person based on
    physiological or behavioral characteristic
    Miller, B

5
Keystroke Biometrics
  • Keystroke identification of a person by their
    personal typing style or keystroke pattern
  • Each individual has a characteristic typing
    ability that is unique Bolle
  • Typing biometrics is the analysis of a users
    keystroke patterns Conn et al

6
Advantages of Keystroke Biometric
  • Keyboards commonly used
  • Not intrusive
  • Inexpensive
  • Can Frequently Re-authenticate the User

7
Literature Search
  • Copy Task
  • Long Text
  • Early studies
  • About 5 majors (Gaines, Umphress, Leggert)
  • Short Entry / Password Hardening
  • At least 16, biggest focus
  • Even noted the longer the word, the higher the
    accuracy
  • Product BioPassword
  • Free Text
  • Song (continuous monitoring)
  • Gunetti Picardi (August 2005)

8
Literature Search
  • Features Extracted
  • Mostly means and standard deviations of press and
    transition (diagraph)
  • Some trigraphs, 4graphs, 6graphs
  • Most pre-process, remove errors or outliers
  • Some applied difficulty factor, zones for the
    keyboard, some factored length and overall
    percentages

9
Literature Search
  • Classification Approaches
  • KNN / Euclidean Distance
  • Most popular, simples, highest accuracy,
    complaint long processing time
  • Fuzzy Logic
  • Neural networking/genetic algorthims
  • Bayesian classification
  • Combination thereof

10
Contribution
  • Copy Under Non Ideal Condition
  • Copy Compared to Free-Text
  • Desktop Compared to Laptop
  • Features Fallback
  • Length for Free-Text
  • Impact of Outlier Distance of Outlier
  • Optimized Performance Parameters Free Text

11
Gleaned From Initial Experiments the Literature
  • More features extracted from raw data yielded
    better results
  • Falling back when no occurrence of a keystroke to
    mean and standard deviation of all degraded
    performance
  • Increasing of participants degrades performance
    but made the experiment more robust

12
These Experiments
  • Increasing feature set to 259 from 58
  • Staying with long passage
  • Must use a larger participant pool for validity
    of study
  • Testing entry in non-ideal conditions
  • Different input type (free text vs. copy)
  • Different keyboard type (desktop vs. laptop)
  • User Awareness Level (aware vs. unaware)

13
Updated Feature Set
14
Keystroke Biometric System Components
  • Data Capture Applet
  • Feature Extractor
  • Pattern Classifier

15
Data Capture
16
Login Screen
17
First Part of Demographic Questionnaire
18
Second Part of Demographic Questionnaire
Note subject is asked to sign off for IRB
approval 16
19
Subjects Could Choose Keyboard and Task
The feature files were named based on their
choices and entry
20
Copy Task Entry Mode
21
Free-Text Entry Mode
22
Sample Raw Feature Data
Sample Raw Feature Data File Hello World
23
239 Feature Measurements
  • 78 Key Press Duration Measures
  • (39 means and 39 standard deviations)
  • 70 Key Transition Type 1 Measures
  • (35 means and 35 standard deviations)
  • 70 Key Transition Type 2 Measures
  • (35 means and 35 standard deviations)
  • 21 Other Measures (percentages and rates)

24
Type 1 and 2 Transition Measures
25
Key Press Duration Features and Fallback
HierarchyWhat to do when key not used often
Hierarchy tree for the 39 duration features (each
oval), each represented by a mean and a standard
deviation.
26
Key Transition Featuresand Fallback Hierarchy
Hierarchy tree for the 35 transition features
(each oval), each represented by a mean and a
standard deviation for each of the type 1 and
type 2 transitions.
27
Fallback for Few Samples
  • Mean and Standard Deviation Computation when
    number of samples n(i) is less than
    kfallback-threshold
  • Similar to NLP backoff statistics for n-grams

28
Two Preprocessing Steps
  • Outlier removal
  • Remove samples gt 2s from µ
  • Prevents feature skewing from pauses
  • Standardization
  • Scales to range 0-1 to give roughly equal weight
    to each measure

29
Pattern Classifier
  • Nearest Neighbor Classifier using Euclidean
    Distance

30
Experimental DesignSix Main Experiments per Six
Arrows
31
Experimental DesignKeyboards (independent
variable 1)
  • Desktop Keyboards mostly (100) Dell desktops
    in a classroom environment
  • Laptop Keyboards about 90 Dell laptops, some
    IBM, HP, Apple
  • (greater variety of laptop
  • than desktop keyboards)

32
Experimental DesignInput Modes (independent
variable 2)
  • Copy Task Input specified text of about 600
    keystrokes corrections
  • Free Text Input creation of arbitrary emails
    (at least 600 keystrokes)

33
Data Collection
34
Subject Participation
35
Participation By Experiment Each subject entered
5 texts in at least two quadrants A total of 36
participated in all four quadrants
Desktop
Laptop





1
52 Subjects
Copy
4
3
5
40 Subjects
47 Subjects
93 Subjects
Free Text
41 Subjects
6






2
40 Subjects
36
Five Sub Experiments for Each of the Six Arrows
d e
b
a
c
  • a. Training testing on data in quadrant at
    first end of arrow (leave-one-out procedure)
  • b. Training testing on data in quadrant at
    second end of arrow (leave-one-out procedure)
  • c. Combining data at each arrow end
    (leave-one-out procedure)
  • d. Training on first end testing on second
  • e. Training on second end testing on first

37
Results Experiment 1 36 subjects participated in
all quadrants
38
Results Experiment 2 36 subjects participated in
all quadrants
39
Results Experiment 3 36 subjects participated in
all quadrants
40
Results Experiment 4 36 subjects participated in
all quadrants
41
Results Experiment 5 36 subjects participated in
all quadrants
42
Results Experiment 6 36 subjects participated in
all quadrants
43
36 Subject Summary
44
All Subject SummarySupports 36 Subject Results
45
Conclusions
  • Best accuracies for same keyboard and same input
    mode
  • Accuracy dropped significantly for different
    keyboards or for different input modes
  • Accuracy for different input modes better than
    accuracy for different keyboards
  • Accuracy for copy mode somewhat better than
    accuracy for free-text mode
  • Accuracy decreased as the number of subjects
    increased

46
Long-Text Input Applications
  • Identify the author of inappropriate email and
    possibly even IM
  • Authenticate the student taking online exams

47
Future Work
  • Authentication
  • Masters Students currently collecting more data
  • Try more sophisticated classifiers
  • Neural Networks
  • Support Vector Machines
  • Explore the data with data mining
  • Identify patterns cross referencing demographics
  • Aware/Unaware with Better Management

48
Future Research Continued
  • Observe keystroke patterns over time (ages 15,
    20, 25, 30 25 same person)
  • Observe those who learned computer usage at a
    very young age to those who learned in their
    adult life, or different typing levels
  • Identify letters and letter pairs that provide
    more value to the accuracy level
  • More work in free-text and how it differs from
    copy

49
Unaware / Aware
50
Demographic Studies
  • Compared 1st half of 93 participant copy text to
    2nd half
  • Train on One, Test on the other matching on
  • Gender
  • Age
  • Language
  • Handedness
  • Ultimately Wacky results
  • Code change at very end to accommodate, dont
    trust the programs are processing correctly
  • Moved to Future Work

51
Right Number of Entries Per Subject?
52
Optimized Feature Extractor Parameters from
previous assignment Copy Task 30 Subjects
Below are the optimized parameters from copy-task
experiments
R
2
1
4
1
Outlier Dist. Real
K1
Fallback 1 Yes 0 No
Outlier R,
K2
53
Fallback
54
Outlier Removal Significance
55
Outlier Removal Significance
56
Optimizing Outlier Distance Free Text 93
Subject
57
Optimizing K1 Free Text 93 Subjects
58
Optimizing K2 Free Text 93 Subject
59
Optimized Feature Extractor Parameters from
previous assignment Copy Task 36 Subjects
Below are the optimized parameters from copy-task
experiments
R
2
1
4
1
Outlier Dist. Real
K1
Fallback 1 Yes 0 No
Outlier R,
K2
R
1.75
6
8
1
60
Rerun of 93 Subject
Conclusion Optimal Settings are different for
Copy and Free-Text Entry Mode Note This result
is not yet in manuscript, it will be added
61
Significance of Input Text Length
62
What Features Contribute Most?
63
Feature Contribution Curves
64
Accuracy as a Function of of Features
65
Contributions
  • Input Mode Analysis
  • Keyboard Type Analysis
  • Feature Set
  • Fallback
  • Optimizing Feature Extraction Parameters for
    Free-Text
  • Minimum Input Length for Free-Text

66
Contributions - Cont
  • Minimum of Samples
  • Features that contribute most
  • Feasibility on Aware / Unaware
  • Feasibility on Demographic

67
Questions?
  • Thank you
Write a Comment
User Comments (0)
About PowerShow.com