Title: Machine Learning Methods for Human-Computer Interaction
1Machine Learning Methods for Human-Computer
Interaction
- Kerem Altun
- Postdoctoral Fellow
- Department of Computer Science
- University of British Columbia
IEEE Haptics Symposium March 4, 2012 Vancouver,
B.C., Canada
2Machine learning
Machine learning
Pattern recognition
Regression
Template matching
Statistical pattern recognition
Structural pattern recognition
Neural networks
Supervised methods
Unsupervised methods
3What is pattern recognition?
- title even appears in the International
Association for Pattern Recognition (IAPR)
newsletter - many definitions exist
- simply the process of labeling observations (x)
with predefined categories (w)
4Various applications of PR
Jain et al., 2000
5Supervised learning
Can you identify other tufas here?
lifted from lecture notes by Josh Tenenbaum
6Unsupervised learning
How many categories are there? Which image
belongs to which category?
lifted from lecture notes by Josh Tenenbaum
7Pattern recognition in haptics/HCI
- Altun et al., 2010a
- human activity recognition
- body-worn inertial sensors
- accelerometers and gyroscopes
- daily activities
- sitting, standing, walking, stairs, etc.
- sports activities
- walking/running, cycling, rowing, basketball, etc.
8Pattern recognition in haptics/HCI
Altun et al., 2010a
walking
basketball
right arm acc
left arm acc
9Pattern recognition in haptics/HCI
- Flagg et al., 2012
- touch gesture recognition on a conductive fur
patch
10Pattern recognition in haptics/HCI
Flagg et al., 2012
light touch
stroke
scratch
11Other haptics/HCI applications?
12Pattern recognition example
Duda et al., 2000
- excellent example by Duda et al.
- classifying incoming fish on a conveyor belt
using a camera image - sea bass
- salmon
13Pattern recognition example
- how to classify? what kind of information can
distinguish these two species? - length, width, weight, etc.
- suppose a fisherman tells us that salmon are
usually shorter - so, let's use length as a feature
- what to do to classify?
- capture image find fish in the image measure
length make decision - how to make the decision?
- how to find the threshold?
14Pattern recognition example
Duda et al., 2000
15Pattern recognition example
- on the average, salmon are usually shorter, but
is this a good feature? - let's try classifying according to lightness of
the fish scales
16Pattern recognition example
Duda et al., 2000
17Pattern recognition example
- how to choose the threshold?
18Pattern recognition example
- how to choose the threshold?
- minimize the probability of error
- sometimes we should consider costs of different
errors - salmon is more expensive
- customers who order salmon but get sea bass
instead will be angry - customers who order sea bass but occasionally get
salmon instead will not be unhappy
19Pattern recognition example
- we don't have to use just one feature
- let's use lightness and width
each point is a feature vector
2-D plane is the feature space
Duda et al., 2000
20Pattern recognition example
- we don't have to use just one feature
- let's use lightness and width
each point is a feature vector
2-D plane is the feature space
decision boundary
Duda et al., 2000
21Pattern recognition example
- should we add as more features as we can?
- do not use redundant features
22Pattern recognition example
- should we add as more features as we can?
- do not use redundant features
- consider noise in the measurements
23Pattern recognition example
- should we add as more features as we can?
- do not use redundant features
- consider noise in the measurements
- moreover,
- avoid adding too many features
- more features means higher dimensional feature
vectors - difficult to work in high dimensional spaces
- this is called the curse of dimensionality
- more on this later
24Pattern recognition example
- how to choose the decision boundary?
is this one better?
Duda et al., 2000
25Pattern recognition example
- how to choose the decision boundary?
is this one better?
Duda et al., 2000
26Probability theory review
- a chance experiment, e.g., tossing a 6-sided die
- 1, 2, 3, 4, 5, 6 are possible outcomes
- the set of all outcomes W1,2,3,4,5,6 is the
sample space - any subset of the sample space is an event
- the event that the outcome is odd A1,3,5
- each event is assigned a number called the
probability of the event P(A) - the assigned probabilities can be selected
freely, as long as Kolmogorov axioms are not
violated
27Probability axioms
- for any event,
- for the sample space,
- for disjoint events
- third axiom also includes the case
- die tossing if all outcomes are equally likely
- for all i16, probability of getting outcome i
is 1/6
28Conditional probability
- sometimes events occur and change the
probabilities of other events - example ten coins in a bag
- nine of them are fair coins heads (H) and tails
(T) - one of them is fake both sides are heads (H)
- I randomly draw one coin from the bag, but I
dont show it to you - H0 the coin is fake, both sides H
- H1 the coin is fair one side H, other side T
- which of these events would you bet on?
29Conditional probability
- suppose I flip the coin five times, obtaining the
outcome HHHHH (five heads in a row) - call this event F
- H0 the coin is fake, both sides H
- H1 the coin is fair one side H, other side T
- which of these events would you bet on now?
30Conditional probability
- definition the conditional probability of event
A given that event B has occurred - P(AB) is the probability of events A and B
occurring together - Bayes theorem
read as "probability of A given B"
31Conditional probability
- H0 the coin is fake, both sides H
- H1 the coin is fair one side H, other side T
- F obtaining five heads in a row (HHHHH)
- we know that F occurred
- we want to find
- difficult use Bayes theorem
32Conditional probability
- H0 the coin is fake, both sides H
- H1 the coin is fair one side H, other side T
- F obtaining five heads in a row (HHHHH)
33Conditional probability
- H0 the coin is fake, both sides H
- H1 the coin is fair one side H, other side T
- F obtaining five heads in a row (HHHHH)
probability of observing F if H0 was true
prior probability (before the observation F)
posterior probability
total probability of observing F
34Conditional probability
- H0 the coin is fake, both sides H
- H1 the coin is fair one side H, other side T
- F obtaining five heads in a row (HHHHH)
total probability of observing F
35Conditional probability
- H0 the coin is fake, both sides H
- H1 the coin is fair one side H, other side T
- F obtaining five heads in a row (HHHHH)
1
1
36Conditional probability
- H0 the coin is fake, both sides H
- H1 the coin is fair one side H, other side T
- F obtaining five heads in a row (HHHHH)
1
1/10
1
1/10
37Conditional probability
- H0 the coin is fake, both sides H
- H1 the coin is fair one side H, other side T
- F obtaining five heads in a row (HHHHH)
1
1/10
1
1/10
1/32
38Conditional probability
- H0 the coin is fake, both sides H
- H1 the coin is fair one side H, other side T
- F obtaining five heads in a row (HHHHH)
1
1/10
1
1/10
1/32
9/10
39Conditional probability
- H0 the coin is fake, both sides H
- H1 the coin is fair one side H, other side T
- F obtaining five heads in a row (HHHHH)
1
1/10
32/41
1
1/10
1/32
9/10
which event would you bet on?
40Conditional probability
- H0 the coin is fake, both sides H
- H1 the coin is fair one side H, other side T
- F obtaining five heads in a row (HHHHH)
- this is very similar to a pattern recognition
problem!
1
1/10
32/41
1
1/10
1/32
9/10
41Conditional probability
- H0 the coin is fake, both sides H
- H1 the coin is fair one side H, other side T
- F obtaining five heads in a row (HHHHH)
- we can put a label on the coin as fake based on
our observations!
1
1/10
32/41
1
1/10
1/32
9/10
42Bayesian inference
- w0 the coin belongs to the fake class
- w1 the coin belongs to the fair class
- x observation
- decide if the posterior probability
is higher than others - this is called the MAP (maximum a posteriori)
decision rule
43Random variables
- we model the observations with random variables
- a random variable is a real number whose value
depends on a chance experiment - discrete random variable
- the possible values form a discrete set
- continuous random variable
- the possible values form a continuous set
44Random variables
- a discrete random variable X is characterized by
a probability mass function (pmf) - a pmf has two properties
45Random variables
- a continuous random variable X is characterized
by a probability density function (pdf) denoted
by - for all possible values
- probabilities are calculated for intervals
46Random variables
- a pdf also has two properties
47Expectation
- definition
- average of possible values of X, weighted by
probabilities - also called expected value, mean
48Variance and standard deviation
- variance is the expected value of deviation from
the mean - variance is always positive
- or zero, which means X is not random
- standard deviation is the square root of the
variance
49Gaussian (normal) distribution
- possibly the most ''natural'' distribution
- encountered frequently in nature
- central limit theorem
- sum of i.i.d. random variables is Gaussian
- definition the random variable with pdf
- two parameters
50Gaussian distribution
it can be proved that
figure lifted from http//assets.allbusiness.com
51Random vectors
- extension of the scalar case
- pdf
- mean
- covariance matrix
- covariance matrix is always symmetric and
positive semidefinite
52Multivariate Gaussian distribution
- probability density function
- two parameters
- compare with the univariate case
53Bivariate Gaussian exercise
The scatter plots show 100 independent samples
drawn from zero-mean Gaussian distributions,with
different covariance matrices. Match the
covariance matrices with the scatter plots, by
inspection only.
b
c
a
54Bivariate Gaussian exercise
The scatter plots show 100 independent samples
drawn from zero-mean Gaussian distributions,with
different covariance matrices. Match the
covariance matrices with the scatter plots, by
inspection only.
b
c
a
55Bayesian decision theory
- Bayesian decision theory falls into the
subjective interpretation of probability - in the pattern recognition context, some prior
belief about the class (category) of an
observation is updated using the Bayes rule
56Bayesian decision theory
- back to the fish example
- say we have two classes (states of nature)
- let be the prior probability that the
fish is a sea bass - is the prior probability that the fish
is a salmon
57Bayesian decision theory
- prior probabilities reflect our belief about
which kind of fish to expect, before we observe
it - we can choose according to the fishing location,
time of year etc. - if we dont have any prior knowledge, we can
choose equal priors (or uniform priors)
58Bayesian decision theory
- let be the feature vector
obtained from our observations - can include features like lightness, weight,
length, etc. - calculate posterior probabilities
- how to calculate?
- and
59Bayesian decision theory
- is called the class-conditional
probability density function (CCPDF) - pdf of observation x if the true class was
- the CCPDF is usually not known
- e.g., impossible to know the pdf of the length of
all sea bass in the world - but it can be estimated, more on this later
- for now, assume that the CCPDF is known
- just substitute observation x in
60Bayesian decision theory
- MAP rule (also called the minimum-error rule)
- decide if
- decide otherwise
- do we really have to calculate ?
61Bayesian decision theory
- multiclass problems
- if prior probabilities are equal
maximum a posteriori (MAP) decision rule
the MAP rule minimizes the error probability, and
is the best performance that can be achieved (of
course, if the CCPDFs are known)
maximum likelihood (ML) decision rule
62Exercise (single feature)
- find
- the maximum likelihood decision rule
Duda et al., 2000
63Exercise (single feature)
- find
- the maximum likelihood decision rule
Duda et al., 2000
64Exercise (single feature)
- find
- the MAP decision rule
- if
- if
Duda et al., 2000
65Exercise (single feature)
- find
- the MAP decision rule
- if
- if
Duda et al., 2000
66Discriminant functions
- we can generalize this
- let be the discriminant function for the
ith class - decision rule assign x to class i if
- for the MAP rule
67Discriminant functions
- the discriminant functions divide the feature
space into decision regions that are separated by
decision boundaries
68Discriminant functions for Gaussian densities
- consider a multiclass problem (c classes)
- discriminant functions
- easy to show analytically that the decision
boundaries are hyperquadrics - if the feature space is 2-D, conic sections
- hyperplanes (or lines for 2-D) if covariance
matrices are the same for all classes (degenerate
case)
69Examples
2-D
3-D
equal and spherical covariance matrices
equal covariance matrices
Duda et al., 2000
70Examples
Duda et al., 2000
71Examples
Duda et al., 2000
722-D example
Jain et al., 2000
73Density estimation
- but, CCPDFs are usually unknown
- that's why we need training data
density estimation
parametric
non-parametric
assume a class of densities (e.g. Gaussian), find
the parameters
estimate the pdf directly (and numerically) from
the training data
74Density estimation
- assume we have n samples of training vectors for
a class - we assume that these samples are independent and
drawn from a certain probability distribution - this is called the generative approach
75Parametric methods
- we will consider only the Gaussian case
- underlying assumption samples are actually
noise-corrupted versions of a single feature
vector - why Gaussian? three important properties
- completely specified by mean and variance
- linear transformations remain Gaussian
- central limit theorem many phenomena encountered
in reality are asymptotically Gaussian
76Gaussian case
- assume are drawn from
a Gaussian distribution - how to find the pdf?
77Gaussian case
- assume are drawn from
a Gaussian distribution - how to find the pdf?
- finding the mean and covariance is sufficient
sample mean
sample covariance
782-D example
calculate
apply the MAP rule
792-D example
802-D example
decision boundary with true pdf
decision boundary with estimated pdf
81Haptics example
Flagg et al., 2012
light touch
stroke
scratch
which feature to use for discrimination?
82Haptics example
- Flagg et al., 2012
- 7 participants performed each gesture 10 times
- 210 samples in total
- we should find distinguishing features
- let's use one feature at a time
- we assume the feature value is normally
distributed, find the mean and covariance
83Haptics example
assume equal priors apply ML rule
84Haptics example
apply ML rule decision boundaries? (decision
thresholds for 1-D)
85Haptics example
- let's plot the 2-D distribution
- clearly this isn't a "good" classifier for this
problem - the Gaussian assumption is not valid
86Activity recognition example
- Altun et al., 2010a
- 4 participants (2 male, 2 female)
- activities standing, ascending stairs, walking
- 720 samples in total
- sensor accelerometer on the right leg
- let's use the same features
- minimum and maximum values
87Activity recognition example
feature 1
feature 2
88Activity recognition example
- the Gaussian assumption looks valid
- this is a "good" classifier for this problem
89Activity recognition example
90Haptics example
- how to solve the problem?
91Haptics example
- how to solve the problem?
- either change the classifier, or change the
features
92Non-parametric methods
- let's estimate the CCPDF directly from samples
- simplest method to use is the histogram
- partition the feature space into (equally-sized)
bins - count the number of samples in each bin
k number of samples in the bin that includes
x n total number of samples V volume of the bin
93Non-parametric methods
- how to choose the bin size?
- number of bins increase exponentially with the
dimension of the feature space - we can do better than that!
94Non-parametric methods
- compare the following density estimates
- pdf estimates with six samples
image from http//en.wikipedia.org/wiki/Parzen_Win
dows
95Kernel density estimation
- a density estimate can be obtained as
- where the functions are Gaussians centered
at . More precisely,
K Gaussian kernel hn width of the Gaussian
96Kernel density estimation
- three different density estimates with different
widths - if the width is large, the pdf will be too smooth
- if the width is small, the pdf will be too spiked
- as the width approaches zero, the pdf converges
to a sum of Dirac delta functions
Duda et al., 2000
97KDE for activity recognition data
98KDE for activity recognition data
99KDE for gesture recognition data
100Other density estimation methods
- Gaussian mixture models
- parametric
- model the distribution as sum of M Gaussians
- optimization algorithm
- expectation-maximization (EM)
- k-nearest neighbor estimation
- non-parametric
- variable width
- fixed k
101Another example
Aksoy., 2011
102Measuring classifier performance
- how do we know our classifiers will work?
- how do we measure the performance, i.e., decide
one classifier is better than the other? - correct recognition rate
- confusion matrix
- ideally, we should have more data independent
from the training set and test the classifiers
103Confusion matrix
confusion matrix for an 8-class problem Tunçel
et al., 2009
104Measuring classifier performance
- use the training samples to test the classifiers
- this is possible, but not good practice
100 correct classification rate for this
example! because the classifier "memorized" the
training samples instead of "learning" them
Duda et al., 2000
105Cross validation
- having a separate test data set might not be
possible for some cases - we can use cross validation
- use some of the data for training, and the
remaining for testing - how to divide the data?
106Cross validation methods
- repeated random sub-sampling
- divide the data into two groups randomly (usually
the size of the training set is larger) - train and test, record the correct classification
rate - do this repeatedly, take the average
107Cross validation methods
- K-fold cross validation
- randomly divide the data into K sets
- use K-1 sets for training, 1 set for testing
- repeat K times, at each fold use a different set
for testing - leave-one-out cross validation
- use one sample for testing, and all the remaining
for training - same as K-fold cross validation, with K being
equal to the total number of samples
108Haptics example
assume equal priors apply ML rule
60.0
the decision region for light touch is too small!!
109Haptics example
apply ML rule
58.5
110Haptics example
58.8
62.4
111Activity recognition example
75.8
71.9
112Activity recognition example
87.8
113Another cross-validation method
- used in HCI studies with multiple human subjects
- subject-based leave-one-out cross validation
- number of subjects S
- leave one subject's data out, train with the
remaining data - repeat for S times, each time test with a
different subject, then average - gives an estimate for the expected correct
recognition rate when a new user is encountered
114Activity recognition example
minimum value
maximum value
K-fold
K-fold
75.8
71.9
subject-based leave-one-out
subject-based leave-one-out
60.8
61.6
115Activity recognition example
K-fold
87.8
subject-based leave-one-out
81.8
116Dimensionality reduction
Duda et al., 2000
- for most problems a few features are not enough
- adding features sometimes helps
117Dimensionality reduction
Jain et al., 2000
- should we add as many features as we can?
- what does this figure say?
118Dimensionality reduction
- we should add features up to a certain point
- the more the training samples, the farther away
this point is - more features higher dimensional spaces
- in higher dimensions, we need more samples to
estimate the parameters and the densities
accurately - number of necessary training samples grows
exponentially with the dimension of the feature
space - this is called the curse of dimensionality
119Dimensionality reduction
- how many features to use?
- rule of thumb use at least ten times as many
training samples as the number of features - which features to use?
- difficult to know beforehand
- one approach consider many features and select
among them
120Pen input recognition
Willems, 2010
121Touch gesture recognition
Flagg et al., 2012
122Feature reduction and selection
- form a set of many features
- some of them might be redundant
- feature reduction (sometimes called feature
extraction) - form linear or nonlinear combinations of features
- features in the reduced set usually dont have
physical meaning - feature selection
- select most discriminative features from the set
123Feature reduction
- we will only consider Principal Component
Analysis (PCA) - unsupervised method
- we dont care about the class labels
- consider the distribution of all the feature
vectors in the d-dimensional feature space - PCA is the projection to a lower dimensional
space that best represents the data - get rid of unnecessary dimensions
124Principal component analysis
- how to best represent the data?
125Principal component analysis
- how to best represent the data?
find the direction(s) in which the variance of
the data is the largest
126Principal component analysis
- find the covariance matrix
- spectral decomposition
- eigenvalues on the diagonal of
- eigenvectors columns of
- covariance matrix is symmetric and positive
semidefinite eigenvalues are nonnegative,
eigenvectors are orthogonal
127Principal component analysis
- put the eigenvalues in decreasing order
- corresponding eigenvectors show the principal
directions in which the variance of the data is
largest - say we want to have m features only
- project to the space spanned by the first m
eigenvectors
128Activity recognition example
Altun et al., 2010a
- five sensor units (wrists, legs,chest)
- each unit has three accelerometers, three
gyroscopes, three magnetometers - 45 sensors in total
- computed 26 features from sensor signals
- mean, variance, min, max, Fourier transform etc.
- 45x261170 features
129Activity recognition example
- compute covariance matrix
- find eigenvalues and eigenvectors
- plot first 100 eigenvalues
- reduced the number of features to 30
130Activity recognition example
131Activity recognition example
what does the Bayesian decision making (BDM)
result suggest?
132Feature reduction
- ideally, this should be done for the training set
only - estimate from the training set, find
eigenvalues and eigenvectors and the projection - apply the projection to the test vector
- for example for K-fold cross validation, this
should be done K times - computationally expensive
133Feature selection
- alternatively, we can select from our large
feature set - say we have d features and want to reduce it to m
- optimal way evaluate all possibilities
and choose the best one - not feasible except for small values of m and d
- suboptimal methods greedy search
134Feature selection
- best individual features
- evaluate all the d features individually, select
the best m features
135Feature selection
- sequential forward selection
- start with the empty set
- evaluate all features one by one, select the best
one, add to the set - form pairs of features with this one and one of
the remaining features, add the best one to the
set - form triplets of features with these two and one
of the remaining features, add the best one to
the set
136Feature selection
- sequential backward selection
- start with the full feature set
- evaluate by removing one feature at a time from
the set, then remove the worst feature - continue step 2 with the current feature set
137Feature selection
- plus p take away r selection
- first enlarge the feature set by adding p
features using sequential forward selection - then remove r features using sequential backward
selection
138Activity recognition example
first 5 features selected by sequential forward
selection
first 5 features selected by PCA
SFS performs better than PCA for a few features.
If 10-15 features are used, their performances
become closer. Time domain features and leg
features are more discriminative
Altun et al., 2010b
139Activity recognition example
Altun et al., 2010b
140Discriminative methods
- we talked about discriminant functions
- for the MAP rule we used
- discriminative methods try to find
directly from data
141Linear discriminant functions
- consider the discriminant function that is a
linear combination of the components of x - for the two-class case, there is a single
decision boundary
142Linear discriminant functions
- for the multiclass case, there are options
- c two-class problems, separate from others
- consider classes pairwise
143Linear discriminant functions
distinguish one class from others
consider classes pairwise
Duda et al., 2000
144Linear discriminant functions
- or, use the original definition
- assign x to class i if
Duda et al., 2000
145Nearest mean classifier
- find the means of training vectors
- assign the class of the nearest mean for a test
vector y
1462-D example
1472-D example
decision boundary with true pdf
decision boundary with nearest mean classifier
148Activity recognition example
149k-nearest neighbor method
- for a test vector y
- find the k closest training vectors
- let be the number of training vectors
belonging to class i among these k vectors - simplest case k1
- just find the closest training vector assign its
class - decision boundaries
- Voronoi tessellation of the space
1501-nearest neighbor
this is called a Voronoi tessellation
Duda et al., 2000
151k-nearest neighbor
- test sample
- circle
- class
- square
- class
- triangle
- note how the decision is different for k3 and
k5
k3
k5
http//en.wikipedia.org/wiki/K-nearest_neighbor_al
gorithm
152k-nearest neighbor
- no training is needed
- computation time for testing is high
- many techniques to reduce the computational load
exist - other alternatives exist for computing the
distance - Manhattan distance (L1 norm)
- chessboard distance (L8 norm)
153Haptics example
K-fold
63.3
subject-based leave-one-out
59.0
154Activity recognition example
K-fold
90.0
subject-based leave-one-out
89.2
155Activity recognition example
decision boundaries for k3
156Feature normalization
- especially when computing distances, the scales
of the feature axes are important - features with large ranges may be weighted more
- feature normalization can be applied so that the
ranges are similar
157Feature normalization
- linear scaling
- normalization to zero mean unit variance
- other methods exist
where l is the lowest value and u is the largest
value of the feature x
where m is the mean value and s is the
standard deviation of the feature x
158Feature normalization
- ideally, the parameters l, u, m, and s should be
estimated from the training set only, and then
used on the test vectors - for example for K-fold cross validation, this
should be done K times
159Discriminative methods
- another popular method is the binary decision
tree - start from the root node
- proceed in the tree by setting thresholds on the
feature values - proceed with sequentially answering questions
like - "is feature j less than threshold value Tk?"
160Activity recognition example
161Discriminative methods
Aksoy, 2011
- one very popular method is the support vector
machine classifier - linear classifier applicable to linearly
separable data - if the data is not linearly separable, maps to a
higher dimensional space - usually a Hilbert space
162Comparison for activity recognition
- 1170 features reduced to 30 by PCA
- 19 activities
- 8 participants
163References
- S. Aksoy, Pattern Recognition lecture notes,
Bilkent University, Ankara, Turkey, 2011. - A. Moore, Statistical Data Mining tutorials
(http//www.autonlab.org/tutorials) - J. Tenenbaum, The Cognitive Science of Intuitive
Theories lecture notes, Massachussetts Institute
of Technology, MA, USA, 2006. (accessed online
http//www.mit.edu/jbt/9.iap/9.94.Tenenbaum.ppt) - R. O. Duda, P. E. Hart, D. G. Stork, Pattern
Classification, 2nd ed., Wiley-Interscience,
2000. - A. K. Jain, R. P. D. Duin, J. Mao, Statistical
pattern recognition a review, IEEE Transactions
on Pattern Analysis and Machine Intelligence,
22(1)437, January 2000. - A. R. Webb, Statistical Pattern Recognition, 2nd
ed., John Wiley Sons, West Sussex, England,
2002. - V. N. Vapnik, The Nature of Statistical Learning
Theory, 2nd ed., Springer-Verlag New York, Inc.,
2000. - K. Altun, B. Barshan, O. Tuncel, (2010a)
Comparative study on classifying human
activities with miniature inertial/magnetic
sensors, Pattern Recognition, 43(10)36053620,
October 2010. - K. Altun, B. Barshan, (2010b) "Human activity
recognition using inertial/magnetic sensor
units," in Human Behavior Understanding, Lecture
Notes in Computer Science, A.A.Salah et al.
(eds.), vol. 6219, pp. 3851, Springer, Berlin,
Heidelberg, August 2010. - A. Flagg, D. Tam, K. MacLean, R. Flagg,
Conductive fur sensing for a gesture-aware furry
robot, Proceedings of IEEE 2012 Haptics
Symposium, March 4-7, 2012, Vancouver, B.C.,
Canada. - O. Tuncel, K. Altun, B. Barshan, Classifying
human leg motions with uniaxial piezoelectric
gyroscopes, Sensors, 9(11)85088546, November
2009. - D. Willems, Interactive Maps using the pen in
human-computer interaction, PhD Thesis, Radboud
University Nijmegen, Netherlands, 2010 - (accessed online http//www.donwillems.ne
t/waaaa/InteractiveMaps_PhDThesis_DWillems.pdf)