Title: S. Y. Kung1 and M. W. Mak2
1On Consistent Fusion of Multimodal Biometrics
- S. Y. Kung1 and M. W. Mak2
- 1Dept. of Electrical Engineering, Princeton
University - 2Dept. of Electronic and Information Engineering,
- The Hong Kong Polytechnic University
2Outline
- Why Fusion for Audio-Visual Biometrics
- Consistent (vs. Catastrophic) Fusion
- Mixture-of-Expert Fusion Architecture
- Consistent fusion
- Linear fusion
- Nonlinear fusion
- Conclusion
3Why Fusion for Audio-Visual Biometrics
- Voice biometrics can suffer severe performance
degradation under noisy environment, but facial
images are unaffected. - Facial image quality can be severely affected in
poor lighting conditions, but lighting has no
effect on voice quality. - Speech and faces provide complementary
information sources that are ideal candidates for
fusion as verified by ROC(DET). - Results based on 295 subjects from XM2VTSDB
4Mixture-of-Expert Fusion Architecture
- The lower layer contains local experts, each
produces a local score based on a single modality - The upper layer contains a gating network
5ROC(DET)
- We may consider the audio and visual sources
separately, i.e., we have two decision thresholds
and two decision boundaries. - By shifting the decision boundaries
independently, we obtain two DET curves, one for
each modality.
False Rejection Rate
False Acceptance Rate
6Regions of Consistent and Catastrophic Fusion
Catastrophic Region
Consistent Region
7Consistent Fusion
- Yield a lower bound performance of consistent
fusion (fusion that leads to performance equal to
or better than any individual modalities)
1
2
3
4
5
Voice
1
False Rejection Rate
2
users
3
4
5
Face
6
7
5
8
6
7
9
8
Imposters
9
False Acceptance Rate
8Linear Fusion
False Rejection Rate
False Acceptance Rate
9Nonlinear Fusion
- Score distribution of multi-modalities
10Nonlinear Fusion
False Rejection Rate
False Acceptance Rate
11Linear Vs. Nonlinear Fusion
Voice
1
2
3
4
5
Face
6
7
FaceVoice (Linear)
8
9
FaceVoice (Nonlinear)
12What if there are N (N gt2) modalities
- Which pair of modalities would be the best
choice? - Answer DET (ROC) could provide a good
indication on - (1) how good and (2) how
complementary. - What guaranteed advantage to adopt N (Ngt2)
modalities?
False Rejection Rate
False Acceptance Rate
13But there is a catch on statistical significance!
- This can be upheld only if the
- training data set,
- held-out set, and
- test set
- are assumed to have statistically the same
distribution and provided in large volume.
14Thank you
15Conclusions
- The notion of consistent fusion is proposed for
multimodality fusion - The consistent fusion framework leads to several
adaptive fusion schemes, such as hard-switching,
linear combination, and adaptive nonlinear SVM
fusion. - Results suggest that consistent fusion provides a
valuable framework for choosing different
modalities in multimodal biometric authentication.
16Score Distributions of Single Modality
- For a single modality, a test sequence from
a claimant is classified as coming from the true
client if
Decision threshold
17DET Based on Single Modality
- Changing the threshold ?from small to large
values, we obtain an ROC or DET
Large ?
False Rejection Rate
Small ?
False Acceptance Rate
18Is Linear Fusion a good idea?
19Why Fusion for Audio-Visual Biometrics
Fused Score
Adaptive Gating Network (e.g. hard-switch, linear
combiner, and SVM)
Classifier for Audio Channel
Classifier for Visual Channel