Title: Chapter 9 Creating and Maintaining Database
1Chapter 9Creating and Maintaining Database
- Presented by Zhiming Liu
- Instructor Dr. Bebis
2Outline
- Introduction
- Enrollment Policies
- The Zoo
- Biometric Sample Quality Control
- Training
- Enrollment Is System Training
3Introduction
- Biometric enrollment asks an individual to give
out private information. - Enrollment is a process directed by some
enrollment policy, which needs to be acceptable
to the public. - Positive enrollment under enrollment policy EM,
select trusted individuals and store machine
representation of these m enrolled members in a
verification database M.
4Introduction
- Negative enrollment for criminal identification
systems, under enrollment policy EN, determine
the undesirable individuals and store machine
representations of the n selected individuals in
the screening database N. - Because of error and fraud, there are fake and
duplicate identities in legacy databases.
5Introduction
- - A fake identity can be one of two cases,
created and stolen identities - 1. Created identity some subject d enrolls in
M as dK using documents for a nonexistent
identity, either fake documents or fake ID. - 2. Stolen identity a fake identity can also
be a falsely enrolled subject dK as subject dK,
the stolen identity. - - A duplicate identity
IB -
Subject A duplicate -
IA
6Enrollment policies
- Positive enrollment this is a process of the
registration of M trusted subjects dm in database
M. The enrollment could be based on some already
enrolled population W. - Negative enrollment is a process of
registration of N questionable subjects dn by
storing machine descriptions of these subjects in
database N, which contains much more specific and
detailed descriptions.
7Enrollment policies
- Social issues
- - How to make biometric authentication work
without creating additional security loopholes,
and without damaging civil liberties? - - Who will administer and maintain databases
of authorized subjects? - - How will the data integrity of these
databases be protected?
8The zoo
- Apply animals to subject categories, depend on
whether one subject is easy to authenticate or
not. - - Sheep The group of subjects that dominate
the population are easy to authenticate because
their real-world biometric is very distinctive
and stable. -
- - Goats The group of subjects that are
particularly difficult to authenticate because of
a poor real-world biometric that is not
distinctive, perhaps due to physical damage to
body parts or due to large spurious variability
in the biometric measurements over time. - This is the portion of the population that
generates the majority of False Rejects.
9The zoo
- - Lambs These are the enrolled subjects who
are easy to imitate. -
- Lambs are the cause of most
False Accepts because they - are imitated by wolves.
- - Wolves These are subjects that are
particularly good at imitating, - impersonating, or forging a
particular biometric. - - Chameleons These are the subjects who are
both easy to imitate - and good at imitating
others. - They are a source of
passive False Accepts when enrolled - and of active False Accepts
when being authenticated.
10The zoo
11Biometric sample quality control
- Many random False Rejects/Accepts occur because
of adverse signal acquisition situations. - - two solutions
12Biometric sample quality control
- - for example, apply image enhancement or
suggest subjects - present the biometric in a different,
better way. - - Failure to Enroll (FTE)
-
- Input quality control
higher FTE rates - Low-quality samples lower
FTE rates - - Relationship with ROC
- lower FTE higher FAR and FRR
13Biometric sample quality control
14Training
- Why does a biometric system need to be trained?
-
- - Compute match score s(B, B).
- - The goal is to make the average difference
between these match - scores and mismatch scores as high as
possible. - There are two aspects to training
- - Enrollment policies and authentication
protocols
15Training
- 1. Enrollment of subjects During enrollment one
or more samples B of a subjects biometric ß are
acquired and biometric samples or templates
derived from the samples B are stored in some
database M. - 2. Protocols A biometric authentication system
itself needs to be trained, by refining and
enhancing the signal or image to match the user
population characteristics and incrementally
improving the match engine.
16Training
17Enrollment is system training
- Build database M by selecting subjects d from the
world population W and assigning an identifier ID
to each subject.
18Enrollment is system training
- Three possibilities
- 1. Correctly linked, ID k
- 2. Subject dk is in reality a subject dj, with j
lt k, i.e., dk is duplicate of subject dj. As a
result, IDj and IDk are duplicates, representing
the same individual. - 3. Subject dk is in reality a subject dj, with j
gt k, i.e., dk is faking unenrolled subject dj. As
a result, IDk corresponds to a fake identity.
19Enrollment is system training
- We have non-zero probabilities
- - PD is the probability that some subject d ? M
is also enrolled under a different ID number - - PF is the probability that subject d ? M
is a fake identity - Database integrity
- - Integrity how well the database reflects the
truth data of the seed documents (birth
certification, proofs of citizenship, and
passports) used for enrollment
20Enrollment is system training
- The database integrity when it comes to
duplicates is determined by PD , the probability
of duplicates -
- - PDEA (Double Enroll Attack) refers to the
probability that an already enrolled subject dj
wishes to re-enroll in the database as a
different identity dk. - - FNMRE is the probability that a match between
two samples of the same biometric is not
detected, i.e., is missed. - - The number of duplicates in M is PD m, with
m the number of entities in M -
21Enrollment is system training
- The enrollment integrity is further determined by
PF, the probability of a fake enroll as dk - - FMRE is the probability that a match between
two different biometric samples is falsely
declared during enrollment - - PIA is the probability of impersonation attack
- - The number of fake identities in M equals PF
m
22Enrollment is system training
- Probabilistic enrollment
- - build an access control list of subjects di, i
1,,m of some database M. - - association between di and the
corresponding biometric ßi - - compute likelihood
- it expresses how well a subjects
biometric ßi match his template Bi - - probability can only be computed if there
exist some machine representation of real word
biometrics ßi , let these representations be
another set of templates and write -
23Enrollment is system training
- where, for simplicity, we assume that the
match score - is the likelihood that di is the true subject,
given Bi - Modeling the world
- - Prob (di Bi) can be approximated by match
score si only under very unrealistic
circumstances. - - more realistic approximations will have to
involve the modeling of other subjects dk
enrolled in M, more generally, compute Prob (di
O) - the likelihood of subject di given the
biometric data O collected at enrollment time
24Enrollment is system training
- - Prob (O) is the prior probability that this
particular observation will occur (which cannot
be computed exactly) - - assume Prob (di) Pd is constant
- - evaluate Prob (Odi) is a matter of
fitting model di to the data O and determine how
well this can be done. -
- - evaluating the rest of this expression Prob
(Odk) k j1,, m is impossible, because these
subjects are not available upon dj enrollment
25Enrollment is system training
- Modeling the rest of the world cohorts
- - the most difficult issue in training a
biometric authentication system is the modeling
of data from unknown people. - - voice verification methods not only use a
model describing the speakers biometric machine
representation, but also a model describing all
other speakers. - - two techniques to approximate the denominator
of (9.7)
26Enrollment is system training
1. World modeling
- - reduce the set M to one fictitious model
subject D, trained on a pool of data from many
different speakers, who represent the world W
of possible speakers. - - factor , so that the
denominator reflects the whole population D di
27Enrollment is system training
2. Cohort modeling
- - approximate the set M by a subset Mi that
resemble subject di . for each subject di , a set
of approximate forgeries is computed and stored.
We denote this set by Di the set is called the
set of cohorts of speaker i. - - factor ?i ci, the number of cohorts for di
28Enrollment is system training
- Updating the probabilities
-
- - denote Prob (di O) with Pi
-
- - during operation of the authentication system,
data from subjects is collected and likelihood Pi
could be updated. - - upon authentication of subject di , a
biometric sample is acquired that we denote here
as ?O. - - compute Prob (di O, ?O)
29Enrollment is system training
- - what needs to be evaluated is the denominator
Prob (?O) - - set Prob (di) Pi
30Enrollment is system training