Title: Ballistics DNA
1(No Transcript)
2Ballistics DNA
Alain Beauchamp, PH.D.
3The path to a ballistic probability model
- PART I Correlation score and probability
- PART II Ballistic probability model
- PART III How could we implement a probability
model in a ballistic system? - Conclusion and future work
4Part I Correlation scores and probability
- Strengths and limitations of the current
correlation score - Why are correlation scores hard to interpret?
- Benefits of a probability score
5Strength and limitations of the correlation score
- In the last 15 years, the correlation score has
been in the core of FTs ballistic systems - Strength of a correlation score
- Useful as a ranking tool
- Can compare score values computed with the same
reference A (and same type of mark) - Score(A against B) Score(A against C) means
- B looks more similar to A than C does
6Strength and limitations of the correlation score
(Contd)
- Limitations of a correlation score
- Correlation score is hard to interpret
- Not useful as an intrinsic similarity measure
- Examples
- Cannot compare score values computed with
different references (same type of mark) - Score(A-B) Score(C-D)
- DOES NOT mean that
- the A-B pair looks more similar than the C-D
pair - Cannot compare score values computed from
different marks - Score(A-B) for the Firing Pin Score(A-B) for
BreechFace - DOES NOT mean
- B looks more similar to A on the FiringPin than
on the BreechFace
7The score is hard to interpret. Why?
- 5 reasons
- 1 Different algorithms for different marks
- Characteristics of the correlatable features and
the geometry are very different - FP/BF circular contour and a wide variety of
features - Ejector/Rimfire polygonal contour
- Bullets stria only
- 2 Algorithms change over time
8The score is hard to interpret. Why? (Contd)
- 3 No unique cartridge or bullet score
- More than 1 score per exhibit
- Cartridge cases
- BF/FP/Ejector scores
- Bullets (Land)
- MaxPhase2D, PeakPhase2D, PeakScore2D
- 3DScore
- Number of score per exhibit expected to increase
in the future - Cartridge cases 3D scores
- Bullets
- Added 3D Land score
- GEA scores?
9The score is hard to interpret. Why? (Contd)
- 4 Effect of the database size
- As the database size increases, the probability
to find non matches that look similar to a given
reference increases - The probability to find a known match in the
Top10 decreases even if the score does not
change - The score value alone is not sufficient. The
database size is an important factor as well. - Universal law, not only in ballistics systems
10The score is hard to interpret. Individual Score
Response
- 5 Each reference has its own score response.
- Example
- If two cartridges A and B are correlated against
the same large database (with no match in it) - Sometimes get two very different list of scores
- For example, scores associated with A could be
greater then scores associated with B
11The score is hard to interpret. Individual Score
Response (Contd)
- Experiment Correlate 9LG bullets against the
same large database (800 non matches) with
BulletTRAX-3D - Compare their non match score distribution
- Significant differences
- high score region
- position of the peak
- Each bullet has its own statistical distribution
of non match scores - No universal score response common to all
bullets
9LG Bullet A
9LG Bullet B
12Solution Convert scores into probabilities
- Each of the previous problems can be solved using
probabilities (in principle) - Different Algorithms
- Probability is a common concept for all score
types - Algorithms change over time
- Probability value may still change, but slightly
- Distinct score response for each bullet/cartridge
- Probability is a common concept for all exhibits
- Effect of database size
- Statistical models based on relevant data could
quantify this effect - More than 1 score per bullet/cartridge
- Compute a probability for each score and combine
them to find a unique probability for the
bullet/cartridge
13How could we combine probabilities? Cartridge case
- Assume
- we have a BF and a FP score for a pair of
cartridge cases AND - the 2 following probabilities are known
- P(FP) Confirmed match according to FP
- P(BF) Confirmed match according to BF
- 4 possible scenarios
- Confirmed match according to BOTH FP and BF
- Confirmed match according to FP ONLY
- Confirmed match according to BF ONLY
- Not a confirmed match
14How could we combine probabilities? Cartridge
case (Contd)
- FP/BF marks provide independent information
- A combined probability is computed by assuming
independent information - P Combined 1 (1-PBF)(1-PFP)
- Results
- A mark with a low probability has no effect on
the combined probability - As we add marks, the combined probability
improves - Easy to generalize for 3 independent marks
15How could we combine probabilities? Bullets
- The 4 bullets scores are not computed from
independent information - Are computed from the same areas on the bullet
- A combined probability cannot be computed by
assuming independent information - Keep the highest probability only (conservative)
16Conclusion Part I
- The probability of being a match is a more
meaningful concept than correlation score - Using probability solves all problems found with
the interpretation of correlation scores - Probabilities of individual marks can be combined
nicely - Challenge Compute the probability of being a
match for individual marks - Two main unknowns
- How to deal with the individual score response of
each cartridge/bullet - How to predict the effect of database size
17Part II Ballistic Probability Model
- Goal and constraints of the model
- Hypothesis
- Tests and results
18Statistical model of scores Goal Constraints
- Project started in 2003
- Goal Develop a model which
- Converts the correlation score of a mark into a
probability of being a match - Current constraints
- We only have database of sister pairs
- Tests with BulletTRAX-3D scores
- The model should find the same performance as the
large database study - As the database size increases, the probability
to find a known match in the first position
should decrease
19Ballistic Statistical Model Hypothesis
- Any mathematical or physical model starts with a
small number of hypotheses/laws/axioms - Need hypotheses for the (3D bullet) ballistic
model - Need to find something common to all bullet score
distributions - However, each bullet has its own score response
20Hypothesis (Contd)
Non Match Statistical distribution
- Experiment already discussed
- Correlate 9LG bullets against the same large
database (800 non matches) - Compare their non match score distribution (3D)
- Differences
- in the high score region
- in the position of the peak
- Similarity
- The distributions have a similar shape
9LG Bullet 1
9LG Bullet 2
21Hypothesis (Contd)
- Core Hypothesis
- The non match score distribution of all bullets
- Has the same universal shape (up to a shift and
stretch factor) - This shape is independent of calibre, material
and quality of the marks - Can be broken into two hypotheses
- Hypothesis I
- The non match score distribution of each bullet
is fully characterized by only two parameters - its mean (position of the peak)
- its width
- Hypothesis II
- If we remove the effect of these 2 parameters,
- the non match score distributions of bullets are
strictly identical - The effect of the 2 parameters is removed as
follows - Shift the overall distribution at the same peak
position for every bullet - Shrink or expand the overall distribution to get
the same width for every bullet
22Hypothesis (Contd)
- The effect of the 2 parameters is removed as
follow - Shift the mean to 0
- Shrink to unit width
- Get very similar distributions!
- Small variations due to limited data
9LG Bullet 1
? ?
9LG Bullet 2
23Ballistic Statistical Model Testing the model
- 4 steps
- Compute 3D correlation scores from a large
database study with BulletTRAX-3D - 4 calibers, 2 materials/compositions
- Compute the individual parameters for each bullet
(Hypothesis I) - mean and width of its non match score
distribution - Determine a Universal Non Match score
distribution - (Hypothesis II)
- By simulations, predict the performance of the
correlation algorithm as a function of database
size
24Testing the model Database General Information
Pittsburgh bullets database (Allegheny County
Coroners Office Forensic Laboratory Division)
25Testing the model Compute individual parameters
- For each bullet
- get an approximation of the universal
distribution (Hypothesis II) - The scores are normalized by this process
- For each bullet
- Mean and width are computed
- The distribution is
- Shifted the mean to 0
- Rescaled to unit width
?
26Testing the model Define a universal non
match distribution
- Add up the approximated universal distributions
found for all bullets - Smooth shape even in high score region
- Universal Normalized distribution for non match
scores
27Testing the model Simulations
- The simulation reproduces the operations done in
a real large database study - Real study (with sister pairs)
- For each reference bullet
- Introduce its known match in the database of size
N - Compute all correlation scores between the
reference and (N1) bullets in the database - Find the rank of the known match
- Compute the performance of the correlation
algorithm (number of known matches at the first
position)
28Testing the model Simulations (Contd)
- Simulation
- For each reference bullet
- Select randomly N non match (normalized)
correlation scores from the universal score
distribution - Normalize the (known) score of its known match by
using - the references individual parameters (mean and
width of its non match score distribution) - Introduce the normalized score of its known match
in the (generated) non-match score list - Find the rank of the known match
- Compute the simulated performance of the
correlation algorithm - Repeat the same process for several databases
sizes N
29Testing the model Simulations (Contd)
Probability that the sister is at the first
position as a function of its normalized score
S
- Dark circles experimental data
- Dark curve
- Result from the model
- Gray curves Prediction for other database sizes
8
30Testing the model Simulations (Contd)
- Summary of the figure
- If the sister has a normalized score 8
- The probability to be in first position is
- 90 for N 500
- 70 for N 2K
- 20 for N 10K
- If we want the sister to be at the first position
with a 95 probability, - its score must be
- 9 for N 500
- 10 for N 2K
- 12 for N 10K
31Part II Summary
- A statistical model of non match scores was built
- a database of 2000 bullets, 4 calibers, 2
compositions/materials - 3D correlation on BulletTRAX-3D
- Hypothesis
- The non match score distribution has the same
shape for all bullets - (except for a shift and stretch factor)
- The model computes the probability that the
sister with a given score is in first position - The prediction agrees with the actual performance
in the large database study - Performance decreases as the database size
increases
32Part III
- How could we implement a probabilistic model in a
ballistic system?
33How could we implement a probabilistic model in a
ballistic system?
- Correlate a given bullet against a large database
- From the (large) list of scores, compute the two
characteristic parameters of the reference bullet
- mean and width of its non match score
distribution - Compute the probability that the bullet in the
first position is a match by using - The universal non match score distributions
- Two characteristic parameters computed previously
- Actual score of the bullet at the first position
- Information about match score distributions
(unknown yet)
34How could we implement a probabilistic model in a
ballistic system? (Contd)
- Repeat the same process for all score types
- MaxPhase2D, PeakPhase2D, PeakScore2D
- 3DScore
- Combine the 4 probabilities into a unique
probability for the bullet
35Future work
- Improving the model with new large database
studies (new calibers) - Test on cartridges
- Get a better knowledge of sister score
distributions - The current study was done with sister pairs only
- Use the model to improve correlation algorithms
36(No Transcript)