Title: Torture tests
1Torture tests
- A quantitative analysis for the robustness of
Knowledge-Based Systems
Perry Groot, Frank van Harmelen, and Annette Ten
Teije
2Motivation
Belief
The ability of KBSs to deal with
missing or invalid data is an essential dimension
of KBS validation.
Our claim
A quantitative analysis of
the robustness of KBSs is both possible and
useful.
3Two informal definitions
The degree to which a system or component can
function correctly in the presence of invalid
inputs or stressful environmental conditions
IEEE, 1990.
Robustness
Degradation study
4Two informal definitions
Robustness
In a degradation study we gradually decrease the
quality of the KBS input and measure how the KBS
output quality decreases as a result.
Degradation study
5Two quality measures
correct(I) ? output(I)
Recall(I)
correct(I)
correct(I) ? output(I)
Precision(I)
output(I)
6Recall Fraction of correct answers that the
system actually computes
Output(I)
Recall
?
Correct(I)
7Precision The fraction of computed answers that
are actually correct
Output(I)
Recall
?
Correct(I)
Precision
?
8Recall completeness, Precision
soundness.
Output(I)
Recall
?
Correct(I)
Precision
?
9Two quality measures
- Well known in information retrieval.
- No commitment to task or domain.
- Geared to KBSs with discrete answers.
- Correct answer has to be known beforehand.
- Answer set has to be finite.
10Case study
- System classifies plants from a part of Germany.
- Input Observations (flower,leafs,stem)
- Output Plant
- Internals ??? (feature of methodology!)
- Our degradation study uses the number of
- observations as gradual input measure.
11Some observations
Both average precision and average recall grow
almost monotonically when adding observations.
(Only 58 of the individual cases have an
monotonically increasing output set.)
Surprise 1.
Surprise 2.
Surprise 3.
Surprise 4.
Surprise 5.
Surprise 6.
12Some observations
Surprise 1.
After about 12 observations, adding more
observations does not increase the
precision. (Most cases contain 19-30
observations)
Surprise 2.
Surprise 3.
Surprise 4.
Surprise 5.
Surprise 6.
13Some observations
Surprise 1.
The region in which additional observations
actually contribute to an increase in precision
is surprisingly small, namely between the 6 and
12 observations.
Surprise 2.
Surprise 3.
Surprise 4.
Surprise 5.
Surprise 6.
14Some observations
Surprise 1.
When aiming for the maximum precision of 1,
there is no need to use any more than 12
observations. (Out of a maximum of 30!).
Surprise 2.
Surprise 3.
Surprise 4.
Surprise 5.
Surprise 6.
15Some observations
Surprise 1.
Surprise 2.
Surprise 3.
No increase in precision can be gained from the
first 6 observations.
Surprise 4.
Surprise 5.
Surprise 6.
16Some observations
Whatever the final precision that is ultimately
obtained by the system, this level of precision
is already obtained after at most 20
observations. (98 of the cases contained more
than 20 observations.)
Surprise 1.
Surprise 2.
Surprise 3.
Surprise 4.
Surprise 5.
Surprise 6.
17The conclusion
- Need for quantitative analysis of KBSs.
- Degradation studies are a good approach.
- Recall Precision are appropriate measures.
- Independence of the underlying PSM.
- Approach shown with case study.
18(No Transcript)