Title: Application of Metamorphic Testing to Supervised Classifiers
1 Application of Metamorphic Testing to
Supervised Classifiers
Xiaoyuan Xie, Tsong Yueh Chen Swinburne
University of Technology
Christian Murphy, Gail Kaiser Columbia University
Joshua Ho University of Sydney
Baowen Xu Nanjing University
2Background
- Many applications in the field of scientific
computing depend on machine learning (ML)
algorithms - ML applications often do not have test oracles
that indicate whether the output is correct for
arbitrary input - Applications without test oracles are called
non-testable programs
3Problem Statement
- Oracles may exist for a limited subset of the
input domain, and gross errors (e.g. crashes) can
be detected with certain inputs or techniques - However, it is difficult to detect subtle
(computational) errors for arbitrary inputs
4Testing ML Applications
- There has been much research into applying ML
techniques to software testing, but not the other
way around - Reusable real-world data sets and frameworks are
available for checking that an ML algorithm
predicts well, but not for checking that an
implementation works correctly
5Observation
- If there is no oracle in the general case, we
cannot know the expected relationship between a
particular input and its output - However, it may be possible to know relationships
between a set of inputs and the corresponding set
of outputs - Metamorphic Testing Chen et al. 98 is such
an approach
6Metamorphic Testing
- An approach for creating follow-on test cases
based on previous test cases - If input x produces output f(x), then the
functions metamorphic properties are used to
guide a transformation function t, which is
applied to produce a new test case input, t(x) - We can then predict the expected value of f(t(x))
based on the value of f(x) obtained from the
actual execution
7Metamorphic Testing without an Oracle
- When a test oracle exists, we can know whether
f(t(x)) is correct - Because we have an oracle for f(x)
- So if f(t(x)) is as expected, then it is correct
- When there is no test oracle, f(x) acts as a
pseudo-oracle for f(t(x)) - If f(t(x)) is as expected, it is not necessarily
correct - However, if f(t(x)) is not as expected, either
f(x) or f(t(x)) (or both) is wrong
8Metamorphic Testing Example
- Consider a program that reads a text file of test
scores for students in a class, and computes the
averages and the standard deviation of the
averages - If we permute the values in the text file, the
results should stay the same - If we multiply each score by 10, the final
results should all be multiplied by 10 as well - These metamorphic properties can be used to
create a pseudo-oracle for the application
9Approach
- To apply Metamorphic Testing to such ML
applications, we first enumerate the metamorphic
relations based on the expected behaviors of a
given machine learning algorithm - We then utilize these relations to conduct
metamorphic testing on the implementation
10Verification Validation
- The scope of which metamorphic properties are
necessary may differ between various problems in
the domain - Properties that are necessary can be used for
verification Is the implementation of the
algorithm correct? - Other properties can be used for validation Is
the algorithm appropriate for solving this
problem?
11Research Questions
- What are the metamorphic properties of supervised
ML classification algorithms? - Which can be used for verification?
- Which can be used for validation?
- Can metamorphic testing detect defects in
real-world ML applications?
12Machine Learning Fundamentals
- Data sets consist of a number of samples, each of
which has attributes and a label - In the first phase (training), a model is
generated that attempts to generalize how
attributes relate to the label - In the second phase, the model is applied to a
previously-unseen data set (testing data) with
unknown labels to produce a classification of
each sample
13Algorithms Investigated
- k-Nearest Neighbors (kNN)
- Samples in the testing data are classified by
using Euclidean distance to find the k nearest
samples in the training data - Classification is then done by majority rule
- Naïve Bayes Classifier (NBC)
- For a given sample in the testing data, computes
the probability of that sample belonging to each
class, assuming conditional independence between
the attributes - Chooses the class that is most likely
14Metamorphic Relations
- We identified 11 properties that we would expect
all classification algorithms to have - Affine transformation of attributes
- Permutation of labels or attributes
- Addition of informative or uninformative
attributes - Addition of classes by duplicating or re-labeling
samples - Removal of classes or samples
15Experimental Setup
- Applied the approach to implementations in the
Weka 3.5.7 toolkit - Initial test cases
- Randomly generated values
- Four attributes (columns)
- 20-50 samples (rows)
- Metamorphic relations were applied to create
20-300 follow-on test cases
16Results
k Nearest Neighbors
Naïve Bayes Classifier
17Analysis kNN
- No necessary properties were violated
- Issues related to validation
- Labels that are non-existent in the training data
have a non-zero chance of being selected in
classification - If two labels are equally likely, the first one
that is listed is chosen
18Analysis Naïve Bayes
- Four necessary properties were violated,
indicating defects in the implementation - Loss of precision related to use of the double
datatype in Java - Laplace Accuracy used to determine probabilities
thus, labels that did not appear in training data
have non-zero probability
19Suggestions
- We suggest using the BigDecimal class instead
of the double datatype - Laplace Accuracy is appropriate for the
attributes but not for the labels - Use of Laplace Accuracy should be set as an
option
20Future Work
- Apply the testing approach to other domains that
depend on ML, such as scientific computing - Further investigation of testing non-testable
programs - Measure the effectiveness of the approach in
empirical studies
21Summary
- Metamorphic testing is easy to implement and
automate - We were able to devise fault-revealing properties
even with just a basic understanding of the ML
algorithms - Metamorphic testing can be used for both
verification and validation
22 Application of Metamorphic Testing to
Supervised Classifiers
Xiaoyuan Xie, Tsong Yueh Chen Swinburne
University of Technology
Christian Murphy, Gail Kaiser Columbia University
Joshua Ho University of Sydney
Baowen Xu Nanjing University
23Related Work
- Applying MT to non-testable programs in other
domains - General properties for use in MT