Title: Test Validity: What it is, and why we care.
1Test ValidityWhat it is, and why we care.
2Validity
- What is validity?
- What is a construct? Meehls nomological net
- Types of validity
- Content validity
- Criterion-related validity
- Construct Validity
- Incremental Validity
- The multi-trait multi-method matrix
3What is validity?
- The validity of a test is the extent to which it
measures the construct that it is designed to
measure (roughly, how accurate it is) - As we shall see, there are many ways for a test
to be inaccurate, and therefore validity is not
a single measure
4Meehl Construct Validity in Psychology Tests
- Meehls paper is, in large part, an attempt to
define the notion of a psychological construct - A construct is a notion that psychology helps
itself to freely, without always giving full
concern to explaining it - Meehls approach is to try to define it in a way
that is simultaneously philosophically coherent
and empirically useful - He builds on positivist philosophy, which
attempts to combine empiricism with a formalized
rationalism
5Paul Meehl What is a construct?
- Meehls definition of a construct has 6 main
elements, as follows - 1.) To say what a construct is means to say what
laws it is subject to. - - This is a definition you can refuse to work
with it or say why you think it is bad, but you
cant disprove it - - The sum of all laws is called a constructs
nomological network.
6What does nomological mean?
- I had always believed it came from
- ad. L. nomin meaning name
- I was wrong. In fact it comes from
- ad. Gr. nom combining form of a word meaning
law - So psychonomics is the study of the laws of the
psyche, and nomological network refers to a
network of psychological components whose
relations can be described by laws or rules
7Adapted from http//trochim.human.cornell.edu/kb/
nomonet.htm
The nomological network consists of
i.) Representations of the concepts of interest
(constructs)
CONSTRUCT
CONSTRUCT
CONSTRUCT
CONSTRUCT
CONSTRUCT
OBS
OBS
OBS
OBS
OBS
OBS
OBS
OBS
OBS
OBS
ii.) Their observable manifestations iii.) The
relationships within and between i.) and ii.)
8Adapted from http//trochim.human.cornell.edu/kb/
nomonet.htm
Theoretical propositions
CONSTRUCT
CONSTRUCT
CONSTRUCT
CONSTRUCT
CONSTRUCT
Operationalized theoretical constructs
OBS
OBS
OBS
OBS
OBS
OBS
OBS
OBS
OBS
OBS
Correspondence rules
Empirical observations
9Paul Meehl What is a construct?
- 2.) Laws may relate observable and theoretical
elements - - The relations must be lawful, but they may
be either causal or statistical (whats the
relation between causal and statistical?) - - What are the theoretical elements?
Constructs! -
10Paul Meehl What is a construct?
- - What are the theoretical elements?
Constructs! - - To escape from circularity and pure speculation
about the properties of constructs, we need to
anchor the nomological net concretely in some
objective reality, hence - 3.) A construct is only admissable if at least
some of the laws to which it is subject involve
observables - If not, we could define a self-consistent network
of ideas that had no relevance to the real world
(and many such networks have been defined! Such
as?) - You should be able to relate this idea of
observables to our earlier discussion of
information what counts as observable is what
counts as information (a detectable difference
that makes a difference) -
11Paul Meehl What is a construct?
- 4.) Elaboration of a constructs nomological net
learning more about that construct - - We elaborate a construct by drawing new
relations, either between elements already in the
network, or between those elements and new
elements outside of the network - - This elaboration is precisely the work of
psychometrics, as well as the work of science in
general
12Paul Meehl What is a construct?
- 5.) Ockhams razor Einsteins addendum
- - That is make things as simple as possible,
but no simpler - 6.) Identity means playing the same role in the
same network - - If it looks like a duck, walks like a duck,
and quacks like a duck then it is a duck! - - Or (in the spirit of Gregory Bateson) If it
makes no difference, then it makes no difference. - at least pending further investigation
13How to measure validity
- Analyze the content of the test
- Relate test scores to specific criteria
- Examine the psychological constructs measured by
the test
14Construct validity
- Construct validity the extent to which a test
measures the construct it claims to measure - Does an intelligence test measure intelligence?
Does a neuroticism test measure neuroticism? What
is latent hostility since it is latent? - As Meehl notes, construct validity is very
general and often very difficult to determine in
a definitive manner - If it looks like a measure of the skill or
knowledge it is supposed to measure, we say it
has face validity - How can we determine construct validity? (How
will you know if you get given a good exam in
this class?)
15Construct validity
- There are two kinds of construct validity
convergent validity or discriminant validity - Convergent validity (sometimes called empirical
validity) means that the measure under
consideration agrees with other measures that are
alleged (or theoretically supposed to) to measure
the same things - Divergent validity means that the measure under
consideration is distinct from other measures
that are alleged (or theoretically supposed to)
to measure different things
16Content validity
- Content validity the extent to which the test
elicits a range of responses over the range of of
skills, understanding, or behavior the test
measures the extent to which it reflects the
specific intended domain of content - In abstract and/or complex domains, it may be
quite difficult to ensure content validity - Could a test have construct validity but not
content validity?
17Criterion-related validity
- Criterion-related validity depends upon relating
test scores to performance on some relevant
criterion or set of criteria - i.e. Validate tests against school marks,
supervisor ratings, or dollar value of productive
work - There are two kinds of criterion-related
validity concurrent and predictive
18Concurrent validity
- Concurrent validity the validity criterion are
available at the time of testing - i.e. give the test to subjects who have been
selected for their economic background or
diagnostic group - the validity of the MMPI was determined in this
manner
19Predictive validity
- Predictive validity the criterion are not
available at the time of testing - concerned with how well test scores predict
future performance - For example, IQ tests should correlate with
academic ratings, grades, problem-solving skills
etc. - A good r-value for most psychological questions
would be .60
20What affects validity?
- i.) Moderator variables Those characteristics
that define groups, such as sex, age, personality
type etc. - - A test that is well-validated on one group
may be less good with another - - Validity is usually better with more
heterogeneous groups, because the range of
behaviors and test scores is larger - And therefore
- ii.) Base rates Tests are less effective when
base rates are very high or very low (that is,
whenever they are skewed from 50/50)
21What affects validity?
- iii.) Test length
- - For similar reasons of the size of the domain
sampled (think of the binomial rabbits or trying
to decide how biased a coin is), longer tests
tend to be more reliably related to the criterion
than shorter tests
22Test length
- Informally, we can see that the same size changes
(such as being 1 flip away from fair) make more
difference to the size of area under the curve
when N is low - Next class we consider how to think about this
for other values in a more formal manner
23What affects validity?
- iii.) Test length
- - For similar reasons of the size of the domain
sampled (think of the binomial rabbits or trying
to decide how biased a coin is), longer tests
tend to be more reliably related to the criterion
than shorter tests -
- - Note that this depends on the questions being
independent ( every question increasing
information) - - When it is not, longer tests are not more
reliable - - eg. short forms of WAIS
- - However, note that independence need only be
partial (r lt 1, but not necessarily r 0)
24What affects validity?
- iv.) The nature of the validity criterion
- - Criterion can be contaminated, especially if
the interpretation of test responses is not
well-specified, allowing for results to feed
back to criterion - - In such cases, there is confusion between the
validation criteria and the test results the
circularity of self-fulfilling prophecy (a
dormitive principle) - - In essence we are then stuck at the
theoretical level of the nomological net, with no
way for empirical study ( no information) to
tell us we are wrong
25How to measure construct validity
- i.) Get expert judgments of the content
- ii.) Analyze the internal consistency of the test
(Tune in next class for how to do this, and why
it is not strictly validity, though it informs
validity) - iii.) Study the relationships between test scores
and other non-test variables which are
known/presumed to relate the same construct - - eg. Meehl mentions Binets vindication by
teachers - iv.) Question your subjects about their responses
in order to elicit underlying reasons for their
responses. - v.) Demonstrate expected changes over time
26How to measure construct validity
- vi.) Study the relationships between test scores
and other test scores which are known/presumed to
relate to (or depart from) the construct
(Convergent versus discriminant validity) - - Multitrait-multimethod approach
Correlations of the same trait measured by the
same and different measures gt correlations of a
different trait measured by the same and
different measures We will look at this in more
detail in a minute. -
- What if correlations of measures of different
traits using the same method gt correlations of
measures of the same trait using different
methods?
27Incremental validity
- Incremental validity refers to the amount of gain
in predictive value obtained by using a
particular test (or test subset) - If we give N tests and are 90 sure of the
diagnosis after that, and the N1th test will
make us 91 sure, is it worth buying that gain
in validity? - Cost/benefit analysis is required.
28Validity coefficient
- Validity coefficient correlation (r) between
test score and a criterion - There is no general answer to the questions how
high should a validity coefficient be? Or What
shall we use for a criterion?
29Measuring validation error
- Coefficient of determination r2
- the percent of variation explained
- Coefficient of alienation k (1 - r2)0.5
- k is the inverse to correlation a measure of
nonassociation between two variables - If k 1.0, you have 100 of the error youd have
had if you just guessed (since this means your r
was 0) - If k 0, you have achieved perfection your r
was 1, and there was no error at all - If k 0.6, you have 60 of the error youd have
had if you guessed
N.B. This never happens.
30Example
- The correlation between SAT scores and college
performance is 0.40. How much of the variation in
college performance is explained by SAT Scores? - r2 0.16, so 16 of the variance is explained
(and so 84 is not explained). - What is the coefficient of alienation?
- Sqrt(1- 0.16) Sqrt(0.84) 0.92
31Why should we care?
- k is useful in reporting accuracy of a test in a
way which is unit free BUT notice that it tells
you nothing you didnt already know from being
told r - It has some other uses in statistics beyond the
scope of this class
32Multitrait-multimethod matrix
- The multi-trait, multi-method matrix is a way of
representing the relations between several traits
(constructs) and several methods for measuring
those constructs, in a systematic and organized
fashion - The organization allows one to display and
understand a great deal of information about both
reliability (which we will discuss in detail next
class) and validity in a compact form
33Multitrait-multimethod matrix
Image from http//trochim.cornell.edu/kb/mtmmmat.
htm
34Multitrait-multimethod matrix
Image from http//trochim.cornell.edu/kb/mtmmmat.
htm
- Validity diagonals Tell you how well you can
measure the same construct using different
methods (monotrait-heteromethod diagonals) - Each entry shows the correlation between two
different methods used to measure the same
construct - We hope these will be highly correlated
convergent validity
35Multitrait-multimethod matrix
Image from http//trochim.cornell.edu/kb/mtmmmat.
htm
- Heterotrait monomethod triangles These show
different constructs measured by the same method - Correlations of the same trait measured by the
same and different measures (Validity diagonals)
should be greater than correlations of a
different trait measured by the same and
different measures (Heterotrait monomethod
triangles) - If not, what is going on?
36Multitrait-multimethod matrix
Image from http//trochim.cornell.edu/kb/mtmmmat.
htm
- Heterotrait heteromethod triangles These show
different constructs measured by different
methods - Because they share neither trait nor method,
they should be expected to be low
37Multitrait-multimethod matrix
Image from http//trochim.cornell.edu/kb/mtmmmat.
htm
- Reliability diagonal Test-Retest or internal
consistency reliabilities - These tell you how reliably you can measure each
construct (A,B,C) with each method (
mono-trait, mono-method correlations) - Next class we discuss reliability in detail