Test Validity: What it is, and why we care. - PowerPoint PPT Presentation

1 / 37

About This Presentation

Title:

Test Validity: What it is, and why we care.

Description:

– PowerPoint PPT presentation

Number of Views:180

Avg rating:3.0/5.0

Slides: 38

Provided by: ualb2

Category:

more less

Transcript and Presenter's Notes

Title: Test Validity: What it is, and why we care.

1
Test ValidityWhat it is, and why we care.
2
Validity

What is validity?
What is a construct? Meehls nomological net
Types of validity
Content validity
Criterion-related validity
Construct Validity
Incremental Validity
The multi-trait multi-method matrix

3
What is validity?

The validity of a test is the extent to which it
measures the construct that it is designed to
measure (roughly, how accurate it is)
As we shall see, there are many ways for a test
to be inaccurate, and therefore validity is not
a single measure

4
Meehl Construct Validity in Psychology Tests

Meehls paper is, in large part, an attempt to
define the notion of a psychological construct
A construct is a notion that psychology helps
itself to freely, without always giving full
concern to explaining it
Meehls approach is to try to define it in a way
that is simultaneously philosophically coherent
and empirically useful
He builds on positivist philosophy, which
attempts to combine empiricism with a formalized
rationalism

5
Paul Meehl What is a construct?

Meehls definition of a construct has 6 main
elements, as follows
1.) To say what a construct is means to say what
laws it is subject to.
- This is a definition you can refuse to work
with it or say why you think it is bad, but you
cant disprove it
- The sum of all laws is called a constructs
nomological network.

6
What does nomological mean?

I had always believed it came from
ad. L. nomin meaning name
I was wrong. In fact it comes from
ad. Gr. nom combining form of a word meaning
law
So psychonomics is the study of the laws of the
psyche, and nomological network refers to a
network of psychological components whose
relations can be described by laws or rules

7
Adapted from http//trochim.human.cornell.edu/kb/
nomonet.htm
The nomological network consists of
i.) Representations of the concepts of interest
(constructs)
CONSTRUCT
CONSTRUCT
CONSTRUCT
CONSTRUCT
CONSTRUCT
OBS
OBS
OBS
OBS
OBS
OBS
OBS
OBS
OBS
OBS
ii.) Their observable manifestations iii.) The
relationships within and between i.) and ii.)
8
Adapted from http//trochim.human.cornell.edu/kb/
nomonet.htm
Theoretical propositions
CONSTRUCT
CONSTRUCT
CONSTRUCT
CONSTRUCT
CONSTRUCT
Operationalized theoretical constructs
OBS
OBS
OBS
OBS
OBS
OBS
OBS
OBS
OBS
OBS
Correspondence rules
Empirical observations
9
Paul Meehl What is a construct?

2.) Laws may relate observable and theoretical
elements
- The relations must be lawful, but they may
be either causal or statistical (whats the
relation between causal and statistical?)
- What are the theoretical elements?
Constructs!

10
Paul Meehl What is a construct?

- What are the theoretical elements?
Constructs!
- To escape from circularity and pure speculation
about the properties of constructs, we need to
anchor the nomological net concretely in some
objective reality, hence
3.) A construct is only admissable if at least
some of the laws to which it is subject involve
observables
If not, we could define a self-consistent network
of ideas that had no relevance to the real world
(and many such networks have been defined! Such
as?)
You should be able to relate this idea of
observables to our earlier discussion of
information what counts as observable is what
counts as information (a detectable difference
that makes a difference)

11
Paul Meehl What is a construct?

4.) Elaboration of a constructs nomological net
learning more about that construct
- We elaborate a construct by drawing new
relations, either between elements already in the
network, or between those elements and new
elements outside of the network
- This elaboration is precisely the work of
psychometrics, as well as the work of science in
general

12
Paul Meehl What is a construct?

5.) Ockhams razor Einsteins addendum
- That is make things as simple as possible,
but no simpler
6.) Identity means playing the same role in the
same network
- If it looks like a duck, walks like a duck,
and quacks like a duck then it is a duck!
- Or (in the spirit of Gregory Bateson) If it
makes no difference, then it makes no difference.
at least pending further investigation

13
How to measure validity

Analyze the content of the test
Relate test scores to specific criteria
Examine the psychological constructs measured by
the test

14
Construct validity

Construct validity the extent to which a test
measures the construct it claims to measure
Does an intelligence test measure intelligence?
Does a neuroticism test measure neuroticism? What
is latent hostility since it is latent?
As Meehl notes, construct validity is very
general and often very difficult to determine in
a definitive manner
If it looks like a measure of the skill or
knowledge it is supposed to measure, we say it
has face validity
How can we determine construct validity? (How
will you know if you get given a good exam in
this class?)

15
Construct validity

There are two kinds of construct validity
convergent validity or discriminant validity
Convergent validity (sometimes called empirical
validity) means that the measure under
consideration agrees with other measures that are
alleged (or theoretically supposed to) to measure
the same things
Divergent validity means that the measure under
consideration is distinct from other measures
that are alleged (or theoretically supposed to)
to measure different things

16
Content validity

Content validity the extent to which the test
elicits a range of responses over the range of of
skills, understanding, or behavior the test
measures the extent to which it reflects the
specific intended domain of content
In abstract and/or complex domains, it may be
quite difficult to ensure content validity
Could a test have construct validity but not
content validity?

17
Criterion-related validity

Criterion-related validity depends upon relating
test scores to performance on some relevant
criterion or set of criteria
i.e. Validate tests against school marks,
supervisor ratings, or dollar value of productive
work
There are two kinds of criterion-related
validity concurrent and predictive

18
Concurrent validity

Concurrent validity the validity criterion are
available at the time of testing
i.e. give the test to subjects who have been
selected for their economic background or
diagnostic group
the validity of the MMPI was determined in this
manner

19
Predictive validity

Predictive validity the criterion are not
available at the time of testing
concerned with how well test scores predict
future performance
For example, IQ tests should correlate with
academic ratings, grades, problem-solving skills
etc.
A good r-value for most psychological questions
would be .60

20
What affects validity?

i.) Moderator variables Those characteristics
that define groups, such as sex, age, personality
type etc.
- A test that is well-validated on one group
may be less good with another
- Validity is usually better with more
heterogeneous groups, because the range of
behaviors and test scores is larger
And therefore
ii.) Base rates Tests are less effective when
base rates are very high or very low (that is,
whenever they are skewed from 50/50)

21
What affects validity?

iii.) Test length
- For similar reasons of the size of the domain
sampled (think of the binomial rabbits or trying
to decide how biased a coin is), longer tests
tend to be more reliably related to the criterion
than shorter tests

22
Test length

Informally, we can see that the same size changes
(such as being 1 flip away from fair) make more
difference to the size of area under the curve
when N is low
Next class we consider how to think about this
for other values in a more formal manner

23
What affects validity?

iii.) Test length
- For similar reasons of the size of the domain
sampled (think of the binomial rabbits or trying
to decide how biased a coin is), longer tests
tend to be more reliably related to the criterion
than shorter tests
- Note that this depends on the questions being
independent ( every question increasing
information)
- When it is not, longer tests are not more
reliable
- eg. short forms of WAIS
- However, note that independence need only be
partial (r lt 1, but not necessarily r 0)

24
What affects validity?

iv.) The nature of the validity criterion
- Criterion can be contaminated, especially if
the interpretation of test responses is not
well-specified, allowing for results to feed
back to criterion
- In such cases, there is confusion between the
validation criteria and the test results the
circularity of self-fulfilling prophecy (a
dormitive principle)
- In essence we are then stuck at the
theoretical level of the nomological net, with no
way for empirical study ( no information) to
tell us we are wrong

25
How to measure construct validity

i.) Get expert judgments of the content
ii.) Analyze the internal consistency of the test
(Tune in next class for how to do this, and why
it is not strictly validity, though it informs
validity)
iii.) Study the relationships between test scores
and other non-test variables which are
known/presumed to relate the same construct
- eg. Meehl mentions Binets vindication by
teachers
iv.) Question your subjects about their responses
in order to elicit underlying reasons for their
responses.
v.) Demonstrate expected changes over time

26
How to measure construct validity

vi.) Study the relationships between test scores
and other test scores which are known/presumed to
relate to (or depart from) the construct
(Convergent versus discriminant validity)
- Multitrait-multimethod approach
Correlations of the same trait measured by the
same and different measures gt correlations of a
different trait measured by the same and
different measures We will look at this in more
detail in a minute.
What if correlations of measures of different
traits using the same method gt correlations of
measures of the same trait using different
methods?

27
Incremental validity

Incremental validity refers to the amount of gain
in predictive value obtained by using a
particular test (or test subset)
If we give N tests and are 90 sure of the
diagnosis after that, and the N1th test will
make us 91 sure, is it worth buying that gain
in validity?
Cost/benefit analysis is required.

28
Validity coefficient

Validity coefficient correlation (r) between
test score and a criterion
There is no general answer to the questions how
high should a validity coefficient be? Or What
shall we use for a criterion?

29
Measuring validation error

Coefficient of determination r2
the percent of variation explained
Coefficient of alienation k (1 - r2)0.5
k is the inverse to correlation a measure of
nonassociation between two variables
If k 1.0, you have 100 of the error youd have
had if you just guessed (since this means your r
was 0)
If k 0, you have achieved perfection your r
was 1, and there was no error at all
If k 0.6, you have 60 of the error youd have
had if you guessed

N.B. This never happens.
30
Example

The correlation between SAT scores and college
performance is 0.40. How much of the variation in
college performance is explained by SAT Scores?
r2 0.16, so 16 of the variance is explained
(and so 84 is not explained).
What is the coefficient of alienation?
Sqrt(1- 0.16) Sqrt(0.84) 0.92

31
Why should we care?

k is useful in reporting accuracy of a test in a
way which is unit free BUT notice that it tells
you nothing you didnt already know from being
told r
It has some other uses in statistics beyond the
scope of this class

32
Multitrait-multimethod matrix

The multi-trait, multi-method matrix is a way of
representing the relations between several traits
(constructs) and several methods for measuring
those constructs, in a systematic and organized
fashion
The organization allows one to display and
understand a great deal of information about both
reliability (which we will discuss in detail next
class) and validity in a compact form

33
Multitrait-multimethod matrix
Image from http//trochim.cornell.edu/kb/mtmmmat.
htm
34
Multitrait-multimethod matrix
Image from http//trochim.cornell.edu/kb/mtmmmat.
htm

Validity diagonals Tell you how well you can
measure the same construct using different
methods (monotrait-heteromethod diagonals)
Each entry shows the correlation between two
different methods used to measure the same
construct
We hope these will be highly correlated
convergent validity

35
Multitrait-multimethod matrix
Image from http//trochim.cornell.edu/kb/mtmmmat.
htm

Heterotrait monomethod triangles These show
different constructs measured by the same method
Correlations of the same trait measured by the
same and different measures (Validity diagonals)
should be greater than correlations of a
different trait measured by the same and
different measures (Heterotrait monomethod
triangles)
If not, what is going on?

36
Multitrait-multimethod matrix
Image from http//trochim.cornell.edu/kb/mtmmmat.
htm

Heterotrait heteromethod triangles These show
different constructs measured by different
methods
Because they share neither trait nor method,
they should be expected to be low

37
Multitrait-multimethod matrix
Image from http//trochim.cornell.edu/kb/mtmmmat.
htm

Reliability diagonal Test-Retest or internal
consistency reliabilities
These tell you how reliably you can measure each
construct (A,B,C) with each method (
mono-trait, mono-method correlations)
Next class we discuss reliability in detail

Write a Comment

User Comments (0)