Test Validity: What it is, and why we care. - PowerPoint PPT Presentation

1 / 37
About This Presentation
Title:

Test Validity: What it is, and why we care.

Description:

– PowerPoint PPT presentation

Number of Views:178
Avg rating:3.0/5.0
Slides: 38
Provided by: ualb2
Category:
Tags: care | test | validity

less

Transcript and Presenter's Notes

Title: Test Validity: What it is, and why we care.


1
Test ValidityWhat it is, and why we care.
2
Validity
  • What is validity?
  • What is a construct? Meehls nomological net
  • Types of validity
  • Content validity
  • Criterion-related validity
  • Construct Validity
  • Incremental Validity
  • The multi-trait multi-method matrix

3
What is validity?
  • The validity of a test is the extent to which it
    measures the construct that it is designed to
    measure (roughly, how accurate it is)
  • As we shall see, there are many ways for a test
    to be inaccurate, and therefore validity is not
    a single measure

4
Meehl Construct Validity in Psychology Tests
  • Meehls paper is, in large part, an attempt to
    define the notion of a psychological construct
  • A construct is a notion that psychology helps
    itself to freely, without always giving full
    concern to explaining it
  • Meehls approach is to try to define it in a way
    that is simultaneously philosophically coherent
    and empirically useful
  • He builds on positivist philosophy, which
    attempts to combine empiricism with a formalized
    rationalism

5
Paul Meehl What is a construct?
  • Meehls definition of a construct has 6 main
    elements, as follows
  • 1.) To say what a construct is means to say what
    laws it is subject to.
  • - This is a definition you can refuse to work
    with it or say why you think it is bad, but you
    cant disprove it
  • - The sum of all laws is called a constructs
    nomological network.

6
What does nomological mean?
  • I had always believed it came from
  • ad. L. nomin meaning name
  • I was wrong. In fact it comes from
  • ad. Gr. nom combining form of a word meaning
    law
  • So psychonomics is the study of the laws of the
    psyche, and nomological network refers to a
    network of psychological components whose
    relations can be described by laws or rules

7
Adapted from http//trochim.human.cornell.edu/kb/
nomonet.htm
The nomological network consists of
i.) Representations of the concepts of interest
(constructs)
CONSTRUCT
CONSTRUCT
CONSTRUCT
CONSTRUCT
CONSTRUCT
OBS
OBS
OBS
OBS
OBS
OBS
OBS
OBS
OBS
OBS
ii.) Their observable manifestations iii.) The
relationships within and between i.) and ii.)
8
Adapted from http//trochim.human.cornell.edu/kb/
nomonet.htm
Theoretical propositions
CONSTRUCT
CONSTRUCT
CONSTRUCT
CONSTRUCT
CONSTRUCT
Operationalized theoretical constructs
OBS
OBS
OBS
OBS
OBS
OBS
OBS
OBS
OBS
OBS
Correspondence rules
Empirical observations
9
Paul Meehl What is a construct?
  • 2.) Laws may relate observable and theoretical
    elements
  • - The relations must be lawful, but they may
    be either causal or statistical (whats the
    relation between causal and statistical?)
  • - What are the theoretical elements?
    Constructs!

10
Paul Meehl What is a construct?
  • - What are the theoretical elements?
    Constructs!
  • - To escape from circularity and pure speculation
    about the properties of constructs, we need to
    anchor the nomological net concretely in some
    objective reality, hence
  • 3.) A construct is only admissable if at least
    some of the laws to which it is subject involve
    observables
  • If not, we could define a self-consistent network
    of ideas that had no relevance to the real world
    (and many such networks have been defined! Such
    as?)
  • You should be able to relate this idea of
    observables to our earlier discussion of
    information what counts as observable is what
    counts as information (a detectable difference
    that makes a difference)

11
Paul Meehl What is a construct?
  • 4.) Elaboration of a constructs nomological net
    learning more about that construct
  • - We elaborate a construct by drawing new
    relations, either between elements already in the
    network, or between those elements and new
    elements outside of the network
  • - This elaboration is precisely the work of
    psychometrics, as well as the work of science in
    general

12
Paul Meehl What is a construct?
  • 5.) Ockhams razor Einsteins addendum
  • - That is make things as simple as possible,
    but no simpler
  • 6.) Identity means playing the same role in the
    same network
  • - If it looks like a duck, walks like a duck,
    and quacks like a duck then it is a duck!
  • - Or (in the spirit of Gregory Bateson) If it
    makes no difference, then it makes no difference.
  • at least pending further investigation

13
How to measure validity
  • Analyze the content of the test
  • Relate test scores to specific criteria
  • Examine the psychological constructs measured by
    the test

14
Construct validity
  • Construct validity the extent to which a test
    measures the construct it claims to measure
  • Does an intelligence test measure intelligence?
    Does a neuroticism test measure neuroticism? What
    is latent hostility since it is latent?
  • As Meehl notes, construct validity is very
    general and often very difficult to determine in
    a definitive manner
  • If it looks like a measure of the skill or
    knowledge it is supposed to measure, we say it
    has face validity
  • How can we determine construct validity? (How
    will you know if you get given a good exam in
    this class?)

15
Construct validity
  • There are two kinds of construct validity
    convergent validity or discriminant validity
  • Convergent validity (sometimes called empirical
    validity) means that the measure under
    consideration agrees with other measures that are
    alleged (or theoretically supposed to) to measure
    the same things
  • Divergent validity means that the measure under
    consideration is distinct from other measures
    that are alleged (or theoretically supposed to)
    to measure different things

16
Content validity
  • Content validity the extent to which the test
    elicits a range of responses over the range of of
    skills, understanding, or behavior the test
    measures the extent to which it reflects the
    specific intended domain of content
  • In abstract and/or complex domains, it may be
    quite difficult to ensure content validity
  • Could a test have construct validity but not
    content validity?

17
Criterion-related validity
  • Criterion-related validity depends upon relating
    test scores to performance on some relevant
    criterion or set of criteria
  • i.e. Validate tests against school marks,
    supervisor ratings, or dollar value of productive
    work
  • There are two kinds of criterion-related
    validity concurrent and predictive

18
Concurrent validity
  • Concurrent validity the validity criterion are
    available at the time of testing
  • i.e. give the test to subjects who have been
    selected for their economic background or
    diagnostic group
  • the validity of the MMPI was determined in this
    manner

19
Predictive validity
  • Predictive validity the criterion are not
    available at the time of testing
  • concerned with how well test scores predict
    future performance
  • For example, IQ tests should correlate with
    academic ratings, grades, problem-solving skills
    etc.
  • A good r-value for most psychological questions
    would be .60

20
What affects validity?
  • i.) Moderator variables Those characteristics
    that define groups, such as sex, age, personality
    type etc.
  • - A test that is well-validated on one group
    may be less good with another
  • - Validity is usually better with more
    heterogeneous groups, because the range of
    behaviors and test scores is larger
  • And therefore
  • ii.) Base rates Tests are less effective when
    base rates are very high or very low (that is,
    whenever they are skewed from 50/50)

21
What affects validity?
  • iii.) Test length
  • - For similar reasons of the size of the domain
    sampled (think of the binomial rabbits or trying
    to decide how biased a coin is), longer tests
    tend to be more reliably related to the criterion
    than shorter tests

22
Test length
  • Informally, we can see that the same size changes
    (such as being 1 flip away from fair) make more
    difference to the size of area under the curve
    when N is low
  • Next class we consider how to think about this
    for other values in a more formal manner

23
What affects validity?
  • iii.) Test length
  • - For similar reasons of the size of the domain
    sampled (think of the binomial rabbits or trying
    to decide how biased a coin is), longer tests
    tend to be more reliably related to the criterion
    than shorter tests
  • - Note that this depends on the questions being
    independent ( every question increasing
    information)
  • - When it is not, longer tests are not more
    reliable
  • - eg. short forms of WAIS
  • - However, note that independence need only be
    partial (r lt 1, but not necessarily r 0)

24
What affects validity?
  • iv.) The nature of the validity criterion
  • - Criterion can be contaminated, especially if
    the interpretation of test responses is not
    well-specified, allowing for results to feed
    back to criterion
  • - In such cases, there is confusion between the
    validation criteria and the test results the
    circularity of self-fulfilling prophecy (a
    dormitive principle)
  • - In essence we are then stuck at the
    theoretical level of the nomological net, with no
    way for empirical study ( no information) to
    tell us we are wrong

25
How to measure construct validity
  • i.) Get expert judgments of the content
  • ii.) Analyze the internal consistency of the test
    (Tune in next class for how to do this, and why
    it is not strictly validity, though it informs
    validity)
  • iii.) Study the relationships between test scores
    and other non-test variables which are
    known/presumed to relate the same construct
  • - eg. Meehl mentions Binets vindication by
    teachers
  • iv.) Question your subjects about their responses
    in order to elicit underlying reasons for their
    responses.
  • v.) Demonstrate expected changes over time

26
How to measure construct validity
  • vi.) Study the relationships between test scores
    and other test scores which are known/presumed to
    relate to (or depart from) the construct
    (Convergent versus discriminant validity)
  • - Multitrait-multimethod approach
    Correlations of the same trait measured by the
    same and different measures gt correlations of a
    different trait measured by the same and
    different measures We will look at this in more
    detail in a minute.
  • What if correlations of measures of different
    traits using the same method gt correlations of
    measures of the same trait using different
    methods?

27
Incremental validity
  • Incremental validity refers to the amount of gain
    in predictive value obtained by using a
    particular test (or test subset)
  • If we give N tests and are 90 sure of the
    diagnosis after that, and the N1th test will
    make us 91 sure, is it worth buying that gain
    in validity?
  • Cost/benefit analysis is required.

28
Validity coefficient
  • Validity coefficient correlation (r) between
    test score and a criterion
  • There is no general answer to the questions how
    high should a validity coefficient be? Or What
    shall we use for a criterion?

29
Measuring validation error
  • Coefficient of determination r2
  • the percent of variation explained
  • Coefficient of alienation k (1 - r2)0.5
  • k is the inverse to correlation a measure of
    nonassociation between two variables
  • If k 1.0, you have 100 of the error youd have
    had if you just guessed (since this means your r
    was 0)
  • If k 0, you have achieved perfection your r
    was 1, and there was no error at all
  • If k 0.6, you have 60 of the error youd have
    had if you guessed

N.B. This never happens.
30
Example
  • The correlation between SAT scores and college
    performance is 0.40. How much of the variation in
    college performance is explained by SAT Scores?
  • r2 0.16, so 16 of the variance is explained
    (and so 84 is not explained).
  • What is the coefficient of alienation?
  • Sqrt(1- 0.16) Sqrt(0.84) 0.92

31
Why should we care?
  • k is useful in reporting accuracy of a test in a
    way which is unit free BUT notice that it tells
    you nothing you didnt already know from being
    told r
  • It has some other uses in statistics beyond the
    scope of this class

32
Multitrait-multimethod matrix
  • The multi-trait, multi-method matrix is a way of
    representing the relations between several traits
    (constructs) and several methods for measuring
    those constructs, in a systematic and organized
    fashion
  • The organization allows one to display and
    understand a great deal of information about both
    reliability (which we will discuss in detail next
    class) and validity in a compact form

33
Multitrait-multimethod matrix
Image from http//trochim.cornell.edu/kb/mtmmmat.
htm
34
Multitrait-multimethod matrix
Image from http//trochim.cornell.edu/kb/mtmmmat.
htm
  • Validity diagonals Tell you how well you can
    measure the same construct using different
    methods (monotrait-heteromethod diagonals)
  • Each entry shows the correlation between two
    different methods used to measure the same
    construct
  • We hope these will be highly correlated
    convergent validity

35
Multitrait-multimethod matrix
Image from http//trochim.cornell.edu/kb/mtmmmat.
htm
  • Heterotrait monomethod triangles These show
    different constructs measured by the same method
  • Correlations of the same trait measured by the
    same and different measures (Validity diagonals)
    should be greater than correlations of a
    different trait measured by the same and
    different measures (Heterotrait monomethod
    triangles)
  • If not, what is going on?

36
Multitrait-multimethod matrix
Image from http//trochim.cornell.edu/kb/mtmmmat.
htm
  • Heterotrait heteromethod triangles These show
    different constructs measured by different
    methods
  • Because they share neither trait nor method,
    they should be expected to be low

37
Multitrait-multimethod matrix
Image from http//trochim.cornell.edu/kb/mtmmmat.
htm
  • Reliability diagonal Test-Retest or internal
    consistency reliabilities
  • These tell you how reliably you can measure each
    construct (A,B,C) with each method (
    mono-trait, mono-method correlations)
  • Next class we discuss reliability in detail
Write a Comment
User Comments (0)
About PowerShow.com