Test Validity: What it is, and why we care. - PowerPoint PPT Presentation

1 / 37
About This Presentation

Test Validity: What it is, and why we care.


– PowerPoint PPT presentation

Number of Views:180
Avg rating:3.0/5.0
Slides: 38
Provided by: ualb2
Tags: care | test | validity


Transcript and Presenter's Notes

Title: Test Validity: What it is, and why we care.

Test ValidityWhat it is, and why we care.
  • What is validity?
  • What is a construct? Meehls nomological net
  • Types of validity
  • Content validity
  • Criterion-related validity
  • Construct Validity
  • Incremental Validity
  • The multi-trait multi-method matrix

What is validity?
  • The validity of a test is the extent to which it
    measures the construct that it is designed to
    measure (roughly, how accurate it is)
  • As we shall see, there are many ways for a test
    to be inaccurate, and therefore validity is not
    a single measure

Meehl Construct Validity in Psychology Tests
  • Meehls paper is, in large part, an attempt to
    define the notion of a psychological construct
  • A construct is a notion that psychology helps
    itself to freely, without always giving full
    concern to explaining it
  • Meehls approach is to try to define it in a way
    that is simultaneously philosophically coherent
    and empirically useful
  • He builds on positivist philosophy, which
    attempts to combine empiricism with a formalized

Paul Meehl What is a construct?
  • Meehls definition of a construct has 6 main
    elements, as follows
  • 1.) To say what a construct is means to say what
    laws it is subject to.
  • - This is a definition you can refuse to work
    with it or say why you think it is bad, but you
    cant disprove it
  • - The sum of all laws is called a constructs
    nomological network.

What does nomological mean?
  • I had always believed it came from
  • ad. L. nomin meaning name
  • I was wrong. In fact it comes from
  • ad. Gr. nom combining form of a word meaning
  • So psychonomics is the study of the laws of the
    psyche, and nomological network refers to a
    network of psychological components whose
    relations can be described by laws or rules

Adapted from http//trochim.human.cornell.edu/kb/
The nomological network consists of
i.) Representations of the concepts of interest
ii.) Their observable manifestations iii.) The
relationships within and between i.) and ii.)
Adapted from http//trochim.human.cornell.edu/kb/
Theoretical propositions
Operationalized theoretical constructs
Correspondence rules
Empirical observations
Paul Meehl What is a construct?
  • 2.) Laws may relate observable and theoretical
  • - The relations must be lawful, but they may
    be either causal or statistical (whats the
    relation between causal and statistical?)
  • - What are the theoretical elements?

Paul Meehl What is a construct?
  • - What are the theoretical elements?
  • - To escape from circularity and pure speculation
    about the properties of constructs, we need to
    anchor the nomological net concretely in some
    objective reality, hence
  • 3.) A construct is only admissable if at least
    some of the laws to which it is subject involve
  • If not, we could define a self-consistent network
    of ideas that had no relevance to the real world
    (and many such networks have been defined! Such
  • You should be able to relate this idea of
    observables to our earlier discussion of
    information what counts as observable is what
    counts as information (a detectable difference
    that makes a difference)

Paul Meehl What is a construct?
  • 4.) Elaboration of a constructs nomological net
    learning more about that construct
  • - We elaborate a construct by drawing new
    relations, either between elements already in the
    network, or between those elements and new
    elements outside of the network
  • - This elaboration is precisely the work of
    psychometrics, as well as the work of science in

Paul Meehl What is a construct?
  • 5.) Ockhams razor Einsteins addendum
  • - That is make things as simple as possible,
    but no simpler
  • 6.) Identity means playing the same role in the
    same network
  • - If it looks like a duck, walks like a duck,
    and quacks like a duck then it is a duck!
  • - Or (in the spirit of Gregory Bateson) If it
    makes no difference, then it makes no difference.
  • at least pending further investigation

How to measure validity
  • Analyze the content of the test
  • Relate test scores to specific criteria
  • Examine the psychological constructs measured by
    the test

Construct validity
  • Construct validity the extent to which a test
    measures the construct it claims to measure
  • Does an intelligence test measure intelligence?
    Does a neuroticism test measure neuroticism? What
    is latent hostility since it is latent?
  • As Meehl notes, construct validity is very
    general and often very difficult to determine in
    a definitive manner
  • If it looks like a measure of the skill or
    knowledge it is supposed to measure, we say it
    has face validity
  • How can we determine construct validity? (How
    will you know if you get given a good exam in
    this class?)

Construct validity
  • There are two kinds of construct validity
    convergent validity or discriminant validity
  • Convergent validity (sometimes called empirical
    validity) means that the measure under
    consideration agrees with other measures that are
    alleged (or theoretically supposed to) to measure
    the same things
  • Divergent validity means that the measure under
    consideration is distinct from other measures
    that are alleged (or theoretically supposed to)
    to measure different things

Content validity
  • Content validity the extent to which the test
    elicits a range of responses over the range of of
    skills, understanding, or behavior the test
    measures the extent to which it reflects the
    specific intended domain of content
  • In abstract and/or complex domains, it may be
    quite difficult to ensure content validity
  • Could a test have construct validity but not
    content validity?

Criterion-related validity
  • Criterion-related validity depends upon relating
    test scores to performance on some relevant
    criterion or set of criteria
  • i.e. Validate tests against school marks,
    supervisor ratings, or dollar value of productive
  • There are two kinds of criterion-related
    validity concurrent and predictive

Concurrent validity
  • Concurrent validity the validity criterion are
    available at the time of testing
  • i.e. give the test to subjects who have been
    selected for their economic background or
    diagnostic group
  • the validity of the MMPI was determined in this

Predictive validity
  • Predictive validity the criterion are not
    available at the time of testing
  • concerned with how well test scores predict
    future performance
  • For example, IQ tests should correlate with
    academic ratings, grades, problem-solving skills
  • A good r-value for most psychological questions
    would be .60

What affects validity?
  • i.) Moderator variables Those characteristics
    that define groups, such as sex, age, personality
    type etc.
  • - A test that is well-validated on one group
    may be less good with another
  • - Validity is usually better with more
    heterogeneous groups, because the range of
    behaviors and test scores is larger
  • And therefore
  • ii.) Base rates Tests are less effective when
    base rates are very high or very low (that is,
    whenever they are skewed from 50/50)

What affects validity?
  • iii.) Test length
  • - For similar reasons of the size of the domain
    sampled (think of the binomial rabbits or trying
    to decide how biased a coin is), longer tests
    tend to be more reliably related to the criterion
    than shorter tests

Test length
  • Informally, we can see that the same size changes
    (such as being 1 flip away from fair) make more
    difference to the size of area under the curve
    when N is low
  • Next class we consider how to think about this
    for other values in a more formal manner

What affects validity?
  • iii.) Test length
  • - For similar reasons of the size of the domain
    sampled (think of the binomial rabbits or trying
    to decide how biased a coin is), longer tests
    tend to be more reliably related to the criterion
    than shorter tests
  • - Note that this depends on the questions being
    independent ( every question increasing
  • - When it is not, longer tests are not more
  • - eg. short forms of WAIS
  • - However, note that independence need only be
    partial (r lt 1, but not necessarily r 0)

What affects validity?
  • iv.) The nature of the validity criterion
  • - Criterion can be contaminated, especially if
    the interpretation of test responses is not
    well-specified, allowing for results to feed
    back to criterion
  • - In such cases, there is confusion between the
    validation criteria and the test results the
    circularity of self-fulfilling prophecy (a
    dormitive principle)
  • - In essence we are then stuck at the
    theoretical level of the nomological net, with no
    way for empirical study ( no information) to
    tell us we are wrong

How to measure construct validity
  • i.) Get expert judgments of the content
  • ii.) Analyze the internal consistency of the test
    (Tune in next class for how to do this, and why
    it is not strictly validity, though it informs
  • iii.) Study the relationships between test scores
    and other non-test variables which are
    known/presumed to relate the same construct
  • - eg. Meehl mentions Binets vindication by
  • iv.) Question your subjects about their responses
    in order to elicit underlying reasons for their
  • v.) Demonstrate expected changes over time

How to measure construct validity
  • vi.) Study the relationships between test scores
    and other test scores which are known/presumed to
    relate to (or depart from) the construct
    (Convergent versus discriminant validity)
  • - Multitrait-multimethod approach
    Correlations of the same trait measured by the
    same and different measures gt correlations of a
    different trait measured by the same and
    different measures We will look at this in more
    detail in a minute.
  • What if correlations of measures of different
    traits using the same method gt correlations of
    measures of the same trait using different

Incremental validity
  • Incremental validity refers to the amount of gain
    in predictive value obtained by using a
    particular test (or test subset)
  • If we give N tests and are 90 sure of the
    diagnosis after that, and the N1th test will
    make us 91 sure, is it worth buying that gain
    in validity?
  • Cost/benefit analysis is required.

Validity coefficient
  • Validity coefficient correlation (r) between
    test score and a criterion
  • There is no general answer to the questions how
    high should a validity coefficient be? Or What
    shall we use for a criterion?

Measuring validation error
  • Coefficient of determination r2
  • the percent of variation explained
  • Coefficient of alienation k (1 - r2)0.5
  • k is the inverse to correlation a measure of
    nonassociation between two variables
  • If k 1.0, you have 100 of the error youd have
    had if you just guessed (since this means your r
    was 0)
  • If k 0, you have achieved perfection your r
    was 1, and there was no error at all
  • If k 0.6, you have 60 of the error youd have
    had if you guessed

N.B. This never happens.
  • The correlation between SAT scores and college
    performance is 0.40. How much of the variation in
    college performance is explained by SAT Scores?
  • r2 0.16, so 16 of the variance is explained
    (and so 84 is not explained).
  • What is the coefficient of alienation?
  • Sqrt(1- 0.16) Sqrt(0.84) 0.92

Why should we care?
  • k is useful in reporting accuracy of a test in a
    way which is unit free BUT notice that it tells
    you nothing you didnt already know from being
    told r
  • It has some other uses in statistics beyond the
    scope of this class

Multitrait-multimethod matrix
  • The multi-trait, multi-method matrix is a way of
    representing the relations between several traits
    (constructs) and several methods for measuring
    those constructs, in a systematic and organized
  • The organization allows one to display and
    understand a great deal of information about both
    reliability (which we will discuss in detail next
    class) and validity in a compact form

Multitrait-multimethod matrix
Image from http//trochim.cornell.edu/kb/mtmmmat.
Multitrait-multimethod matrix
Image from http//trochim.cornell.edu/kb/mtmmmat.
  • Validity diagonals Tell you how well you can
    measure the same construct using different
    methods (monotrait-heteromethod diagonals)
  • Each entry shows the correlation between two
    different methods used to measure the same
  • We hope these will be highly correlated
    convergent validity

Multitrait-multimethod matrix
Image from http//trochim.cornell.edu/kb/mtmmmat.
  • Heterotrait monomethod triangles These show
    different constructs measured by the same method
  • Correlations of the same trait measured by the
    same and different measures (Validity diagonals)
    should be greater than correlations of a
    different trait measured by the same and
    different measures (Heterotrait monomethod
  • If not, what is going on?

Multitrait-multimethod matrix
Image from http//trochim.cornell.edu/kb/mtmmmat.
  • Heterotrait heteromethod triangles These show
    different constructs measured by different
  • Because they share neither trait nor method,
    they should be expected to be low

Multitrait-multimethod matrix
Image from http//trochim.cornell.edu/kb/mtmmmat.
  • Reliability diagonal Test-Retest or internal
    consistency reliabilities
  • These tell you how reliably you can measure each
    construct (A,B,C) with each method (
    mono-trait, mono-method correlations)
  • Next class we discuss reliability in detail
Write a Comment
User Comments (0)
About PowerShow.com