MOS 384a - Reliability and Validity - PowerPoint PPT Presentation

1 / 39
About This Presentation
Title:

MOS 384a - Reliability and Validity

Description:

Trust in test publishers' glossy brochures. Review of test items/content based on 'common sense' ... OK, but how do you know it then? ... – PowerPoint PPT presentation

Number of Views:66
Avg rating:3.0/5.0
Slides: 40
Provided by: ssc6
Category:

less

Transcript and Presenter's Notes

Title: MOS 384a - Reliability and Validity


1
MOS 384a - Reliability and Validity
  • Overview
  • Intro and some basic terms
  • Basics of psychometric theory
  • Reliability
  • Validity
  • Excursus Validity generalization
  • Applicants perspective

2
MOS 384a - Reliability and Validity
  • Readings
  • Textbook (CWHM) Chapter 2
  • Present Slides , and Your Notes

3
MOS 384a - Reliability and Validity
  • How do you know how well your selection system
    works?
  • Inappropriate ways to evaluate the system
  • Trust in test publishers glossy brochures
  • Review of test items/content based on common
    sense
  • Anecdotal evidence
  • Do as always did

4
MOS 384a - Reliability and Validity
  • OK, but how do you know it then?
  • Recruitment and Selection as a System (see CWHM,
    Fig. 2.1, p.25)
  • Constructing and evaluating a selection system
    that works is a scientific process.
  • You develop hypotheses on what may work (based on
    prior experience and theory) and
  • ...test them empirically.
  • The empirical evaluation is called validation.

5
MOS 384a - Reliability and Validity
  • Some Basic Terms
  • (Psychological) Construct an unobservable
    quality that needs to be inferred from observable
    measures
  • KSAO Constructs
  • Knowledge can be declarative (facts) or
    procedural (How to?)
  • Skills being able to perform a certain task
    (manually and/or mentally)
  • Abilities More general/abstract constructs that
    facilitate acquisition of knowledge/skills.
  • Other characteristics Personality traits,
    attitudes, etc., that are not directly related to
    cognitive or physical abilities but also
    important for performance on the job.

6
MOS 384a - Reliability and Validity
  • Science- vs. Practice-Based Selection (see CWHM,
    Table 2.1, p.28)

7
MOS 384a - Reliability and Validity
  • The Basics of Psychometrics
  • If you observe the same thing in a group of
    people, you get a distribution
  • Distributions are most easily described by a
    central tendency (e.g., the arithmetic mean), and
    the variation around it (e.g., the standard
    deviation)

8
MOS 384a - Reliability and Validity
  • The Basics of Psychometrics
  • If you observe more than one thing each of which
    varies, there will be covariation (e.g., a
    correlation r) between the variables

9
MOS 384a - Reliability and Validity
  • The Basics of Psychometrics
  • KSAOs are often measured by psychological tests.
  • Psychological tests are based on psychometric
    theory.
  • (Classic) Psychometric theory involves a number
    of assumptions or axioms
  • An observed test score (x) is composed of a true
    score (t) and an error (e) x t e
  • There is nothing systematic in the error
    component, which means that we expect errors to
    cancel each other out the more often we measure
    µ(e) 0
  • The error component is NOT correlated with the
    true score r(t,e) 0
  • nor with the true score on other variables
    r(t,e) 0
  • nor with the error in a repeated measure
    r(e,e) 0

10
MOS 384a - Reliability and Validity
  • The Basics of Psychometrics
  • However, in reality, things are often slightly
    more complicated. For example
  • Observed test scores (x) often reflect other
    systematic components (t) in addition to the
    true score (t) and random error (e) x t
    t e
  • These additional systematic components can have
    undesirable properties, such as being correlated
    with the true score, not being cancelled out in
    repeated measurements, etc.

11
MOS 384a - Reliability and Validity
  • Reliability The first concept of psychometric
    quality
  • Definition Reliability is the degree to which a
    test is free of random measurement error s2t /
    (s2t s2e)
  • If we know the reliability, we know the extent to
    which a test measures something
  • We still dont know the degree to which the test
    measures what it should measure

12
MOS 384a - Reliability and Validity
  • Factors Affecting Reliability
  • Temporary Individual Characteristics (e.g., mood,
    physical or psychological well-being)
  • Lack of standardization (e.g., differing
    conditions under which a test is administered,
    differences between questions asked in an
    interview)
  • Chance (e.g., guessing on a knowledge or
    intelligence test, differences in prior
    experience with a test)
  • Lack of comprehensiveness (e.g., too few items in
    a test, inadequate scale format)

13
MOS 384a - Reliability and Validity
  • Methods of Estimating Reliability
  • Test-Retest Reliability Administer the same test
    twice and correlate the two measurements
  • Internal Consistency Take different parts
    (items, halves) of the test and correlate them
    with each other
  • Alternate Forms Construct two equivalent
    versions of the same test and correlate them
  • Inter-Rater Reliability Let two or more persons
    assess the same ratee on the same variables and
    correlate the raters evaluations
  • In essence, all forms of reliability estimation
    yield a tests correlation with itself rtt

14
MOS 384a - Reliability and Validity
  • Validity The core concept of psychometric
    quality
  • Definition Validity is the degree to which
    inferences or interpretations based on a test
    score for a specific purpose are justified.
  • Validity is NOT a property of the test but of the
    inferences based on the test
  • Reliability is a necessary but NOT a sufficient
    precondition of validity.

15
MOS 384a - Reliability and Validity
  • Approaching Validity from Different Angles
  • Validity is a unitary concept. There are no
    multiple validities.
  • However, there are many different ways to
    approach validity. Not all of them are equally
    suitable in every instance. It all depends on the
    inferences you wish to make

16
MOS 384a - Reliability and Validity
  • The Classic Distinction between 3 Validity
    Concepts
  • Content validation draw inferences from a test
    score to a larger domain of similar content
  • Emphasis on representativeness for the domain
  • Established usually through expert ratings
  • Example work sample test

17
MOS 384a - Reliability and Validity
  • The Classic Distinction between 3 Validity
    Concepts
  • Construct validation draw inferences from a test
    score to a psychological construct
  • Emphasis on relations between empirical
    measurement and theoretical constructs
  • Established through a wide range of means. For
    example, a test should correlate highly with
    other measures of the same or similar constructs
    (convergent validity) and low with measures of
    conceptually distinct constructs (discriminant
    validity)
  • Examples cognitive ability test, personality
    test
  • NOTE CWHM are misleading on this issue. Evidence
    of construct validity is often based on relations
    to other variables.

18
MOS 384a - Reliability and Validity
  • The Classic Distinction between 3 Validity
    Concepts
  • Criterion-related validation draw inferences
    from a test score to outside variables
  • Emphasis on prediction of outside variables, in
    personnel selection typically job performance
  • Established through criterion-related validation
    studies. The criteria can be measured at the same
    time as the predictor (concurrent validation) or
    at a later point in time (predictive validation).
  • Examples important for any kind of selection
    device. However, some instruments (e.g., some
    kinds of biodata questionnaires) rely almost
    exclusively on evidence of criterion-related
    validity.

19
MOS 384a - Reliability and Validity
  • Factors Affecting Validity Coefficients
  • Measurement error Reliability places an upper
    limit on validity (rpt vrtt)
  • Range restriction If we employ job incumbents to
    validate a selection procedure, the current
    employees are likely to be more similar to each
    other than the members of the original applicant
    pool.
  • Sampling error Sample sizes in validation
    studies are often so small that the empirical
    coefficient becomes an imprecise estimate of the
    actual population coefficient
  • Differences between the situation in the
    validation study and the actual selection
    situation
  • Flaws in criterion measurement

20
MOS 384a - Excursus Validity Generalization
  • VG Overview
  • Why Doing VG and Other Metaanalyses?
  • 1.1 Narrative Review vs. Metaanalysis
  • 1.2 What does it mean to us?
  • 2. Conducting a VG Study
  • 2.1 Research Question
  • 2.2 Literature Search
  • 2.3 Coding Individual Studies
  • 2.4 Computations
  • 3. Some Critical Remarks

21
MOS 384a - Excursus Validity Generalization
  • Why doing metaanalyses?
  • From narrative review to metaanalysis
  • The idea of replication Two studies are better
    than one (and three or more are even better).
  • There are often many studies on a topic. If so,
    how to make sense of them overall?
  • 2 possible ways
  • Combine them intuitively
  • Combine them statistically

22
MOS 384a - Excursus Validity Generalization
  • Major Differences between Narrative and
    Metaanalytic Review
  • Narrative Review Approach
  • Intuitive and implicit weighting of study
    outcomes or count of significant/non significant
    results
  • -gt subjective summary
  • Problem Real effects are often underestimated
    because statistical artifacts are not taken into
    account
  • Metaanalytic Approach
  • Objective and quantitative weighting of study
    outcomes
  • -gt quantitative summary of mean and variation of
    effect sizes
  • Statistical artefacts are systematically
    investigated

23
MOS 384a - Excursus Validity Generalization
  • Major Difference between VG and Single Validation
    Study
  • VG based on (much) more comprehensive data
  • VG delivers additional useful information
  • Estimate of the true (population) value of
    validity coefficients (?)
  • Estimate of the true size of the variation
    around ? after correcting for statistical
    artefacts (sampling error, and often also
    measurement error, range restriction, etc.)
  • Helps to identify systematic sources of variation
    across study findings (called moderators or
    subgroups, often followed by moderator analyses)

24
MOS 384a - Excursus Validity Generalization
  • How a VG Study is Conducted
  • Research Question
  • Often not hypotheses testing but rather
    exploratory (What is the validity of method X in
    predicting job performance?)
  • Requires to exactly demarcate and structure the
    field of research to end up with generalizable
    results

25
MOS 384a - Excursus Validity Generalization
  • Literature Search
  • Goal Find all studies that fit your research
    question
  • Where to search
  • Reference sections of prior narrative and
    metaanalytic reviews
  • Keyword search (vary search terms!) in electronic
    databases (e.g., PsycINFO, PSYNDEX, ABI/INFORM,
    Sociofile, Dissertational Abstracts not Google)
  • Systematic manual search in relevant journals
  • Contacting researchers, institutes, business
    organizations with requests for unpublished
    studies
  • Which studies to drop
  • Missing information (but dont forget to ask the
    authors!)
  • for conceptual reasons (lack of quality or
    relevance for research question needs to be
    substantiated!)

26
MOS 384a - Excursus Validity Generalization
  • Coding of Individual Studies
  • Standard source, sample size, statistical
    artefacts (if reported)
  • Assigment to subgroups according to planned
    moderator analyses, for example
  • Sample characteristics (e.g., students vs.
    managers)
  • Predictor characteristics (e.g., interview
    structure)
  • Predictor characteristics (e.g., performance
    ratings vs. objective indicators)
  • Design characteristics (e.g., predictive vs.
    concurrent validation)
  • Source characteristics (e.g., published vs.
    unpublished)
  • ...

27
MOS 384a - Excursus Validity Generalization
  • Computations
  • (according to the VG method by Hunter Schmidt,
    1990, 2005)
  • There are K correlation coefficients and N
    persons entered into computations (important
    correlations must be from independent samples
    otherwise, compute a mean across correlations
    within a single study)
  • It generally applies (all else being equal) that
    the larger K and N, the more meaningful are the
    results of a VG study

28
MOS 384a - Excursus Validity Generalization
  • Computational Steps
  • mean uncorrected correlation ?ro ? Niroi / ?
    Ni

29
MOS 384a - Excursus Validity Generalization
  • Computational Steps
  • (2) Correcting for artefacts (each coefficient
    individually)
  • Measurement error attenuation correction rxy
    / (rxxryy)½ Note correcting for attenuation in
    the criterion only leads to an estimate of
    operational (practical) validity correcting
    both predictor and criterion for attenuation
    estimates the true relationship between
    constructs. If the reliability is not reported in
    individual studies, it has to be estimated from
    known information
  • Range restriction or enhancement Divide the
    studys standard deviation (SD) by the population
    SD Note Its often hard to estimate the
    population SD
  • Compute the product of the individual corrections
    per study
  • Divide each observed correlation by its
    respective product of corrections (which is
    usually lt 1, so rc gt ro).

30
MOS 384a - Excursus Validity Generalization
  • Computational Steps
  • (3) Computing the mean corrected correlation
    (true score correlation)
  • Each individually corrected correlation is
    weighted with the product of its sample size
    times the squared product of corrections (see
    above i.e. larger samples and less flawed
    studies receive larger weights)
  • Compute the true score correlation (estimate of
    the population correlation ?) ? wirci / ? wi

31
MOS 384a - Excursus Validity Generalization
  • Computational Steps
  • (4) Estimate the variance that remains after
    correcting for artefacts
  • Compute the variance of the corrected
    coefficients (Var(r)) ? wi (rci - ?)² / ?
    wi
  • Compute the variance accounted for by artefacts
    (Var(e)) It becomes larger the (a) smaller
    individual samples are, (b) the larger the
    artefacts are, (c) the smaller the observed
    correlations are
  • Compute the variance not accounted for by
    artefacts Var(r) - Var(e) (Note This
    difference can become negative will then be
    assumed to be zero)

32
MOS 384a - Excursus Validity Generalization
  • Computational Steps
  • (5) Examine the generalizability of the mean
    validity
  • (a) 75-Rule If at least 75 of the variance of
    the corrected coefficients is accounted for by
    artefacts (Var(e)/ Var(r) ? 0,75), the validity
    is said to be generalizable. That is, there are
    no substantial differences between the situations
    in which the single validity coefficients were
    observed
  • (b) Credibility interval (CV) If the 90-CV ( ?
    1,64 (Var(r) - Var(e))½ ) does not include zero
    or if it is relatively small, the validity is
    said to be generalizable (Note The CV is not a
    confidence interval, which could also be computed
    and tells you about the accuracy of the estimate
    for?ro)
  • ? If there is still substantial variance after
    correcting for artefacts, this could be taken as
    evidence of the existence of moderator variables
    (subgroups with differing population means)

33
MOS 384a - Excursus Validity Generalization
  • Computational Steps
  • (6) If Applicable Moderator Analyses
  • Create subgroups of studies according to
    previously specified criteria (participant
    groups, predictor variants, criterion measures,
    etc.)
  • Compute a new VG study for each group
    individually
  • Decisive factor Does the variance not accounted
    for by artefacts decrease substantially in
    subgroup analyses? If so (cf. 75-Regel, CV), the
    mean validity in each group can be interpreted as
    this groups population value.
  • Problem Second order sampling error. Every
    single moderator analysis contains fewer data
    than the overall VG study. Therefore, values
    found in moderator analyses are more prone to be
    affected by atypical studies and less reliable.

34
MOS 384a - Excursus Validity Generalization
  • An Example (Hülsheger et al., in press)

35
MOS 384a - Excursus Validity Generalization
  • Some Critical Remarks on Metaanalysis
  • (1) Publication Bias
  • Effect sizes overstate true effects, because null
    findings have lower chances of getting published
  • Plausible, but Empirical comparisons between
    published and unpublished studies have often
    shown negligible differences null findings can
    be due to poor quality of the research
  • (2) Apples and Oranges-Problem
  • Metaanalysts tend to lump together studies that
    are hardly comparable
  • Maybe, but Metaanalysis provides you with the
    means to uncover substantial differences between
    studies and quantify them in moderator analyses

36
MOS 384a - Excursus Validity Generalization
  • (3) Lawnmower-Method
  • Metaanalyses obscure the particularities of
    individual studies by summarizing them all in a
    single statistical value
  • Yes, but Metaanalysis is an alternative to the
    narrative review, not to primary empirical
    studies if you want to learn about the details
    of a particular primary study, you have to go
    back to the original source
  • (4) Over-Interpretation
  • Metaanalyses are often considered to be the
    final word on a subject, which can terminate
    research interest in this issue
  • Can be true, but If so, it can be a blessing or
    a curse the former if resources were otherwise
    wasted on matters that can be closed, the latter
    if conclusions based on metaanalyses turn out to
    be wrong or deficient

37
MOS 384a - Excursus Validity Generalization
  • Conclusion
  • VG and other methods of metaanalysis are powerful
    tools for making sense of the often confusing
    volume of apparently contradictory findings in
    heavily researched fields. They are not to be
    seen as machines that automatically generate the
    truth about empirical questions.

38
MOS 384a - Reliability and Validity
  • Considering the Applicants Perspective Bias,
    Fairness, and Acceptability
  • Bias Systematic errors in measurement related to
    identifiable group membership characteristics
    (e.g., sex, age, and many more)
  • Fairness The principle that every applicant
    should be assessed in an equitable manner.
    Fairness is based on judgment and often involves
    processes of negotiation in a society/group.
  • Acceptability An applicants individual
    perception of a selection procedure as being
    fair, valid, useful, etc. Includes attitudinal
    and behavioral reactions to being exposed to the
    procedure (e.g., likelihood of accepting a job
    offer).

39
MOS 384a - Reliability and Validity
  • According to organizational justice theory
    (Gilliland, 1993), applicants accept selection if
    it satisfies
  • Distributive Justice Selection decisions based
    on accepted standards (merit, need, )
  • Procedural Justice Adherence to rules of
    structural (e.g., perceived validity, equal
    administration), informational (communication
    during and after process), and interpersonal
    (respect, privacy) justice.
  • Discuss implications for practice!
Write a Comment
User Comments (0)
About PowerShow.com