PersonalityPsychopathology Measurement and IRT: promising opportunities - PowerPoint PPT Presentation

1 / 48
About This Presentation
Title:

PersonalityPsychopathology Measurement and IRT: promising opportunities

Description:

... there is a large domain (e.g., spelling) where the items are sampled from, in ... Influence psychological factors on health, illness, and death ... – PowerPoint PPT presentation

Number of Views:25
Avg rating:3.0/5.0
Slides: 49
Provided by: tol41
Category:

less

Transcript and Presenter's Notes

Title: PersonalityPsychopathology Measurement and IRT: promising opportunities


1
Personality/Psychopathology Measurement and IRT
promising opportunities
  • Rob Meijer

2
Personality Assessment
  • Diagnosis of personality and personality
    disorders requires an evaluation of the
    individual
  • Self-reports and peer-reports questionnaires are
    often used to determine personality traits
  • Contexts health care, clinical psychology,
    personnel selection and development

3
Topic
  • How can item response theory (IRT) improve
    understanding of personality questionnaires ?
  • Discuss several applications, what I have learned
    from my cooperation with clinical, personality,
    and I/O psychologists
  • Not enough research projects that communicates
    convincingly the relative superiority of the IRT
    approach in the personality domain

4
Topic
  • IRT applied in educational and cognitive
    measurement
  • The purpose of cognitive assessment precise and
    valid scaling of individual differences.
  • In (applied) personality assessment test score
    interpretation and prediction of wide ranging
    behavior
  • In cognitive assessment there is a large domain
    (e.g., spelling) where the items are sampled
    from, in personality many domains are restricted.
    There are only a number of indicators of e.g.,
    social introversion, friendliness, or narcissism.

5
Topic
  • When IRT is transported from cognitive abilities
    into typical performance assessment, special
    issues and problems arise
  • E.g., limited indicators (items), underlying
    distribution not normal

6
Personality Assessment
  • In 2002 and 2003, 20 of 39 research articles in
    JEM and 32 of 52 in APM involved IRT
  • 2 out of 122 in Journal of Personality Assessment
    and 6 of 106 articles in Psychological Assessment
    included IRT
  • Partly due to different psychometrics prevalent
    in the two fields

7
CTT
  • CTT scale construction item difficulty, item
    discrimination, and reliability
  • drawback reliability and SEM is constant for
    all respondents

8
IRT
  • IRT assumes that a person has a true location on
    a continuous latent dimension (theta).Theta is
    assumed to probabilistically cause how a person
    responds to an item
  • The equation that relates to the probability of
    endorsing an item is the IRF (dichotomous item
    scores)

9
IRT
  • Item difficulty (b) is the point on the latent
    variable scale that where the probability equals
    .50
  • Item discrimination (a) is proportional to the
    slope of the IRF
  • Important feature IRT estimates the joint
    relation between person properties and item
    properties
  • a usually between .5, 2.5
  • b usually between -2.5 (easy) and 2.5
    (difficult)

10
Item Response Functions (IRF)
11
IRT
  • Assumptions
  • Unidimensionality
  • Monotonic relation between trait level and
    probability of endorsing an item
  • Statistical evaluation of model-to-data goodness
    of fit

12
Item and scale analysis
  • CTT item discrimination, item difficulty,
    reliability
  • IRT item analysis is done in a similar way but
    item discrimination, difficulty, and reliability
    are examined in a more powerful way
  • Instead of test reliability, item information
    plays an important role

13
Item and Test information
  • Information indicates how well an item
    discriminates among respondents who are at
    different levels of the latent variable
  • Items provide different amounts of information at
    different ranges of the latent variable
  • (1) Item information is additive across items
    test information function
  • (2) information is inversely related to the SEM

14
Item and scale analysis
15
Item and Test information
  • The amount of information an item provides is
    determined by the item discrimination
  • The location on the latent trait where
    information is maximized is determined by the
    item difficulty

16
Item Information
17
Item Information
18
Polytomous scores
  • Graded response model (GRM), likert data
  • a-parameter magnitude reflects the degree to
    which the item is related to the trait
  • Two or more location parameters, b1, b2, ..
  • (equal to number response categories minus one)
  • Reflects the spacing of the response categories
    along the trait scale
  • Thus for m 5 answer categories there are b1,
    b2, b3, and b4 location parameters

19
Depression items
Item 2 I have recently considered killing myself
Item 3 I am sometimes down in the dumbs
20
Option response curves
21
Example 1 Construct validity clinical scales
  • Can we use scales as a diagnostic instrument to
    classify persons in different categories?
  • In clinical psychology/psychiatry many rating
    scales are constructed so that they cover
    DSM-IV(TR) categories. On the basis of a scale a
    person is classified into different categories
    such as no, versus mild, versus severe mental
    illness states
  • Because diagnostic criteria influence how
    psychiatric disorders are recognized, researched
    and treated it is very important to ensure their
    empirical validity

22
Practical Features
  • Clinical change, degree of change within the
    individual, to measure this, there should be a
    scale discriminating in the area of interest
  • Scale should be discriminating around cut-off
    scores
  • Diagnostic Interview-Expanded Substance Scale
  • IRT analysis to investigate the quality of the
    scale
  • Can the scale be used as a diagnostic instrument
    ?

23
Alcohol use disorder (Langenbucher et al, 2004)
24
Cocaine use disorder
25
Conclusion
  • Dense clustering of symptom item response
    functions imply that a number of criteria (items)
    of substance abuse carry the same information
  • Measurement precision in only a narrow trait
    range
  • Trichotomous diagnostic scheme of the DSM-IV
    (undiagnosed, dependence, abuse) is not
    supported, only impaired/less impaired can be
    distinguished

26
Conclusion
  • Additive severe criteria (items) are needed to
    reliably and broadly identify serious degrees of
    addictive pathology
  • Additional mild criteria for screening and
    prevention and establishing base rates
    (epidemiology)
  • But are constructs fully continuous ? And can we
    find measures (items) across an entire range?

27
Quasi-traits
  • Researchers often assume that all construct are
    fully continuous, defined at both ends of the
    construct
  • IRT modeling shows that many personality
    constructs used in clinical scales
    (psychopathology) are highly skewed or
    quasi-traits
  • For example, self-esteem

28
Quasi traits
  • One explanation is that this is not due to poor
    items or options but due to the nature of the
    self-esteem construct items only differentiate
    between people with low self-esteem because this
    is the only end of the construct that is
    meaningful
  • Future research should clarify whether we can
    write items that also discriminate at the medium
    levels of the latent trait

29
Example 2 Type D personality
  • What is the effect of narrow band constructs
    combined with limited item pools on the construct
    validity of our scales?
  • When only a few items have high slopes and the
    remainder have low slopes care should be taken in
    interpreting the latent trait.

30
Context
  • Influence psychological factors on health,
    illness, and death
  • Psychosomatic research on cardiac disease needs
    to include personality
  • Distress as a risk factor
  • High levels of distress are linked to anxiety,
    stress, and anger ? vital exhaustion
  • DS-14 7 items Negative Affect 7 items Social
    Inhibition
  • Type D Score above median on both scales
    Increased risk

31
Example 2 Type D
  • Negative Affect (NA) tendency to experience
    aversive emotional status with feelings of
    dysphoria, tension and worry. (a .88 fact.
    loadings .6-.8)
  • Social inhibition (SI) inhibit self-expression
    in social interactions in order to avoid
    disapproval by others (a .86 fact. Loadings
    .6-.8) (Emons, Meijer, Denollet, 2006)

32
(No Transcript)
33
(No Transcript)
34
Example 2 Type D
  • Variable pattern of slopes may be problematic
  • (1) The dysphoria items NA7, NA4, and NA2
    dominate the construct, remaining items are less
    important
  • (2) A practitioner should be very careful in
    interpretation of the underlying construct NA
    dysphoria and in particular I am often down in
    the dumbs
  • (3) the latent trait does not reflect variance on
    a common latent variable shared by other items,
    but reflects individual differences on the items
    with the highest slopes

35
(No Transcript)
36
Example 3 Validity of test scores
  • Test score validity validity scales e.g.,
    F-scale in MMPI, items scored infrequently in the
    normal population, high scores invalidate the
    interpretation of the MMPI
  • Can we identify and interpret invalid test scores
    through studying the configuration of individual
    item scores by means of fit statistics that are
    proposed in the context of item response theory
    IRT ? (Meijer, Egberink, Emons, Sijtsma, 2008)

37
Context
  • On the basis of an IRT model observed and
    expected item scores can be compared and many
    unexpected item scores alert the researcher that
    the total score may not adequately reflect the
    trait being measured.
  • Gap between psychometric characteristics of
    several statistical tests and measures on the one
    hand and the articles that describe the practical
    usefulness of these measures on the other hand.

38
Context
  • Try to integrate psychometric analysis with
    information from qualitative sources to make
    judgments about the validity of an individuals
    test score. And replication !!
  • Explore the usefulness of person-fit statistics
    to identify invalid test scores using real data,
    and
  • Validate information obtained from IRT using
    personality theory and qualitative data obtained
    from observation and interviews

39
Rationale of the method
  • When measuring e.g., depressed suicidal ideation
    every person that endorses the statement
  • I have recently considered killing myself is
    expected to also endorse the statement I dont
    seem to care what happens to me(relative to the
    previous item this item is less extreme or, more
    popular)
  • However, in practice, when analyzing personality
    data, errors are found against this perfect
    pattern
  • Many errors may point at invalid person scaling

40
Fit statistics
  • 0100100000000001001011001010010100001100
    X 12
  • 1001000010110010111111000000000000000000
    X 12
  • 0101110111001010001011110001011111000000
    X 20
  • 1111110111111111101101000100000000000000
    X 20
  • Many statistics, we used several statistical
    tests, normed Guttman errors (ZGE)

41
Data
  • Harters Self-Perception Profile for Children
    (SPPC), polytomous item scores (4 point scale)
  • Intended to determine how children between 8 and
    12 years of age judge their own functioning in
    several specific domains and how they judge their
    global self-worth
  • 6 subscales each consisting of 6 items
  • Scholastic Competence (SC), Social Acceptance
    (SA), Athletic Competence (AC), Physical
    Appearance (PA), Behavioral Conduct (BC), Global
    Self-worth (GS)

42
Procedure
  • 611 children between 6 and 12 years of age
  • Inspection of model fit
  • Calculation of person-fit statistics
  • Interviewing teachers, and children, and
    observation of children
  • Re-administration of the SPPC

43
Results
  • In general, young children (8/9 years of age)
    scored less consistent than older children
  • Asking children to select personality statements
    that better describe them may be relatively
    complex especially for young children.
  • They should understand the meaning of these
    statements and they should also have a frame of
    reference which is similar to that of old
    children. We observed that the meaning of some
    items was problematic, and that inconsistent
    answering behavior was often due to learning
    disability

44
Results
  • Older children more often than younger children
    choose the categories 2 and 3.
  • Older girls more often than older boys choose the
    2 and 3 options. We speculate that these shifts
    point at a more differentiated self-concept for
    older children as compared to young children and
    at a more differentiated self-concept for girls
    than boys

45
Profiles
  • Similar Profiles with different item score
    patterns

46
Profiles
  • Child 275 very inconsistent item score pattern
  • (SC422124, SA444414, AC411444, PA313414,
    BC124443, GS344143)
  • Child 94 consistent
  • SC112422, SA443423, AC444322, PA222242,
    BC333333, GS424233
  • Child 242 consistent
  • SC222232, SA443333, AC333343, PA322233,
    BC433223, GS343333).

47
Re-administration
  • As expected, the ZGE scores collected at the
    second administration were lower than the ZGE
    scores collected at the first administration.
  • 8 out of the 27 children again produced irregular
    item score patterns
  • For 4 children this was due to cognitive
    problems learning disability, problems with
    reading comprehension skills and/or lexical
    processing speed.
  • For 2 other children this may be due to the home
    situation. come from troubled homes, they have
    difficult relations with their parents and,
    perhaps as a result of this, they are very
    insecure.

48
Conclusions
  • In clinical practice and applied research, the
    fundamental question often is not whether
    unexpected item score patterns exist but whether
    the patterns have any theoretical or applied
    validity
  • Because nothing in a (statistical) fit procedure
    guarantees that identified patterns have
    associations with external criteria or diagnostic
    categories it is important to use information
    from other sources. Thus, one may combine
    information from fit statistics with information
    obtained from other subtest scores (score
    profiles), interviews, and/or observation.
Write a Comment
User Comments (0)
About PowerShow.com