Upcoming Topics - PowerPoint PPT Presentation

1 / 83
About This Presentation
Title:

Upcoming Topics

Description:

Secular Increases in IQ. Interventions to increase IQ. Test bias. Item Response Theory ... Over the last 60 years, a steady increase of raw scores on IQ tests ... – PowerPoint PPT presentation

Number of Views:48
Avg rating:3.0/5.0
Slides: 84
Provided by: lindsey
Category:
Tags: iq | test | topics | upcoming

less

Transcript and Presenter's Notes

Title: Upcoming Topics


1
Upcoming Topics
  • Secular Increases in IQ
  • Interventions to increase IQ
  • Test bias
  • Item Response Theory
  • Computer Adaptive Testing

2
Flynn Effect
  • Secular increase in IQ
  • Over the last 60 years, a steady increase of raw
    scores on IQ tests
  • The effect has been documented for different 25
    tests in 15 countries (1930-1980)
  • Seems to correspond to industrialization

3
(No Transcript)
4
(No Transcript)
5
Flynn Effect
  • Mean increase of 5 IQ points
  • Somewhat larger increase for non-verbal and
    culture-reduced tests (e.g., Ravens Progressive
    Matrices)
  • Tests of scholastic content least gain
  • Biggest effect is at low of the IQ distribution

6
Flynn Effect
  • Increase in IQ scores seems to be an increase in
    g
  • Paradoxically, achievement scores have been
    declining (e.g., SAT)
  • Why has there been an increase in IQ?
  • No knows for sure, but there are many conjectures

7
Causes of the Flynn Effect
  • Schooling
  • More people receive standardized educations
    extending over longer periods of time
  • Greater emphasis on decontextualizing problems or
    identifying the general principle
  • Nutrition and Health Care
  • Improves biological and intellectual
    developmental
  • Similar increases in statue also detected

8
(No Transcript)
9
Implications of Rise in IQ
  • Are we really getting smarter?
  • Probably, but cant extrapolate too many
    generations
  • People do more complex tasks and some evidence
    suggests people learn these skills faster
  • No really knows, its a true puzzle at this point

10
Test Bias
  • Bias
  • Differences between two (or more) groups (usually
    between a majority and minority group) in
    some function of item performance (difficulty or
    discrimination)
  • Bias is not the same as group differences
  • Groups can differ on a test, that doesnt make
    the test biased

11
Test Bias
  • Test Bias is not a matter of opinion
  • Quantitative methods exist to establish whether
    or not a test is biased
  • Bias is present only if the item or test
    properties differs across groups after
    controlling for overall ability

12
Can g be increased?
  • To truly increase g some conditions must be met
  • Treatment (T) group scores higher than the
    Control (C) group on measure of g
  • Board generalizability treatment effect should
    be evident in many different tests of g
  • Practical utility how did the increase in g
    effect outcomes in the persons life?

13
Interventions to increase g
  • Most attempts to increase g focus on early
    development and low-IQ samples
  • Certain educational-psychological interventions
    increase IQ score, however,
  • Increases typically diminish to zero within 1-2
    years
  • Do not generalize to multiple tests (probably do
    not increase g)
  • Can instruct skills, but not g

14
Hierarchical Structure
G-factor
Km
Ved
Verbal
Number
Spatial
Visualization
S5
S6
S7
S8
S1
S2
S3
S4
15
Abecedarian Project
  • Intensive intervention to increase IQ
  • Began in 1972
  • 111 African-African families in Chapel Hill, NC
  • Medically healthy but demographically at risk for
    school failure
  • Low parental education/occupation, family income,
    low mothers IQ, welfare status or recipient of
    special services
  • Mean maternal IQ 84

16
Abecedarian Project
  • Children attended a specialized day care from
    infancy to age 5
  • Low adult-child ratio initially 13 eventually
    17
  • Stable, professional staff
  • Played simple games that focused on language
    development and exposure what might be
    intellectually stimulating
  • Control group received nutritional supplements
    and some social services

17
Abecedarian Findings
  • Average difference between T and C groups
  • Infancy through age 5 7.8 IQ points
  • Biggest difference at age 3 17 IQ points
  • Age 8 and 12 5 IQ points
  • Age 21 4.6 IQ points

18
Achievement Gains
  • Age 12, IQ below 85 T 12.8, C 44.2
  • Age 15
  • ½ as many of T group in special ed classes
  • Repeated a grade T 28, C 55
  • Age 21
  • Gone to 4-year college T 35, C 14
  • Good job or college T 65, C 40
  • Fewer in T group with children, and older age of
    having children

19
Interventions to Increase g
  • Abecedarian intervention certainly had a big
    effect for the participants
  • However, the increase in g required prolonged and
    intensive effort and resulted in only a modest
    effect
  • What are the key elements?
  • Probably a multitude of many small effects that
    over the course of years adds up
  • Very early intervention is important

20
Group Differences in IQ scores
  • Since early 1900s there has a been a 1 SD
    difference in mean IQ scores between White and
    African-Americans
  • Means for racial groups
  • African-Am 85
  • Hispanic 90
  • White 100
  • East Asian, Jewish 105-110 (?)

21
Group differences in IQ
  • Intensively researched
  • Basic conclusion
  • Mean differences are NOT due to the tests
  • Evidenced across different types of tests,
    including non-verbal and culture-reduced
  • Mean differences are also not due to differences
    in SES

22
Test Bias in CTT
  • Based on Linear Regression
  • Slope Bias
  • When the regression coefficient of a test
    predicting a criterion is different for the two
    groups
  • Differential Validity
  • The test is more valid in one group
  • May not be measuring the same construct across
    groups

23
IDENTICAL REGRESSIONS
Majority (B)
Minority (A)
Criterion Score Y
NO BIAS
X Test Score
24
Test Bias in CTT
  • Intercept Bias
  • When a test systematically underpredicts or
    overpredicts criterion performance for a
    particular group
  • Same Slope different Intercept
  • Same regression (validity) coefficient
  • Using the same regression line for both groups
    will result in bias

25
SLOPE BIAS
Majority (B)
Minority (A)
Criterion Score Y
X Test Score
26
INTERCEPT BIAS CASE 1
Minority (A)
Criterion Score YB YA
Majority (B)
Single regression line Underpredicts for Minority
X Test Score
27
INTERCEPT BIAS CASE 2
Majority (B)
Criterion Score YA YB
Minority (A)
Single regression line Overpredicts for Minority
X Test Score
28
Intercept Bias
  • Intercept Bias is the most common form of test
    bias
  • However, there is not rampant test bias
  • Case 1 sometimes found between men and women
  • Case 2 sometimes found between ethnic and racial
    groups
  • If the two groups differ in additional variables
    correlated with both the test and criterion
  • Including additional predictors will reduce the
    bias

29
Item Response Theory
  • Also called Latent Trait Theory
  • Resolves most of the limitations of CTT
  • First developed in educational measurement, but
    applicable to all types of psychological
    measurement
  • Puts items and people on the same scale

30
Item Response Function (IRF)
  • Also called Item Characteristic Curve (ICC)
  • In alcohol paper called a Symptom Response
    Function
  • Plot item responses as a function of total score
  • Total score is the estimate of the trait level

31
IRF
  • IRF is the actual behavior of the item
  • This is what we need to understand
  • Develop statistical models that explain the
    behavior of the items
  • Result Item Response Theory

32
IRF
  • The IRF for items with high discrimination are
    not linear (same slope along the x-axis)
  • Need to employ non-linear models that allows the
    slope to change
  • Ogive
  • Logistic

33
(No Transcript)
34
(No Transcript)
35
IRT Parameters
  • T theta
  • Replace total score with the hypothetical trait
    continuum expressed in a z-score metric
  • a item discrimination
  • b item difficulty or severity
  • c (pseudo-) guessing parameter

36
Item Discrimination
  • Discrimination (a) the slope of the IRF
  • The steeper the slope, the greater the ability of
    the item to differentiate between people
  • Assessed at the difficulty of the item
  • What does that mean?

37
(No Transcript)
38
Item Difficulty
  • Difficulty (b) point on the theta continuum
    (x-axis) that corresponds to a 50 probability of
    endorsing the item
  • A more difficult item is located further to the
    right than an easier item

39
Item Difficulty
  • Values are interpreted almost the reverse of CTT
  • Difficulty is in a z-score metric
  • Usually range from 3 to 3
  • Outside of ability/education measurement, often
    called the location parameter

40
(No Transcript)
41
Guessing Parameter
  • (Pseudo-) Guessing (c) y-intercept of the IRF
  • Included because even people with very low
    ability might answer correctly due to chance
  • Probability of correct response never reaches
    zero
  • Usually not included for non-ability traits

42
IRT Models
  • 1-parameter model Rasch model only concerned
    with difficulty varying across items
  • 2-parameter model concerned with difficulty and
    discrimination
  • 3-parameter model concerned with difficulty,
    discrimination, and guessing

43
Measuring Adolescent Sexual Behavior
  • Interested in how adolescent sexual behavior
    related to substance abuse and antisocial
    behavior
  • Needed to come up with a way to measure sexual
    behavior

44
Adolescent Sexual Behavior Items
  • Wanted to include both normative and
    non-normative behaviors
  • Started Dating
  • Broken up
  • Sexual Intercourse
  • Intercourse before age 15
  • Afraid Pregnant
  • Pregnant

45
(No Transcript)
46
(No Transcript)
47
(No Transcript)
48
IRT Item and Test Functions
  • IRFs are the backbone of IRT, but give rise to
    other response functions
  • Item Information Function (IIF)
  • Standard Error of Measurement (SEM) function
  • Item functions give rise to TEST functions, which
    are the sum of all item functions that compose
    the test
  • Test Response Function (TRF)
  • Test Information Function (TIF)
  • Test SEM function

49
Item Information Function (IIF)
  • Looks like a hill
  • The higher the hill the more information
  • The peak of the hill is located at the item
    difficulty
  • The steepness of the hill is a function of the
    item discrimination
  • More discriminating items provide more information

50
IIFs for Sexual Behavior Items
51
Standard Error of Measurement (SEM) Function
  • Estimate of measurement precision at a given
    theta value
  • SEM inverse of the square root of the item
    information
  • SEM is smallest at the item difficulty
  • Items with greater discrimination have smaller
    SEM, greater measurement precision

52
Test Information Function (TIF)
  • Sum of all the item information functions
  • Index of how much information a test is providing
    a given trait level
  • The more items at a given trait level the more
    information

53
Test Standard Error of Measurement (SEM) function
  • Inverse of the square root of the test
    information function
  • Index of how well, i.e., precise a test measures
    the trait at a given difficulty level

54
(No Transcript)
55
(No Transcript)
56
Target Information Functions
  • Can take any form desired but typically 2 kinds
  • Rectangular
  • Measure all levels of the trait equally well
  • Peaked
  • Want to measure a particular level of the trait
    extremely well and dont care about other ability
    levels
  • Mastery testing want to be very certain the
    person is above or below a specified cutoff score

57
(No Transcript)
58
Invariance of IRT Parameters
  • Difficulty and Discrimination parameters for an
    item are invariant across populations
  • Within a linear transformation
  • That is no matter who you administer the test to,
    you should get the same item parameters
  • However, precision of estimates will differ
  • If there is little variance on an item in a
    sample with have unstable parameter estimates

59
Computer Adaptive Testing (CAT)
  • In IRT, a persons estimate of true score is not
    a function of number of items correct
  • Therefore, can use different items to measure
    different people and tailor a test to the
    individual
  • Provides greater
  • Efficiency (fewer items)
  • Control of precision - given adequate items,
    every person can be measured with the same degree
    of precision

60
Components of a CAT system
  • A pre-calibrated bank of test items
  • Need to administer a large group of items to a
    large sample and estimate item parameters
  • An entry point into the item bank
  • i.e., a rule for selecting the first item to be
    administered
  • Item difficulty,e.g., b 0, -3, or 3
  • Use prior information about examinee

61
Components of a CAT system
  • An item selection or branching rule(s)
  • E.g., if correct to first item, go to a more
    difficult item
  • If incorrect go to a less difficult item
  • Always select the most informative item at the
    current estimate of the trait level
  • As responses accumulate more information is
    gained about the examinees trait level

62
(No Transcript)
63
Components of a CAT system
  • A termination rule
  • Fixed items
  • Equiprecision
  • End when SEM around the examinees trait score
    has reached a certain level of precision The
    precision of test varies across individuals
  • Examinees whose responses are consistent with
    the model will be easier to measure, i.e.,
    require fewer items
  • Equiclassification
  • SEM around trait estimate is above or below a
    cutoff level

64
(No Transcript)
65
(No Transcript)
66
Assumptions of IRT
  • A single common factor accounts for all the item
    covariances.
  • Unidimensional there is a single latent trait
  • Local independence if remove common factor the
    items are uncorrelated
  • Relations between the latent trait and observed
    item responses have a certain form, i.e., the IRF

67
Advantages of IRT over CTT
  • Persons and items are placed on the same scale,
    making it possible to scale persons relative to
    items and vice-versa
  • Item parameters estimated in one sample are
    within a linear transformation of those estimated
    in a different sample
  • Can create large pools of items that have been
    linked, i.e., put onto a common scale
  • Can place person estimates (thetas) from one
    group onto the scale of another group
  • Makes it possible to compare persons measured in
    different groups and with different items

68
Advantages of IRT over CTT
  • IRT trait estimate for an individual is
    independent of the group in which the person was
    measured
  • Also, the observed SEM for the trait estimate is
    independent of the group
  • Can use CAT to design more efficient and
    effective tests

69
Advantages of IRT over CTT
  • Can use test information functions to design
    tests with a specific purpose by selecting items
    with a target information function in mind
  • SEMs vary at different levels of the trait
  • Selecting items that fit the model with result in
    unidimensional measurement
  • The higher the discrimination parameters the more
    unidimensional

70
Test Bias in IRT Differential Item Functioning
(DIF)
  • Differences between groups in the probability of
    a correct response to an item for examinees of
    the same trait level

71
DIF
  • First, need to put the two groups on the same
    scale
  • Need a core group of anchor items that function
    the same across the groups
  • Use the anchor items to equate the two groups on
    the trait
  • Compare item parameters across the two groups
    after controlling for the latent trait
  • If item parameters are significantly different,
    due to group status

72
DIF
  • Uniform DIF
  • Only Difficulty parameters differ across groups
  • Items are still measuring the same construct
    across groups
  • Non-uniform DIF
  • Discrimination parameters differ across groups
  • Item does not measure the same construct (or as
    well) across groups

73
1.0
Uniform DIF
Majority
Minority
.50
Probability
0
-3
-1
-2
2
1
3
Latent Trait
74
1.0
Non-Uniform DIF
Majority
.50
Probability
Minority
0
-3
-1
-2
2
1
3
Latent Trait
75
Test DIF
  • Can extend item analysis to tests
  • Differential Test Functioning
  • Hard to get every single item to function the
    same across groups
  • Easier to get a test as a whole to function the
    same across groups
  • Some items will be harder while others easier for
    the minority group
  • Examine Test Response Function
  • If lines overlap no test bias

76
DIF
  • DIF is an extremely useful and rigorous method
    for studying groups differences
  • Sex Differences
  • Race/Ethnic Differences
  • Cross-cultural and Cross-national studies
  • Clinical and non-clinical populations
  • Determine whether differences are an artifact of
    measurement or something different about the
    construct and population

77
DIF Example 2
  • Alcohol Problems from Krueger et al. (2003)
  • Men and Women
  • Test a uniform DIF model
  • 105 total items
  • 43 items significant difference in difficulty
  • 21 items more difficult for women, i.e., men more
    likely to endorse
  • 22 items more difficult for men, i.e., women more
    likely to endorse

78
Uniform DIF for Alcohol Problems
  • Which problems are men more likely to exhibit?
  • 7 drinks on 1 occasion 7 drinks 1x week 20
    drinks several times drank to avoid hangover or
    shakes went on benders neglected
    responsibilities 3 binges over 3 days at 1/5
    liquor or 24 beers or 3 bottles of wine on 1
    occasion stayed drunk through entire day family
    objected to drinking arrested b/c of drinking
    trouble driving b/c of drinking trouble driving
    several times fights or physical violence drank
    after realized had problems drank in dangerous
    situations drank before breakfast tolerance 1
    month age at first drink

79
Uniform DIF for Alcohol Problems
  • Which problems are women more likely to exhibit?
  • Ever used alcohol ever been intoxicated
    depressed grandiose calm relaxed thought you
    were an excessive drinker couldnt work when
    intended needed/depended on alcohol felt guilty
    about drinking job or school trouble rode with
    someone who was high rode with someone who was
    high or drank in dangerous situation 2x
    nervous or uptight couldnt keep from drinking
    drank when decided not to wanted to stop but
    couldnt rules regarding drinking for 1 month
    or several times emotional problems from
    drinking emotional problems 1 month drank
    despite emotional problems stop drinking for 3
    months stopped drinking and gone back more than
    1x

80
DIF Alcohol Problems
  • Overall, given the same trait level, a women are
    more likely to exhibit an alcohol problem
  • Female alcohol problems more emotional/internalizi
    ng
  • Male alcohol problems more consumption and
    externalizing behaviors

81
DIF Alcohol Problems
  • Extend to other groups
  • Racial groups
  • Alcoholism different for African-Americans?
    Hispanics? Asians?
  • Countries
  • Alcohol problems in the U.S. different from
    Russia? Ireland? France? Japan?

82
Other Extensions of DIF
  • Can use DIF for any kind of behavior
  • For example, would my adolescent sexual behavior
    items exhibit different properties in different
    countries or cultures?
  • Provides a powerful tool to study any kind of
    group difference (gender, race, culture,
    nationality)
  • WHY? Because DIF is an Quantitative technique
    based on testable theory that utilizes empirical
    data

83
(No Transcript)
Write a Comment
User Comments (0)
About PowerShow.com