Title: Upcoming Topics
1Upcoming Topics
- Secular Increases in IQ
- Interventions to increase IQ
- Test bias
- Item Response Theory
- Computer Adaptive Testing
2Flynn Effect
- Secular increase in IQ
- Over the last 60 years, a steady increase of raw
scores on IQ tests - The effect has been documented for different 25
tests in 15 countries (1930-1980) - Seems to correspond to industrialization
3(No Transcript)
4(No Transcript)
5Flynn Effect
- Mean increase of 5 IQ points
- Somewhat larger increase for non-verbal and
culture-reduced tests (e.g., Ravens Progressive
Matrices) - Tests of scholastic content least gain
- Biggest effect is at low of the IQ distribution
6Flynn Effect
- Increase in IQ scores seems to be an increase in
g - Paradoxically, achievement scores have been
declining (e.g., SAT) - Why has there been an increase in IQ?
- No knows for sure, but there are many conjectures
7Causes of the Flynn Effect
- Schooling
- More people receive standardized educations
extending over longer periods of time - Greater emphasis on decontextualizing problems or
identifying the general principle - Nutrition and Health Care
- Improves biological and intellectual
developmental - Similar increases in statue also detected
8(No Transcript)
9Implications of Rise in IQ
- Are we really getting smarter?
- Probably, but cant extrapolate too many
generations - People do more complex tasks and some evidence
suggests people learn these skills faster - No really knows, its a true puzzle at this point
10Test Bias
- Bias
- Differences between two (or more) groups (usually
between a majority and minority group) in
some function of item performance (difficulty or
discrimination) - Bias is not the same as group differences
- Groups can differ on a test, that doesnt make
the test biased
11Test Bias
- Test Bias is not a matter of opinion
- Quantitative methods exist to establish whether
or not a test is biased - Bias is present only if the item or test
properties differs across groups after
controlling for overall ability
12Can g be increased?
- To truly increase g some conditions must be met
- Treatment (T) group scores higher than the
Control (C) group on measure of g - Board generalizability treatment effect should
be evident in many different tests of g - Practical utility how did the increase in g
effect outcomes in the persons life?
13Interventions to increase g
- Most attempts to increase g focus on early
development and low-IQ samples - Certain educational-psychological interventions
increase IQ score, however, - Increases typically diminish to zero within 1-2
years - Do not generalize to multiple tests (probably do
not increase g) - Can instruct skills, but not g
14Hierarchical Structure
G-factor
Km
Ved
Verbal
Number
Spatial
Visualization
S5
S6
S7
S8
S1
S2
S3
S4
15Abecedarian Project
- Intensive intervention to increase IQ
- Began in 1972
- 111 African-African families in Chapel Hill, NC
- Medically healthy but demographically at risk for
school failure - Low parental education/occupation, family income,
low mothers IQ, welfare status or recipient of
special services - Mean maternal IQ 84
16Abecedarian Project
- Children attended a specialized day care from
infancy to age 5 - Low adult-child ratio initially 13 eventually
17 - Stable, professional staff
- Played simple games that focused on language
development and exposure what might be
intellectually stimulating - Control group received nutritional supplements
and some social services
17Abecedarian Findings
- Average difference between T and C groups
- Infancy through age 5 7.8 IQ points
- Biggest difference at age 3 17 IQ points
- Age 8 and 12 5 IQ points
- Age 21 4.6 IQ points
18Achievement Gains
- Age 12, IQ below 85 T 12.8, C 44.2
- Age 15
- ½ as many of T group in special ed classes
- Repeated a grade T 28, C 55
- Age 21
- Gone to 4-year college T 35, C 14
- Good job or college T 65, C 40
- Fewer in T group with children, and older age of
having children
19Interventions to Increase g
- Abecedarian intervention certainly had a big
effect for the participants - However, the increase in g required prolonged and
intensive effort and resulted in only a modest
effect - What are the key elements?
- Probably a multitude of many small effects that
over the course of years adds up - Very early intervention is important
20Group Differences in IQ scores
- Since early 1900s there has a been a 1 SD
difference in mean IQ scores between White and
African-Americans - Means for racial groups
- African-Am 85
- Hispanic 90
- White 100
- East Asian, Jewish 105-110 (?)
21Group differences in IQ
- Intensively researched
- Basic conclusion
- Mean differences are NOT due to the tests
- Evidenced across different types of tests,
including non-verbal and culture-reduced - Mean differences are also not due to differences
in SES
22Test Bias in CTT
- Based on Linear Regression
- Slope Bias
- When the regression coefficient of a test
predicting a criterion is different for the two
groups - Differential Validity
- The test is more valid in one group
- May not be measuring the same construct across
groups
23IDENTICAL REGRESSIONS
Majority (B)
Minority (A)
Criterion Score Y
NO BIAS
X Test Score
24Test Bias in CTT
- Intercept Bias
- When a test systematically underpredicts or
overpredicts criterion performance for a
particular group - Same Slope different Intercept
- Same regression (validity) coefficient
- Using the same regression line for both groups
will result in bias
25SLOPE BIAS
Majority (B)
Minority (A)
Criterion Score Y
X Test Score
26INTERCEPT BIAS CASE 1
Minority (A)
Criterion Score YB YA
Majority (B)
Single regression line Underpredicts for Minority
X Test Score
27INTERCEPT BIAS CASE 2
Majority (B)
Criterion Score YA YB
Minority (A)
Single regression line Overpredicts for Minority
X Test Score
28Intercept Bias
- Intercept Bias is the most common form of test
bias - However, there is not rampant test bias
- Case 1 sometimes found between men and women
- Case 2 sometimes found between ethnic and racial
groups - If the two groups differ in additional variables
correlated with both the test and criterion - Including additional predictors will reduce the
bias
29Item Response Theory
- Also called Latent Trait Theory
- Resolves most of the limitations of CTT
- First developed in educational measurement, but
applicable to all types of psychological
measurement - Puts items and people on the same scale
30Item Response Function (IRF)
- Also called Item Characteristic Curve (ICC)
- In alcohol paper called a Symptom Response
Function - Plot item responses as a function of total score
- Total score is the estimate of the trait level
31IRF
- IRF is the actual behavior of the item
- This is what we need to understand
- Develop statistical models that explain the
behavior of the items - Result Item Response Theory
32IRF
- The IRF for items with high discrimination are
not linear (same slope along the x-axis) - Need to employ non-linear models that allows the
slope to change - Ogive
- Logistic
33(No Transcript)
34(No Transcript)
35IRT Parameters
- T theta
- Replace total score with the hypothetical trait
continuum expressed in a z-score metric - a item discrimination
- b item difficulty or severity
- c (pseudo-) guessing parameter
36Item Discrimination
- Discrimination (a) the slope of the IRF
- The steeper the slope, the greater the ability of
the item to differentiate between people - Assessed at the difficulty of the item
- What does that mean?
37(No Transcript)
38Item Difficulty
- Difficulty (b) point on the theta continuum
(x-axis) that corresponds to a 50 probability of
endorsing the item - A more difficult item is located further to the
right than an easier item
39Item Difficulty
- Values are interpreted almost the reverse of CTT
- Difficulty is in a z-score metric
- Usually range from 3 to 3
- Outside of ability/education measurement, often
called the location parameter
40(No Transcript)
41Guessing Parameter
- (Pseudo-) Guessing (c) y-intercept of the IRF
- Included because even people with very low
ability might answer correctly due to chance - Probability of correct response never reaches
zero - Usually not included for non-ability traits
42IRT Models
- 1-parameter model Rasch model only concerned
with difficulty varying across items - 2-parameter model concerned with difficulty and
discrimination - 3-parameter model concerned with difficulty,
discrimination, and guessing
43Measuring Adolescent Sexual Behavior
- Interested in how adolescent sexual behavior
related to substance abuse and antisocial
behavior - Needed to come up with a way to measure sexual
behavior
44Adolescent Sexual Behavior Items
- Wanted to include both normative and
non-normative behaviors - Started Dating
- Broken up
- Sexual Intercourse
- Intercourse before age 15
- Afraid Pregnant
- Pregnant
45(No Transcript)
46(No Transcript)
47(No Transcript)
48IRT Item and Test Functions
- IRFs are the backbone of IRT, but give rise to
other response functions - Item Information Function (IIF)
- Standard Error of Measurement (SEM) function
- Item functions give rise to TEST functions, which
are the sum of all item functions that compose
the test - Test Response Function (TRF)
- Test Information Function (TIF)
- Test SEM function
49Item Information Function (IIF)
- Looks like a hill
- The higher the hill the more information
- The peak of the hill is located at the item
difficulty - The steepness of the hill is a function of the
item discrimination - More discriminating items provide more information
50IIFs for Sexual Behavior Items
51Standard Error of Measurement (SEM) Function
- Estimate of measurement precision at a given
theta value - SEM inverse of the square root of the item
information - SEM is smallest at the item difficulty
- Items with greater discrimination have smaller
SEM, greater measurement precision
52Test Information Function (TIF)
- Sum of all the item information functions
- Index of how much information a test is providing
a given trait level - The more items at a given trait level the more
information
53Test Standard Error of Measurement (SEM) function
- Inverse of the square root of the test
information function - Index of how well, i.e., precise a test measures
the trait at a given difficulty level
54(No Transcript)
55(No Transcript)
56Target Information Functions
- Can take any form desired but typically 2 kinds
- Rectangular
- Measure all levels of the trait equally well
- Peaked
- Want to measure a particular level of the trait
extremely well and dont care about other ability
levels - Mastery testing want to be very certain the
person is above or below a specified cutoff score
57(No Transcript)
58Invariance of IRT Parameters
- Difficulty and Discrimination parameters for an
item are invariant across populations - Within a linear transformation
- That is no matter who you administer the test to,
you should get the same item parameters - However, precision of estimates will differ
- If there is little variance on an item in a
sample with have unstable parameter estimates
59Computer Adaptive Testing (CAT)
- In IRT, a persons estimate of true score is not
a function of number of items correct - Therefore, can use different items to measure
different people and tailor a test to the
individual - Provides greater
- Efficiency (fewer items)
- Control of precision - given adequate items,
every person can be measured with the same degree
of precision
60Components of a CAT system
- A pre-calibrated bank of test items
- Need to administer a large group of items to a
large sample and estimate item parameters - An entry point into the item bank
- i.e., a rule for selecting the first item to be
administered - Item difficulty,e.g., b 0, -3, or 3
- Use prior information about examinee
61Components of a CAT system
- An item selection or branching rule(s)
- E.g., if correct to first item, go to a more
difficult item - If incorrect go to a less difficult item
- Always select the most informative item at the
current estimate of the trait level - As responses accumulate more information is
gained about the examinees trait level
62(No Transcript)
63Components of a CAT system
- A termination rule
- Fixed items
- Equiprecision
- End when SEM around the examinees trait score
has reached a certain level of precision The
precision of test varies across individuals - Examinees whose responses are consistent with
the model will be easier to measure, i.e.,
require fewer items - Equiclassification
- SEM around trait estimate is above or below a
cutoff level
64(No Transcript)
65(No Transcript)
66Assumptions of IRT
- A single common factor accounts for all the item
covariances. - Unidimensional there is a single latent trait
- Local independence if remove common factor the
items are uncorrelated - Relations between the latent trait and observed
item responses have a certain form, i.e., the IRF
67Advantages of IRT over CTT
- Persons and items are placed on the same scale,
making it possible to scale persons relative to
items and vice-versa - Item parameters estimated in one sample are
within a linear transformation of those estimated
in a different sample - Can create large pools of items that have been
linked, i.e., put onto a common scale - Can place person estimates (thetas) from one
group onto the scale of another group - Makes it possible to compare persons measured in
different groups and with different items
68Advantages of IRT over CTT
- IRT trait estimate for an individual is
independent of the group in which the person was
measured - Also, the observed SEM for the trait estimate is
independent of the group - Can use CAT to design more efficient and
effective tests
69Advantages of IRT over CTT
- Can use test information functions to design
tests with a specific purpose by selecting items
with a target information function in mind - SEMs vary at different levels of the trait
- Selecting items that fit the model with result in
unidimensional measurement - The higher the discrimination parameters the more
unidimensional
70Test Bias in IRT Differential Item Functioning
(DIF)
- Differences between groups in the probability of
a correct response to an item for examinees of
the same trait level
71DIF
- First, need to put the two groups on the same
scale - Need a core group of anchor items that function
the same across the groups - Use the anchor items to equate the two groups on
the trait - Compare item parameters across the two groups
after controlling for the latent trait - If item parameters are significantly different,
due to group status
72DIF
- Uniform DIF
- Only Difficulty parameters differ across groups
- Items are still measuring the same construct
across groups - Non-uniform DIF
- Discrimination parameters differ across groups
- Item does not measure the same construct (or as
well) across groups
731.0
Uniform DIF
Majority
Minority
.50
Probability
0
-3
-1
-2
2
1
3
Latent Trait
741.0
Non-Uniform DIF
Majority
.50
Probability
Minority
0
-3
-1
-2
2
1
3
Latent Trait
75Test DIF
- Can extend item analysis to tests
- Differential Test Functioning
- Hard to get every single item to function the
same across groups - Easier to get a test as a whole to function the
same across groups - Some items will be harder while others easier for
the minority group - Examine Test Response Function
- If lines overlap no test bias
76DIF
- DIF is an extremely useful and rigorous method
for studying groups differences - Sex Differences
- Race/Ethnic Differences
- Cross-cultural and Cross-national studies
- Clinical and non-clinical populations
- Determine whether differences are an artifact of
measurement or something different about the
construct and population
77DIF Example 2
- Alcohol Problems from Krueger et al. (2003)
- Men and Women
- Test a uniform DIF model
- 105 total items
- 43 items significant difference in difficulty
- 21 items more difficult for women, i.e., men more
likely to endorse - 22 items more difficult for men, i.e., women more
likely to endorse
78Uniform DIF for Alcohol Problems
- Which problems are men more likely to exhibit?
- 7 drinks on 1 occasion 7 drinks 1x week 20
drinks several times drank to avoid hangover or
shakes went on benders neglected
responsibilities 3 binges over 3 days at 1/5
liquor or 24 beers or 3 bottles of wine on 1
occasion stayed drunk through entire day family
objected to drinking arrested b/c of drinking
trouble driving b/c of drinking trouble driving
several times fights or physical violence drank
after realized had problems drank in dangerous
situations drank before breakfast tolerance 1
month age at first drink
79Uniform DIF for Alcohol Problems
- Which problems are women more likely to exhibit?
- Ever used alcohol ever been intoxicated
depressed grandiose calm relaxed thought you
were an excessive drinker couldnt work when
intended needed/depended on alcohol felt guilty
about drinking job or school trouble rode with
someone who was high rode with someone who was
high or drank in dangerous situation 2x
nervous or uptight couldnt keep from drinking
drank when decided not to wanted to stop but
couldnt rules regarding drinking for 1 month
or several times emotional problems from
drinking emotional problems 1 month drank
despite emotional problems stop drinking for 3
months stopped drinking and gone back more than
1x
80DIF Alcohol Problems
- Overall, given the same trait level, a women are
more likely to exhibit an alcohol problem - Female alcohol problems more emotional/internalizi
ng - Male alcohol problems more consumption and
externalizing behaviors
81DIF Alcohol Problems
- Extend to other groups
- Racial groups
- Alcoholism different for African-Americans?
Hispanics? Asians? - Countries
- Alcohol problems in the U.S. different from
Russia? Ireland? France? Japan?
82Other Extensions of DIF
- Can use DIF for any kind of behavior
- For example, would my adolescent sexual behavior
items exhibit different properties in different
countries or cultures? - Provides a powerful tool to study any kind of
group difference (gender, race, culture,
nationality) - WHY? Because DIF is an Quantitative technique
based on testable theory that utilizes empirical
data
83(No Transcript)