Title: Unit 2 Review
1Unit 2 Review
2Theories of Intelligence
- Two classical theories of intelligence
- Spearmans g or two-factor theory
- Thurstones 9 Primary Mental Abilities
- Guilfords Structure of Intellect is another
multi-factor theory of intelligence - Also known as the psychometric theories of
intelligence because of reliance on data
relationships
3Chapter 7
4Spearman
- Developed first formal theory about human mental
ability - One general, g, factor accounted for correlations
among tests of simple sensory functions - Each test also had a specific component, s,
unique to that test plus error
5Thurstones Primary Mental Abilities
- S spatial
- P perceptual (esp. speed of visual perception)
- N Numerical (speed accuracy of computation)
- V Verbal M - Memory
- W Words (word fluency or disarranged words)
- I Induction (finding a rule)
- R Reasoning (arithmetic)
- D Deduction (application of rule)
6Hierarchical Models
- Are compromise between and one vs. many argument
- Acknowledge there are many separate abilities but
can be arranged so only a few dominant factors
are at the top of the hierarchy. - Includes Cattell (crystallized and fluid), Vernon
(verbal educational and spatialmechanical), and
Carroll (three stratum theory with g as the top
stratum)
7Other Theories
- Developmental (e.g., Piaget)
- Information Processing--based on elementary
cognitive tasks, ECT, such as reaction time
(e.g., Jensen, Sternberg) - Biological Theories (e.g., Gardners theory of
multiple intelligences)
8Differences by Sex
- Differences minimal on total scores
- Males outperform females on tests of spatial
ability (effect size of 5-7) - Females outperform males on verbal tests during
childhood and much of adolescence - Greater variability in intelligence for males
9Differences by Racial/Ethnic Group
- Compared to whites
- Blacks are about 1SD below
- Hispanics and native Americans are .5-1 SD below
on verbal and at the mean on performance tests - Asians have a similar verbal mean and are about 1
SD above on non-verbal tests
10Heredity and Environment
- Intelligence results from an interaction of
heredity and environment - Estimates of heritability range from .4 to .8
(median of about .5 or .6) - Heritability increases with age
- g has a higher heritability index than specific
abilities
11Correlation of IQ scores From Bouchard McGue
(1981)
- Identical twins reared together .86
- Identical twins reared apart .72
- Same sex fraternal twins reared together .62
- Opposite sex fraternal twins reared together
.57 - Non-twin siblings reared together .47
- Unrelated (adopted) siblings reared together .30
12Individual Tests of Intelligence
13Common Characteristics of Individual Intelligence
Tests
- individually administered
- administration requires advanced training
- tests cover wide range of age and ability
- examiner must establish rapport
- immediate scoring of items
- usually requires about one hour
- allows opportunity for observation
14Two Main Individually Administered Intelligence
Tests
- Stanford-Binet
- He wanted to create a process for identifying
intellectually limited children so they could be
removed from the regular classroom and put in
special education. - Wechsler scales
- Developed in response to the perceived
shortcomings of the Stanford-Binet
15Early Binet Scales
- 1905 30 items ordered by difficulty. Test
lacked - adequate measuring units to express results
(only used idiot, imbecile, and moron) - adequate normative data (only used 50 subjects)
- evidence of validity
- 1908 Grouped items according to age level
rather than simply increasing difficulty.
Introduced concept of mental age. - Increased norm group to 203.
- Criticized because it produced only one score
almost exclusively related to verbal, language,
and reading ability
16Modern Binet scale
- Totally revised in 1986 by Thorndike et al.
- Used Thurstones multidimensional model (1938)
- G made up of crystallized ability (verbal
quantitative reasoning), fluid-analytic abilities
(abstract-visual reasoning) and short term memory.
17Structure of the SB-IV
- Verbal reasoning included vocabulary test,
comprehension test, absurdities test, and verbal
relations test. - Abstract-visual reasoning included pattern
analysis test, copying test, matrices test,
paper-folding and cutting test. - Quantitative reasoning included quantitative
test, number series test, equation-building test. - Short-term memory included bead memory, memory
for sentences, memory for digits, and memory for
objects - Composite included all areas combined
18Psychometric properties of SB-IV
- Standardization sample stratified based on 1980
census geographic region, community size,
ethnic group, age, and gender. - Internal consistency reliability is .98 for
composite and .93-.97 for area scores. Some
individual test scores are lower .73 for memory
for objects is the lowest. - Test-retest reliabilities for composite score
were .91 and .90 for 5 and 8-year-olds. - Factor analysis supports the structure of the
test. - Correlations with other IQ tests are generally in
the 70s and 80s
19Wechsler Scales
- David Wechsler worked at NYs Bellevue Hospital.
He wasnt happy with the Stanford Binet with its
focus on children or on the production of a
single score. - In 1939, he created the Wechsler-Bellevue, later
called the WAIS. - In 1949, he created the childrens version, the
WISC. - In 1967, he added the WPPSI for children ages
2.5-7.
20Structure of the WAIS
- The WAIS yields separate verbal and performance
IQs - The WAIS-III has four index scores Verbal
comprehension, working memory, perceptual
organization, and processing speed.
21Scales and Norms for the WAIS
- Determine raw score for each subtest.
- Convert raw scores to standard scores, called
scaled scores (M10, SD3) - There are conversions for 13 age groups. This
method of conversion obscures any differences in
performance by age. - Subtest scaled scores are added, then converted
to WAIS-III composite scores. - Three composite scores verbal, performance,
full scale, each with M100, SD15 - Four index scores verbal comprehension,
perceptual organization, working memory,
processing speed
22Reliability of the WAIS
- Internal consistency and test-retest
reliabilities are about .95 or higher for full
scale and verbal scores. - Theyre about .90 for performance and three other
index scores perceptual organization, working
memory, and processing speed. - Internal consistency reliability for the
subtests range from upper .70s to low .90s.
Test-retest is about .83. - Generally, performance reliabilities are lower
than verbal reliabilities on the subtests.
23Validity of the WAIS
- Great deal of information on criterion-related
and construct validity. - Factors analyses support use of 4 index scores.
- Comparison studies show the pattern of WAIS-III
scores for many special groups, e.g., Alzheimers
Disease, Parkinsons, learning disabled, brain
injury. - Is the top test used today
24Group Differences in IQ
- Test scores that demonstrate differences among
people may suggest that people are not created
with the same basic abilities. - Biggest problem Some ethnic groups obtain lower
average scores on some psychological tests. On
average African Americans score 15 points lower
than whites on IQ tests. - Dispute is not whether differences occur but why
they occur.environment vs. biology
25Problems with Biology Argument
- IQ scores are improving (called the Flynn
effect), more so for African Americans than
whites. - Victimization by stereotyping could affect test
performance and grades. - Construct of race has no biological meaning based
on evidence from studies in population genetics,
the human genome and physical anthropology.
26Criticisms related to Content Validity
- Looking at specific items, it was thought that
they might be biased because some children
wouldnt have the opportunity to learn the
material - Members of ethnic groups might answer some items
differently but still correctly - Scores affected by language skills inculcated as
part of a white, middle-class upbringing foreign
to inner city children
27Responses to Content Validity Criticisms
- Some evidence suggests that the linguistic bias
in standardized tests does not cause the observed
differences (Scheuneman, 1987). - Elimination of biased items from a test didnt
change the test scores (Bianchini, 1976). - Cant find classes of items most likely to be
missed by minority group members (Wild, et al.,
1989)
28Group Tests of Mental Ability
29Characteristics of Group Mental Ability Tests
- Administered to a large group
- Composed of multiple choice items that can be
machine-scored - Content similar to individual tests
- Fixed time limit and number of items
- Usually yield a total score and some subscores
- Principal purpose is prediction
30Advantages of individual tests
- Provide information beyond the test score
- Allow the examiner to observe behavior in a
standard setting - Allow individualized interpretation of test scores
31Advantages of group tests
- Are cost-efficient
- Minimize professional time for administration and
scoring - Require less examiner skill training
- Have more objective and more reliable scoring
procedures - Have a very broad application
- Group tests far outnumber individual tests and
group tests vary widely among themselves
32Scoring Information for the OLSAT7
- Yields verbal, Nonverbal, and Total scores
- Converted to School Ability Index (SAI) with
M100 and SD16 - SAIs determined separately for age groups at 3
month interval from ages 5-19 - Score reports also include anticipated
achievement comparisons (AAC) to predict
performance on the Stanford tests
33Psychometric Properties
- About half a million cases are part of the
research base for the OLSAT7 - High internal consistency with nothing lower than
.87 for total score (higher at higher grades) - KR-20 for Verbal and Nonverbal in the high .80s
for upper grades and low .80s for lower grades - No test-retest reliability data
- High correlations between the OLSAT7 and the
Stanford tests, but other validity evidence is
weak
34College Admissions The SAT
- The College Board oversees the development of the
test called the SAT - ETS (Educational Testing Service) actually
develops the SAT - The SAT is a cluster of tests
- SAT I includes the well-known Verbal and Math
tests - SAT II has tests in 12 subject fields
35SAT I Structure
- Includes verbal (SAT-V) and math (SAT-M) summed
to get total score - Uses correction for guessing
- Range for each subtest is 200-800 with M500 and
SD100. Total M1000 and SD200 - Norms based on test users, not any well-defined,
predetermined population - Scaled score norms last determined in 1994.
Percentile norms adjusted on an annual basis.
36Reliability and validity of the SAT
- Internal consistencies of .91-.93
- SEMs of about 30 points
- Poor predictive power regarding grades of
students scoring in middle ranges - Number of English or math units doesnt correlate
significantly (maybe due to coaching) - Validity coefficients are about .40 with 1st year
grades - On old SAT, African-American and Latino students
scored lower, sometimes by as much as 80 points.
New test MAY have reduced that.
37The ACT
- ACT provides more emphasis on school-based skills
- Have scores for English, Math, Reading, Science
Reasoning, and Composite which is an average of
the 4 tests - Does NOT use a correction for guessing
- Score range is 1-36 with M20 and SD5
38Psychometric properties of the ACT
- Norms based on users, usually about a million
annually - Reliabilities ranges from .84-.91 with the
Composite score reliability being .96 - SEMs are 1.5-2 points for each subtest and 1
point for the Composite score - Correlates about .80 with SAT
- Like SAT, high school GPA is generally as good a
predictor as test scores
39Graduate and Professional School Selection The
GRE
- Includes General Test, Subject Test, and Writing
Assessment - General tests includes Verbal, Quantitative, and
Analytical - General tests intended to measure developed
abilities that have been acquired over a long
period of time - Gradually moving to computer adaptive testing
40GRE Scores and Norms
- Scaled score set at M500 and SD100, with a
range of 200-800 - Like SAT, average scores have gradually drifted
downward - Norms are user norms
- Internal consistency is in the low 90s
- SEMs are about 30-40 points
41Validity of the GRE
- GRE tests correlate with first-year GPA in the
range of mid-20s to low 30s. - Tests correlate with each other. Lowest
correlation is V-Q at .45, higher is Q-A at .66 - Undergraduate GPA is a better prediction than any
of the tests and about equal to total test score
in predictive validity.
42Ravens Progressive Matrices
- Designed to measure the fluid dimension of
intelligence may be best measure of g - Not used more widely because
- too many manuals and norm groups
- conflicting evidence about what the test is
measuring - Hasnt really eliminated differences between
majority and minority group examinees
43Achievement Tests
44Achievement vs. Aptitude tests
- Achievement tests
- Evaluate the effects of a known or controlled set
of experiences - Evaluate the product of a course of training
- Rely heavily on content validation procedures
- Aptitude tests
- Evaluate the effect of an unknown, uncontrolled
set of experiences - Evaluate the potential to profit from a course of
training - Rely heavily on predictive criterion validation
procedures
45Classification of Achievement Tests
- Achievement battery
- Single area achievement tests
- Certification, licensing exams
- State, national, international tests
- Psycho-educational batteries
- Teacher-made tests
46Establishing cutscores
- Norm-referenced approach
- select a percentile and everyone above that point
is in - Criterion-referenced approach
- many approaches
- Most popular approach is Angoffs where judge
looks at each item and assesses the probability
of minimally competent person getting it right.
Probabilities summed to get total score - Original judgments often changed after results
are seen
47TIMSS design
- TIMSS 1999 used a matrix sampling technique to
achieve broad coverage the total of 308 items
were systematically distributed across 8 test
booklets and the booklets were distributed
randomly to students - Each student completed one 90-minute test
booklet. - Approximately one-third of the items were
constructed-response format, and the remaining
items were multiple-choice
48Nagging questions about achievement tests
- Is there some other way to measure content
validity? - Is there really a difference between achievement
and ability? - How motivated were examinees to perform well?
- How can we get diagnostic information from a
short test? - What is the difference between constructed
response and selected response? Is it important?