Overview

1 / 40

About This Presentation

Title:

Overview

Description:

... 'certainty' is philosophical, not real: in the absence of knowing which group you ... organization, serious problems in sentence structure, usage errors ... – PowerPoint PPT presentation

Number of Views:67

Avg rating:3.0/5.0

Slides: 41

Provided by: chriswe

more less

Transcript and Presenter's Notes

Title: Overview

1
Predictive Tests
2
Overview

Introduction
Some theoretical issues
The failings of human intuitions in prediction
Issues in formal prediction
Inference from class membership The individual
versus group problem (and its only solution)
Some well-known predictive tests
Prediction in science and psychometrics

3
Predictive Tests

Many tests are used to make predictions, of
levels of achievement or success, or of
likelihood of recidivism, or diagnostic category
Two kinds of predictions
Categorical Predict which category this subject
will fall into (diagnosis, occupation)
Numerical Predict the value of a relevant
numerical value (GPA, economic return to company)

4
The failings of human intuition

We have already seen many ways in which humans
succumb to errors in numerical reasoning
Kahneman Tversky Asked subjects about areas of
graduate specialization base rate estimation,
estimates (from a description) of similarity to
other students in each field, and predictive
estimate (also from a description)

5
Results

Results
Similarity and prediction correlate at 0.97
Similarity and base rates correlate at -0.65
What does this result remind you of?
What do these subjects need to be taught?

6
6 Errors discussed by Kahneman Tversky

Representativeness error Assumes predictions are
not different from assessments of similarity
Insufficient regression error People fail to
take into account that when predictive validity
is less than perfect, correlations between
predictors and performance should be lt 1
Central tendency error Subjects making judgments
tend to avoid extremes, and compress their
judgments into a smaller range than the
phenomenon being judged

7
6 Errors discussed by Kahneman Tversky

Discounting of prior probabilities Human
predictors will throw out base rate information
for almost any reason
Overweighting of coherence There is greater
confidence in predictions based on consistent
input than inconsistent input with the same
average (i.e. two B's is better than a B C for
predicting a B average)
Overweighting of extremes Confidence in judgment
is over-weighted at extremes, especially positive
extremes ( j-shaped confidence function)

8
What do we need to make good predictions?

We need three pieces of information
1.) Base rates
2.) Relevant predictors in the individual case
3.) Bounds on accuracy (cutting scores)
Kahneman Tversky's experimental evidence
(previous slides) show that subjects usually fail
to weight any of these three properly

9
Review Measuring validation error

Coefficient of alienation (or coefficient of
non-determination) k (1 - r2), where r is
correlation of test score with some predicted
performance
k the proportion of the error inherent in
guessing that your estimate has (percent of
variance not accounted for)
If k 1.0, you have 100 of the error youd have
had if you just guessed (since this means your r
was 0)
If k 0, you have achieved perfection your r
was 1, and there was no error at all
If k 0.6, you have 60 of the error youd have
had if you guessed

N.B. This never happens.
10
Why should we care?

We care because r/k are useful in interpreting
accuracy of an individuals scores
r 0.6 (good), k 0.64 (not good)
r 0.7 (great), k 0.51 (not so great)
r 0.9 (fantastic!), k 0.19 (so so)

11
Why should we care?

Since even high values of r (0.9) leave a fairly
large proportion of variance unaccounted for, the
prediction of any individuals criterion score is
always accompanied by a wide margin of error
Recall Smr S (1 - r)0.5 --gt Individual error
margins are a function of how good our
correlation is
The moral Predicting individual performance is
really hard to do!

12
What can we infer from class membership?

Some commentators have suggested that inference
from class membership is inherently fallacious
i.e. 25 of first-degree relatives of those
diagnosed with malignant melanoma (skin cancer)
will also develop melanoma
I am a first-degree relative of two persons
diagnosed with melanoma, so I take my odds of
developing the disease to be gt 25
Critics of the inference say No, it is either 0
(I don't develop the disease) or 100 (I do)
i.e. group probabilities don't apply to
individuals

13
Do group probabilities apply to individuals?

Meehl's response "If nothing is rationally
inferable from membership in a class, no
empirical prediction is ever possible"
The argument is a re-statement of the necessity
of inference even in the case of predicting
individual behavior from that individual's data,
we need to consider the pattern over past data
Moreover, claim of 'certainty' is philosophical,
not real in the absence of knowing which group
you are in, there is only probability, not
certain knowledge

14
"One incident that occurred while future Nobel
Laureate Kenneth Arrow was forecasting the
weather illustrates both uncertainty and the
human unwillingness to accept it. Some officers
had been assigned the task of forecasting the
weather a month ahead, but Arrow and his
statisticians found that their long-range
forecasts were no better than numbers pulled out
of a hat. The forecasters agreed and asked their
superiors to be relieved of this duty. The reply
was 'The Commanding General is well aware that
the forecasts are no good. However, he needs them
for planning purposes'." Peter
Bernstein Against The Gods- The Remarkable Story
of Risk
15
Some Predictive Tests Standardized admission
tests

The Scholastic Aptitude Tests (SAT, GREs) are
highly reliable tests developed to painstaking
psychometric standards
The reference norm group changes every year The
reference group for 2003 scores was based on
examinees from 1998-2001 and the reference group
for 2004 scores was based on examinees from
1999-2002.
For this reason, the same score may have a
(slightly) different percentile rank in one year
than in another

16
The Graduate Record Exam General

The GRE is a computerized standardized test taken
by individuals applying to graduate school.
Its purpose is to measure the acquired skills of
the test taker, and to predict performance in
graduate school.
The general GRE has four sections
Verbal Section 30 questions, 30 minutes
Quantitative Section 28 questions, 45 minutes
Analytical Writing Section 2 Analytical Writing
Tasks
45-minute "Present Your Perspective on an Issue"
task)
30-minute "Analyze an Argument" task
Research sections
The test is timed, and corrected for guessing
It is also computer adaptive questions depend
on answers

17
The Graduate Record Exam Writing

Score on a 6 point scale (mean SD 4.18
0.97)
6 Insightful analyses of complex ideas,
logically compelling, well organized, skillful
sentence variety, few or no usage errors
5 Generally thoughtful analysis of ideas,
logically sound reasons, generally well
organized, sentence variety conveys meaning,
minor usage errors
4 Competent analysis of ideas, relevant
reasons, adequately organized, satisfactory
control of sentence structure, some usage errors
3 Some competence but flawed by at least one
of limited analysis or development, weak
organization or control of sentence structure,
usage errors that result in vagueness
2 Serious weakness in at least one of lack
of analysis, development, or organization,
serious problems in sentence structure, usage
errors obscure meaning
1 Fundamental deficiencies content that is
confusing or irrelevant, little or no
development, pervasive errors that result in
incoherence

18
Sample Verbal Questions

Analogies
ETERNAL END
a. precursory beginning
b. grammatical sentence
c. implausible credibility
d. invaluable worth
e. frenetic movement

19
Sample Verbal Questions

Sentence Completions
Museums, which house many paintings and
sculptures, are good places for students of
_____.
a. art
b. science
c. religion
d. dichotomy
e. democracy

20
Sample Verbal Questions

Antonyms
MALADROIT
a. ill-willed
b. dexterous
c. cowardly
d. enduring
e. sluggish

21
Sample Quantitative Questions

Quantitative Comparison
Column A y-6 Column B -3
If y gt 2
a. the quantity in column A is always greater
b. the quantity in column B is always greater
c. the quantities are always equal
d. It cannot be determined from the information
given

22
Sample Quantitative Questions

Problem Solving
The sum of x distinct integers greater than zero
is less than 75. What is the greatest possible
value of x ?
a. 8
b. 9
c. 10
d. 11
e. 12

23
Sample Analytical Questions

A pastry shop will feature 5 desserts-- V,W,X,Y
Z-- to be served Monday thru Friday, one dessert
a day, that conforms to the following
restrictions
Y must be served before V.
X and Y must be served on consecutive days.
Z may not be the second dessert to be served.

24
The Graduate Record Exam Subject

The subject test has 220 5-choice multiple choice
questions
Currently have subject tests in Biochemistry,
Cell and Molecular Biology Biology Chemistry
Computer Science Literature in English
Mathematics Physics Psychology
In psychology
43 Experimental/natural science
43 social science
14 general

25
Reliability

Within-test reliability 0.9
Test re-test reliability is not so good Repeat
test takers for both tests show an average score
gain of 20-30 points
This may move a student by a large amount more
than 10 percentiles
Standard error of measurement of about 35 points

26
Validity

In one meta-analysis by Sternberg and Williams,
they point out that empirical validities of the
GRE vary somewhat by field
Tests correlate with each other
Verbal and quantitative 0.45
Quantitative and analytical 0.66
GRE correlations between various combinations of
GRE scores and grad school performance are only
between 0.25 and 0.35, and only marginally better
(0.4) if you include undergraduate grades

27
Validation Correlations of GRE Scores
28
Correlations of GRE Scores

You can estimate your IQ from GRE/SAT scores at
http//members.shaw.ca/delajara/GREIQ.html
GRE VQ 1240 IQ 130
N.B. I have no idea how valid this sites
claims are.

29
Subject Test Validity

Kuncel, N. R., Hezlett, S. A., Ones, D. S.
(2001). A comprehensive meta-analysis of the
predictive validity of the graduate record
examinations Implications for graduate student
selection and performance. Psychological
Bulletin, 127 (1), 162-181.
N 1,753 studies, together covering 82,659
graduate students
Subject Tests tended to be better predictors than
the Verbal, Quantitative, and Analytical tests.
GRE correlations with degree attainment and
research productivity were consistently positive
however, some lower 90 credibility intervals
included 0.

30
Construct Validity

Does the GRE get at anything related to graduate
school?
What about motivation, creativity, devotion,
conscientiousness, and other aspects that make a
successful graduate student?
Some complaints
Graduate assignments require that students
develop research skills, but GRE does not test
this
GRE is timed but real life is rarely timed
GRE is individualised but real work usually
involves collaboration

31
Why is the GRE so popular?

Because is in the public eye
Since average scores for admissions on tests such
as the GRE are published, there is pressure on
schools to keep the average scores of the
students that they accept high so that they can
remain competitive with other institutions in
the public eye
One strength of the GR that they have specific
regression equation by college i.e. they can
predict future performance at a particular
college independently
Because there is relatively little variation in
their reference letters and undergraduate GPA,
GRE scores are one main sources of the variation
that is needed to rank applicants
P.S. A new GRE is scheduled to come out in 2006

32
The Scholastic Aptitude Test

The SAT is a set of tests
SAT I includes the Verbal and Math tests, whose
scores are summed to get the total score
SAT II has tests in 12 subject fields
Like the GRE, the SAT test is timed and corrected
for guessing
Range for each subtest (Verbal/Math) is 200-800
(mean SD 500 100)

33
The Scholastic Aptitude Test

First normed in 1941, re-normed in 1995 on a more
carefully-chosen group
There was an 80 point increase in verbal at most
score ranges (e.g., an 1941 score of 500 would
now be 580)
Math scores were up by about 40 points at lower
ranges only

34
Some Predictive Tests The SAT

Internal reliability 0.90
Standard error of about 30 points
SAT r 0.4 with university GPA
By comparison, high school grade r 0.48
Together, r 0.55

35
Can you beat the standards?

Notwithstanding the huge industry waiting to take
money from anxious high school students, studying
for the SAT doesn't help much
SAT coaching increases scores by about 15 points,
which is 0.15 SDs
Repeat testing increases it a little less, about
12 points or 0.12 SDs
How much should we pay for 0.1 SDs?

36
Some Predictive Tests Professional tests

Professional school tests (MCAT, LSAT)
MCAT r low .80s
LSAT r gt 0.9
There is relatively little evidence of validity
They predict performance about as well as
undergraduate GPA alone r 0.25 - 0.3

37
Some Predictive Tests The Strong Interest
Inventory

The Strong (1927) Interest Inventory
(Strong-Campbell, 1981) widely used test of
interests as predictors of professional aptitude
Empirically constructed with concurrent validity,
comparing each vocational group to the overall
average
Has 325 items, 162 scales covering 85 occupations
Reliability is high
0.9test/retest over weeks 0.6-0.7 over years
unless they were old ( 25 years) at first test,
then 0.8 even after 20 years
Does not predict success or satisfaction in a
profession
Does predict likelihood of entering and remaining
in a profession chances of 50 that a person
will end up in a profession most strongly
predicted (A score), and only 12 that he will
end in one least predicted (C score)

38
Prediction in scientific psychology

Prediction scientific explanation are related
We admire Newton's laws precisely because they
are accurate in predicting real phenomena
Many cognitive models in psychology are weak
because they are purely descriptive they fail to
make an effort to predict how a person will
perform on unseen stimuli
There are many ways to do so, if you have
sufficient variation in predictors multiple
regression, neural networks, 'cheap' methods
(i.e. best single predictor)

39
Some lessons about scientific prediction

Models can 'cheat' by using variance in the input
data set that does not transfer to unseen data
you must test your predictions on unseen data (
cross-validation)
Some models that are very good may be very good
precisely because they are very good at using
this 'within-set' variation
Even very simple non-linear models may do as well
or better than than much more complex models,
especially linear models
Eg. r 0.48 (validation set r 0.58)
Linear regression r 0.22 (validation set r
0.20)
They may exclude highly-correlated variables
Different measures of successful prediction may
yield quite different results (i.e. test
correlation versus correlation after binning into
0.5 SD intervals)

40
Some lessons about scientific prediction

Linear assumptions may be limiting You may hide
variance just by taking on the assumption
More predictive power may sometimes (perhaps
often) be obtained by dropping the assumptions
of linear relations between predictors and the
quality to be predicted

Write a Comment

User Comments (0)