Overview

1 / 40
About This Presentation
Title:

Overview

Description:

... 'certainty' is philosophical, not real: in the absence of knowing which group you ... organization, serious problems in sentence structure, usage errors ... – PowerPoint PPT presentation

Number of Views:67
Avg rating:3.0/5.0
Slides: 41
Provided by: chriswe

less

Transcript and Presenter's Notes

Title: Overview


1
Predictive Tests
2
Overview
  • Introduction
  • Some theoretical issues
  • The failings of human intuitions in prediction
  • Issues in formal prediction
  • Inference from class membership The individual
    versus group problem (and its only solution)
  • Some well-known predictive tests
  • Prediction in science and psychometrics

3
Predictive Tests
  • Many tests are used to make predictions, of
    levels of achievement or success, or of
    likelihood of recidivism, or diagnostic category
  • Two kinds of predictions
  • Categorical Predict which category this subject
    will fall into (diagnosis, occupation)
  • Numerical Predict the value of a relevant
    numerical value (GPA, economic return to company)

4
The failings of human intuition
  • We have already seen many ways in which humans
    succumb to errors in numerical reasoning
  • Kahneman Tversky Asked subjects about areas of
    graduate specialization base rate estimation,
    estimates (from a description) of similarity to
    other students in each field, and predictive
    estimate (also from a description)

5
Results
  • Results
  • Similarity and prediction correlate at 0.97
  • Similarity and base rates correlate at -0.65
  • What does this result remind you of?
  • What do these subjects need to be taught?

6
6 Errors discussed by Kahneman Tversky
  • Representativeness error Assumes predictions are
    not different from assessments of similarity
  • Insufficient regression error People fail to
    take into account that when predictive validity
    is less than perfect, correlations between
    predictors and performance should be lt 1
  • Central tendency error Subjects making judgments
    tend to avoid extremes, and compress their
    judgments into a smaller range than the
    phenomenon being judged

7
6 Errors discussed by Kahneman Tversky
  • Discounting of prior probabilities Human
    predictors will throw out base rate information
    for almost any reason
  • Overweighting of coherence There is greater
    confidence in predictions based on consistent
    input than inconsistent input with the same
    average (i.e. two B's is better than a B C for
    predicting a B average)
  • Overweighting of extremes Confidence in judgment
    is over-weighted at extremes, especially positive
    extremes ( j-shaped confidence function)

8
What do we need to make good predictions?
  • We need three pieces of information
  • 1.) Base rates
  • 2.) Relevant predictors in the individual case
  • 3.) Bounds on accuracy (cutting scores)
  • Kahneman Tversky's experimental evidence
    (previous slides) show that subjects usually fail
    to weight any of these three properly

9
Review Measuring validation error
  • Coefficient of alienation (or coefficient of
    non-determination) k (1 - r2), where r is
    correlation of test score with some predicted
    performance
  • k the proportion of the error inherent in
    guessing that your estimate has (percent of
    variance not accounted for)
  • If k 1.0, you have 100 of the error youd have
    had if you just guessed (since this means your r
    was 0)
  • If k 0, you have achieved perfection your r
    was 1, and there was no error at all
  • If k 0.6, you have 60 of the error youd have
    had if you guessed

N.B. This never happens.
10
Why should we care?
  • We care because r/k are useful in interpreting
    accuracy of an individuals scores
  • r 0.6 (good), k 0.64 (not good)
  • r 0.7 (great), k 0.51 (not so great)
  • r 0.9 (fantastic!), k 0.19 (so so)

11
Why should we care?
  • Since even high values of r (0.9) leave a fairly
    large proportion of variance unaccounted for, the
    prediction of any individuals criterion score is
    always accompanied by a wide margin of error
  • Recall Smr S (1 - r)0.5 --gt Individual error
    margins are a function of how good our
    correlation is
  • The moral Predicting individual performance is
    really hard to do!

12
What can we infer from class membership?
  • Some commentators have suggested that inference
    from class membership is inherently fallacious
  • i.e. 25 of first-degree relatives of those
    diagnosed with malignant melanoma (skin cancer)
    will also develop melanoma
  • I am a first-degree relative of two persons
    diagnosed with melanoma, so I take my odds of
    developing the disease to be gt 25
  • Critics of the inference say No, it is either 0
    (I don't develop the disease) or 100 (I do)
    i.e. group probabilities don't apply to
    individuals

13
Do group probabilities apply to individuals?
  • Meehl's response "If nothing is rationally
    inferable from membership in a class, no
    empirical prediction is ever possible"
  • The argument is a re-statement of the necessity
    of inference even in the case of predicting
    individual behavior from that individual's data,
    we need to consider the pattern over past data
  • Moreover, claim of 'certainty' is philosophical,
    not real in the absence of knowing which group
    you are in, there is only probability, not
    certain knowledge

14
"One incident that occurred while future Nobel
Laureate Kenneth Arrow was forecasting the
weather illustrates both uncertainty and the
human unwillingness to accept it. Some officers
had been assigned the task of forecasting the
weather a month ahead, but Arrow and his
statisticians found that their long-range
forecasts were no better than numbers pulled out
of a hat. The forecasters agreed and asked their
superiors to be relieved of this duty. The reply
was 'The Commanding General is well aware that
the forecasts are no good. However, he needs them
for planning purposes'." Peter
Bernstein Against The Gods- The Remarkable Story
of Risk
15
Some Predictive Tests Standardized admission
tests
  • The Scholastic Aptitude Tests (SAT, GREs) are
    highly reliable tests developed to painstaking
    psychometric standards
  • The reference norm group changes every year The
    reference group for 2003 scores was based on
    examinees from 1998-2001 and the reference group
    for 2004 scores was based on examinees from
    1999-2002.
  • For this reason, the same score may have a
    (slightly) different percentile rank in one year
    than in another

16
The Graduate Record Exam General
  • The GRE is a computerized standardized test taken
    by individuals applying to graduate school.
  • Its purpose is to measure the acquired skills of
    the test taker, and to predict performance in
    graduate school.
  • The general GRE has four sections
  • Verbal Section 30 questions, 30 minutes
  • Quantitative Section 28 questions, 45 minutes
  • Analytical Writing Section 2 Analytical Writing
    Tasks
  • 45-minute "Present Your Perspective on an Issue"
    task)
  • 30-minute "Analyze an Argument" task
  • Research sections
  • The test is timed, and corrected for guessing
  • It is also computer adaptive questions depend
    on answers

17
The Graduate Record Exam Writing
  • Score on a 6 point scale (mean SD 4.18
    0.97)
  • 6 Insightful analyses of complex ideas,
    logically compelling, well organized, skillful
    sentence variety, few or no usage errors
  • 5 Generally thoughtful analysis of ideas,
    logically sound reasons, generally well
    organized, sentence variety conveys meaning,
    minor usage errors
  • 4 Competent analysis of ideas, relevant
    reasons, adequately organized, satisfactory
    control of sentence structure, some usage errors
  • 3 Some competence but flawed by at least one
    of limited analysis or development, weak
    organization or control of sentence structure,
    usage errors that result in vagueness
  • 2 Serious weakness in at least one of lack
    of analysis, development, or organization,
    serious problems in sentence structure, usage
    errors obscure meaning
  • 1 Fundamental deficiencies content that is
    confusing or irrelevant, little or no
    development, pervasive errors that result in
    incoherence

18
Sample Verbal Questions
  • Analogies
  • ETERNAL END
  • a. precursory beginning
  • b. grammatical sentence
  • c. implausible credibility
  • d. invaluable worth
  • e. frenetic movement

19
Sample Verbal Questions
  • Sentence Completions
  • Museums, which house many paintings and
    sculptures, are good places for students of
    _____.
  • a. art
  • b. science
  • c. religion
  • d. dichotomy
  • e. democracy

20
Sample Verbal Questions
  • Antonyms
  • MALADROIT
  • a. ill-willed
  • b. dexterous
  • c. cowardly
  • d. enduring
  • e. sluggish

21
Sample Quantitative Questions
  • Quantitative Comparison
  • Column A y-6 Column B -3
  • If y gt 2
  • a. the quantity in column A is always greater
  • b. the quantity in column B is always greater
  • c. the quantities are always equal
  • d. It cannot be determined from the information
    given

22
Sample Quantitative Questions
  • Problem Solving
  • The sum of x distinct integers greater than zero
    is less than 75. What is the greatest possible
    value of x ?
  • a. 8
  • b. 9
  • c. 10
  • d. 11
  • e. 12

23
Sample Analytical Questions
  • A pastry shop will feature 5 desserts-- V,W,X,Y
    Z-- to be served Monday thru Friday, one dessert
    a day, that conforms to the following
    restrictions
  • Y must be served before V.
  • X and Y must be served on consecutive days.
  • Z may not be the second dessert to be served.

24
The Graduate Record Exam Subject
  • The subject test has 220 5-choice multiple choice
    questions
  • Currently have subject tests in Biochemistry,
    Cell and Molecular Biology Biology Chemistry
    Computer Science Literature in English
    Mathematics Physics Psychology
  • In psychology
  • 43 Experimental/natural science
  • 43 social science
  • 14 general

25
Reliability
  • Within-test reliability 0.9
  • Test re-test reliability is not so good Repeat
    test takers for both tests show an average score
    gain of 20-30 points
  • This may move a student by a large amount more
    than 10 percentiles
  • Standard error of measurement of about 35 points

26
Validity
  • In one meta-analysis by Sternberg and Williams,
    they point out that empirical validities of the
    GRE vary somewhat by field
  • Tests correlate with each other
  • Verbal and quantitative 0.45
  • Quantitative and analytical 0.66
  • GRE correlations between various combinations of
    GRE scores and grad school performance are only
    between 0.25 and 0.35, and only marginally better
    (0.4) if you include undergraduate grades

27
Validation Correlations of GRE Scores
28
Correlations of GRE Scores
  • You can estimate your IQ from GRE/SAT scores at
  • http//members.shaw.ca/delajara/GREIQ.html
  • GRE VQ 1240 IQ 130
  • N.B. I have no idea how valid this sites
    claims are.

29
Subject Test Validity
  • Kuncel, N. R., Hezlett, S. A., Ones, D. S.
    (2001). A comprehensive meta-analysis of the
    predictive validity of the graduate record
    examinations Implications for graduate student
    selection and performance. Psychological
    Bulletin, 127 (1), 162-181.
  • N 1,753 studies, together covering 82,659
    graduate students
  • Subject Tests tended to be better predictors than
    the Verbal, Quantitative, and Analytical tests.
  • GRE correlations with degree attainment and
    research productivity were consistently positive
    however, some lower 90 credibility intervals
    included 0.

30
Construct Validity
  • Does the GRE get at anything related to graduate
    school?
  • What about motivation, creativity, devotion,
    conscientiousness, and other aspects that make a
    successful graduate student?
  • Some complaints
  • Graduate assignments require that students
    develop research skills, but GRE does not test
    this
  • GRE is timed but real life is rarely timed
  • GRE is individualised but real work usually
    involves collaboration

31
Why is the GRE so popular?
  • Because is in the public eye
  • Since average scores for admissions on tests such
    as the GRE are published, there is pressure on
    schools to keep the average scores of the
    students that they accept high so that they can
    remain competitive with other institutions in
    the public eye
  • One strength of the GR that they have specific
    regression equation by college i.e. they can
    predict future performance at a particular
    college independently
  • Because there is relatively little variation in
    their reference letters and undergraduate GPA,
    GRE scores are one main sources of the variation
    that is needed to rank applicants
  • P.S. A new GRE is scheduled to come out in 2006

32
The Scholastic Aptitude Test
  • The SAT is a set of tests
  • SAT I includes the Verbal and Math tests, whose
    scores are summed to get the total score
  • SAT II has tests in 12 subject fields
  • Like the GRE, the SAT test is timed and corrected
    for guessing
  • Range for each subtest (Verbal/Math) is 200-800
    (mean SD 500 100)

33
The Scholastic Aptitude Test
  • First normed in 1941, re-normed in 1995 on a more
    carefully-chosen group
  • There was an 80 point increase in verbal at most
    score ranges (e.g., an 1941 score of 500 would
    now be 580)
  • Math scores were up by about 40 points at lower
    ranges only

34
Some Predictive Tests The SAT
  • Internal reliability 0.90
  • Standard error of about 30 points
  • SAT r 0.4 with university GPA
  • By comparison, high school grade r 0.48
  • Together, r 0.55

35
Can you beat the standards?
  • Notwithstanding the huge industry waiting to take
    money from anxious high school students, studying
    for the SAT doesn't help much
  • SAT coaching increases scores by about 15 points,
    which is 0.15 SDs
  • Repeat testing increases it a little less, about
    12 points or 0.12 SDs
  • How much should we pay for 0.1 SDs?

36
Some Predictive Tests Professional tests
  • Professional school tests (MCAT, LSAT)
  • MCAT r low .80s
  • LSAT r gt 0.9
  • There is relatively little evidence of validity
  • They predict performance about as well as
    undergraduate GPA alone r 0.25 - 0.3

37
Some Predictive Tests The Strong Interest
Inventory
  • The Strong (1927) Interest Inventory
    (Strong-Campbell, 1981) widely used test of
    interests as predictors of professional aptitude
  • Empirically constructed with concurrent validity,
    comparing each vocational group to the overall
    average
  • Has 325 items, 162 scales covering 85 occupations
  • Reliability is high
  • 0.9test/retest over weeks 0.6-0.7 over years
    unless they were old ( 25 years) at first test,
    then 0.8 even after 20 years
  • Does not predict success or satisfaction in a
    profession
  • Does predict likelihood of entering and remaining
    in a profession chances of 50 that a person
    will end up in a profession most strongly
    predicted (A score), and only 12 that he will
    end in one least predicted (C score)

38
Prediction in scientific psychology
  • Prediction scientific explanation are related
  • We admire Newton's laws precisely because they
    are accurate in predicting real phenomena
  • Many cognitive models in psychology are weak
    because they are purely descriptive they fail to
    make an effort to predict how a person will
    perform on unseen stimuli
  • There are many ways to do so, if you have
    sufficient variation in predictors multiple
    regression, neural networks, 'cheap' methods
    (i.e. best single predictor)

39
Some lessons about scientific prediction
  • Models can 'cheat' by using variance in the input
    data set that does not transfer to unseen data
    you must test your predictions on unseen data (
    cross-validation)
  • Some models that are very good may be very good
    precisely because they are very good at using
    this 'within-set' variation
  • Even very simple non-linear models may do as well
    or better than than much more complex models,
    especially linear models
  • Eg. r 0.48 (validation set r 0.58)
  • Linear regression r 0.22 (validation set r
    0.20)
  • They may exclude highly-correlated variables
  • Different measures of successful prediction may
    yield quite different results (i.e. test
    correlation versus correlation after binning into
    0.5 SD intervals)

40
Some lessons about scientific prediction
  • Linear assumptions may be limiting You may hide
    variance just by taking on the assumption
  • More predictive power may sometimes (perhaps
    often) be obtained by dropping the assumptions
    of linear relations between predictors and the
    quality to be predicted
Write a Comment
User Comments (0)