Mitch Robinson - PowerPoint PPT Presentation

1 / 41
About This Presentation
Title:

Mitch Robinson

Description:

Rank correlation is easier to simulate than Pearson correlation; however, as ... Sources: 67th Military Operations Research Symposium, 1999; 32nd Annual DoD Cost ... – PowerPoint PPT presentation

Number of Views:42
Avg rating:3.0/5.0
Slides: 42
Provided by: mitchrobin
Category:
Tags: mitch | robinson

less

Transcript and Presenter's Notes

Title: Mitch Robinson


1
Rank Correlation inCrystal BallÒ Simulations
(or How We Overcame Our Fear of Spearmans Rin
Cost Risk Analyses)
  • Mitch Robinson Sandi Cole
  • June 11-14, 2002

2
Fear
Why should you fear rank correlation in cost risk
analyses?
3
Fear2
Fear of Rank Correlation
Cost risk researchers have recently questioned
the use of Monte Carlo tools that simulate cost
driver correlations using rank correlation
methods
Crystal Ball and _at_Risk use rank correlation
methods. Rank correlation is easier to simulate
than Pearson correlation however, as we've seen,
rank correlation is not appropriate for cost risk
analyses. Sources 67th Military Operations
Research Symposium, 1999 32nd Annual DoD Cost
Analysis Symposium, 1999 ISPA/SCEA Joint
Meeting, 2001
Why do they fear rank correlation?
4
Rank Correlation
Rank correlation measures how consistently one
variable increases (or decreases) in a second
variablemonotonicity between two variables
1 one of two variables strictly increases in the
other -1 if one of two variables strictly
decreases in the other ? (-1,1) if one of two
variables is constant or variably increases and
decreases in the second. The Spearman r is one
rank correlation measure.
5
Spearman Rank Correlation
Y strictly decreases in X Spearman r -1.00.
6
Product Moment Correlation (1)
  • However, our statistical tools generally
    require linear association measureshow
    consistently do two variables covary linearly?
  • 1 if the two variables covary on a
    positively-sloped line
  • -1 if the two variables covary on a
    negatively-sloped line
  • ? (-1,1) if one variable is constant in the
    second variable or
  • the two dont covary linearly.
  • The Pearson r addresses linear covariation.

7
Pearson Product Moment Correlation
Same numbers. Pearson r -0.18.
The regression line modeling linearity is about
Y 142 19 ? X.
8
Spearman vs. Pearson
Monotonicity does not imply linearity!
Spearman r -1.00 Pearson r -0.18
9
Whats Up with Crystal Ball (1)
Crystal BallÒ implements Iman and Conovers
(1982) method for simulating rank
correlation. Iman and Conovers step-by-step
mathematical logic proves their algorithm for
simulating Spearman correlations.
10
Whats Up with Crystal Ball (2)
  • However, Crystal BallÒ nominally uses the
    Iman-Conover algorithm to simulate Pearson rs.
  • This assumption does not follow from the
    Iman-Conover logic and is thus sensibly suspect.
  • Weve seen the potential for bad disconnects
    between Spearman rs and Pearson rs for the same
    sets of numbers.
  • We should thus want to know, Does Crystal Ball Ò
    produce correlations that accurately match our
    intended Pearson-sense target correlations?

11
This Presentation
  • Can Crystal BallÒ accurately simulate Pearson
    correlations?
  • Are there conditions or practices that contribute
    to better or worse accuracy performance?

12
The General Approach (1)
  • Define 33 inputs and respective probability
    distributions in an ExcelÒ spreadsheet using
    Crystal BallÒassumption cells.
  • Define 33 outputs in an ExcelÒ spreadsheet using
    Crystal BallÒ, each equal to an assumption
    cellforecast cells.

13
The General Approach (2)
  • Define a target correlation matrix.
  • Configure the simulation using the Crystal BallÒ
    Run Preferences menu.

14
The General Approach (3)
  • Run 10,000 simulation trials.
  • Calculate the simulated Pearson correlations by
    applying the ExcelÒ correlation tool (in
    Tools-Data Analysis) to the extracted,
    forecast cell outputs.
  • Compare the simulated Pearson correlations with
    their respective target correlations.

15
The First Study (1)
  • 33 variablesyielded 528 correlations for
    accuracy tests.
  • Identical target correlations among the
    variables.
  • Identical triangular (0,0.25, 1.0) probability
    distributionsslightly right skewed with mode
    0.25, mean 0.42.

16
The Tr (0, 0.25, 1.0) Distribution
17
The First Study (2)
  • 10,000 trials10,000 numbers for each variable.
  • Correlation sample 10,000i.e., apply the
    correlation algorithm to the entire set of
    numbers.
  • If correlation sample 1000 Crystal BallÒ would
    apply the algorithm 10 times, to successive
    batches comprising 1000 trials per variable and
    33 variables.
  • 3 x 4 study designrun the 10,000 simulation
    trials under each of 12 separate conditions
  • target correlation 0.25, 0.50, or 0.75.
  • starting seed 1 2 1,048,576 or 2,097,152the
    four number streams are mutually nonoverlapping
    over their first 2 million members.

18
First Study Results (1)
  • More than 98 of the 6336 simulated correlations
    were within 0.03 of the target all but 5 were
    within 0.05 of the target all were within 0.06
    of the target.
  • Seed 2 produced 73 of the 99 simulated
    correlations that missed their target by more
    than 0.03 all of the 5 that missed their target
    by more than 0.05.
  • Nearly 75 of the simulated correlations were
    less than their target this varied only
    negligibly over the three targets seed 2
    produced a 68/32 split.

19
First Study Results (2)
20
The Second Study (1)
The first study related every variable to every
other variable. Did this dense correlation
network528 nonzero correlations interconnecting
all 33 variable pairsdrive the correlation
accuracy results?
21
The Second Study (2)
  • Assigned nonzero correlations only to (x1, x2),
    (x3, x4), (x31, x32), reducing the correlation
    yield from 528 to 16 in each replicationi.e., 3
    targets x 4 seeds.
  • Other second study design choices are identical
    to their first study counterparts.

22
Second Study Results (1)
  • All of the 192 simulated correlations were within
    0.02 of their target.
  • All of the simulated correlations were less than
    the target.

23
Second Study Results (2)(First Study Results)
24
The Third Study (1)
  • Are there conditions or practices that worsen
    accuracy performance?
  • Simulating correlations requires a large
    sample of random values generated ahead of time.
    The values in the samples are rearranged to
    create the desired correlations. If the
    correlation sample size is smaller than the
    total number of trials a next group of samples
    is generated and correlated.
  • Crystal Ball 2000 Users Manual. pp. 246-7.

25
The Third Study (2)
  • Are there conditions or practices that worsen
    accuracy performance?
  • The sample size is initially set to 500. While
    any sample size greater than 100 should produce
    sufficiently acceptable results, you can set this
    number higher to maximize accuracy. The increased
    accuracy resulting from the use of larger
    samples, however, requires additional memory and
    reduces overall system responsiveness. If either
    of these become an issue, reduce the sample
    size.
  • Crystal Ball 2000 Users Manual. pp. 246-7.

26
The Third Study (3)
  • Correlation sample size 100configure Crystal
    BallÒ to apply the correlation algorithm 100
    times, to successive batches comprising 100
    trials per variable and 33 variables.
  • Other third study design choices were identical
    to their first study counterparts.

27
Third Study Results (1) (First Study Results)
  • About 31 (98) of the 6336 simulated
    correlations were within 0.03 of the target 2817
    (5) were outside 0.05 of the targetall were
    within 0.29 (0.06) of the target.
  • About 92 (74) of the simulated correlations
    were less than the target.

28
Third Study Results (2)
29
Third Study Results(4) (First Study Results)
30
The Fourth Study (1)
  • Does increasing the correlation sample size from
    100 to 500 well improve accuracy?
  • 500 is Crystal BallsÒ installation default for
    the correlation sampleThe sample size is
    initially set to 500. Source Crystal Ball 2000
    Users Manual. pp. 246-7 also see the Trials
    tab in the Crystal BallÒ Run Preferences menu.

31
The Fourth Study (2)
  • Correlation sample size of 500i.e., configure
    Crystal BallÒ to apply the correlation algorithm
    20 times, to successive batches comprising 500
    trials per variable and 33 variables.
  • Examine only target correlation 0.75, for which
    we catastrophically lost accuracy in the third
    study.
  • Other fourth study design choices were identical
    to their first and third study counterparts.

32
Fourth Study Results (1) (First/Third Study
Results for Target 0.75)
  • About 79 (99/2) of the 2112 simulated
    correlations were within 0.03 of the target
  • 100 were within 0.09 (0.05/0.29) of the target.
  • About 93 (74/92) of the simulated correlations
    were less than the target.

33
Fourth Study Results (2)
34
Fourth Study Results (3)
35
The Extended Fourth Study
  • Side-by-side examination of correlation sample
    sizes 100, 500, 1000, and 10,000 for target
    correlation 0.75 and accuracy bands 0.02,
    0.04, and 0.06.
  • Other fourth study design choices were identical
    to their first and third study counterparts.

36
Extended Fourth Study Results
37
Lesson Learned (1)
  • Dont fear rank order correlation as a general
    principle Crystal BallÒ produced pretty accurate
    Pearson correlations in the first and second
    studies.
  • This was surprising given the theoryonly
    showing that we really dont understand the
    theory.
  • Correlation accuracy collapsed after minimizing
    the correlation sample size.
  • Accuracy fell apart asymmetrically,
    concentrating on where we most need accuracy,
    among the larger correlations.

38
Lesson Learned (2)
  • There was a clear tendency to undershoot target
    correlations.
  • This may have been predictable. Strong Spearman
    rs can accompany weak Pearson rs , but not
    vice-versa.
  • Dont put all of your simulations in one seed
    basket.
  • Seed 2 provided somewhat weaker results than
    the other seeds in the first study. Replicating
    the simulation using other seeds exposed the
    weaker, atypical results.

39
Acknowledgements
To Ed Miller for encouraging this study. To Eric
Wainwright and Decisioneering, Inc. for supplying
us a Crystal BallÒ.
40
References
Decisioneering, Inc. Crystal Ball 2000Ò User
Manual. 1998-2000. Iman, R.L. and W.J. Conover.
A distribution-free approach to inducing rank
correlation among input variables.
Communications in Statistics, B11 (3), pp.
311-334, 1982. Kelton, W.D. and A.M. Law.
Simulation Modeling Analysis. New York McGraw
Hill, 1991.
41
The End
Write a Comment
User Comments (0)
About PowerShow.com