Title: Mitch Robinson
1Rank Correlation inCrystal BallÒ Simulations
(or How We Overcame Our Fear of Spearmans Rin
Cost Risk Analyses)
- Mitch Robinson Sandi Cole
- June 11-14, 2002
2Fear
Why should you fear rank correlation in cost risk
analyses?
3Fear2
Fear of Rank Correlation
Cost risk researchers have recently questioned
the use of Monte Carlo tools that simulate cost
driver correlations using rank correlation
methods
Crystal Ball and _at_Risk use rank correlation
methods. Rank correlation is easier to simulate
than Pearson correlation however, as we've seen,
rank correlation is not appropriate for cost risk
analyses. Sources 67th Military Operations
Research Symposium, 1999 32nd Annual DoD Cost
Analysis Symposium, 1999 ISPA/SCEA Joint
Meeting, 2001
Why do they fear rank correlation?
4Rank Correlation
Rank correlation measures how consistently one
variable increases (or decreases) in a second
variablemonotonicity between two variables
1 one of two variables strictly increases in the
other -1 if one of two variables strictly
decreases in the other ? (-1,1) if one of two
variables is constant or variably increases and
decreases in the second. The Spearman r is one
rank correlation measure.
5Spearman Rank Correlation
Y strictly decreases in X Spearman r -1.00.
6Product Moment Correlation (1)
- However, our statistical tools generally
require linear association measureshow
consistently do two variables covary linearly? - 1 if the two variables covary on a
positively-sloped line - -1 if the two variables covary on a
negatively-sloped line - ? (-1,1) if one variable is constant in the
second variable or - the two dont covary linearly.
- The Pearson r addresses linear covariation.
7Pearson Product Moment Correlation
Same numbers. Pearson r -0.18.
The regression line modeling linearity is about
Y 142 19 ? X.
8Spearman vs. Pearson
Monotonicity does not imply linearity!
Spearman r -1.00 Pearson r -0.18
9Whats Up with Crystal Ball (1)
Crystal BallÒ implements Iman and Conovers
(1982) method for simulating rank
correlation. Iman and Conovers step-by-step
mathematical logic proves their algorithm for
simulating Spearman correlations.
10Whats Up with Crystal Ball (2)
- However, Crystal BallÒ nominally uses the
Iman-Conover algorithm to simulate Pearson rs. - This assumption does not follow from the
Iman-Conover logic and is thus sensibly suspect. - Weve seen the potential for bad disconnects
between Spearman rs and Pearson rs for the same
sets of numbers. - We should thus want to know, Does Crystal Ball Ò
produce correlations that accurately match our
intended Pearson-sense target correlations?
11This Presentation
- Can Crystal BallÒ accurately simulate Pearson
correlations? - Are there conditions or practices that contribute
to better or worse accuracy performance?
12The General Approach (1)
- Define 33 inputs and respective probability
distributions in an ExcelÒ spreadsheet using
Crystal BallÒassumption cells. - Define 33 outputs in an ExcelÒ spreadsheet using
Crystal BallÒ, each equal to an assumption
cellforecast cells.
13The General Approach (2)
- Define a target correlation matrix.
- Configure the simulation using the Crystal BallÒ
Run Preferences menu.
14The General Approach (3)
- Run 10,000 simulation trials.
- Calculate the simulated Pearson correlations by
applying the ExcelÒ correlation tool (in
Tools-Data Analysis) to the extracted,
forecast cell outputs. - Compare the simulated Pearson correlations with
their respective target correlations.
15The First Study (1)
- 33 variablesyielded 528 correlations for
accuracy tests. - Identical target correlations among the
variables. - Identical triangular (0,0.25, 1.0) probability
distributionsslightly right skewed with mode
0.25, mean 0.42.
16The Tr (0, 0.25, 1.0) Distribution
17The First Study (2)
- 10,000 trials10,000 numbers for each variable.
- Correlation sample 10,000i.e., apply the
correlation algorithm to the entire set of
numbers. - If correlation sample 1000 Crystal BallÒ would
apply the algorithm 10 times, to successive
batches comprising 1000 trials per variable and
33 variables. - 3 x 4 study designrun the 10,000 simulation
trials under each of 12 separate conditions - target correlation 0.25, 0.50, or 0.75.
- starting seed 1 2 1,048,576 or 2,097,152the
four number streams are mutually nonoverlapping
over their first 2 million members.
18First Study Results (1)
- More than 98 of the 6336 simulated correlations
were within 0.03 of the target all but 5 were
within 0.05 of the target all were within 0.06
of the target. - Seed 2 produced 73 of the 99 simulated
correlations that missed their target by more
than 0.03 all of the 5 that missed their target
by more than 0.05. - Nearly 75 of the simulated correlations were
less than their target this varied only
negligibly over the three targets seed 2
produced a 68/32 split.
19First Study Results (2)
20The Second Study (1)
The first study related every variable to every
other variable. Did this dense correlation
network528 nonzero correlations interconnecting
all 33 variable pairsdrive the correlation
accuracy results?
21The Second Study (2)
- Assigned nonzero correlations only to (x1, x2),
(x3, x4), (x31, x32), reducing the correlation
yield from 528 to 16 in each replicationi.e., 3
targets x 4 seeds. - Other second study design choices are identical
to their first study counterparts.
22Second Study Results (1)
- All of the 192 simulated correlations were within
0.02 of their target. - All of the simulated correlations were less than
the target.
23Second Study Results (2)(First Study Results)
24The Third Study (1)
- Are there conditions or practices that worsen
accuracy performance? - Simulating correlations requires a large
sample of random values generated ahead of time.
The values in the samples are rearranged to
create the desired correlations. If the
correlation sample size is smaller than the
total number of trials a next group of samples
is generated and correlated. - Crystal Ball 2000 Users Manual. pp. 246-7.
25The Third Study (2)
- Are there conditions or practices that worsen
accuracy performance? - The sample size is initially set to 500. While
any sample size greater than 100 should produce
sufficiently acceptable results, you can set this
number higher to maximize accuracy. The increased
accuracy resulting from the use of larger
samples, however, requires additional memory and
reduces overall system responsiveness. If either
of these become an issue, reduce the sample
size. - Crystal Ball 2000 Users Manual. pp. 246-7.
26The Third Study (3)
- Correlation sample size 100configure Crystal
BallÒ to apply the correlation algorithm 100
times, to successive batches comprising 100
trials per variable and 33 variables. - Other third study design choices were identical
to their first study counterparts.
27Third Study Results (1) (First Study Results)
- About 31 (98) of the 6336 simulated
correlations were within 0.03 of the target 2817
(5) were outside 0.05 of the targetall were
within 0.29 (0.06) of the target. - About 92 (74) of the simulated correlations
were less than the target.
28Third Study Results (2)
29Third Study Results(4) (First Study Results)
30The Fourth Study (1)
- Does increasing the correlation sample size from
100 to 500 well improve accuracy? - 500 is Crystal BallsÒ installation default for
the correlation sampleThe sample size is
initially set to 500. Source Crystal Ball 2000
Users Manual. pp. 246-7 also see the Trials
tab in the Crystal BallÒ Run Preferences menu.
31The Fourth Study (2)
- Correlation sample size of 500i.e., configure
Crystal BallÒ to apply the correlation algorithm
20 times, to successive batches comprising 500
trials per variable and 33 variables. - Examine only target correlation 0.75, for which
we catastrophically lost accuracy in the third
study. - Other fourth study design choices were identical
to their first and third study counterparts.
32Fourth Study Results (1) (First/Third Study
Results for Target 0.75)
- About 79 (99/2) of the 2112 simulated
correlations were within 0.03 of the target - 100 were within 0.09 (0.05/0.29) of the target.
- About 93 (74/92) of the simulated correlations
were less than the target.
33Fourth Study Results (2)
34Fourth Study Results (3)
35The Extended Fourth Study
- Side-by-side examination of correlation sample
sizes 100, 500, 1000, and 10,000 for target
correlation 0.75 and accuracy bands 0.02,
0.04, and 0.06. - Other fourth study design choices were identical
to their first and third study counterparts.
36Extended Fourth Study Results
37Lesson Learned (1)
- Dont fear rank order correlation as a general
principle Crystal BallÒ produced pretty accurate
Pearson correlations in the first and second
studies. - This was surprising given the theoryonly
showing that we really dont understand the
theory. - Correlation accuracy collapsed after minimizing
the correlation sample size. - Accuracy fell apart asymmetrically,
concentrating on where we most need accuracy,
among the larger correlations.
38Lesson Learned (2)
- There was a clear tendency to undershoot target
correlations. - This may have been predictable. Strong Spearman
rs can accompany weak Pearson rs , but not
vice-versa. - Dont put all of your simulations in one seed
basket. - Seed 2 provided somewhat weaker results than
the other seeds in the first study. Replicating
the simulation using other seeds exposed the
weaker, atypical results.
39Acknowledgements
To Ed Miller for encouraging this study. To Eric
Wainwright and Decisioneering, Inc. for supplying
us a Crystal BallÒ.
40References
Decisioneering, Inc. Crystal Ball 2000Ò User
Manual. 1998-2000. Iman, R.L. and W.J. Conover.
A distribution-free approach to inducing rank
correlation among input variables.
Communications in Statistics, B11 (3), pp.
311-334, 1982. Kelton, W.D. and A.M. Law.
Simulation Modeling Analysis. New York McGraw
Hill, 1991.
41The End