Title: Measurement%20Myth%20Busters%20101
1Measurement Myth Busters 101
- Joe Adams, Ph.D.
- www.joeadams.net
2Things to keep in mind
- All measurement contains error.
- All measures are human creations.
- All measures require an observer or instrument
user.
3More things to keep in mind
- Measurement is a discipline.
- What do you see?
- What do you hear?
- How do you look and listen objectivity?
- How do you describe/define the observation?
4Myth 1 You cant measure(fill in the blank).
5The Best Measures are Simple
- Measures are a shorthand for experience or
observations. - Knowing your subject matter counts!
- If they can do it, you can too!
- Dont be fooled by naysayers.
6Gilleys song inspired four teams of researchers
to test his hypothesis!
7And he was almost right!
8The Beat Goes On
- And so did the research
- On Attractiveness
- On Mate Selection
- On Stability of Relationships
- On Genetic Cues, etc., etc
- On a lot of things you really dont want to know
9Myth 2 Its all subjective!
- Beauty is in the eye of the beholder!
10The Distorted Cultural Legacy of A.J. Ayer (1910
1989)
- Language, Truth, and Logic (1936)
- The most famous spokesman for the fact/value
dichotomy. - Claimed that all statements about values are
merely expressions of emotion, with no logical
significance. - Also a formidable opponent to Mike Tyson.
11Ayer v. Tyson
- Ayer taught or lectured several times in the
United States, including serving as a visiting
professor at Bard College in the fall of 1987. At
a party that same year held by fashion designer
Fernando Sanchez, Ayer, then 77, confronted Mike
Tyson harassing the (then little-known) model
Naomi Campbell. When Ayer demanded that Tyson
stop, the boxer said "Do you know who the f I
am? I'm the heavyweight champion of the world,"
to which Ayer replied "And I am the former
Wykeham Professor of Logic. We are both
pre-eminent in our field. I suggest that we talk
about this like rational men". Ayer and Tyson
then began to talk, while Naomi Campbell slipped
out. - Wikipedia - TKO First Round!
- Verifiable on Wikipedia
12The fact of twilight does not prevent us from
distinguishing between day and night.Attributed
to Dr. Samuel Johnson (1709-1784)
13The Real Issues AreValidityand Reliability
14Validity Relevance - Logic
- DESIRALBE QUALITIES
- RELEVANCE Measures should mean something
important to those who use them performance
measures should drive performance! - PURITY Measures should deal with a clearly
defined domain or dimension of a particular
quality. - REPRESENTATIVENESS Measures should capture
something about a phenomena without distorting
the phenomena.
15Invalid Measures
- Tend to obscure reality, not illuminate it.
- May lead to erroneous, spurious, or absurd
conclusions.
16In Application of MeasuresInternal Threats to
Validity
- Selection picking facts that fit hypothesis
- History observations taken at different times
- Maturation Effect subjects or effects mature
- Repeated Testing subjects get test-wise
- Instrumentation breaks down or used
incorrectly - Experimental Mortality people drop out
- Experimenter Bias creates expectations
17Threats to External Validity
- Generalizability of results may be limited by
- TIME Sample taken on Fat Tuesday!
- SETTING During the Superbowl.
- PLACES As they come out of Sugars
- PEOPLE (SAMPLE) Inside Sugars
- OBSERVER Barney Fife
18Threats to External Validity(Continued)
- Generalizability of results may be limited by
- Placebo Effect MSU Health Plan
- Novelty Effect Ooo wow!
- Hawthorn Effect More below.
19Summary of Validity Issues
- Does the measure capture what you intend it to
capture. - Artifacts of measurement
20Artifacts of Measuring
- Measures that pretend to be one thing, but are
actually something else (e.g. pleasing answers). - An artifact might mean that the act of measuring
caused something to register that wasnt there - The act of measurement disturbs the same reality
it is measuring, a problem commonly known as the
Heisenberg Principle.
21The Hawthorn Effect
- General Electric plant at Hawthorn Works, outside
Chicago in Cicero, Illinois - A series of studies done by Harvard professors
between 1924 and 1932. - They were testing hypotheses about working
conditions and productivity. - Treatment groups increased productivity
regardless of conditions
22Why did they improve?
- They felt special for being chosen to
participate in the experiment. - The experiments spawned the whole Human Relations
school of thought in the field of management.
23The Rosenthal Effect
- Studies done by Robert Rosenthal and Lenore
Jacobson (1968/1992). - Also called the Pygmalion Effect.
- Observer / Teacher expectations improved student
results more than different treatments. - Thats the good news about teaching It matters.
24Reliability - Consistency
- DESIRALBE QUALITIES
- ROBUSTNESS Measures should work well under of
variety of extraneous conditions. - PRECISESNESS Measures should differentiate
between different qualities or gradations. - SENSITIVITY Measures should detect change.
25Intercoder Reliabilty
- Inter-coder or inter-rater reliability The
results of two or more people correlation with
each other on a particular item, using the same
scale or instrument. - Problem They see the same thing looking through
the same lenses (but they were drunk). - In the example from the Girls All Get Prettier at
Closing Time, inter-coder reliability on the
attractiveness of females typically reaches .90,
or 90 percent, depending on how you define
reliability. Most research in this area indicate
a high degree of consistency from both sexes.
Does drinking help?
26Internal Consistency
- Internal consistency The result of one measure
correlate with other similar, but different,
measures measuring the same thing. - Problem Error in the measures may be correlated
more than the content. Its the correlation
between the measures that is the key to knowing
whether the measures are reliable, but that might
be a problem - The observer was drunk again. (GIGO)
27Test-retest Reliability
- Test-retest reliability Try measuring the same
thing with the same instrument more than once to
see if the results are the same. - Problem The Barney Fife problem the person
using the instrument is part of the instrument
(retest wont catch this). - Examples Racial differences between interviewer
and subject may shift responses on surveys
dealing with race. Male versus female
interviewers asking about sexual issues same
problem.
28Split-Half Reliability
- Split-half reliability Use two equivalent forms
of a scale to see if they correlate. - Example Use two different questions in the
same survey to measure the same thing. If they
are correlated, youve demonstrated the
reliability of the instrument(s).
29Half Goofy The MMPI
- The Minnesota Multiphasic Personality Inventory
(1952 - ) - Its the pattern, not the questions alone.
- Different axes (dimensions).
- The Diagnostic and Statistical Manual of Mental
Disorders (DSM) - Provides standardized diagnoses.
- Describes some treatment protocols
30Resources for Testing Validity and Reliability
- G. David Garson, Quantitative Research in Public
Administration http//www2.chass.ncsu.edu/garson/p
A765/reliab.htm - Wikipedia, Validity (Statistics)
http//en.wikipedia.org/wiki/Validity_28statistic
s29 - Wikipedia, Validity (Logic) http//en.wikipedia.or
g/wiki/Validity
31Myth 3 Madison Avenue is home to the worlds
greatest scientific minds (Data proves (fill in
the blank).
32How often have you heard
- Scientific research proves.
33Science does not prove, it disproves.
- Key things to understand
- In science, a null hypothesis is rejected or
accepted. - The outcome of any experiment or statistical
comparison counts as only one observation,
regardless of the number of data points. - Different observations at different times may
yield different results. - Eternity is not ours to observe.
34Key References
- David Hume (1711 1776) Noted that there is
nothing logically necessary about the repetition
of a pattern continuing in the future. - Ludwig Wittgenstein (1889 1951) Wrote the
Tractatus Logico-Philosophicus, which outlines
almost all of the rules of scientific endeavor,
one of the most important points of which, is
that the notion of causation is a purely
intellectual construction and is never a fact.
35Myth 4 The whole is equal to sum of the parts.
36AKA The Ecological Fallacy
- The Level of Measurement Matters
- (A Logical Validity Issue)
-
37Levels of Analysis Examples
- Individual a person, single cell, atom, e.g.
smallest discrete unit. - Group may meet face-to-face
- Organization does not generally meet
face-to-face - State a geopolitical jurisdiction
- Nation Like Texas yall.
38Aggregate measures cannot generally be used to
estimate disaggregated behavior. Conclusions
about individual-level behavior cannot be drawn
from aggregate comparisons. Example Emile
Durkheims Study of Suicide. Just because more
Bavarians commit suicide, Catholics are NOT more
likely to commit suicide
39Disaggregated data cannot generally be used to
estimate aggregate behavior. Conclusions about
aggregate behavior cannot be drawn from
individual level data. Example Hydrogen and
Oxygen burn. H2O does not. Not ALL Texans carry
guns and wear cowboy hats. Not ALL Austinites
wear speedos and ride 10-speeds downtown.
40Maybe Not?
- Gary King (1997). A Solution to the Ecological
Inference Problem, Princeton University Press. - Within limits, there may be probable statements
about inferences between levels. The level of
certainty about such statements can be estimated. - http//gking.harvard.edu/stats.shtml
41Myth 5 Attitudes indicate behavior.
42Attitudes ? Behavior
- Classic Case
- LaPiere, Richard T. Attitudes vs. Actions,
Social Forces, Vol. 13, No. 2. (Dec., 1934), pp.
230-237.
43Actual Behavior
44Customer Satisfaction?
- Case 2 (1983)
- Cenaré Italian Cuisine
- 404 East University Drive
- College Station, Texas
The tale of the half-price special!
45Dr. Robert A. Peterson
- Associate Dean for Research at the University of
Texas McCombs School of Business - Robert A. Peterson and William R. Wilson (1992).
Measuring Customer Satisfaction Fact and
Artifact, Journal of the Academy of Marketing
Science, Vol. 20, No. 1, 61-71. - Customer satisfaction surveys may be measuring
how many happy people or unhappy people are in
the sample, nothing more.
46Myth 6 Quantitative data are different than
qualitative data.
47Developing Measures
- Quantification is merely a second order matching
of primary qualities. - Karl Wolfgang Deutsch (1912-1992)
48Develop Powerful Measures"!"
- Three levels of measurement
- Nominal The weakest measure
- Ordinal Mediocre, but not awful.
- Interval/Ratio The best possible.
49Nominal Measures
- Nominal (Categorical) refers to opaque
qualities, color, sex, nationality, groups, etc.
Must have no order or rank. - Problem There might be a hidden order to the
measure that is not immediately identifiable,
particularly in cases where social status may
correlate with other measures (income, education,
etc.). The existence of some hidden order is an
empirical question that can be tested.
50Ordinal Measures
- Interval / Ordinal Measures have direction or
dimension, a greater and lesser ends to the
measure. Likert or Guttman Scales, 7-point,
5-point, but no specific distance between points.
Example Scalding, hot, warm, cool, cold,
freezing, etc - Problem Survey question construction may prompt
an order (preference among candidates).
Randomization is a partial remedy.
51Interval/Ratio
- Interval / Ratio Measures Most precise kind of
measures. The have a constant interval of some
kind, admits of degree, gradations, sometimes
referred to as a common metric. - Problem Intervals may not be constant (linear).
The measures may hide uneven increments. An
example is education in years. A year of college
is not equal to a year of elementary school - (unless you went to t.u.)
52Develop Powerful Measures"!"
- The more precise the measure, the more powerful
the analytical techniques that can be used - Nominal Crosstabs, Chi-square,
- Ordinal Tau-b, rank order correlation, etc.
- Interval/ratio Regression, time-series, etc.
53Definitions Precision
- The precision of the measure depends on two
critical items - The quality of the definition, and
- The quality of the data collection system.
54Parts of a Good Definition
- A clear description of the purpose
- A clear description of what the measure is
supposed to measure - A clear description of how the measure is to be
applied, which includes - Every step in the data collection process
- A means for identifying error in the collection
process (what the measure is not) - An explanation of how the measure will be used.
55There are no facts, only interpretations.Fried
rich Nietzsche (1844-1900)
56Context Matters
- What is the theory, hypothesis, or logic model
that makes this measure sensible? - Is the measurement tied to a particular problem?
- Is the problem an intellectual/academic question
or a practical problem requiring a solution? - What question is the measure supposed to answer?
57Some call them Paradigms
- Concept popularized by Thomas Kuhn in the
Structure of Scientific Revolutions (1962). - The paradigm includes all the methods related to
the practice of a scientific endeavor, including
the instrumentation and operating assumptions. - Example Tell me about your mother
- http//en.wikipedia.org/wiki/Thomas_Samuel_Kuhn
58What is your context?
- Why do you need to measure something?
- To test a hypothesis?
- To make decisions about agency operations?
- To calculate cost/benefits?
- To demonstrate effectiveness?
- To understand what is happening?
- To find someone to blame?
59Theories that Work!
- On Good Theories On the characteristics of a
good theory, see the work of Imre Lakatos,
especially his book, The Methodology of
Scientific Research Programmes Philosophical
Papers Volume 1 (1977) and Harry G. Frankfurt's
On Truth. (See also On Bullshit.) - Good theories exemplify the characteristics of
parismony (simplicity, elegance), explanatory
power (apply in a wide variety of situations),
robustness (they operate in contaminated
environments), and empirical support (fit facts
better than others).
60Feeling Good
- was good enough for me
- and Bobby McGee
- Kris Kristofferson
- (b. 1936, Brownsville, Texas)
61Flow The Science of Optimal Experienceby Mihaly
Csikszentmihalyi
Challenges
Flow
Anxiety
Boredom
Skills
62The Good Work Project
- Recommended Reading Martin E.P. Seligman,
Authentic Happiness.com (Book Website)See his
What You Can Change and What you Can't and The
Optimistic Child also see The Science of
Optimism and Hope Research Essays in Honor of
Martin E. P. Seligman. Mihaly Csikszentmihalyi's
Flow The Psychology of Optimal Experience. - Also, see The Good Work Project website for
applications of these theories.
63Myth 7 Measures have to be exact.
64it is the mark of an educated man to look for
precision in each class of things just so far as
the nature of the subject admits...-
AristotleNichomachian Ethics
65Special Cases for Estimation
- Measures that estimate ranges and compare
proportions across two or more dimensions. - Measures that show relationships, trade-offs, and
thresholds. - Measures that show what is not seen, residuals.
66Flight Envelope Summarizes
- Flight envelopes are estimated from available
data which show the following characteristics - a Take-off speed
- b Stalling speed
- c Ceiling, with corresponding speed
- d Maximum level speed
- d Maximum speed at altitude
- f Maximum sea level speed
67Two-dimensions Flight Envelope
- Altitude (expressed in ranges)
- Speed (expressed in ranges)
68Comparing Flight Envelopes
- Combat helicopter (ex. Boeing AH-64 Apache)
- Cargo aircraft (ex. Lockheed C-130J)
- Subsonic transport aircraft (ex. Airbus A-300)
- Supersonic fighter aircraft (ex. Lockheed F-16C)
http//www.aerodyn.org/Atm-flight/flimit.html
69Measuring Inequality
- The Lorenz Curve describes any distribution of a
quantity across any population. - The Gini coefficient provides a global estimation
of the degree of inequality within that
population.
70The Gini Coefficient
71Trade-offs
- Bounded by a zero point (no trade-off).
- Change in A Change in B 0
- Trade-offs between A B may occur six ways
- A increases, B decreases
- B increases, A decreases
- A increases more than B
- B increases more than A
- A decreases more than B
- B decreases more than A
72Four Trade-off Conditions
Potential Trade-offs A Wins B Wins
Net Increase A gt B A lt B
Net Decrease A gt B A lt B
73Four Basic Conditions
More on this later
74A Real-Life Measurement Problem
- The Mississippi Department of Wildlife,
Fisheries, and Parks has an 8-week backlog in
boat registration and sportsmans licenses. - Delays do not discriminate between individuals,
whether they be - Farmers
- Bankers
- Legislators, or
- Governors.
75Myth 8 You have to observe subjects directly.
76The Sherlock Holmes Approach
- We must fall back upon the old axiom that when
all other contingencies fail, whatever remains,
however improbable, must be the truth. - Sherlock Holmes
- The Adventure of the Bruce Partington Plans
- (Sir Arthur Conan Doyle)
77Were on the Case!
78Whatever is left
Using residuals to measure something indirectly
has been a very useful technique in several
arenas.
79The Most Famous Example
- The Double-Helix of DNA was not observed
directly. In essence, Crick and Watson used
Rosalind Franklins x-rays of wet and dry strands
of DNA. - Essentially, they were looking at the shadow of
DNA, not the DNA itself.
80Example 2 Relative Political Capacity
- Initial observations
- All political systems must have resources.
- Those that are able to obtain resources are
stronger than those that cannot. - Wealthier populations are able to pay more taxes
than poorer populations. - Some economies are easier to tax than others.
- People dont like to pay taxes, unless they know
theyll get the money back (e.g. Social Security).
81Predicted/Model vs. Actual
- Observations that fall on the regression line
were given a score of 1.00. - Those above were scored as a ratio of their
predicted, if double, then 2.00, three times,
3.00 and so on - Those below their predicted tax rate were given
scores from 0 to .99, based on the percentage of
the predicted scores.
82Results Uses of RPCs
- Explains demographic transitions (population
explosions or lack thereof). - Outcomes of wars between relatively
equal/uneaqual opponents. - Black market exchange rates for currencies in
unstable countries.
83Lets Talk Performance!
84Real Men and Women Use Performance Measures!
(Wennies Dont)
- Performance measures should drive performance.
- There should be thresholds at which management
takes action to do something different. - Example Watch the altimeter for sudden drops,
pull up on the yoke if the numbers go down. - Those actions should be defined in some sort of
plan - Example At 500 feet, eject.
85A Barometer is not a Performance Measure!
86Benchmarking
- Choosing the Right Comparisons
87Myth 9 Collin County Community College is the
perfect peer.
88Peer-to-Peer
- Choose statistical neighbors (like you).
- Comparisons need to make sense.
- Choose those with a similar environment.
- Environments need to be controlled
analytically. - Choose those who differ on performance.
- Variation requires explanation and understanding.
- Lack of variation means nothing can be learned.
89Best of Breed
- Choose those who out-perform the competition.
- That is the benchmark to beat.
- Include those who do not perform well.
- This avoids the mistake of Tom Peters.
- Compare environments, but choose on performance.
90Establish Baseline Compare Trends
- Track your own performance over time.
- Identify key internal and external factors.
- Test explanations (hypotheses)
- Identify variations.
- If there are no variations, you cannot draw any
conclusions about causes. A constant explains
nothing.
91Myth 10 Good measures dont vary.
92Measures are VariablesAnd Variables Vary
- No Variance?
- No chance of improvement
- No Gains
- No Learning
93A Costly Example of No Variance
- Parties, Ideologies, and Budgets A Study of
Budget Trade-offs in 18 OECD Countries - Based on data from 1960 to 1990
- 65,000 cells of data drawn from more than 50
sources, taking six months to enter by hand. -
94Results for Health vs. Defense
95Results for Education vs. Defense
96All is not lost
97Mona Lisa
98Discovery
99Myth 11 Performance measures will improve
performance.
100Do Performance Measures Improve Performance?
- The Case of Texas State Agencies
101Myth 12 Data integrity is exclusively a
reporting issue.
- Reporting is an operational issue.
102CREATING INTEGRITY BY DESIGN
- Alabama SMART Budgeting
- Training
103Qualities of Good Performance Measures in the
Real World
- RELIABILITY
- Consistency Data can be replicated by a
competent, trained professional (e.g. Auditor). - Accuracy The indicators are true to the facts.
- VALIDITY
- Relevance Measures relate to progress toward
realistic agency/organizational goals. - Usefulness They provide actionable indicators
104Data Integrity Starts with People
- Checklist
- Are reporting roles clearly defined?
- Is there documentation?
- A paper trail for auditing?
- Written procedures for verifying data accuracy?
- Clear responsibility for reviewing and approving
performance measure reports? - Is there management ownership for performance
measurement reports?
105(No Transcript)
106If the answer any of the first five questions is
No.
- Go back to the beginning.
- Check every step from start to finish until the
error or problem is identified. - If everything checks out, then it is time to look
at program operations for answers. - This is a job that is the exclusive
responsibility of program management.
107Question 6 Identify Root Causes
- Is the change in performance the result of an
internal or external factor? - Can the relationship between internal or external
factors and performance be demonstrated with
data? - Do they correlate?
- What are the patterns, trends, etc.?
- What factors can be changed by management?
- Can staff, training, technology or funding change
the result? - What do data indicate about these connections?
108Response to 6 Action Plan
- What is required to make change results?
- What new activities will be required to make
those changes? - What resources (or authority) would be required
to implement those new activities? - Who will implement new activities?
- When can the new activities begin?
- How long will it take for the new activities to
have an effect?
109Measurement Disasters
- Tennessee Sour Mash
- Corn and Student Test Scores
110The Situation
- A University of Tennessee Ag Economics Professor
proposes using crop yield formulas for measuring
the value-added increases in student test
scores. - The Tennessee General Assembly promptly enacts
the idea, granting the professor a contract as
the sole-source provider, naming him personally
in statute (name later removed in the Tennessee
Code).
111Question
- How do student test scores differ from corn?
Student Test Scores
Crop Yields
www.freephoto.com
Photo Credit Lloyd Wolf/U.S. Census Bureau
112What Type of Measures Are They?
- Nominal?
- Ordinal?
- Interval?
- INTERVAL
- (BOTH MEASURES)
113Corn can always grow taller!
114Which School Would Do Better?
115Not everything that counts can be measured, and
not everything that can be measured
countsAlbert Einstein (1879-1955)
Before we accept the first premise, we have to
ask, Have we tried?
116Myth 13 Measures cant detect management
issues.
117Measuring What is Important
- Organizational Culture
- Turnover Big Clue!
- Absenteeism Big Clue 2!
- Lack of initiative, passivity Clue 3
- Low morale Starting to see a pattern?
- Anger, frustration, discipline problems
- Sense of hopelessness!!!!
- How do we measure this?
118Possible Index?
- TEN RULES FOR STIFLING INNOVATION
- Regard any new idea from below with
suspicionbecause its new, and because its from
below. - Insist that people who need your approval to act
first go through several other levels of
management to get their signatures. - Ask departments or individuals to challenge and
criticize each others proposals. (That saves you
the job of deciding you just pick the survivor.) - Discuss your criticisms freely, and withhold your
praise. That keeps people on their toes. Let them
know they can be fired at any time. - Treat identification of problems as signs of
failure, to discourage people from letting you
know when something in their area isnt working.
119Cont
- TEN RULES FOR STIFLING INNOVATION (continued)
- Control everything carefully. Make sure people
count anything that can be counted, frequently. - Make decisions to reorganize or change policies
in secret, arid spring them on people
unexpectedly. (That also keeps People on their
toes.) Let them know that they can be fired at
any time. - Make sure that requests for information are fully
justified, and make sure that it is not given out
to managers freely. (You dont want data to fall
into the wrong hands.) - Assign to lower-level managers, in the name of
delegation and participation responsibilities for
figuring out how to cut back, lay oil, move
people around, or otherwise implement threatening
decisions you have made, and get them to do it
quickly. - And above all, never forget that you, the
higher-ups, already know everything important
about this business.
120- These rules reflect pure segmentalism in
actiona culture and an attitude that make it
unattractive and difficult for people in the
organization to take initiative to solve problems
and develop innovative solutions Segmentalist
companies may not suffer from a lack of potential
innovators so much as from failure to make the
power available to those embryonic entrepreneurs
that they can use to innovate. - And, when innovations do occur, segmentalist
organizations may not even he able to take
advantage of them. - Rosabeth Moss Kanter, The Change Masters, 1982,
p. 101.
121Myth 14 Counting people is easy.
122How many people did you serve?
123Recidivism or Repeat Customers?
- Do unduplicated counts make more sense than
duplicated counts? Why? - How do we count level of service?
- What if wrap-around services are effective and
one-shot taps on the head are not? - What counts as service?
- How do we count costs for repeat customers or
those that consume more than one menu item?
124Life is not divided into federal block
grants.Robert GreensteinCenter for Budget and
Policy PrioritiesNCSL Conference in Burlington,
VT September 1995
125Myth 15 Weve already counted everything thats
important.
126People are strange.
127Your measures need to capture reality!
- All relevant observations must fit somewhere o
the measure. - If they dont, youre missing reality.
- Anomalies are as important as the normal
observations. - We learn from measurement when they help us see
something we would have missed.
128Outcome Measures Telling the Tale that Wags
the Dog?
- Is anybody better off?
- Is anybody worse off?
- How can you tell?
- Adapted from Mark Friedmans Trying Hard is Not
Good Enough.