Title: Measuring
1Measuring
2Got some spare time?
- Do people have more of less leisure time now than
a generation ago? - Overworked Americans says NO
- Time for Life says YES
- To think about this we need to measure leisure
time - Produce a number that can go up or down and
correspond to the amount of leisure time - First step to measurement is defining what you
want to measure! - Time when youre not working, doing chores,
commuting, ,
3Got some spare time?
- Once weve defined leisure time we have to
actually produce the number that corresponds to
it - Ask people they forget
- Have the write diaries to record their time
they forget, and the busier ones forget more - Its hard to define what the concept of leisure
time is and its hard to measure it after we do
have a working concept - Bottom line measurement all by itself can be a
very tricky process
4Got some spare time?
- As we have learned, random samples and
experiments are great, but before we can make
progress with either of those we have to - Grapple with the concepts and definitions of what
we want to measure with our variables, and - Actually attached up and down numbers to
different levels of those things - Dont trust measurements (numbers) until you know
how they were meassured
5Measurement basics
- Once we have our sample or experimental subjects,
we must measure whatever it is that our variables
describe - First Are we measuring the right thing? Are
there other important things that we are leaving
out?
6Example 1 But what about the patients?
- Clinical trials often measure obvious
easy-to-measure things that relate to a physical
condition - Blood pressure
- Tumor size
- Virus concentration in the blood,
- Often what matters most to patients is whether or
not the treatment has a noticeable effect on
their lives? - A study found that only 5 of published trials
between 1980 and 1997 measured the effect of
treatments on patients emotional well-being and
their ability to function day-to-day !
7Measurement basics
- Second Once we have decided what to measure, we
must think hard about how to actually do the
measurements
8Example 2 Length, college readiness, highway
safeety
- To measure the length of a bed
- The instrument you use is a tape measure
- You can use inches or centimeters as the unit of
measure - Your variable is the length of the bed in inches
(if you chose inches as the unit) - To measure a students readiness for college
- The SAT exam form is the instrument
- The variable is the students score on the SAT
9Measurement basics
- These questions are useful when evaluating
statistical studies - Exactly how is the variable defined?
- Is the variable a valid way to describe the
property it claims to measure? - How accurate are the measurements?
- Its unusual for us to design our own measurement
instruments so we wont go much more into this,
apart from pointing out that we must understand
how things are measured, along these lines
10Know your variables
- Measurement is the process of turning concepts
like length of college preparedness into
numbers - With length this is easy we know exactly what
length is - This is much harder with many other things that
are harder to define precisely - For example what is necessary to be ready for
college? - The SAT is one way of producing a number, there
may be many others
11Know your variables
- What about counting highway deaths? Are they
- Pedestrians hit by cars
- People in cars hit by other vehicles
- Do deaths resulting from accidents that happen
days after the accident count? - This quickly becomes very murky
- Another example, what is a maternal death?
- Just deaths during childbirth?
- Deaths during pregnancy related to being
pregnant? - Deaths days after childbirth due to complications
with the birth?
12Example 3 Measuring unemployment
- To be unemployed, someone must be in the labor
force but without a job - What is the labor force
- People who are available for work and actively
looking for work or employed - Retired people, students and others are not in
the labor force, cant be counted as unemployed - What is without a job
- If you are on strike but will return to your job
you are employed - There are very careful and long definitions of
labor force and employed
13Example 3 Measuring unemployment
14Know your variables
- The Bureau of Labor Statistics uses information
collected by the CPS to calculate the
unemployment rate - One cannot simply ask are you in the labor
force, instead a series of questions are asked
to properly categorize a person - The graph on the previous slide shows what
happened in 1994 when the BLS improved its
questions for categorizing people in the labor
force and introduced computer-assisted
interviewing - There is a big discontinuity in the unemployment
rate in 1994 simply due to this change in the
way unemployment is measured
15Measurements, valid and invalid
- We all agree that using centimeters or inches to
measure the length of something is okay - Many people argue about using the SAT to measure
college preparedness - Why not just use something simple that we agree
on - Measure all prospective students height in
inches and admit the tallest 10 - Now that would create problems!
- Inches, or more properly length is clearly valid
as a measure for some things, but not for others
16Measurements, valid and invalid
17Example 4 Measuring highway safety
- Over time
- Roads get better
- SUVs replace cars
- Speed limits increase
- Enforcement reduces drunk driving
- How has highway safety changed during this time?
18Example 4 Measuring highway safety
- Deaths have gone up
- The number of drivers has gone up
- The number of miles driven has gone up
- If more people are driving more miles, we expect
the number of deaths to go up
19Example 4 Measuring highway safety
- The raw number of deaths is not a valid measure
of highway safety because just increasing the
number of drivers and/or miles driven will
increase the number of deaths - How do we take account of the increased number of
drivers and increased number of miles driven? - Instead of a count, we use a rate
- The number of deaths per mile driven takes into
account the fact that more people are driving
more miles
20Example 4 Measuring highway safety
- Taking into account the increased potential for
fatal road accidents in 2002, the rate with which
deaths occurred actually fell - Driving has been getting safer !
21Measurements, valid and invalid
- Examples of invalid measures
- Height to measure college readiness
- Counts when rates are needed
- There are other measures that are harder to
classify as valid or invalid
22Example 5 Achievement tests
- Largely valid and uncontroversial
- A statistics exam is a valid measure of a
students mastery of the course material if it
asks about the main topics included in the
syllabus for the course - The SAT is largely valid as a measure of college
readiness because it covers a well-defined set of
topics that should be covered in most high school
curricula - Experts can judge the validity of measures like
these by comparing the test questions with the
syllabus in question
23Example 6 IQ tests
- Much disagreement about validity
- Psychologists would like to measure aspects of
human personality that cannot be directly
observed, like intelligence - Is an IQ test a valid measure of intelligence?
- Some say YES with conviction
- Some say NO with equal conviction
24Example 6 IQ tests
- The YES camp is convinced that there is such a
thing as general intelligence and that it can
be reasonably measured - The NO camp is convinced there is only a
collection of various mental abilities, and that
a single number cannot measure them all - This is a serious disagreement about the nature
of intelligence what it is we are measuring - This leads to a serious disagreement about the
validity of IQ tests
25Measurements, valid and invalid
- Statistics as a science does not help resolve
questions of validity like this - If the idea or definition of what is to be
measured is vague, then validity is a matter of
opinion - However, statistics can help if we think more
carefully about validity
26Measurements, valid and invalid
- Is the SAT a valid measure of readiness for
college? - Readiness for college academic work is pretty
vague - Likely combines
- Inborn intelligence
- Learned knowledge
- Study and test-taking skills
- Motivation,
- Instead, lets ask a simpler question
- Do SAT scores predict students success at
college?
27Measurements, valid and invalid
- Success at college is easy to measure
- Does a student graduate?
- College grades
- Students with higher SAT scores are
- More likely to graduate
- On average earn higher grades in college
- Because of this we say that SAT scores have
predictive validity as measures of college
readiness
28Measurements, valid and invalid
- Predictive validity is clear and useful
- However, predictive validity does not provide a
yes/no answer - We still need to ask how accurately SAT scores
predict college success?
29Measurements, valid and invalid
- We must also ask for what groups SAT scores have
predictive validity - For example SAT scores might have high predictive
validity for women but not for men
30Measurements, accurate and inaccurate
- Measurements can be valid without being accurate
- Think about your bathroom scale
- It is an appropriate instrument that provides a
valid measure of your weight - However, it may not be accurate, say it reads 3
pounds too heavy all the time, so - Measured weight true weight 3 pounds
- This is a consistent error, which you could
correct for, but there may be other things too - Maybe the scale is rusty so it sticks sometimes
and gives slightly erratic readings
31Measurements, accurate and inaccurate
- If you step on and off 3 times, you might get
these weights - Measured weight true weight 3 pounds 0.5
pound - Measured weight true weight 3 pounds 0.1
pound - Measured weight true weight 3 pounds 0.2
pound - This will go on forever if you keep stepping on
and off the scale
32Measurements, accurate and inaccurate
- The scale has two kinds of error
- The 3 pounds that it adds to the real weight
every time someone steps on is called bias - A consistent systematic error
- The erratic, unpredictable error caused by the
sticking is called random error
33Measurements, accurate and inaccurate
- Reliability is the ability to produce repeated
measurements that differ from each other very
little - An instrument can be both reliable and biased at
the same time
34Improving reliability, reducing bias
- Measuring time has been one of mans primary
challenges and fascinations for a very long time - So what time is it really?
- Astronomical measures of time have to do with the
motion of the earth and other orbiting bodies - One year is one rotation of earth around sun
- One day is one rotation of earth on its axis etc.
- But, all these physical phenomena are erratic on
short time scales, days get longer and shorter,
and believe it or not so do years !
35Improving reliability, reducing bias
- So, theres a better way of measuring time
- Since 1967, the standard second has served as
the basic unit of time - A standard second is equivalent to 9,192,631,770
vibrations of an atom of the element cesium - Cesium atoms vibrate VERY regularly
- Physical clocks, including the planets, are
affected by all kinds of things like temperature,
humidity, gravity of other massive or moving
objects, etc. - Cesium atoms dont care at all about these things
36Example 9 Really accurate time
- The National Institute of Standards and
Technology (NIST) has a cesium atom clock that it
uses to measure time very accurately - But it is not completely accurate
- World standard time is kept by the International
Bureau of Weights and Measures (BIPM) in Sèvres,
France - BIPM averages the times from 200 atomic clocks
around the world - Then NIST compares its times to the BIPM average
to see how well its doing
37Example 9 Really accurate time
- A sample of the differences between NISTs time
and the BIPM average (in seconds) is - 0.0000000069
- -0.0000000020
- 0.0000000067
- -0.0000000045
- 0.0000000063
- -0.0000000046
- Its clear that the average of these errors is
near 0, that is the NIST clock is not biased
because there is no systematic difference between
NISTs times and the world standard time
38Improving reliability, reducing bias
- Scientists everywhere repeat measurements and use
the average to get more reliable results - Just as taking a larger sample reduces random
variation in a sample statistic, averaging over
many measurements reduces variation in the final
result
39Pity the poor psychologist
- The Big Idea in both sampling and measurement is
to ask What would happen if we did this over
and over again? - In sampling we want to estimate a population
parameter, and we want that estimate to be
unbiased and not vary too much from sample to
sample - In measurement we also want our measurement to be
unbiased and not to vary too much from
measurement to measurement - In both cases we want to eliminate systematic
error (bias) and random error (variability due to
chance) - This is all straightforward for things like
weight and time
40Pity the poor psychologist
- It gets much harder when we want to measure
something like personality - How would we measure an authoritarian
personality? - After WWII some psychologists wanted to study the
authoritarian personality to see if this may help
explain why some people are disposed to rigid
thinking and blindly following strong leaders
41Pity the poor psychologist
- They developed the F-scale to measure this, it
asks how strongly you agree or disagree with
statements like - Obedience and respect for authority are the
most important virtues children should learn - Science has its place, but there are many
important things that can never be understood by
the human mind - Strong agreement marks a person as
authoritarian - How would we think about bias and reliability
with this?
42Summary
- To measure means to assign a number to some
property of an individual or thing - A variable contains the values of measurements
taken on many individuals or things - Always ask how the variables are define and what
they might leave out - A measure is valid if it properly describes the
properties it purports to measure - Predictive validity compares a measure to a
future outcome, this is especially useful when
the concept of validity is hard to pin down, as
with the SAT test
43Summary
- Errors in measurement have two components
- Systematic differences from the true value are
bias - Variable differences from the true value are
random error - A measurement is composed of
- measured value true value bias random error
- A reliable measure is one that has small random
error - The thing we use to make a measurement is
called an instrument