Title: MEASUREMENT THEORY AND ITS
1 MEASUREMENT THEORY AND ITS APPLICATIONS FRED S.
ROBERTS RUTGERS UNIVERSITY
2MEASUREMENT Measurement has something to do
with numbers. Representational theory of
measurement Assign numbers to objects being
measured in such a way that certain empirical
relations are preserved. In measurement of
temperature, we preserve a relation warmer
than. In measurement of mass, we preserve a
relation heavier than.
3A set of objects R binary relation on A
R could be preference. Then f is a utility
function (ordinal utility function). In mass,
there is more going on There is an operation of
combination of objects and mass is additive
means a combined with b
4Homomorphisms Relational System A set Ri
relation on A (not necessarily binary)
operation on A (binary) Homomorphism from
into is a function such that
5(This only makes sense if Ri and are both
) Examples of Homomorphisms
Example 1 Let a mod 3 be the number b in
0,1,2 such that (mod 3). Let A
0,1,2,,26 and Then f(a) a mod 3 is a
homomorphism from (A,R) into f(11) 2, f(6)
0, ...
6Example 2 A a,b,c,d R (a,b),(b,c),(a,c),(
a,d),(b,d),(c,d) f(a) 4, f(b) 3, f(c) 2,
f(d) 1. f is a homomorphism from (A, R)
into Second homomorphism g(a) 10, g(b) 4,
g(c) 2, g(d) 0. Example 3 A a,b,c, R
(a,b),(b,c),(c,a) There is no
homomorphism Therefore, f(a) gt f(c) But cRa ?
f(c) gt f(a).
7Example 4 A 0,1,2,, B 0,2,4, A
homomorphism
f(a) 2a f(a b) f(a) f(b). Second
homomorphism g(a) 4a A homomorphism does not
have to be onto. Example 5 A
homomorphism f(a) ea
8Temperature Measurement homomorphism Mass
Measurement homomorphism Since negative
numbers do not arise in the case of mass, we can
think of the homomorphism as being into The Two
Problems of Representational Measurement
Theory Representation Problem Find conditions
on (necessary) and sufficient for the
existence of a homomorphism from into
?. Subproblem Find an algorithm for obtaining
the homomorphism.
9If f is a homomorphism from into ?, we
call ( , ?,f) a scale. A representation
theorem is a solution to the representation
problem. Uniqueness Problem Given a
homomorphism from into ?, how unique is
it? A uniqueness theorem is a solution to the
uniqueness problem. A representation theorem
gives conditions under which measurement can take
place. A uniqueness theorem is used to classify
scales and to determine what assertions arising
from scales are meaningful in a sense we shall
make precise.
10The Theory of Uniqueness Admissible
Transformations An admissible transformation
sends one acceptable scale into another. In
most cases one can think of an admissible
transformation as defined on the range of a
homomorphism. Suppose f is a homomorphism from
into ?. is called an
admissible transformation of f if is
again a homomorphism from into ? .
11 Example It is a homomorphism ? is
admissible
12Can every homomorphism g be obtained from a
homomorphism f by an admissible transformation
of scale? No. (Semiorders) Two homomorphisms
(A,R) into (B,S) f(0) f(1/2) 0, f(10)
10 g(0) 0, g(1/2) 1/2, g(10) 10. There is
no such that A
homomorphism f is called irregular if there is a
homomorphism g not attainable from f by an
admissible transformation.
13Irregular scales are unusual and annoying. We
shall assume (until later) that all scales are
regular. Most common scales of measurement are.
In the case where all scales are regular, we can
develop the theory of uniqueness by studying
admissible transformations. A classification of
scales is obtained by studying the class of
admissible transformations associated with the
scale. This defines the scale type. (S.S.
Stevens)
14Some Common Scale Types
Class of Adm. Transfs. Scale Type Example
ratio Mass Temp. (Kelvin)
Time (intervals)
Loudness (sones)?
Brightness (brils)?
interval Temp(F,C) Time (calendar) IQ tests (standard scores?)
15Class of Adm. Transfs. Scale Type Example
strictly increasing ordinal Preference? Hardness Grades of leather, wool, etc. IQ tests (raw scores)?
absolute Counting
16Theorem (Roberts and Franke 1976) If every
homomorphism from into ? is regular and
f and g are homomorphisms, then f is a ratio
scale iff g is a ratio scale. Same for
interval, ordinal, absolute, etc. However This
fails in the irregular case.
17Meaningful Statements In measurement theory, we
speak of a statement as being meaningful if its
truth or falsity is not an artifact of the
particular scale values used. Assuming all
scales involved are regular, one can use the
following definition (due to Suppes 1959 and
Suppes and Zinnes 1963). Definition A
statement involving numerical scales is
meaningful if its truth or falsity is unchanged
after any (or all) of the scales is transformed
(independently?) by an admissible transformation.
18If some scales are irregular, we have to modify
the definition Alternate Definition A
statement involving numerical scales is
meaningful if its truth or falsity is unchanged
after any (or all) of the scales is
(independently?) replaced by another acceptable
scale. We shall usually disregard the
complication. (But it does arise in practical
examples, for example those involving preference
judgments under the semiorder model.) Even this
alternative definition is not always applicable
in some situations. However, it is a useful
definition for a lot of situations and we shall
adopt it.
19This talk will be three times as long as the
previous talk. Is this meaningful? We have a
ratio scale (time intervals). (1) f(a)
3f(b). This is meaningful if f is a ratio
scale. For, an admissible transformation is
. We want (1) to hold iff
(2) But (2) becomes (3)
20The high temperature today was three times the
high temperature yesterday. Meaningless. It
could be true with Fahrenheit and false with
Centigrade, or vice versa. In general For ratio
scales, it is meaningful to compare
ratios f(a)/f(b) gt f(c)/f(d) For interval
scales, it is meaningful to compare
intervals f(a) - f(b) gt f(c) - f(d) For
ordinal scales, it is meaningful to compare
size f(a) gt f(b)
21I weigh 1000 times what the Statue of Liberty
weighs. Meaningful. It involves ratio
scales. It is false no matter what the
unit. Meaningfulness is different from
truth. It has to do with what kinds of
assertions it makes sense to make, which
assertions are not accidents of the particular
choice of scale (units, zero points) in use.
22Average Performance Study two groups of
individuals, machines, algorithms, etc. under
different conditions. f(a) is the time required
by a to finish a task. Data suggests that the
average performance of individuals in the first
group is better than the average performance of
individuals in the second group.
individuals in first group
individuals in second group. (1) We are
comparing arithmetic means.
23Statement (1) is meaningful iff for all
admissible transformations of scale ?, (1) holds
iff (2) Performance length of time, a ratio
scale. Thus, , so (2)
becomes (3) Then implies
. Hence, (1) is meaningful.
24Note (1) is still meaningful if f is an
interval scale. For example, the task could be
to heat something a certain number of degrees and
f(a) temperature increase produced by
a. Here, .
Then (2) becomes (4) This readily reduces to
(1). However, (1) is meaningless if f is just
an ordinal scale.
25To show that comparison of arithmetic means can
be meaningless for ordinal scales Suppose f(a)
is measured on an ordinal scale, e.g., 5-point
scale 5excellent, 4very good, 3good, 2fair,
1poor. In such a scale, the numbers don't mean
anything, only their order matters. Group 1 5,
3, 1 average 3 Group 2 4, 4, 2 average 3.33
(greater) Admissible transformation
. New scale conveys the same information. New
scores Group 1 100, 65, 30 average 65
(greater) Group 2 75, 75, 40 average 63.33
26Thus, comparison of arithmetic means can be
meaningless for ordinal data. Of course, you may
argue that in the 5-point scale, at least equal
spacing between scale values is an inherent
property of the scale. In that case, the scale
is not ordinal and this example does not
apply. On the other hand, comparing medians is
meaningful with ordinal scales To say that one
group has a higher median than another group is
preserved under admissible transformations.
27Importance Ratings/Performance Ratings Suppose
each of n individuals is asked to rate each of
a collection of alternative variables, people,
machines, policies as to their relative
importance. Or we rate the alternatives on
different criteria or against different
benchmarks. Let be the rating of
alternative a by individual i (under
criterion i). Is it meaningful to assert that
the average rating of alternative a is higher
than the average rating of alternative b? A
similar question arises in performance ratings,
loudness ratings, brightness ratings, confidence
ratings, etc. (1)
28If each is a ratio scale, then we consider
for ? gt 0, (2) Clearly, ,
so (1) is meaningful. Problem
might have independent units. In this case,
we want to allow independent admissible
transformations of the . Thus, we must
consider (3) It is easy to see that there are
so that (1) holds and (3) fails. Thus, (1) is
meaningless.
29Motivation for considering different n
2, of a,
of a. Then (1) says that the average of
a's weight and height is greater than the
average of b's weight and height. This could be
true with one combination of weight and height
scales and false with another. Conclusion Be
careful when comparing arithmetic mean importance
or performance ratings.
30In this context, it is safer to compare geometric
means (Dalkey).
, all Thus, if each is a
ratio scale, if individuals can change importance
rating scales (performance rating scales)
independently, then comparison of geometric means
is meaningful while comparison of arithmetic
means is not.
n
n
n
n
31Application Roberts (1972, 1973). A panel of
experts each estimated the relative importance
of variables relevant to energy demand using the
magnitude estimation procedure. If magnitude
estimation leads to a ratio scale -- Stevens
presumes this -- then comparison of geometric
mean importance ratings is meaningful. However,
comparison of arithmetic means may not be.
Geometric means were used.
32Magnitude Estimation by One Expert of Relative
Importance for Energy Demand of Variables Related
to Commuter Bus Transportation in a Given Region
Variable Rel. Import. Rating
1. No. bus passenger mi. annually 80
2. No. trips annually 100
3. No. miles of bus routes 50
4. No. miles special bus lanes 50
5. Average time home to office 70
6. Average distance home to office 65
7. Average speed 10
8. Average no. passengers per bus 20
9. Distance to bus stop from home 50
10. No. buses in the region 20
11. No. stops, home to office 20
33Performance Evaluation Fleming and Wallace
(1986) make a similar point about performance
evaluation of computer systems. A number of
systems are tested on different benchmarks.
Their scores on each benchmark are normalized
relative to the score of one of the systems. The
normalized scores of a system are combined by
some averaging procedure. If the averaging is
the arithmetic mean, then the statement that one
system has a higher arithmetic mean normalized
score than another system is meaningless The
system to which scores are normalized can
determine which has the higher arithmetic mean.
34However, if geometric mean is used, this problem
does not arise (under reasonable assumptions
about scores). performance of
system u on benchmark i. F is a combined
score such as arithmetic mean. (normalized
score relative to system x) (normalized score
relative to system y)
35System
R M Z
1 417 244 134
2 83 70 70
3 66 153 135
4 39449 33527 66000
5 72 368 369
Bench-Mark
36Scores Normalized to System R
R M Z
1 1.00 0.59 0.32
2 1.00 0.84 0.85
3 1.00 2.32 2.05
4 1.00 0.85 1.67
5 1.00 0.48 0.45
Arith. Mean 1.00 1.01 1.07
37Scores Normalized to System M
R M Z
1 1.71 1.00 0.55
2 1.19 1.00 1.00
3 0.43 1.00 0.88
4 1.18 1.00 1.97
5 2.10 1.00 1.00
Arith. Mean 1.32 1.00 1.08
38Geometric Means Geom. Mean of Scores Normalized
Relative to R Geom. Mean of Scores
Normalized Relative to M Similar situation
if we test students on different benchmarks
R M Z
1.00 0.86 0.84
R M Z
1.17 1.00 0.99
39How Should we Average or Merge
Scores? Sometimes arithmetic means are not a
good idea. Sometimes geometric means are. Are
there situations where the opposite is the case?
Or some other method is better? Can we lay down
some guidelines about when to use what merging
procedure? are
scores. F is an unknown merging
function. Approaches to finding acceptable
merging functions (1) axiomatic (2) scale
types (3) meaningfulness
40An Axiomatic Approach Theorem (Fleming and
Wallace). Suppose has
the following properties (1). Reflexivity
F(a,a,...,a) a (2). Symmetry
for
all permutations (3). Multiplicativity Then F
is the geometric mean. And conversely.
41A Functional Equations Approach Using Scale Type
or Meaningfulness Assumptions Unknown
function Luce's idea If you know the scale
types of the and the scale type of u and
you assume that an admissible transformation of
each of the leads to an admissible
transformation of u, you can derive the form of
F. Example are
independent ratio scales, u is a ratio scale.
42 Thus, we get the functional
equation () Theorem (Luce 1964) If
is continuous and satisfies
(), then there are
so that
43(Aczél, Roberts, Rosenbaum 1986) It is easy to
see that the assumption of continuity can be
weakened to continuity at a point, monotonicity,
or boundedness on an (arbitrarily small, open)
n-dimensional interval or on a set of positive
measure. Call any of these assumptions
regularity. Theorem (Aczél and Roberts 1989)
If in addition F satisfies reflexivity and
symmetry, then ? 1 and
, so F is the geometric
mean. Aside Solution to () without Regularity
(Aczél, Roberts, and Rosenbaum)
44Second Example are ratio
scales, but have the same unit u is a ratio
scale. New Functional Equation Regular
Solutions to () (Aczél, Roberts, and
Rosenbaum) f gt 0, arbitrary regular
function on c arbitrary constant. (n 1 f
is a positive constant)
45Solutions to () without Regularity (Aczél,
Roberts, and Rosenbaum) Third Example
are independent ratio scales, u
is an interval scale New Functional
Equation ()
46Regular solutions to () (Aczél, Roberts, and
Rosenbaum) OR
47Solutions to () without Regularity (Aczél,
Roberts, and Rosenbaum) OR
48Sometimes You Get the Arithmetic Mean Fourth
Example interval scales
with the same unit and independent zero points
u an interval scale. Functional
Equation () Solutions to () without
Regularity Assumed (Aczél, Roberts, and
Rosenbaum)
arbitrary constants
49Note that all solutions are regular. Theorem
(Aczél and Roberts) (1). If in addition F
satisfies reflexivity, then (2). If in
addition F satisfies reflexivity and symmetry,
then for all i, b 0, i.e.,
F is the arithmetic mean.
50Meaningfulness Approach While it is often
reasonable to assume you know the scale type of
the independent variables
, it is not so often reasonable to assume that
you know the scale type of the dependent variable
u. However, it turns out that you can replace
the assumption that the scale type of u is
xxxxxxx by the assumption that a certain
statement involving u is meaningful.
51Back to First Example are
independent ratio scales. Instead of assuming u
is a ratio scale, assume that the statement is
meaningful for all
and k gt 0. Then we get the same
results as before Theorem (Roberts and
Rosenbaum 1986) Under these hypotheses and
regularity, If in addition F satisfies
reflexivity and symmetry, then F is the
geometric mean.
52MEASUREMENT OF AIR POLLUTION Various pollutants
are present in the air Carbon monoxide (CO),
hydrocarbons (HC), nitrogen oxides (NOX), sulfur
oxides (SOX), particulate matter (PM). Also
damaging Products of chemical reactions among
pollutants. E.g. Oxidants such as ozone produced
by HC and NOX reacting in presence of
sunlight. Some pollutants are more serious in
presence of others, e.g., SOX are more harmful in
presence of PM. Can we measure pollution with
one overall measure?
53To compare pollution control policies, need to
compare effects of different pollutants. We
might allow increase of some pollutants to
achieve decrease of others. One single measure
could give indication of how bad pollution level
is and might help us determine if we have made
progress. Combining Weight of Pollutants Measure
total weight of emissions of pollutant i over
fixed period of time and sum over i. e(i,t,k)
total weight of emissions of pollutant i (per
cubic meter) over tth time period and due to
kth source or measured in kth location.
54Early uses of this simple index A in the early
1970s led to the conclusion that transportation
is the largest source of air pollution,
accounting for over 50 of all pollution, with
stationary fuel combustion (especially by
electric power plants) second largest. Also CO
accounts for over half of all emitted air
pollution. Are these meaningful
conclusions? All are meaningful if we
measure all e(i,t,k) in same units of mass
(e.g., milligrams per cubic meter) and so
admissible transformation means multiply
e(i,t,k) by same constant.
55These comparisons are meaningful in the technical
sense. But Are they meaningful comparisons of
pollution level in a practical sense? A unit of
mass of CO is far less harmful than a unit of
mass of NOX. EPA standards based on health
effects for 24 hour period allow 7800 units of CO
to 330 units of NOX. These are Minimum acute
toxicity effluent tolerance factors (MATE
criteria). Tolerance factor is level at which
adverse effects are known. Let t(i) be
tolerance factor for ith pollutant. Severity
factor t(CO)/t(i) or 1/t(i)
56One idea (Babcock and Nagda, Walther, Caretto and
Sawyer) Weight the emission levels (in mass) by
severity factor and get a weighted sum. This
amounts to using the indices Degree of
hazard and the combined index Pindex Under
pindex, transportation is still the largest
source of pollutants, but now accounting for less
than 50. Stationary sources fall to fourth
place. CO drops to bottom of list of pollutants,
accounting for just over 2 of the total.
57These conclusions are again meaningful if all
emission weights are measured in same units.
For an admissible transformation multiplies t
and e by the same constant and thus leaves the
degree of hazard unchanged and pindex
unchanged. Pindex was introduced in the San
Francisco Bay Area in the 1960s. But, are
comparisons using pindex meaningful in the
practical sense? Pindex amounts to for a given
pollutant, take the percentage of a given harmful
level of emissions that is reached in a given
period of time, and add up these percentages over
all pollutants. (Sum can be greater than 100 as
a result.)
58If 100 of the CO tolerance level is reached,
this is known to have some damaging effects.
Pindex implies that the effects are equally
severe if levels of five major pollutants are
relatively low, say 20 of their known harmful
levels. Severity tonnage of pollutant i due
to a given source is actual tonnage times the
severity factor 1/t(i). Data from Walther 1972
suggests the following. Interesting exercise to
decide which of these conclusions are meaningful.
591. HC emissions are more severe (have greater
severity tonnage) than NOX emissions. 2. Effects
of HC emissions from transportation are more
severe than those of HC emissions from industry.
(Same for NOX.). 3. Effects of HC emissions from
transportation are more severe than those of CO
emissions from industry. 4. Effects of HC
emissions from transportation are more than 20
times as severe as effects of CO emissions from
transportation. 5. The total effect of HC
emissions due to all sources is more than 8 times
as severe as total effect of NOX emissions due to
all sources.
60Social Networks and Blockmodeling In studying
social networks, a common approach is to start
with a matrix where
measures the friendship of the individual for the
. M is used to derive such conclusions
as the degree to which friendship is
propagated through the group (if k is a friend
of j and j is a friend of i, is k a
friend of i?) the degree to which friendship
is reciprocated (if j is a friend of i, is i
a friend of j?) grouping of individuals into
friendship cliques or clusters or blocks.
61Batchelder Unless attention is paid to the
types of scales used to measure ,
conclusions about propagation, reciprocation, and
grouping can be meaningless. Simple Example
?-Transitivity The social network is called
?-transitive (? gt 0) if is 5-transitive.
1 2 3
M 1 ? 10 8
M 2 4 ? 12
3 1 2 ?
62Suppose each individual makes judgments of
friendship on a ratio scale. Even if all the
ratio scales are the same, the conclusion of
?-transitivity is meaningless is not
5-transitive.
1 2 3
½M 1 ? 5 4
½M 2 2 ? 6
3 .5 1 ?
63How we present data can determine the
meaningfulness of conclusions. Suppose is
not the numerical estimate of i's friendship
for j, but the relative ranking by i of j
among the other individuals. In the above
example M, suppose we use only the ranks n
for first place, n-1 for second place, etc.
We get This is 2-transitive. Even if the
scales are only ordinal, this M' is unchanged
even when M changes. Thus, ?-transitivity of
M' is a meaningful conclusion. (Batchelder)
1 2 3
M ' 1 ? 3 2
M ' 2 2 ? 3
3 2 3 ?
64Blockmodeling The process of grouping
individuals into blocks is called blockmodeling.
A widely used algorithm in blockmodeling is the
CONCOR algorithm of Breiger, Boorman, and Arabie
and Arabie, Boorman, and Levitt. In CONCOR, the
matrix M is used to obtain a similarity measure
between individuals i and j, using either
column by column or row by row product moment
correlations. The partition into blocks is
based on the similarities. Batchelder If the
friendship rating scales are independent interval
scales, then the column by column product moment
correlation obtained from M can change. Thus,
the blocks can also change.
65Hence, the blocking conclusions from a procedure
such as CONCOR can be meaningless. E.g.
Suppose is the Pearson column by column
product moment correlation obtained from matrix
M and suppose is the same correlation
obtained from M', where If
, ,
, then and
. This is as dramatic a change as possible.
0 2 2 1
2 0 6 3
Let M 2 4 0 5
1 1 3 0
66Note Batchelder shows that if is the
product moment correlation between the and
rows of M, then under the
transformation is invariant. Thus,
conclusions from M based on row by row
correlations are meaningful.
67Meaningfulness of Statistical Tests (Marcus-Robert
s and Roberts) For more than 40 years, there has
been considerable disagreement on the limitations
that scales of measurement impose on statistical
procedures we may apply. The controversy stems
from the foundational work of Stevens (1946,
1951, 1959, ...), who developed the
classification of scales of measurement we have
given and then provided rules for the use of
statistical procedures which provided that
certain statistics were inappropriate at certain
levels of measurement. The application of
Stevens' ideas to descriptive statistics has been
widely accepted, but the application to
inferential statistics has been labeled by some
a misconception.
68Descriptive Statistics P population whose
distribution we would like to describe We try to
capture some of the properties of P by finding
some descriptive statistic for P or by taking a
sample S from P and finding a descriptive
statistic for S. Our previous examples suggest
that certain descriptive statistics are
appropriate only for certain measurement
situations. This idea was originally due to
Stevens and was popularized by Siegel in his
well-known book Nonparametric Statistics (1956).
69Our examples suggest the principle that
arithmetic means are appropriate statistics for
interval scales, medians for ordinal scales. The
other side of the coin It is argued that it is
always appropriate to calculate means, medians,
and other descriptive statistics, no matter what
the scale of measurement. Frederic Lord Famous
football player example. The numbers don't
remember where they came from. I agree It is
always appropriate to calculate means, medians,
... But Is it appropriate to make certain
statements using these descriptive statistics?
70My position It is usually appropriate to make a
statement using descriptive statistics if and
only if the statement is meaningful. Why? A
statement which is true but meaningless gives
information which is an accident of the scale of
measurement used, not information which describes
the population in some fundamental way. So, it
is appropriate to calculate the mean of ordinal
data, it is just not appropriate to say that the
mean of one group is higher than the mean of
another group.
71Inferential Statistics Over the years, Stevens'
ideas have also come to be applied to inferential
statistics -- inferences about an unknown
population P. They have led to such principles
as the following (1). Classical parametric
tests (e.g., t-test, Pearson correlation,
analysis of variance) are inappropriate for
ordinal data. They should be applied only to
data which define an interval or ratio
scale. (2). For ordinal scales, non-parametric
tests (e.g., Mann-Whitney U, Kruskal-Wallis,
Kendall's tau) can be used. Not everyone agrees.
Thus Controversy
72My View The validity of a statistical test
depends on a statistical model, which includes
information about the distribution of the
population and about the sampling procedure. The
validity of the test does not depend on a
measurement model, which is concerned with the
admissible transformations and scale type. The
scale type enters in deciding whether the
hypothesis is worth testing at all -- is it a
meaningful hypothesis? The issue is If we
perform admissible transformations of scale, is
the truth or falsity of the hypothesis unchanged?
73A Brief Analysis of These Principles Inferences
about an unknown population P. Measurement
model for P defines the scale type, i.e., tells
admissible transformations. Null hypothesis
about P. Suppose an admissible transformation
? transforms scale values. We get a new
population We also get a transformed null
hypothesis by applying ? to all
scales and scale values in
74Example mean of P is 0. Measurement
model P ordinal Then is
admissible. mean of ?(P) is Note
that if , then
is true. Also,
and so is
false. Thus, a null hypothesis can be
meaningless. Summary of my position The major
restriction measurement theory places on
inferential statistics is on what hypotheses are
meaningful. We should be primarily concerned
with testing meaningful hypotheses.
75Can we test meaningless hypotheses? Sure. But I
question what information we get outside of
information about the population as
measured. More details Testing about P
1). Draw a random sample S from P. 2).
Calculate a test statistic based on S. 3).
Calculate probability that the test statistic is
what was observed given is true. 4). Accept
or reject on the basis of the test. Note
Calculation of probability depends on a
statistical model, which includes information
about the distribution of P and about the
sampling procedure. But, the validity of the
test depends only on the statistical model, not
on the measurement model.
76Thus, you can apply parametric tests to ordinal
data, provided the statistical model is
satisfied, in particular if the data is normally
distributed. (Thomas showed that ordinal data
can be normally distributed it is a
misconception that it cannot be.) Where does the
scale type enter? In determining if the
hypothesis is worth testing at all. i.e., if it
is meaningful. For instance, consider ordinal
data and The hypothesis is meaningless. But,
if the data meets certain distributional
requirements such as normality, we can apply a
parametric test, such as the t-test, to check if
the mean is 0.
77The result gives us information about the
population as we have measured it, but no
intrinsic information about the population
since its truth or falsity can depend on the
particular choice of scale. So, what is wrong
with using the t-test on ordinal data? Nothing.
The test is o.k. However, the hypothesis is
meaningless. Some Observations on Tests of
Meaningful Hypotheses meaningful. We can
test it in various ways. We cannot expect that
two different tests of will lead to the
same conclusion in all cases, even forgetting
about change of scale. For example, the t-test
can lead to rejection and the Wilcoxon test to
acceptance.
78But what happens if we apply the same test to
data both before and after admissible
transformation? The best situation is if the
results of the test are invariant under
admissible changes of scale. More precisely
Suppose is a meaningful hypothesis about
population P. Test by drawing a random
sample S from P. Let ?(S) be the set of all
?(x) for x in S. We hope for a test T for
with the following properties (a). The
statistical model for T is satisfied by P and
also by ?(P) for all admissible
transformations ? as defined by the measurement
model for P. (b) T accepts with sample
S at level of significance ? if and only if T
accepts with sample ?(S) at level of
significance ?.
79There are many examples in the statistical
literature where conditions (a) and (b) hold. We
give two examples. Example 1
t-test Measurement model interval scale
mean is Statistical model normal
distribution Admissible transformations ?(x)
kxc, kgt0 (a). Such ? take normal
distributions into normal distributions
80(b) The test statistic is where n sample
size, sample mean, sample
variance. Transform by ?(x) kxc. New
sample mean is new sample
variance is mean is New test
statistic for The test statistic is the same,
so (b) follows.
81Example 2 Sign Test Measurement model
ordinal scale median is
(meaningful) Statistical model Literature of
nonparametric statistics is unclear. Most books
say continuity of probability density function is
needed. In fact, it is only necessary to
assume . (a) (? is
strictly increasing) (Note that strictly
increasing ? also take continuous data into
continuous data, since there are only countably
many gaps)
82(b) Test statistic Number of sample points
above or number below , whichever is
smaller. This test statistic does not change
when a strictly increasing ? is applied to S
and is replaced by . Thus, (b)
follows. What if Conditions (a) or (b)
Fail? What if (a) holds, (b) fails? We might
still find test T useful, just as we sometimes
find it useful to apply different tests with
possibly different conclusions to the same data.
However, it is a little disturbing that seemingly
innocuous changes in how we measure things could
lead to different conclusions under the same
test.
83What if is meaningful but (a) fails? This
could happen if P violates the statistical
model for T. We might still be able to test
using T Try to find admissible ? such that
satisfies the statistical model for T
and then apply T to test . Since
is meaningful, holds iff , so
T applied to gives a test for
. For instance, if is meaningful, then
even for ordinal data, we can seek a
transformation which normalizes the data and
allows us to apply a parametric test.
84What if conditions (a) or (b) fail and we still
apply test T to test ? There is much
empirical work on this, especially emphasizing
the situation where (a) and (b) fail because we
have an ordinal scale but we treat it like an
interval scale. I don't think measurement theory
has much to say about this situation. But good
theoretical work does, and more is needed.
85Summary of My Views on the Limitations on
Statistical Procedures Imposed by Scales of
Measurement (1). It is always appropriate to
calculate means, medians, and other descriptive
statistics. However, the key point is whether or
not it is appropriate to make certain
statements using these statistics. (2). If we
want to make statements which capture something
fundamental about the population being described,
it is appropriate to make statements using
descriptive statistics if and only if the
statements are meaningful.
86(3). The appropriateness of a statistical test of
a hypothesis is just a matter of whether the
population and sampling procedure satisfy the
appropriate statistical model, and is not
influenced by the properties of the measurement
scale used. (4). However, if we want to draw
conclusions about a population which say
something basic about the population, rather than
something which is an accident of the particular
scale of measurement used, then we should only
test meaningful hypotheses, and meaningfulness is
determined by the properties of the measurement
scale.
87(5). A meaningful hypothesis can be tested by
several different tests, which may or may not
give the same conclusion even on the same data.
It can also be tested by applying the same test
to both given data and data transformed by an
admissible transformation of scale, and in the
best situation the two applications of the test
will give the same conclusion. So long as the
hypothesis is meaningful, it can also be tested
by transforming the data by an admissible
transformation and applying a test which might
not be appropriate for the data in its initial
form.
88The First Representation Problem Ordinal Utility
Functions Homomorphism from (A,R) into R
warmer than, louder than, preferred to, ... If
preference, then f is an ordinal utility
function. Definition We say that (A,R) is a
strict weak order if it satisfies the following
two conditions (1). Asymmetry aRb
bra (2). Negative Transitivity aRb bRc
aRc
89Example aRb a is to the right
of b Representation Theorem Suppose A is a
finite set. Then (A,R) is homomorphic to
iff (A,R) is a strict weak
order. This says that our example with the ?'s
is the most general strict weak order on a finite
set.
?
?
? ?
? ? ?
? ? ? ? ?
? ? ? ? ? ?
90Application Preferences Among Composers
Mo Mozart, H Haydn, Br Brahms, Be
Beethoven, W Wagner, Ba Bach, Ma Mahler, S
Strauss Is there an ordinal utility
function? How does one tell if (A,R) is strict
weak?
Mo H Br Be W Ba Ma S
Mo 0 1 1 0 0 0 1 1
H 0 0 1 0 0 0 1 0
Br 0 0 0 0 0 0 0 0
Be 1 1 1 0 1 0 1 1
W 0 1 1 0 0 0 1 1
Ba 1 1 1 0 1 0 1 1
Ma 0 1 1 0 0 0 0 0
S 0 1 1 0 0 0 1 0
91Proof of the Representation Theorem for A
Finite Necessity straightforward Sufficiency
If (A,R) is strict weak, then f is a
homomorphism. Suppose aRb. Then bRa. Hence,
aRy implies bRy, i.e., bRy implies aRy.
It follows that f(a) f(b). By asymmetry,
bRb. Since aRb, it follows that f(a) gt
f(b). Suppose aRb. Then bRy implies aRy,
so aRy implies bRy, so f(b) f(a), i.e.,
f(a) gt f(b).
92Comment Given any (A,R), we can define f
this way. Then, if there is a homomorphism at
all, f is one. Application to our
Example Calculate row sums. Then f(x) row
sum of row x. If the matrix is rearranged so
that the alternatives are listed in descending
order of row sums (with arbitrary ordering in
case of ties), then we can test if f is a
homomorphism by checking that there are 1's in
row x for all those y with f(y) lt f(x).
This should give a block of 1's in each row from
some point to the end.
93 Mo Mozart, H Haydn, Br Brahms, Be
Beethoven, W Wagner, Ba Bach, Ma Mahler, S
Strauss Rearranged Matrix
Mo H Br Be W Ba Ma S Row Sum
Mo 0 1 1 0 0 0 1 1 4
H 0 0 1 0 0 0 1 0 2
Br 0 0 0 0 0 0 0 0 0
Be 1 1 1 0 1 0 1 1 6
W 0 1 1 0 0 0 1 1 4
Ba 1 1 1 0 1 0 1 1 6
Ma 0 1 1 0 0 0 0 0 1
S 0 1 1 0 0 0 1 0 3
Be Ba W Mo S H Ma Br
Be 0 0 1 1 1 1 1 1
Ba 0 0 1 1 1 1 1 1
W 0 0 0 0 1 1 1 1
Mo 0 0 0 0 1 1 1 1
S 0 0 0 0 0 1 1 1
H 0 0 0 0 0 0 1 1
Ma 0 0 0 0 0 0 0 1
Br 0 0 0 0 0 0 0 0
94Are the axioms reasonable? Prescriptive vs.
descriptive. As descriptive axioms for warmer
than or preferred to Asymmetry Seems ok for
both. Negative transitivity Seems ok for
warmer than. However, for preference, consider
the following case. We choose on the basis of
price, unless prices are close, in which case we
choose by quality. Suppose a is higher quality
than b, which is higher quality than c, with
a higher in price than b and b higher in
price than c. If a and b are close in price
and so are b and c, but a and c are not,
then we would choose a over b and b over c,
but c over a. Thus, cRb and bRa, but cRa.
95Strict Simple Orders Let (A,R) be a strict
weak order. Define E on A by aEb
aRb bRa (A,R) strict weak implies (A,E)
is an equivalence relation. If R
is interpreted as preference, then E is
interpreted as indifference. We are indifferent
between a and b iff we prefer neither to the
other. Let A be the set of equivalence
classes. Define R on A by aRb
aRb
96Then, R is well-defined and is strict weak.
(A,R) is called the reduction of (A,R) by the
equivalence relation E. (A,R) has no ties. A
strict weak order with no ties allowed is called
strict simple. (A,R) tells us how to order the
equivalence classes. It can be thought of as the
relation strictly to the right of on a set of
points on a line
? ? ? ? ? ?
97Example aRb a mod 3 gt b mod 3. Then
(A,R) is given by Formally, we say that a
binary relation (A,R) is a strict simple order
if it satisfies the following conditions (1).
Asymmetry (2). Transitivity aRb bRc
aRc (3). Completeness (aRb or
bRa) Theorem If (A,R) is a strict weak
order, then (A,R) is a strict simple order.
98Theorem If A is a finite set, then there is a
1-1 homomorphism from (A,R) into iff
(A,R) is a strict simple order. Weak Orders
is not a strict simple order. It
violates asymmetry We can have aRb and bRa.
Similarly, weakly to the right of on a set of
points is not a strict weak order for the same
reason.
?
?
? ?
? ? ?
? ? ? ? ?
? ? ? ? ? ?
99This suggests that there should be concepts of
weak order and simple order analogous to the
concepts of strict weak order and strict simple
order and bearing the same relation to these
concepts as ? does to gt. Here are the formal
definitions (A,R) is called a weak order if it
satisfies the following conditions (1).
Transitivity (2). Strong Completeness
(aRb or bRa) Think of weakly preferred to or
at least as warm as.
100(A,R) is called a strict simple order if it
satisfies (1). Transitivity (2). Strong
Completeness (3). Antisymmetry aRb bRa
a b. Theorem If A is finite, then
there is a homomorphism from (A,R) into
iff (A,R) is a weak order. Theorem If A
is finite, then there is a 1-1 homomorphism from
(A,R) into iff (A,R) is a simple
order.
101If (A,R) is a weak order, define a binary
relation E on A by aEb aRb bRa
Then (A,E) is an equivalence relation.
Again, we can define a binary relation R on
the set A of equivalence classes, with aRb
aRb hen (A,R) is well-defined and is a
simple order.
102Generalizations of the Representation Theorem to
Infinite Sets of Alternatives. Cantor's Theorem
(1895) If A is a countable set, then (A,R)
is homomorphic to iff (A,R) is a
strict weak order. This theorem fails for
arbitrary infinite sets. Use the lexicographic
ordering of the plane This is strict weak,
but there is no homomorphism f(a,1) gt
f(a,0). ? rational number g(a) with f(a,1) gt
g(a) gt f(a,0). g is 1-1 function from reals
into rationals a gt b g(a) gt f(a,0) gt
f(b,1) gt g(b)
103Birkhoff-Milgram Theorem (A,R) binary relation,
B ? A. Say B is order-dense in A if a,b ?
A-B, aRb and bRa imply that ? c ? B such
that aRcRb. Theorem (Birkhoff-Milgram) (A,R)
is homomorphic to iff (A,R) is a
strict weak order and has a
countable set which is order-dense.
104 Q ? A is countable, order-dense Q ? A is
not order-dense But
is. A 0,1, R gt A is countable,
order-dense , Note The set
of equivalence classes with rational first
component is countable and order-dense.
105Uniqueness Results Uniqueness Theorem Every
homomorphism from (A,R) into is
regular and every homomorphism defines an ordinal
scale. Meaningfulness f(a) gt f(b) is
meaningful. f(a) 3f(b) is not meaningful. f(a)
f(b) gt f(c) - f(d) is not meaningful. We do
not get an interval scale of temperature as we
might have wanted. We'll return to this point.
106What Representations Lead to Ordinal Scales? Let
M be a positive integer. An M-tuple
from corresponds to a strict
weak order of rank
first all i such that we never have
having ranked , rank next
all i such that for no
, is Thus, (3,9,3,2,8) has 2 ranked first,
5 ranked second, 1,3 tied for third, 4
last. Suppose ? is a strict weak order on
. Let on consist of all M-tuples
whose corresponding ranking
(strict weak order) is ?. E.g. If
. If ?
,
107Theorem (Roberts 1984) Suppose S is an M-ary
relation on and f is a homomorphism from
(A,R) into which defines a regular
ordinal scale. Suppose . Then
S is of the form where , ..., are
all the strict weak orders of and
is or For example, if M
2, then ,
, . So
108Note that the condition is necessary but not
sufficient does not lead to an
ordinal scale. It is an open question to
completely characterize those S so that
homomorphisms into lead to ordinal
scales. It is also an open question to obtain
similar results for homomorphisms into A
related open question If f is a homomorphism
from (A,R) into and S is of the
form , when is the statement f(a) gt
f(b) meaningful? (Partial results Harvey and
Roberts 1989 Roberts and Rosenbaum 1994.)
109Second Representation Problem Extensive
Measurement Consider the homomorphisms from
into . This representation
is known as extensive measurement. Motivation
R is heavier than, is combination. Also
preferred to and combination. Historically,
scales were called extensive if they were at
least additive and a scale was not considered
acceptable unless it was at least
extensive. Representation Theorem (Hölder 1901)
Every Archimedean ordered group is homomorphic to
110Axioms for an Archimedean Ordered Group (1).
is a group. (2). (A,R) is a strict
simple order. (3). Monotonicity (4).
Archimedean If e is the group identity and
aRe, then there is a positive integer n such
that naRb. (Here, na is .
n times) Which of the axioms are necessary? (1).
Associativity, identity, inverse are not. (2).
Strict simple order is not there could be
ties. (3). Monotonicity is. (4). The
Archimedean axiom as stated is not it doesn't
even make sense without e.
111Definition is an extensive
structure if it satisfies the following
conditions (1). Weak Associativity
(2). (A,R) is a strict weak
order. (3). Monotonicity. (4). Modified
Archimedean Axiom If aRb, then there is a
positive integer n such that (For the latter
axiom, think of n(a-b) gt (d-c).)
112Theorem (Roberts and Luce 1968)
is homomorphic to iff
is an extensive structure. Are the axioms
reasonable? Descriptive? For mass OK For
preference Strict weak could be questioned.
(Recall price-quality example.) Weak
associativity could be questioned if combination
allows physical interaction (a flame, b
cloth, c fire retardant)
113Monotonicity could be questioned a black
coffee, b candy bar, c sugar bRa
and Modified Archimedean a 1, b 0, c is
life as a cripple, d is a long and healthy life.
Is ever R to Uniqueness Theorem
(Roberts and Luce 1968) Every homomorphism from
into is regular
and defines a ratio scale. Note This gives the
desired scale type for mass. Note f(a) 3f(b)
is meaningful.
114Question What representations into ? give
rise to ratio scales? Roberts (1984) showed
that (A,R) into never gives rise to
a ratio scale. Thus, ? requires at least two
relations or at least one operation. Can we get
a ratio scale without any operations in ??
115Third Representation Problem Difference
Measurement Define ? on by Let D be a
quaternary relation on A. Consider the
representation A
homomorphism f satisfies This representation
is called algebraic difference measurement. Motiv
ation Temperature measurement. D is
comparison of differences in warmth. Related
motivation Utility with comparison of
differences in value.
116Representation Theorems Sufficient conditions
were given by Debreu (1958), Scott and Suppes
(1958), Suppes and Zinnes (1963), Kristof (1967),
Krantz, Luce, Suppes, and Tversky (1971),
... Necessary and sufficient conditions when A
is finite were given by Scott (1964). A certain
solvability condition plays an important role in
the representation theorem of Krantz, Luce,
Suppes, and Tversky and is basic for the
corresponding uniqueness theorem. Definition
(A,D) satisfies solvability if whenever
D(s,t,a,b) and D(x,x,s,t), there are u,v ?
A such that D(a,u,s,t), D(s,t,a,u),
D(v,b,s,t) and D(s,t,v,b).
117What this says implies such
that Solvability is not a necessary
condition for the representation. Uniqueness
Theorem (Suppes and Zinnes 1963, Krantz, Luce,
Suppes, Tversky 1971). Every homomorphism from
(A,D) into is regular. Moreover,
if (A,D) satisfies solvability, then every
homomorphism defines an interval scale. Note
This gives temperature as an interval scale.
118Example to Show Hypothesis of Solvability is
Needed f(x) x, all x g(0) 1, g(1) 2,
g(3) 8 Then f and g are homomorphisms from
(A,D) into . If
, then ?(0) 1, ?(1) 2, ?(3) 8 and we
cannot find ? gt0, ? so that ?(x) ?x?.
Thus, f is not an interval scale. Comment
Solvability fails. Open Question What
representations into ? lead to interval
scales?
119Fourth Representation Problem Conjoint
Measurement Very important in the history of
measurement theory was the introduction of
conjoint measurement as an example of a
fundamental measurement structure different from
extensive measurement. Conjoint measurement is
concerned with multidimensional alternatives.
The set A has a product structure In economics,
we can think of an alternative
as a market basket. In perception,
could be intensities of sounds presented to the
left ear and could be intensities of sounds
presented to the right ear.
120In testing, set could be a set of subjects
and set a set of test items. In studying
response strength, we could take a combination of
drive and incentive. And so on. We have a
binary relation on A interpreted as preferred
to, sounds louder, performs better, etc.
We seek functions so
that Sufficient conditions for the
representation were given by Debreu 1960 and
Luce Tukey 1964 necessary and sufficient
conditions by Scott 1964 in case each is
finite.
121Semiorders Suppose R is preference. Recall
that indifference corresponds to the
relation aIb aRb bRa Suppose f is
an ordinal utility function aRb f(a) gt
f(b) Then aIb f(a) f(b) This implies
that indifference is transitive.
122Arguments against transitivity of indifference
(or more generally of tying in a measurement
context) go back to Armstrong in the 1930's,
Wiener in the 1920's, even Poincaré (in the 19th
century). One argument against transitivity of
indifference The Coffee-Sugar example (Luce
1956). Let 0 cup of coffee with no sugar, 5
cup of coffee with 5 spoons of sugar. We
have a preference between 0 and 5 -- we are not
indifferent.
? ?
0 5
123Arguments against transitivity of indifference
(or more generally of tying in a measurement
context) go back to Armstrong in the 1930's,
Wiener in the 1920's, even Poincaré (in the 19th
century). One argument against transitivity of
indifference The Coffee-Sugar example (Luce
1956). Let 0 cup of coffee with no sugar, 5
cup of coffee with 5 spoons of sugar. We
have a preference between 0 and 5 -- we are not
indifferent. Add sugar one grain at a time. We
are indifferent between the first and the last.
? ?
0 5
124Second argument against transitivity of
indifference Pony-bicycle Example (Armstrong
1939) pony I bicycle pony I bicyclebell not
bicycle I bicyclebell The semiorder model we
shall discuss is intended to account for the
first kind of problem, but not the second.
125The Idea Find a function f so that aRb
f(a) is sufficiently larger than f(b) Let
? gt 0 be a threshold or just noticeable
difference. Find a function f on A so
that aRb f(a) gt f(b) ? Define gt ? on
by We are seeking a homomorphism from (A,R)
into
126Note that if such a homomorphism f exists for
some ? gt 0, then another f ' exists for all
? ' gt 0 Take f ' (? '/ ? )f. Definition
(Luce 1956) (A,R) is a semiorder if it
satisfies the following conditions (1).
Irreflexivity aRa (2). aRb cRd
aRd or cRb (3). aRb bRc aRd
or dRc. Representation Theorem (Scott and
Suppes 1958). Suppose (A,R) is a binary
relation on a finite set A. Then there is a
homomorphism from (A,R) to iff
(A,R) is a semiorder.
127Necessity of the conditions (1). trivial (3).
(2). If f(a) ? f(c) If
f(a) ? f(c)
?? ??
? ? ?
f(c) f(b) f(a)
??
? ? ?
f(d) f(c) f(a)
??
? ? ?
f(b) f(a) f(c)
128Idea of the Proof A 1,2,3,4,5 R (1,3),
(1,4), (1,5), (2,4), (2,5), (3,5) Is this a
semiorder? Not easy to check by the
axioms. However It is easy to construct
f. First, find the order of embedding. Finding
the Order 1R3, not 2R3 ? f(1) gt f(2). 1R3, not
1R2 ? f(2) gt f(3). 3R5, not 4R5 ? f(3) gt
f(4). 3R5, not 3R4 ? f(4) gt f(5).
? ? ? ? ?
f(5) f(4) f(3) f(2) f(1)
0 .5 1.2 1.7 2.3
with ? 1
129Uniqueness Problem for Semiorders There can be
irregular homomorphisms from (A,R) to
. A 0,1/2,10, . Two
homomorphisms from (A,R) into f(0) 0, f(1/2)
0, f(10) 10 g(x) x, all x. There is no
admissible transformation so
that So, the usual theory of scale type does
not apply. There are various ways around this.
130The Method of Perfect Substitutes Define an
equivalence relation E on A by In this
case, a and b are perfect substitutes for
each other with respect to the relation R.
Define on the collection of equivalence
classes by If (A,R) is a semiorder,
is well-defined and is again a semiorder.
Moreover, every homomorphism is now 1-1 and
regular.
131A similar trick of canceling out the perfect
substitutes relation E always leads to regular
representations for equivalence classes. We can
now ask What is the scale type of a
homomorphism from into
None of the common scale types applies and no
succinct characterization of admissible
transformations is known. A Uniqueness
Result Suppose Define W on A by Then
(A,W) is a strict weak order.
132Theorem (Roberts 1971) The order (A,W) is
essentially unique. The only changes allowed are
to permute elements a and b which are
equivalent under the perfect substitutes
relation. Meaningfulness One can still ask,
given a homomorphism f from (A,R) into
, if the conclusion f(a) gt f(b) is
meaningful in the broader sense of invariance
under all possible homomorphisms. It isn't
always. In the example above, we have g(1/2) gt
g(0), but not f(1/2) gt f(0). However, it is
easy to see that if f is a homomorphism, f(a)
gt f(b) is a meaningful conclusion for all a, b
if for all x ? y, xEy.
133Indifference Graphs Consider the indifference
representation corresponding to the semiorder
representation Define on by We
seek a homomorphism from (A,I)
into Graph-theoretic formulation G (V,E), V
A, edge a to b iff aIb. Assign numbers
to vertices of G so that two vertices are
adjacent iff their corresponding numbers are
close (within ?).
134Here is an example of such a representation on a
graph
1.1
? 1
0
.7
1.6
2.5
Definition We say that (A,I) is an
indifference graph if there is a homomorphism
into
135Examples of Graphs that are Not Indifference
Graphs C4 Similarly, C5, C6, ...
136Representation Theorem (Roberts 1969). G (A,I)
is an indifference graph iff it has none of the
above graphs as an induced subgraph. Uniqueness
Irregularity can again be a problem. Now, the
perfect substitutes relation is equivalent
to (Assuming that aIa for all a, we have
aEb iff a and b have the same closed
neighborhoods.) Define on by
. Then all homomorphisms from
into are regular, but the
class of admissible transformations is not known.
137Again, is
essentially unique, as in the case of
semiorders. Meaningfulness f(a) gt f(b) is
never meaningful for indifference graphs since
-f is again a homomorphism. However, some
ordinal conclusions are meaningful. Say y is
between x and z if