Title: Issues in Measuring Behaviour:
1Issues in Measuring Behaviour Why do we want
to quantify everything? Types of psychological
test. Factors affecting test reliability. Factor
s affecting test validity.
2Why quantify? 1. Science involves measurement -
because measurements can be objectively obtained,
are publicly available, and potentially checkable
by sceptical others. 2. Science often (but not
invariably) involves experimentation - because
the experimental method is good for identifying
cause and effect.
3Types of psychological test Purpose of
tests 1. Research. 2. Practical applications -
clinical, educational, occupational. A 20th
century phenomenon, dating back to Binet (1900s).
4Dangers of testing 1. Untrained use - easy to
administer, hard to interpret. 2. Spurious
precision, because quantitative. 3.
Misapplication of findings, in a deterministic
way. 4. Essentially descriptions of groups less
reliable as descriptions of individuals.
5Sex differences in reaction time?
6Types of test 1. Performance (e.g. IQ
tests) Problems - Motivation. Standardisation
(culture-fair?) 2. Disposition (e.g., anxiety,
extroversion) Problems - Social desirability
(criterion-keyed tests). Ambiguity. Need
appropriate norms.
7Test reliability A test is reliable if it gives
consistent/reproducible results. A score
true score error Error is due to (a)
natural performance variation (b) lack of
precision in defining and measuring psychological
constructs (e.g. what do we mean by terms like
"aggression" or "intelligence?")
8Measures of reliability (a) Test - retest
(time to time). (b) Alternate forms (version to
version). (c) Split-half (item to item). (d)
Inter-scorer (person to person).
9Factors affecting reliability 1. The phenomenon
itself (traits vs. states). 2. Precision of
measurement. 3. Length of test (long gt short). 4.
Time between tests (short gt long). 5. Variability
in performance (high gt low). 6. Format Multiple
choice of 5 answers per question 20 correct by
chance. True/false 50 correct by
chance. Multiple choice therefore more reliable
than true/false. 7. Inter-individual variability
in scores (high gt low).
10The greater the variability between individuals
in test scores, the better the reliability
11Test validity A test is valid if it measures
what it is supposed to be measuring. Important -
a test can be reliable without being valid (but
not vice versa).
12Example of reliable but invalid
measurements Paul Broca (1870s) Searched for
anthropometric measurements that correlated with
the known ranking of human races in terms of
intelligence and civilisation. e.g. ratio of
forearm to upper arm more ape-like in negroes
than whites. (Abandoned once he realised that by
this criterion, whites were more ape-like than
Eskimos, aborigines and other inferior
races!) Brain weight men gtwomengtnegroesgtgorillas
. Modern brains heavier than mediaeval
brains. French brains heavier than German
brains! 292 male brains mean weight 1,325
grams. 140 female brains mean weight 1,144
grams (14 difference). No account of age of
death (young men, old women) or body size. "We
might ask if the small size of the female brain
depends exclusively upon the small size of her
body...But we must not forget that women are, on
the average, a little less intelligent than
men...We are therefore permitted to suppose that
the relatively small size of the female brain
depends in part upon her physical inferiority and
in part upon her intellectual inferiority" (1861,
p.153).
13Measures of validity (a) Face validity
(intuitively looks plausible). (b) Content
validity (test covers material which is
considered relevant - eg. statistics exams
shouldnt contain history questions!). (c)
Criterion validity - predictive or
concurrent. Problem - finding appropriate/decent
criteria. (d) Construct validity (does
performance correlate well with known measures of
the phenomenon?). (e) Ecological / external
validity.
14Factors affecting validity Norms and
standardisation. (a) How well was the test
standardised? Stratified random sampling is
ideal. Do sub-group norms exist? (b) Are
sufficient details given to ensure correct
administration? (c) How appropriate is the
standardised group as a baseline against which to
compare your sample?
15Ecological validity To what extent are our
results generalisable to the real world? Depends
- e.g. driving simulators are good for simulating
vehicle control, useless for simulating how
riskily people are prepared to drive. L.E.D.
brakelights - light faster, but do the
milliseconds make any practical difference to a
following driver's braking times?