Title: Psychometrics
1Psychometrics
- PSYC 325
- Paul Jose
- Thought for the day Studies at Jikei University
in Tokyo found that people who employed 7 rules
for good health (e.g., adequate sleep, no
smoking) had about 6 higher blood pressure than
people who were not so concerned about their
health.
2This is a big topic
- Could spend a whole semester on this topic, but
well devote a few days. - Psychometrics are important to know about because
most of the work in psychology involves measures
of some kind for which you need to know whether
they are reliable and valid. Quick review on this.
3Scale reliability
- Internal reliability
- Cronbachs alpha Kuder-Richardson 20
split-half etc. - Want to know that all of your items are highly
intercorrelated. If not, then you have noise or
heterogeneity in your measure. You may think that
youre measuring depression, lets say, but
youre really measuring depression and anxiety. - Lets take a look at some examples. I mixed
depression items with coping items on the
following page.
4Item-total Statistics Scale
Scale Corrected Mean
Variance Item- Alpha
if Item if Item Total
if Item Deleted Deleted
Correlation Deleted CDI08 8.1450
11.0675 .4146 .6002 CDI09
8.0341 10.4920 .5248
.5768 CDI10 8.1938 10.8881
.4864 .5898 CDI11 7.9242
10.6614 .3930 .5967 CDI12
8.2340 11.9727 .2101
.6328 CDI07 8.0916 10.6471
.4964 .5833 COPE08 6.3352
9.6316 .3088 .6201 COPE11
5.6796 11.4301 -.0004
.7260 COPE15 6.6628 8.2480
.5157 .5485
5Criteria for Cronbachs alpha
- The minimum acceptable level is .70. You would
like to have the alpha be in the .80s, and youre
ecstatic if it goes into the .90s. - Another example. A colleague and I have written a
new scale to measure parental facilitation of
literacy and numeracy in preschool children
(PFLNS), and we sought to compare it with a
pre-existing measure (Home Literacy Environment
HLE).
6Some critical facts
- HLE composed of 9 items and the purported
Cronbachs alpha is .74. Has been shown by the
authors to predict reading scores on two commonly
used tests of reading PPVT and PIAT-R. - PFLNS composed of 42 items, and we didnt know
what the reliability would be. No validation yet. - Research plan Collect data from 200 parents on
the HLE and PFLNS in Chicago and Wellington, and
individually test these children (4 and 5-years
old) on the TEMA and TERA. - By so doing, we could examine the internal
reliability, test-retest reliability, and
validity data in one fell swoop. Lets see how it
turned out. Next page is alpha for the HLE.
7 Scale Scale Corrected
Mean Variance Item-
Alpha if Item if Item
Total if Item Deleted
Deleted Correlation Deleted TELLY
9.8764 5.3247 -.0525
.5345 CHECKS 11.1437 4.9534
.0673 .5124 NEWSPAP 10.8764
4.0394 .3439 .4127 MAGADULT
10.5833 3.7250 .3360
.4117 MAGCHILD 11.2730 4.6025
.1802 .4781 MOTHREAD 9.9770
4.3222 .3020 .4343 FATHREAD
10.0201 4.2157 .2905
.4362 CHILREAD 9.9195 4.6794
.2367 .4608 NUMBOOKS 9.7557
5.0497 .2037 .4776 Reliability
Coefficients N of Cases 348.0
N of Items 9 Alpha .4955
8PFLNS
- Cronbachs alpha .866 for 42 items.
- --------------------------------------------------
-- - Inescapable conclusion the PFLNS is internally
reliable and the HLE is definitely not. Doesnt
usually turn out to be quite so clean. - Something to remember the more items you have
(if they are similar), the higher your alpha. A
9-item scale must be very coherent in order to
have a good alpha. We have 42 items, and they
have 9.
9Reliability
- Okay, weve demonstrated good internal
reliability, are we done yet? - No, because we dont know if the scales have good
reliability over time, usually called
test-retest reliability. One simply correlates
scores between individuals over a relatively
short period of time (a few weeks to a month). - What are they for the HLE and PFLNS? Answer We
dont know yet. We have the data, but have not
entered them yet. I would guess that the PFLNS
would be better, again because of the larger
number of items in it.
10Reliability over time
- Why is this important? Because you want to know
that whatever youre measuring is relatively
stable over time. - But is that true for all measures? In the case of
parental practices, the answer is yes. But in the
case of rapidly changing variables, such as mood,
you would not expect stability over time. So
think about this before you gather the data and
check it.
11Validity
- There are four kinds of validity. Lets review
them. - Face validity do the items look like they
measure what theyre supposed to measure? - Convergent validity does the measure correlate
with similar measures and fail to correlate with
dissimilar measures? - Criterion validity does the measure predict
something that it is supposed to predict? - Construct validity the degree to which the
measure accurately measures the hypothetical
construct it is designed to measure.
12Validity for the PFLNS
- So what kind of validity should we consider?
- Face validity we created items that measured the
degree to which parents did educationally
enriching activities. - Convergent validity does our scale correlate
with the HLE? We could have included a measure of
anxiety, or something unrelated too. - Criterion validity does the scale predict scores
from standardized tests of literacy and numeracy?
This is the most important goal. - Construct validity does the scale predict the
hypothetical construct of parental behaviours
that facilitate academic skills? This would be
the long-term goal of a number of data
collections.
13Face validity
- HLE
- Approximately how many books does your child own?
- How many hours of television does your child
watch each week? - PFLNS
- Use maths in home routines, e.g., measuring
ingredients for cooking. - Do alphabet workbooks or worksheets.
14Convergent validity
- Correlation between the HLE and PFLNS
- r(322) .245, p lt .001.
- Correlation between the HLE and the PFLNS-Reading
sub-score r(322) .259, p lt .001 - Correlation between the HLE and the PFLNS-Maths
sub-score r(322) .190, p lt .001
15Criterion validity
- Correlation of the HLE with
- Reading b .017, R2 .001, p .82.
- Mathematics b .047, R2 .002, p .35.
- Correlation of the PFLNS with
- Reading b .238, R2 .06, p .001.
- Mathematics b .158, R2 .03, p .003.
- Conclusion? The PFLNS seems to do a better job of
predicting maths and literacy scores than the HLE.
16Construct validity
- This is not easily demonstrated. One needs to
have the results from a variety of studies, all
of which show that the new scale is a good
predictor/correlate of related constructs. - Other than the HLE, no pre-existing measure of
parental behaviours exists with which we can
correlate our new measure. One really needs to
have 3-5 other measures to triangulate in on
the hypothetical construct. - One cannot measure a hypothetical construct
directly, but one can use structural equation
modeling to determine this.
17Example with a variety of similar tests
Intelligence
.79
.75
.60
New test
.32
.71
Test A
Test C
Test D
Test B
Test B is not a good example, but the new test is
a good example.
18Other topics in psychometrics
- There are two high-powered technique of choosing
items for scales called - Item response theory (IRT)
- The Rasch model.
- I suspect that Ill use one or the other to
examine the specific items of our new scale to
determine whether 42 items are truly necessary. - Also, there is the issue of whether certain items
are relevant for specific ages of children. I
need to examine whether parents change what they
are doing over the preschool age span (I know
they are).
19Low level of psychometric knowledge out there
- My chief complaint about researchers who propose
new measures is that they are not systematic and
complete in doing all of the things that need to
be done. - Dont report internal and test-retest
reliabilities dont report validity data and
dont factor analyze their data properly. - This last item is our next concern.