Introduction to Psychometrics - PowerPoint PPT Presentation

About This Presentation
Title:

Introduction to Psychometrics

Description:

Introduction to Psychometrics Psychometrics & Measurement Validity Constructs & Measurement Kinds of Items Properties of a good measure Standardization – PowerPoint PPT presentation

Number of Views:122
Avg rating:3.0/5.0
Slides: 19
Provided by: Gar68
Learn more at: https://psych.unl.edu
Category:

less

Transcript and Presenter's Notes

Title: Introduction to Psychometrics


1
Introduction to Psychometrics
  • Psychometrics Measurement Validity
  • Constructs Measurement
  • Kinds of Items
  • Properties of a good measure
  • Standardization
  • Reliability
  • Validity
  • Standardization Inter-rater Reliabiligy

2
  • Psychometrics
  • (Psychological Measurement)
  • The process of assigning a value to represent the
    amount or kind of a specific attribute of an
    individual.
  • Individuals can be participants, collectives,
    stimuli, or processes
  • We do not measure individuals
  • We measure specific attributes of an individual

E.g., Each participant in the Heptagonal
Condition was presented with a 2 inch wide
polygon to view for 10 seconds. Then this polygon
and four similar ones were presented and the
participants reaction time to identify the
polygon presented previously was recorded.
We will focus on measuring attributes of persons
in this introduction!
3
  • Psychometrics is the centerpiece of scientific
    empirical psychological research practice.
  • All psychological data result from some form of
    measurement
  • Behaviors are collected by observation,
    self-report or behavioral traces.
  • Measurement is the process of turning those
    behaviors into data for analysis
  • For those data to be useful we need Measurement
    Validity
  • The better the measurement, the better the data,
    the more accurate and the more useful are the
    conclusions of the data analysis for the
    intended psychological research or application

Without Measurement Validity, there cant be
Internal Validity, External Validity, or
Statistical Conclusion Validity!
4
Most of what we try to measure in Psychology are
constructs Theyre called this because most of
what we care about as psychologists are not
physical measurements, such as height, weight,
pressure velocity rather the stuff of
psychology ? learning, motivation, anxiety,
social skills, depression, wellness, etc. are
things that dont really exist. Rather, they
are attributes and characteristics that weve
constructed to give organization and structure to
behavior. Essentially all of the things we
psychologists research, both as causes and
effects, are Attributive Hypotheses with
different levels of support and acceptance!!!!
5
Measurement of constructs is more difficult than
measurement of physical properties! We cant
just walk up to someone with a scale, ruler,
graduated cylinder or velocimeter and measure how
depressed they are. We have to figure out some
way to turn observations of their behavior,
self-reports or traces of their behavior into
variables that give values for the constructs we
want to measure. So, measurement is, just like
the rest what weve learned about so far in this
course, all about representation !!! Measurement
Validity is the extent to which the data
(variable values) we have represent the behaviors
(constructs) we want to study.
6
  • What are the different types of constructs we
    measure from persons ???
  • The most commonly discussed types are ...
  • Demographics population/subpopulation
    identifiers
  • e.g., age, gender, race/ethnic, history
    variables
  • Ability/Skill performance broadly defined
  • e.g., scholastic skills, job-related skills,
    research DVs, etc.
  • Attitude/Opinion how things are or should be
  • e.g., polls, product evaluations, etc.
  • Personality characterological contextual
    attributes of an individual
  • e.g., anxiety, psychoses, assertiveness,
    extroversion, etc.

7
  • However, it is difficult to categorize many of
    the things we Psychologists measure..
  • Diagnostic Category
  • achievement limits of what can be
    learned/expressed /or
  • personality private social expressions
    /or
  • attitude/opinion beliefs feelings
  • Social Skills
  • achievement something that has been learned ?
    /or
  • personality how we get along socially is part
    of who we are ?
  • Intelligence
  • innate (biological) preparedness for learning
    /or
  • achievement earlier learning more
    intelligence
  • Aptitude
  • achievement know things necessary to learn
    other things /or
  • specific capacity the ability to learn certain
    skills

8
  • Each separate thing we measure is called an
    item
  • e.g., a question, a problem, a page, a trial,
    etc.
  • Collections of items are called many things
  • e.g., survey, questionnaire, instrument,
    measure, test, or scale
  • Three kinds of item collections you should know
    ..
  • Scale (Test) - all items are put together to
    get a single score
  • Subscale (Subtest) item sets put together
    to get multiple separate scores
  • Surveys each item gives a specific piece of
    information
  • Most questionnaires, surveys or interviews
    are a combination of all three.

9
There are skads of ways of classifying or
categorizing items, here are three ways that I
want you to be familiar with
  • Kinds of items 1? objective items vs. subject
    items
  • objective does not mean true real or
    accurate
  • subjective does not mean made up or
    inaccurate
  • Defined by how the observer/interviewer/coder
    transforms participants responses into data

Objective Items - no evaluation or decision is
needed either response data or a
mathematical transformation e.g., multiple
choice, TF, matching, fill-in-the-blanks (strict)
Subjective Items response must be evaluated and
a decision or judgment made what should be the
data value content coding, diagnostic systems,
behavioral taxonomies e.g., essays, interview
answers, drawings, facial expressions
10
Bit more about objective vs. subjective
  • Seems simple
  • the objective measure IS the behavior of interest
  • e.g., impolite statements, GPA, hourly sales,
    publications
  • problems? Objective doesnt mean
    representative
  • Seems harder
  • subjective rating of behavior IS the behavior of
    interest
  • e.g., friends eval, advisors eval, managers
    eval, Chairs eval
  • problems? Good subjective measures are hard
    work, but
  • Hardest most common
  • construct of interest isnt a specific behavior
  • e.g., social skills, preparation for the
    professorate, sales skill, contribution to the
    department
  • problems ? What is construct how represent it
    ???

11
  • Kind 2 ? Judgments, Sentiments Scored
    Sentiments
  • Judgments ? do have a correct answer (e.g., 2
    2 4)
  • the behavior, response or trace must be
    scored (compared it to the correct answer) to
    produce the variable/data
  • scoring may be objective or subjective,
    depending on item
  • Scored Sentiments ? do not have a correct answer
    but do have an indicative answer (e.g., Do you
    prefer to be alone?)
  • behavior, response or trace must be scored
    (compared it to the indicative answer) to
    produce the variable/data
  • scoring may be objective or subjective,
    depending on item
  • Sentiments ? do not have a correct answer (e.g.,
    Like Psyc350?) or have a correct answer, but we
    wont check (e.g., age)
  • the behavior, response or trace is the
    variable/data
  • scoring may be objective or subjective,
    depending on item

12
  • Using Judgments, Sentiments Scored Sentiments
  • Judgments ? do have a correct answer
  • Ability/skill
  • Intelligence
  • Diagnostic category
  • Aptitude
  • Scored Sentiments ? do not have a correct answer
    but do have an indicative answer
  • Personality
  • Diagnostic category
  • Aptitude
  • Sentiments ? do not have a correct answer or
    have a correct answer, but we wont check
  • Demographics
  • Attitude/Opinion

13
Kind 3 ? Direct Keying vs. Reverse Keying We
want the respondents to carefully read and
respond to each item of our scale/test. One
thing we do is to write the items so that some of
them are backwards or reversed Consider
these items from a depression measure 1. It is
tough to get out of bed some mornings.
disagree 1 2 3 4 5 agree 2. Im generally
happy about my life. 1 2 3 4 5 3.
I sometimes just want to sit and cry.
1 2 3 4 5 4. Most of
the time I have a smile on my face. 1
2 3 4 5
If the person is depressed, we would expect
then to give a fairly high rating for questions 1
3, but a low rating on 2 4. Before
aggregating these items into a composite scale or
test score, we would direct key (11, 22, 33,
44, 55) and reverse key items 2 4 (15, 24,
42, 51)
14
Desirable Properties of Psychological
Measures Interpretability of Individual and
Group Scores Population Norms Validity
Reliability Standardization
15
Desirable Properties of Psychological Measures
Interpretability of Individual Group Scores
Population Norms Scoring Distribution Cutoffs
Validity Face, Content, Criterioin-Related,
Construct
Reliability Inter-rater, Internal Consistency,
Test-Retest Alternate Forms
Standardization Administration Scoring
16
  • Standardization
  • Administration test is given the same way
    every time
  • who administers the instrument
  • specific instructions, order of items, timing,
    etc.
  • Varies greatly - multiple-choice classroom test
    ? hand it out - MMPI ? hand it out
  • - WAIS ? whole books
    courses
  • Scoring test is scored the same way every
    time
  • who scores the instrument
  • correct, partial and incorrect answers, points
    awarded, etc.
  • Varies greatly - multiple choice test ? fill in
    the bubble sheet
  • - MMPI ? whole books
    courses
    - WAIS ? whole books
    courses

17
  • We need to assess the inter-rater reliability of
    the scores from subjective items.
  • Have two or more raters score the same set of
    tests (usually 25-50 of the tests)
  • Assess the consistency of the scores different
    ways for different types of items
  • Quantitative Items
  • correlation, intraclass correlation, RMSD
  • Ordered Categorical Items
  • agreement, Cohens Kappa
  • Keep in mind ? what we really want is rater
    validity
  • we dont really want raters to agree, we want
    then to be right!
  • so it is best to compare raters with a
    standard rather than just with each other

18
  • Ways to improve inter-rater reliability
  • improved standardization of the measurement
    instrument
  • do questions focus respondents answers?
  • will single sentence or or other response
    limitations help?
  • instruction in the elements of the
    standardization
  • is complete explication possible? (borders on
    objective)
  • if not, need conceptual matches
  • practice with the instrument -- with feedback
  • walk-through with experienced coders
  • practice with common problems or historical
    challenges
  • experience with the instrument
  • really no substitute
  • have to worry about drift generational
    reinterpretation
  • use of the instrument to the intended population
  • different populations can have different
    response tendencies
Write a Comment
User Comments (0)
About PowerShow.com