Standardization of variables - PowerPoint PPT Presentation

1 / 24
About This Presentation
Title:

Standardization of variables

Description:

apart from that all you have to remember is that the formula is complicated ... Both give a distribution with fixed mean, standard deviation, and unit ... – PowerPoint PPT presentation

Number of Views:45
Avg rating:3.0/5.0
Slides: 25
Provided by: b117
Category:

less

Transcript and Presenter's Notes

Title: Standardization of variables


1
Standardization of variables
  • Maarten Buis
  • 5-12-2005

2
Recap
  • Central tendency
  • Dispersion
  • SPSS

3
Standardization
  • Is used to improve interpretability of variables.
  • Some variables have a natural interpretable
    metric e.g. income, age, gender, country.
  • Others, primarily ordinal variables, do not e.g.
    education, attitude items, intelligence.
  • Standardizing these variables makes them more
    interpretable.

4
Standardization
  • Transforming the variable to a comparable metric
  • known unit
  • known mean
  • known standard deviation
  • known range
  • Three ways of standardizing
  • P-standardization (percentile scores)
  • Z-standardization (z-scores)
  • D-standardization (dichotomize a variable)

5
When you should always standardize
  • When averaging multiple variables, e.g. when
    creating a socioeconomic status variable out of
    income and education.
  • When comparing the effects of variables with
    unequal units, e.g. does age or education have a
    larger effect on income?

6
P-Standardization
  • Every observation is assigned a number between 0
    and 100, indicating the percentage of observation
    beneath it.
  • Can be read from the cumulative distribution
  • In case of knots assign midpoints
  • The median, quartiles, quintiles, and deciles are
    special cases of P-scores.

7
(No Transcript)
8
P-standardization
  • Turns the variable into a ranking, i.e. it turns
    the variable into a ordinal variable.
  • It is a non-linear transformation relative
    distances change
  • Results in a fixed mean, range, and standard
    deviation M50, SD28.6, This can change
    slightly due to knots
  • A histogram of a P-standardized variable
    approximates a uniform distribution

9
Linear transformation
  • Say you want income in thousands of guilders
    instead of guilders.
  • You divide INCMID by f1000,-

10
Linear transformation
  • Say you want to know the deviation from the mean
  • Subtract the mean (f2543,-) from INCMID

11
Recap multiplication and addition and the number
line
12
Linear transformation
  • Adding a constant (X Xc)
  • M(X) M(X)c
  • SD(X) SD(X)
  • Multiply with a constant (X Xc)
  • M(X) M(X)c
  • SD(X) SD(X) c

13
Z-standardization
  • Z (X-M)/SD
  • two steps
  • center the variable (mean becomes zero)
  • divide by the standard deviation (the unit
    becomes standard deviation)
  • Results in fixed mean and standard deviation
    M0, SD1
  • Not in a fixed range!
  • Z-standardization is a linear transformation
    relative distances remain intact.

14
Z-standardization
  • Step 1 subtract the mean
  • c -M(X)
  • M(X) M(X)c
  • M(X) M(X)-M(X)0
  • SD(X)SD(X)

15
Z-standardization
  • Step 2 divide by the standard deviation
  • c is 1/SD(X)
  • M(Z) M(X) c
  • M(Z) 0 1/SD(X) 0
  • SD(Z) SD(X) c
  • SD(Z) SD(X) 1/SD(X) 1

16
Normal distribution
  • Normal distribution Gauss curve Bell curve
  • Formula (McCall p. 120)
  • Note the (x-m)2 part
  • apart from that all you have to remember is that
    the formula is complicated
  • Normal distribution occurs when a large number of
    small random events cause the outcome e.g.
    measurement error

17
Normal distribution
  • Other examples the height of individuals,
    intelligence, attitude
  • But the variables Education, Income and age in
    Eenzaam98 are not normally distributed

18
Z-scores and the normal distribution
  • Z-standardization will not result in a normally
    distributed variable
  • Standardization in NOT the same as normalization
  • We will not discuss normalization (but it does
    exist)
  • But If the original distribution is normally
    distributed, than the z-standardized variable
    will have a standard normal distribution.

19
Standard normal distribution
  • Normal distribution with M0 and SD1.
  • Table A in Appendix 2 of McCall
  • Important numbers (to be remembered)
  • 68 of the observations lie between 1 SD
  • 90 of the observations lie between 1.64 SD
  • 95 of the observations lie between 1.96 SD
  • 99 of the observations lie between 2.58 SD

20
Why bother?
  • If you know
  • That a variable is normally distributed
  • the mean and standard deviation
  • Than you know the percentage of observations
    above or below and observation
  • These numbers are a good approximation, even if
    the variable is not exactly normally distributed

21
P Z standardization
  • Both give a distribution with fixed mean,
    standard deviation, and unit
  • P-standardization also gives a fixed range
  • Both are relative to the sample if you take
    observations out, than you have to re-compute the
    standardized variables

22
P Z-standardization
  • When interpreting Z-standardized variables one
    uses percentiles
  • With P-standardization one decreases the scale of
    measurement to ordinal, BUT this improves
    interpretability.

23
Student recap
24
Do before Wednesday
  • Read McCall chapter 5
  • Understand Appendix 2, table A
  • make exercises 5.7-5.28
Write a Comment
User Comments (0)
About PowerShow.com