Describing Variation in Data Chapter 9 - PowerPoint PPT Presentation

1 / 41
About This Presentation
Title:

Describing Variation in Data Chapter 9

Description:

Describe the pattern of spread (variation) in your data set, ... The data is the value of that characteristic: White, pink, yellow, brown. A, B, O, AB. ... – PowerPoint PPT presentation

Number of Views:34
Avg rating:3.0/5.0
Slides: 42
Provided by: kimberl71
Category:

less

Transcript and Presenter's Notes

Title: Describing Variation in Data Chapter 9


1
Describing Variation in DataChapter 9
  • Kimberly R. Barber, PhD
  • HED 547

2
Variation
  • Variation is present in every aspect of every
    characteristic
  • In test measures, behaviors, environment.
  • All characteristics have some level of variation.
  • Level of variation depends on
  • Source
  • Population
  • Accuracy of measure

3
Variation Defined
  • Differences in the values of a characteristic.
  • Variation inherently always exists
  • Because individuals differ,
  • Because individuals are not all going to have the
    same exact value for every variable.
  • Student height
  • Student age
  • Student blood pressure

4
Sources of Variation
  • Biological differences
  • Measurement errors
  • Differences in measurement technique
  • Differences in measurement conditions
  • Random variation

5
Biological Variation
  • Differences in
  • Genes,
  • Nutrition,
  • Environmental exposures.
  • Although people tend to be similar through
    genetics, they are not exactly the same.
  • In addition, differing exposures create different
    outcomes
  • i.e., tall parents tend to have tall children,
  • But malnutrition can result in shortness even
    with tall parents.

6
Disease Variation
  • Differences in the presence / absence of disease.
  • Differences in the stages of disease.
  • Differences in co-morbidity.
  • Exercise
  • Using Diabetes as an example explain how
    individuals can differ on all of the above
    factors.

7
Measurement Variation
  • Differences in conditions during measurement
  • Ambient factors (i.e., temp, noise)
  • Environment and blood pressure
  • Patient factors (i.e., fatigue, anxiety)
  • White coat syndrome and blood pressure.
  • Differences in methods of measurement
  • Instrument, technique, and operator.
  • Exercise how can weight differ according to
    instrument, technique, or operator?

8
Measurement Error
  • Differences in instrument recordings.
  • Machine error, survey typos, etc.
  • Differences in instrument observation.
  • Operator interpretation errors,
  • Observer errors,
  • These measurement variations are all systematic
  • They occur for a reason.
  • We can control for them.

9
Random Error
  • Unexplained variation.
  • Also called background noise.
  • Unsystematic differences in values
  • For unknown, random reasons.
  • We cannot control for random errors.
  • We can estimate a level of random error
  • Poll results ( or 5).

10
Random Error, continued.
  • Random error produces a distribution of values
    even if all systematic error is eliminated.
  • Statistical tests look for systematic differences
    between samples beyond the random variation
    within a sample.

11
Statistics
  • Statistical methods explain variation in data
  • Describe the pattern of spread (variation) in
    your data set,
  • Compare the pattern of two or more groups,
  • Determine whether the differences between two
    patterns are real (significant) or random (non
    significant).

12
Data
  • A variable is the characteristic
  • Skin color, blood type, age, cholesterol level.
  • The data is the value of that characteristic
  • White, pink, yellow, brown.
  • A, B, O, AB.
  • 20, 35, 50 years.
  • 136, 201, 400ml.
  • Exercise which two are qualitative data and
    which two are quantitative?

13
Types of Variables
  • Nominal
  • Naming (categorical)
  • No measurement scaling
  • Blood type A, B, AB, O
  • Dichotomous (binary)
  • Categorical but only two levels
  • Indicates a direction (normal abnormal, good -
    bad)
  • Cancer Yes / No
  • Health status Well / Sick

14
Types of Variables, cont.
  • Ordinal (ranked)
  • Naming with an order.
  • From better to worse.
  • Illness scale no illness / dizzy / nauseated /
    vomiting.
  • 1 2
    3 4
  • Test scale excellent / good / fair / poor
  • A B C
    D
  • Contains more information than nominal variables
    (ill / not ill) (passed / failed)

15
Continuous Variables
  • Measured on a scale
  • Height, weight, age, glucose level.
  • Provides even more information than ordinal
    variables
  • Shows position relative to each other,
  • Shows extent each observation differs from the
    other.
  • i.e., with age we know just how much older we are
    than the average student.

16
Units of Observation
  • Counts
  • Counts of a characteristic in persons, things.
  • Patterns presented in a frequency table (2 X 2).
  • Can compare proportions within or across groups
    (female, male).
  • Proportions (risk)
  • Number of persons with a characteristic.
  • Number of persons who died, who ill, etc.
  • Can compare the ratio of counts between groups.

17
Collapsing Data in Variables(Combining data)
  • Continuous variable my be converted to an ordinal
    variable
  • By grouping values together,
  • To form categories.
  • Are shrinking many values into a few values
    (collapsing).

18
Collapsing data, continued.
  • Disadvantage
  • Information is lost because,
  • Individual values are no longer apparent.
  • (500g through1200g) (lt501g, gt500g)
  • Advantage
  • Percentages can be created
  • Relationships easier to show.

19
Frequency Distributions
  • The number of persons with each value in a
    variable.
  • Age Distribution
  • 5 people are 22y
  • 10 people are 23y
  • 12 people are 24y
  • 15 people are 25y
  • 12 people are 26y
  • 10 people are 27y
  • 5 people are 28y

- - - 22 23 24 25 26 27 28
20
Frequency Distributions, cont.
  • Real distribution
  • The pattern obtained from the actual data in a
    population or sample population.
  • Can be one or may shaped curves.
  • Gaussian distribution
  • The distribution in a population expected
    (calculated) under normal or average conditions.
  • Is a smooth, bell-shaped curve.

21
Distributions
  • Discuss range of values histogram
  • Refer to Table 9-2 and Figure 9-2
  • Textbook page 141
  • Discuss normal distribution
  • Refer to Figure 9-3 and 9-4
  • Textbook page 142

22
Parameters of Frequency Distribution
  • Ways to summarize and define the distribution.
  • Measures of central tendency
  • Where among the values the commonest value lies.
  • Measures of dispersion
  • How widely the values are spread out.

23
Measures of Central Tendency
  • In a normal distribution
  • The density of observed values is greatest near
    the center.
  • Each tail of the curve diminishes in similar
    frequency toward zero.
  • The mean, median, and mode are located in the
    center of the bell curve.

24
Measures of Central Tendency, continued.
  • Distribution is not normal if
  • mean, medial, mode are in different locations
    on the curve.

Mean median
mode
25
Skewed Distributions
  • Refer to Figure 9-8 of text.
  • Curve is pushed (- skew) to the right (Fig.
    9-8A).
  • Curve is pushed ( skew) to the left (Fig. 9-8B).
  • Curve is abnormally peaked is leptokurtic (Fig.
    9-8C).
  • Curve is abnormally flat is platykurtic (Fig.
    9-8D).

26
Measures of Dispersion
Range of values Amount of spread in the
distribution
20 30 40 50 60
20 30 40 50 60
27
Variance
  • How far each value is from the mean.
  • How far in both directions.
  • Example Age of a sample.
  • Mean is 35 years
  • Mary is 20 yrs.
  • John is 50 yrs.
  • Mean is 35 years
  • Mary is 30 yrs.
  • John is 40 yrs.

20 35 50
30 35 40
28
How Variation is Measured
  • S2 ? (Xi µ)2
  • N - 1
  • See Box 9-3.
  • Degrees of freedom (N-1)
  • A way to make up for small sample sizes which
    through off the estimates.
  • 200 1 199 (a .05 difference)
  • 2 1 1 (a 50 difference)

29
Standard Deviation
  • Also, describes the amount of spread in the
    frequency distribution.
  • Is the square root of the variance.
  • ________
  • S v ?(xi - µ)2
  • N 1
  • Refer to Box 9-3

30
Standard Deviation, continued.
  • For normal distribution
  • 99 of values fall between 2.5 SD.
  • 95 of values fall between 1.96 SD.
  • 68 of values fall between 1.0 SD.
  • Refer to Figure 9-6.
  • Symbols
  • µ mean of the population (theoretical).
  • s standard deviation of population.
  • _
  • X sample mean, s sample std deviation.

31
Normalized Dataset
  • Choice of units effect mean and std dev.
  • Cant compare two groups using two different
    measuring units.
  • For example
  • Weight of a group of elephants (µ 5000lbs).
  • Weight of a group of mice (µ 5 mg).
  • To compare we have to equalize both into the same
    units.

32
Normalized Data, continued
  • To eliminate the effects produced by the choice
    of units
  • Data is put into a unit-free form (normalized).
  • Calculate a z-score
  • Z distributions have a mean 0 and sd 1.
  • Values in terms of how many std deviations each
    value is away from the mean.

Zi xi µ sd from the group mean.
s
33
Assumptions of Normal Distributions
  • In order to test whether a sample differs from
    the population its drawn from
  • The data must meet some assumptions
  • That the values are normally distributed,
  • The sample size is sufficiently large,
  • Bias and error are minimized.
  • When these assumptions are met, you may conduct
    inferential analysis.

34
Effect of Error on Distributions
  • Random error effects the distribution differently
    than systematic error.
  • How well we can compare two distributions
  • How accurate our conclusions are,
  • Depend on how much error is in the data.
  • Random error does not effect the group average
    (regardless of the value)
  • Systematic error does effect the average and can
    really through off the estimates.

35
(No Transcript)
36
(No Transcript)
37
Reducing Measurement Error
  • Pilot test instruments.
  • Train the data collectors.
  • Double check or verify data entries.
  • Use multiple instrument measures.
  • Consult statistician about adjusting for
    measurement error.

38
Nonparametric Distributions
  • Data from categorical (nominal ordinal)
    variables are not normally distributed.
  • Statistical tests based on assumptions of the
    normal curve cannot be used.
  • Their parameters are not the mean and standard
    deviation.
  • Their tests do not require that the data follow a
    particular distribution.
  • There are special tests called nonparametric
    statistics.

39
Nonparametric Distributions
  • The nonparametric distribution
  • Is not normal,
  • Is based on a distribution of counts,
  • There are special tests for nonparametric data,
  • These tests only require that the data be
    ordinal.

40
Summary
  • Fundamental to any analysis (descriptive or
    inferential) is
  • Understanding your variables and the type of data
    they are,
  • Continuous data is normally distributed and has
    two parameters
  • Measures of central tendency (mean)
  • Measures of dispersion (variance)

41
Summary, continued
  • The normal distribution has
  • Mean, median, mode that coincide.
  • 95 of observations are within 1.96 standard
    deviations from the mean.
  • Some distributions are skewed
  • Outliers pull the mean away from the center.
  • Non-normal distributions require special
    nonparametric statistical tests.
Write a Comment
User Comments (0)
About PowerShow.com