Science of Statistics - PowerPoint PPT Presentation

1 / 30
About This Presentation
Title:

Science of Statistics

Description:

Science of Statistics. Descriptive Statistics. methods of summarizing or describing a set of data. tables, graphs, numerical summaries Inferential Statistics – PowerPoint PPT presentation

Number of Views:70
Avg rating:3.0/5.0
Slides: 31
Provided by: uky119
Learn more at: https://www.uky.edu
Category:

less

Transcript and Presenter's Notes

Title: Science of Statistics


1
Science of Statistics
  • Descriptive Statistics
  • methods of summarizing or describing a set of
    data
  • tables, graphs, numerical summaries
  •  
  • Inferential Statistics
  • methods of making inference about a population
    based on the information in a sample

2
Variables
  • Individuals are the objects described by a set of
    data may be people, animals or things
  • Variable is any characteristic of an individual

3
Statistical Data
  • What purpose do the data have?
  • Individuals Describe? How many?
  • Variables How many? Definition? Unit of
    measurement?

4
Types of Variables
  • Categorical variable places an individual into
    one of several groups or categories
  • Quantitative variable takes numerical values for
    which arithmetic operations make sense
  • Distribution of a variable tells us what values
    it takes and how often it takes these values

5
Exploratory Data Analysis
  • Examine each variable by itself then
    relationships among the variables
  • Start with graphs then add numerical summaries
    of specific aspects of the data

6
Levels of Measurement
  • Nominal
  • Ordinal
  • Interval
  • Ratio
  • It's important to recognize that there is a
    hierarchy implied in the level of measurement
    idea. At each level up the hierarchy, the current
    level includes all of the qualities of the one
    below it and adds something new. In general, it
    is desirable to have a higher level of
    measurement.

7
In nominal measurement the numerical values just
"name" the attribute uniquely. No ordering of the
cases is implied. For example, jersey numbers
in basketball are measures at the nominal level.
Is a player with number 30 more of anything than
a player with number 15?
8
In ordinal measurement the attributes can be
rank-ordered. Here, distances between attributes
do not have any meaning. For example, on a
survey you might code Educational Attainment as
0less than H.S. 1some H.S. 2H.S. degree
3some college 4college degree 5post college.
In this measure, higher numbers mean more
education. But is distance from 0 to 1 same as 3
to 4?
9
In interval measurement the distance between
attributes does have meaning. For example, when
we measure temperature (in Fahrenheit), the
distance from 30-40 is same as distance from
70-80. The interval between values is
interpretable. Because of this, it makes sense to
compute an average of an interval variable, where
it doesn't make sense to do so for ordinal
scales. Do ratios make sense at this level? For
example, is it twice as hot at 80 degrees as it
is at 40 degrees?
10
Finally, in ratio measurement there is always an
absolute zero that is meaningful. This means that
you can construct a meaningful ratio. Weight is
a ratio variable. In applied social research most
"count" variables are ratio. Is number of
clients in past six months ratio? Why?
11
Describing Graphically
  • Bar Graph count or percent
  • Pie Chart parts of the whole
  • Stem Plot shape of distribution
  • Histogram great when lots of groups
  • Frequency Table

12
Time Plots
  • Time Series measurements of a variable taken at
    regular intervals over time
  • Residual Plots checking assumptions
  • Trends, such as seasonal variation

13
Outliers
  • Extreme Values
  • What do you do with outliers?
  • Ignore them
  • Throw them out
  • ?

14
Graphical Examples
  • Lets Take a Look

15
Choosing a Summary
  • The five-number summary is usually better than
    the mean and standard deviation for describing a
    skewed distribution or a distribution with strong
    outliers.
  • Use the mean and standard deviation for
    reasonably symmetric distributions that are free
    of outliers.

16
Describing Distributions with Numbers
  • Mean simple average
  • is sensitive to extreme scores
  • not necessarily a possible value
  • To calculate add the values and divide by the
    number of items

17
  • Median middle score
  • not sensitive to extreme scores
  • To Calculate
  • rank data from smallest to largest
  • if n is odd, median is the middle score
  • if n is even, median is the average of two middle
    scores

18
  • Mode most frequent score
  • does not always exist
  • unstable
  • can be used with qualitative data

19
Measures of Dispersion (Variability)
  • Range
  • totally sensitive to extreme scores
  • easy to compute
  • To Calculate high score low score

20
  • Variance measures squared distances from the
    mean
  • large values of suggest large variability
  • Standard Deviation square root of the variance

21
Empirical Rule
  • Should be used for mound shape data
  • approx. 68 of the data fall between mean /- SD
  • approx. 95 of the data fall between mean /- 2
    SD
  • approx. 99.7 of the data fall between mean /- 3
    SD

22
Lets give it a try
  • Lets use faculty experience.
  • Why?
  • What should we do with it?

23
Quartiles and 5-Number Summary
  • Quartiles divide ordered numerical data into four
    equally sized parts.
  • 1st quartile, Q1, 25 below and 75 above
  • 2nd quartile, Q2, median, 50 below and 50 above
  • 3rd quartile, Q3, 75 below and 25 above
  • The low score, Q1, Q2, Q3, and the high score are
    known as the five number summary of a data set.

24
BoxPlots
  • Particularly helpful in comparing 2 or more
    groups
  • Box shows central 50 of data and the median
  • Whiskers show extremes

25
Lets give it a try
  • Lets use the in the pocket data.
  • Why?
  • What should we do with it?

26
1.5 X IQR Criterion
  • Interquartile Range is the distance between the
    1st and 3rd quartiles
  • Call an observation a suspected outlier if it
    falls more than 1.5 X IQR above the 3rd quartile
    or below the 1st quartile
  • Example on page 46

27
Normal Distributions
  • Density Curve can often describe the overall
    pattern of a distribution
  • Total area of 1 under the curve
  • Areas under the curve are relative frequencies
  • The mean, median, and quartiles can be eyed on
    a density curve.

28
Normal Distributions
  • Bell-shaped, symmetric, unimodal curve
  • The mean and standard deviation completely
    specify the normal distribution
  • Mean is the center of symmetry
  • SD is the distance from the mean to the change of
    curvature points

29
Standardizing Observations
  • The Z-score of an observation gives the of
    standard deviations it is above or below the mean

30
Standard Normal
  • Standard Normal is a special case of the normal
    where N(0,1)
  • Lets do some examples. We will need to use
    Table A.
Write a Comment
User Comments (0)
About PowerShow.com