STATISTICS FOR DESCRIBING, EXPLORING, AND COMPARING DATA - PowerPoint PPT Presentation

1 / 53

About This Presentation

Title:

STATISTICS FOR DESCRIBING, EXPLORING, AND COMPARING DATA

Description:

Descriptive statistics methods used to summarize or describe the important ... number summary for the bear data, and use this information to draw a boxplot of ... – PowerPoint PPT presentation

Number of Views:65

Avg rating:3.0/5.0

Slides: 54

Provided by: wild6

Category:

more less

Transcript and Presenter's Notes

Title: STATISTICS FOR DESCRIBING, EXPLORING, AND COMPARING DATA

1
STATISTICS FOR DESCRIBING, EXPLORING, AND
COMPARING DATA
2
Overview
There are way too many numbers. The world would
be a better place if we lost half of them --
starting with 8. I've always hated 8. Homer J.
Simpson
3
Two Branches of Statistics

Descriptive statistics methods used to
summarize or describe the important
characteristics of a set of data.
Inferential statistics methods used with sample
data to make inferences (or generalizations)
about a population.

4
Measures of Center
5
Definition

A measure of center is a value at the center or
middle of a data set.

6
Mean

The arithmetic mean of a set of values is the
measure of center found by adding the values and
dividing the total by the number of values. It is
referred to simply as the mean.

7
Notation

denotes the sum of a set of values.
x is the variable usually used to
represent the individual data
values.
n represents the number of values in a
sample.
N represents the number of values in a
population.
is the mean of a set of sample
values.
is the mean of all values in a
population.

8
Median

The median of a data set is the measure of center
that is the middle value when the original data
values are arranged in order of increasing (or
decreasing) magnitude. The median is often
denoted by .

9
Mode

The mode of a data set is the value that occurs
most frequently.

10
Midrange

The midrange is the measure of center that is the
value midway between the maximum and minimum
values in the original data set. It is found by
adding the maximum data value to the minimum data
value and then dividing the sum by 2, that is,

11
Round-Off Rule

A simple rule for rounding answers is this
Carry one more decimal place than is present in
the original set of values.

12
Example

Use Data Set 6 Bears, find the four measures of
center for the weights of the bears in the sample.

13
The Best Measure of Center

14
Weighted Mean

A weighted mean of x values computed with the
different values assigned different weights,
denoted by w, as given in In particular, the
mean of a frequency distribution can be found
using

15
Example

Use the frequency distribution to estimate the
mean length of the bears in the sample.

16
Example (continued)
17
Skewness and Symmetry

A distribution of data is skewed if it is not
symmetric and extends more to one side than the
other. A distribution of data is symmetric if the
left half of its histogram is roughly a mirror
image of its right half.

18
Measures of Variation
19
Range

The range of a set of data is the difference
between the maximum value and the minimum value.

20
Measuring Deviation

Find the mean for each the following two sets of
data

21
Measuring Deviation

Calculate the total deviation for each of the two
sets of data

22
Measuring Deviation

Calculate the total deviation for each of the two
sets of data

23
Standard Deviation

The standard deviation of a set of sample values
is a measure of variation of value about the
mean. It is a type of average deviation of values
from the mean that is calculated by

24
Standard Deviation of a Population

The standard deviation of a population is
calculated by

25
Variance of a Sample and Population

The variance of a set of values is a measure of
variation equal to the square of the standard
deviation.
Sample variance
Population variance

26
Notation

Sample standard deviation s
Sample variance
Population standard deviation
Population variance

Note Articles in professional journals and
reports often use SD for standard deviation and
VAR for variance.
27
Round-Off Rule

We use the same round-off rule given in the
previous section
Carry one more decimal place than is present in
the original set of values.
Round only the final answer, not in the middle of
a calculation. (If it becomes absolutely
necessary to round in the middle, carry at least
twice as many decimal places as will be used in
the final answer.

28
Example

Use Data Set 6 Bears, find the three measures of
variation for the weights of the bears in the
sample.

29
Range Rule of Thumb

For Estimating a Value of the Standard Deviation
s To roughly estimate the standard deviation
from a collection of know sample data,
usewhere

30
Range Rule of Thumb

For Interpreting a Known Value of the Standard
Deviation If the standard deviation is known,
use it to find rough estimates of the minimum and
maximum usual sample values by using the
following

31
Example

Use the Range Rule of Thumb to find the maximum
and minimum usual values for the weights our
bears.

32
Empirical (or 68-95-99.7) Rule for Data with a
Bell-Shaped Distribution

Another rule that is helpful in interpreting
values for a standard deviation is the empirical
rule. This rule states that for data sets having
a distribution that is approximately bell-shaped,
the following properties apply.
About 68 of all values fall within 1 standard
deviation of the mean.
About 95 of all values fall within 2 standard
deviations of the mean.
About 99.7 of all values fall within 3 standard
deviations of the mean.

33
Empirical (or 68-95-99.7) Rule for Data with a
Bell-Shaped Distribution

34
Chebyshevs Theorem

The proportion (or fraction) of any set of data
lying within K standard deviations of the mean
is always at least
, where K is any
positive number greater than 1.

35
Coefficient of Variation

The coefficient of variation (or CV) for a set of
nonnegative sample or population data, expressed
as a percent, describes the standard deviation
relative to the mean, and is given by the
following Sample
Population

36
Measures of Relative Standing
37
z Scores

A z score (or standardized value), is the number
of standard deviations that a given value x is
above or below the mean. It is found by using the
following expressions Sample
Population(Round z to two decimal
places.)

38
Interpreting z Scores

Ordinary Values
Unusual values

39
Percentiles

The percentile that corresponds to a particular
value x is given by

40
Percentiles

Notation
n total number of values in the data set
k percentile being used
L locator that gives the position of a value
Pk kth percentile

41
Percentiles

42
Quartiles

Q1 (First Quartile) Separates the bottom 25
from the top 75.
Q2 (Second Quartile) Same as the median
separates the bottom 50 from the top 50.
Q3 (Third Quartile) Separates the bottom 75
from the top 25.

43
Example

Use the bear data to find
the percentile corresponding to a length of 63.5
in,
the length corresponding to the 25th percentile,
the length corresponding to the first quartile.

44
Interquartile Range (IQR)

The interquartile range (IQR)is given by

45
Exploratory Data Analysis (EDA)
46
Exploratory Data Analysis (EDA)

Exploratory data analysis is the process of using
statistical tools (such as graphs, measures of
center, measures of variation) to investigate
data sets in order to understand their important
characteristics.

47
Outliers

Informally, an outlier is a value that is located
very far away from almost all other values.
An outlier can have a dramatic effect on the
mean.
An outlier can have a dramatic effect on the
standard deviation.
An outlier can have a dramatic effect on the
scale of the histogram so the true nature of the
distribution is totally obscured.

48
Boxplots

For a set of data, the 5-number summary consists
of the minimum value, the first quartile Q1, the
median (or second quartile Q2), the third
quartile Q3, and the maximum value.
A boxplot (or box-and-whisker diagram) is a graph
of a data set that consists of a line extending
from the minimum value to the maximum value, and
a box with lines drawn at the first quartile Q1,
the median, and the third quartile Q3.

49
Example

Find a five number summary for the bear data, and
use this information to draw a boxplot of the
lengths of the bears.

50
Example (continued)
51
Outliers

More formally, a data value is an outlier if it
is
above Q3 by an amount greater than 1.5 x IQR, or
below Q1 by an amount greater than 1.5 x IQR

52
Modified Boxplot

A modified boxplot is a boxplot constructed with
these modifications
A special symbol (such as an asterick) is used to
identify outliers as defined here, and
the solid horizontal line extends only as far as
the minimum data value that is no an outlier and
the maximum data value that is not an outlier.

53
Example