Classification of Variables - PowerPoint PPT Presentation

About This Presentation
Title:

Classification of Variables

Description:

an inner box that shows the numbers which span the range from Q1 Box-and-Whisker Plot to Q3. ... The 'whiskers' are lines drawn from Q1 to the minimum vale, and ... – PowerPoint PPT presentation

Number of Views:147
Avg rating:3.0/5.0
Slides: 52
Provided by: Rafael116
Learn more at: https://www.ux1.eiu.edu
Category:

less

Transcript and Presenter's Notes

Title: Classification of Variables


1
Classification of Variables
  • Discrete Numerical Variable
  • A variable that produces a response that comes
    from a counting process.

2
Classification of Variables
  • Continuous Numerical Variable
  • A variable that produces a response that is the
    outcome of a measurement process.

3
Classification of Variables
  • Categorical Variables
  • Variables that produce responses that belong to
    groups (sometimes called classes) or categories.

4
Measurement Levels
  • Nominal and Ordinal Levels of Measurement refer
    to data obtained from categorical questions.
  • A nominal scale indicates assignments to groups
    or classes.
  • Ordinal data indicate rank ordering of items.

5
Frequency Distributions
  • A frequency distribution is a table used to
    organize data. The left column (called classes
    or groups) includes numerical intervals on a
    variable being studied. The right column is a
    list of the frequencies, or number of
    observations, for each class. Intervals are
    normally of equal size, must cover the range of
    the sample observations, and be non-overlapping.

6
Construction of a Frequency Distribution
  • Rule 1 Intervals (classes) must be inclusive
    and non-overlapping
  • Rule 2 Determine k, the number of classes
  • Rule 3 Intervals should be the same width, w
    the width is determined by the following
  • Both k and w should be rounded upward, possibly
    to the next largest integer.

7
Construction of a Frequency Distribution
  • Quick Guide to Number of Classes for a Frequency
    Distribution
  • Sample Size Number of Classes
  • Fewer than 50 5 6 classes
  • 50 to 100 6 8 classes
  • over 100 8 10 classes

8
Cumulative Frequency Distributions
  • A cumulative frequency distribution contains the
    number of observations whose values are less than
    the upper limit of each interval. It is
    constructed by adding the frequencies of all
    frequency distribution intervals up to and
    including the present interval.

9
Relative Cumulative Frequency Distributions
  • A relative cumulative frequency distribution
    converts all cumulative frequencies to cumulative
    percentages

10
Histograms and Ogives
  • A histogram is a bar graph that consists of
    vertical bars constructed on a horizontal line
    that is marked off with intervals for the
    variable being displayed. The intervals
    correspond to those in a frequency distribution
    table. The height of each bar is proportional to
    the number of observations in that interval.

11
Histograms and Ogives
  • An ogive, sometimes called a cumulative line
    graph, is a line that connects points that are
    the cumulative percentage of observations below
    the upper limit of each class in a cumulative
    frequency distribution.

12
Histogram and Ogive for Example 2.1
13
Stem-and-Leaf Display
  • A stem-and-leaf display is an exploratory data
    analysis graph that is an alternative to the
    histogram. Data are grouped according to their
    leading digits (called the stem) while listing
    the final digits (called leaves) separately for
    each member of a class. The leaves are displayed
    individually in ascending order after each of the
    stems.

14
Stem-and-Leaf Display
Stem-and-Leaf Display for Gilottis Deli Example
15
Tables- Bar and Pie Charts -
Frequency and Relative Frequency Distribution for
Top Company Employers Example
16
Tables- Bar and Pie Charts -
Figure 2.9 Bar Chart for Top Company Employers
Example
17
Tables- Bar and Pie Charts -
Figure 2.10 Pie Chart for Top Company Employers
Example
18
Pareto Diagrams
  • A Pareto diagram is a bar chart that displays the
    frequency of defect causes. The bar at the left
    indicates the most frequent cause and bars to the
    right indicate causes in decreasing frequency. A
    Pareto diagram is use to separate the vital few
    from the trivial many.

19
Line Charts
  • A line chart, also called a time plot, is a
    series of data plotted at various time intervals.
    Measuring time along the horizontal axis and the
    numerical quantity of interest along the vertical
    axis yields a point on the graph for each
    observation. Joining points adjacent in time by
    straight lines produces a time plot.

20
Line Charts
21
Parameters and Statistics
  • A statistic is a descriptive measure computed
    from a sample of data.
  • A parameter is a descriptive measure computed
    from an entire population of data.

22
Measures of Central Tendency- Arithmetic Mean -
  • The arithmetic mean of a set of data is the sum
    of the data values divided by the number of
    observations.

23
Sample Mean
  • If the data set is from a sample, then the sample
    mean, , is

24
Population Mean
  • If the data set is from a population, then the
    population mean, ? , is

25
Measures of Central Tendency- Median -
  • An ordered array is an arrangement of data in
    either ascending or descending order. Once the
    data are arranged in ascending order, the median
    is the value such that 50 of the observations
    are smaller and 50 of the observations are
    larger.

26
Measures of Central Tendency- Median -
  • If the sample size n is an odd number, the
    median, Xm, is the middle observation. If the
    sample size n is an even number, the median, Xm,
    is the average of the two middle observations.
    The median will be located in the 0.50(n1)th
    ordered position.

27
Measures of Central Tendency- Mode -
  • The mode, if one exists, is the most frequently
    occurring observation in the sample or
    population.

28
Shape of the Distribution
  • The shape of the distribution is said to be
    symmetric if the observations are balanced, or
    evenly distributed, about the mean. In a
    symmetric distribution the mean and median are
    equal.

29
Shape of the Distribution
  • A distribution is skewed if the observations are
    not symmetrically distributed above and below the
    mean. A positively skewed (or skewed to the
    right) distribution has a tail that extends to
    the right in the direction of positive values. A
    negatively skewed (or skewed to the left)
    distribution has a tail that extends to the left
    in the direction of negative values.

30
Shapes of the Distribution
31
Measures of Central Tendency - Geometric Mean -
  • The Geometric Mean is the nth root of the product
    of n numbers
  • The Geometric Mean is used to obtain mean growth
    over several periods given compounded growth from
    each period.

32
Measures of Variability- The Range -
  • The range is in a set of data is the difference
    between the largest and smallest observations

33
Measures of Variability- Sample Variance -
  • The sample variance, s2, is the sum of the
    squared differences between each observation and
    the sample mean divided by the sample size minus
    1.

34
Measures of Variability- Short-cut Formulas for
s2
  • Short-cut formulas for the sample variance, s2,
    are

35
Measures of Variability- Population Variance -
  • The population variance, ?2, is the sum of the
    squared differences between each observation and
    the population mean divided by the population
    size, N.

36
Measures of Variability- Sample Standard
Deviation -
  • The sample standard deviation, s, is the positive
    square root of the variance, and is defined as

37
Measures of Variability- Population Standard
Deviation-
  • The population standard deviation, ?, is

38
The Empirical Rule(the 68, 95, or almost all
rule)
  • For a set of data with a mound-shaped histogram,
    the Empirical Rule is
  • approximately 68 of the observations are
    contained with a distance of one standard
    deviation around the mean ?? 1?
  • approximately 95 of the observations are
    contained with a distance of 2 standard
    deviations around the mean ?? 2?
  • almost all of the observations are contained with
    a distance of three standard deviation around the
    mean ?? 3?

39
Coefficient of Variation
  • The Coefficient of Variation, CV, is a measure of
    relative dispersion that expresses the standard
    deviation as a percentage of the mean (provided
    the mean is positive).
  • The sample coefficient of variation is

40
Coefficient of Variation
  • The population coefficient of variation is

41
Percentiles and Quartiles
  • Data must first be in ascending order.
    Percentiles separate large ordered data sets into
    100ths. The Pth percentile is a number such that
    P percent of all the observations are at or below
    that number.
  • Quartiles are descriptive measures that separate
    large ordered data sets into four quarters.

42
Percentiles and Quartiles
  • The first quartile, Q1, is another name for the
    25th percentile. The first quartile divides the
    ordered data such that 25 of the observations
    are at or below this value. Q1 is located in the
    .25(n1)st position when the data is in ascending
    order. That is,

43
Percentiles and Quartiles
  • The third quartile, Q3, is another name for the
    75th percentile. The first quartile divides the
    ordered data such that 75 of the observations
    are at or below this value. Q3 is located in the
    .75(n1)st position when the data is in ascending
    order. That is,

44
Interquartile Range
  • The Interquartile Range (IQR) measures the spread
    in the middle 50 of the data that is the
    difference between the observations at the 25th
    and the 75th percentiles

45
Five-Number Summary
  • The Five-Number Summary refers to the five
    descriptive measures minimum, first quartile,
    median, third quartile, and the maximum.

46
Box-and-Whisker Plots
  • A Box-and-Whisker Plot is a graphical procedure
    that uses the Five-Number summary.
  • A Box-and-Whisker Plot consists of
  • an inner box that shows the numbers which span
    the range from Q1 Box-and-Whisker Plot to Q3.
  • a line drawn through the box at the median.
  • The whiskers are lines drawn from Q1 to the
    minimum vale, and from Q3 to the maximum value.

47
Box-and-Whisker Plots (Excel)
48
Grouped Data Mean
  • For a population of N observations the mean is
  • Where the data set contains observation values
    m1, m2, . . ., mk occurring with frequencies f1,
    f2, . . . fK respectively

49
Grouped Data Mean
  • For a sample of n observations, the mean is
  • Where the data set contains observation values
    m1, m2, . . ., mk occurring with frequencies f1,
    f2, . . . fK respectively

50
Grouped Data Variance
  • For a population of N observations the variance
    is

Where the data set contains observation values
m1, m2, . . ., mk occurring with frequencies f1,
f2, . . . fK respectively
51
Grouped Data Variance
  • For a sample of n observations, the variance is
Write a Comment
User Comments (0)
About PowerShow.com