Lecture 1 - PowerPoint PPT Presentation

About This Presentation
Title:

Lecture 1

Description:

Lecture 1 – PowerPoint PPT presentation

Number of Views:73
Slides: 90
Provided by: huraish

less

Transcript and Presenter's Notes

Title: Lecture 1


1
INEN 270
  • ENGINEERING STATISTICS
  • Fall 2011

2
Agenda
Purpose
Statistical Concepts
Descriptive Statistics and Some of Their Graphs
Inferential Statistics
Lecture Summary
3
5
Lecture 1 Introduction to Statistics
Purpose
Statistical Concepts
Descriptive Statistics and Some of Their Graphs
Inferential Statistics
Lecture Summary
4
Why Study Statistics?
  • You need to know how to evaluate published
    numerical facts.
  • Your profession or employment may require you to
    interpret the results of sampling or to employ
    statistical methods of analysis to make
    inferences in your work.

5
What Is the Purpose of Statistics?
  • One purpose of statistics is to make sense of
    your data.
  • Statistics provide information about your data so
    you can answer questions and make informed
    business decisions.

6
(No Transcript)
7
5
Lecture 1 Introduction to Statistics
Purpose
Statistical Concepts
Descriptive Statistics and Some of Their Graphs
Inferential Statistics
Lecture Summary
8
Objectives
  • Explain use of statistics.
  • Define population and sample.
  • Describe processes involved in statistical
    analysis.
  • Compare descriptive and inferential statistics.
  • Discuss the sampling plan.

9
Defining the Problem
  • Before you begin any analysis, you should
    complete certain tasks.
  • 1. Outline the purpose of the study.
  • 2. Document the study questions.
  • 3. Define the population of interest.
  • 4. Determine the need for sampling.
  • 5. Define the data collection protocol.

10
Example Speeding Data
11
Population and Sample
12
Basic Definition
  • STATISTICS Area of science concerned with
    extraction of information from numerical data and
    its use in making inference about a population
    from data that are obtained from a sample.

13
Extract Information
?
Population (set of all measurements)
Sample (set of measurements selected from the
population)
?
Make Inference
14
Basic Definition
  • Population and Parameter
  • Population set representing all measurements of
    interest to the investigator.
  • Parameters an unknown population characteristic
    of interest to the investigator.
  • Sample and Statistic
  • Sample subset of measurements selected from the
    population of interest.
  • Statistic a sample characteristic of interest to
    the investigator.
  • Descriptive Statistics
  • Center of location mean, median, mode
  • Variability variance, standard deviation
  • Distribution

15
Examples of Population and Sample
  • Selecting the proper diet for shrimp or other sea
    animals is an important aspect of sea farming. A
    researcher wishes to estimate the average weight
    of shrimp maintained on a specific diet for a
    period of 6 months. One hundred shrimp are
    randomly selected from an artificial pond and
    each is weighed.
  • Identify the population
  • Identify the sample
  • Identify the parameter
  • Identify the statistic

16
Simple Random Sampling
17
Convenience Sampling
18
Process of Statistical Data Analysis
Population
RandomSample
Make Inferences
Describe
SampleStatistics
19
Sampling Plan
20
(No Transcript)
21
5
Lecture 1 Introduction to Statistics
Purpose
Statistical Concepts
Descriptive Statistics and Some of Their Graphs
Inferential Statistics
Lecture Summary
22
Objectives
  • Compute and interpret statistics describing the
    location of a set of values, such as the mean and
    median and mode.
  • Compute and interpret statistics describing the
    variability in a set of values, such as the range
    and standard deviation.
  • Compute and interpret the measures of shape,
    skewness and kurtosis.
  • Produce graphical displays of data.

23
Some Frequently Used Statistics and Parameters
24
Measure of Location
  • Descriptive statistics that locate the center of
    your data are called measures of central tendency
  • Sample Mean
  • The sample mean of a set of n measurements (x1,
    x2,xn) is equal to the sum of the measurements
    divided by n.

25
Measure of Location
  • Sample Median
  • Median the middle value (also known as the
    50th percentile)
  • The median of a set of n measurements (x1,
    x2,xn) is the value that falls in the middle
    position when the measurements are ordered from
    the smallest to the largest.
  • x1,xn are arranged in increasing order
    of magnitude

26
RULE FOR CALCULATING THE MEDIAN
  • 1. Order the measurements from the smallest to
    the largest.
  • 2. A) If the sample size is odd, the median is
    the middle measurement.
  • B) If the sample size is even, the median is
    the average of the two middle measurements.

27
(No Transcript)
28
Percentiles
29
Example
  • A random sample of six values were
  • taken from a population. These values were
  • x17, x21, x310, x48, x54, and x612.
  • What are the sample mean and
  • sample median for these data?

30
Sample Mean
31
(No Transcript)
32
CALCULATIONS FOR THE SAMPLE MEDIAN
1. Order Sample
2.Median
33
  • 1. Order Sample

x21, x54, x17, x48, x310, x612
MEDIAN ( 7 8 ) / 2 7.5
34
Example
  • Given a set of data 1.7, 2.2, 3.11, 3.9, and
    14.7
  • Sample mean
  • Sample median

35
(No Transcript)
36
Example
Consider the following sample 4 18 36
39 41 42 43 44 44 45 46 47
48 49 49 50 51 53 54 60
Which measure of central tendency best describes
the central location of the data THE SAMPLE
MEAN OR SAMPLE MEDIAN? Why?
37
the median
38
  • Why?
  • Because there is an outlier (extreme value),4 in
    the data set, the mean is heavily influenced by
    this single outlier.
  • Solution
  • Trimmed meandrops the highest and lowest extreme
    values and averages the rest.
  • e.g. 5 trimmed mean drops the highest and lowest
    5 and averages the rest.

39
Sample Mode
  • Sample Mode
  • What is the mode for the previous example?
  • 44 (occurs twice)
  • 49 (occurs twice)

40
Measures of Central Tendency (Mode, mean and
median)
  • How are they related to a given data set?
  • Depending on the skewness of the population

(a) A bell-shaped distribution
41
(b) A distribution skewed to the left
(c) A distribution skewed to the right
A mean B median C mode
A mode B median C mean
42
  • Suppose IRS wants to measure the central tendency
    of the income of the American population, which
    measure will you recommend and why?
  • Hint Bill Gates
  • Skewed to the right

43
Other Measures of Locations
  • Trimmed means
  • Computed by trimming away a certain percent of
    both the largest and smallest set of values.
  • Less sensitive to outliers than the mean but
    more-so than the median.
  • What is the relationship between trimmed mean and
    the median?
  • Example 0.32 0.53 0.28 0.37
    0.47 0.43 0.36 0.42 0.38 0.43

44
0.32 0.53 0.28 0.37 0.47 0.43 0.36 0.42 0.38 0
.43
0.28 0.32 0.36 0.37 0.38 0.42 0.43
0.43 0.47 0.53
45
The Spread of a Distribution Variation
Measure Definition
range the difference between the maximum and minimum data values
interquartile range the difference between the 25th and 75th percentiles (IR or IQR)
variance a measure of dispersion of the data around the mean
standard deviation a measure of dispersion expressed in the same units of measurement as your data (the square root of the variance)
coefficient of variation standard deviation as a percentage of of the mean
46
Typical Variation Standard Deviation
  • The variance is a measure of variation. The
    square root of the variance, or standard
    deviation, is a measure of variation in terms of
    the original linear scale.
  • is the population standard
    deviation
  • is an estimate of the population standard
    deviation.

47
Typical Variation Average Squared Deviation
  • Consider the data 3, 4, 8

Obs Data Deviation (Deviation)2
1 3 -2 4
2 4 -1 1
3 8 3 9
Sum 15 0 14
Average 5 0 14/3
48
Measures of Variability
  • Sample Range
  • XMax-XMin
  • Sample Variance
  • Sample Standard Deviation

49
Obs.
Obs.
1 7 49 2 1
1 3 10 100 4 8 64 5
4 16 6 12 144
1 7 0 0 2 1 -6
36 3 10 3 9 4 8 1
1 5 4 -3 9 6
12 5 25
42
374
80
50
Sample Variance
51
Unbiased Estimate of Population Variance
  • Calculate the unbiased estimate of population
    variance by averaging with n-1 instead of n.
  • This estimator is unbiased because, on average,
    it equals the population variance.

52
Discrete and Continuous Data
  • Discrete Data
  • Counted of defective items, of accidents
  • Continuous Data
  • Measured all possible heights, weights,
    distance,etc.

53
Distributions
  • When you examine the distribution of values for
    speed, you can determine
  • the range of possible data values
  • the frequency of data values
  • whether the data values accumulate in the middle
    of the distribution or at one end.

54
Graphical Methods and Data Description
  • Stem and Leaf Plot
  • Relative Frequency distribution
  • Relative Frequency Histogram

55
Construction of a Stem-Leaf Display
  • List the stem values, in order, in a vertical
    column
  • Draw a vertical line to the right of the stem
    values
  • For each observation, record the leaf portion of
    the observation in the row corresponding to the
    appropriate stem
  • Reorder the leaves from the lowest to highest
    within each stem row
  • If the number of leaves appearing in each stem is
    too large, divide the stems into two groups, the
    first corresponding to leaves 0 through 4, and
    the second corresponding to leaves 5 through 9.
    (This subdivision can be increased to five groups
    if necessary).

56
Car Battery Life
2.2 4.1 3.5 4.5 3.2 3.7 3.0 2.6
3.4 1.6 3.1 3.3 3.8 3.1 4.7 3.7
2.5 4.3 3.4 3.6 2.9 3.3 3.9 3.1
3.3 3.1 3.7 4.4 3.2 4.1 1.9 3.4
4.7 3.8 3.2 2.6 3.9 3.0 4.2 3.5
57
Stem and Leaf Plot of Battery Life
58
Double-Stem and Leaf Plot of Battery Life
59
Relative Frequency Distribution
  • Group data into different classes or intervals
  • Counting leaves belonging to each stem
  • Each stem defines a class interval
  • Divide each class frequency by the total number
    of observations, we obtain the proportion of the
    set of observations in each of the classes.

60
Relative Frequency Distribution of Battery Life
Class Interval Class midpoint Frequency, f Relative frequency
1.5-1.9 1.7 2 0.05
2.0-2.4 2.2 1 0.025
2.5-2.9 2.7 4 0.100
3.0-3.4 3.2 15 0.375
3.5-3.9 ? ? ?
4.0-4.4 ? ? ?
4.5-4.9 ? ? ?
61
Class Interval Class midpoint Frequency, f Relative frequency
1.5-1.9 1.7 2 0.05
2.0-2.4 2.2 1 0.025
2.5-2.9 2.7 4 0.100
3.0-3.4 3.2 15 0.375
3.5-3.9 3.7 10 0.250
4.0-4.4 4.2 5 0.125
4.5-4.9 4.7 3 0.075
62
Relative Frequency Histogram of Battery Life
63
Picturing Distributions Histogram
  • Each bar in the histogram represents a group of
    values (a bin).
  • The height of the bar is the percent of values in
    the bin.

PERCENT
Bins
64
Measures of Shape Skewness
65
Measures of Shape Kurtosis
66
Data Displays and Graphical Methods
  • Box and Whisker Plot or Boxplot
  • Pth Percentile
  • The Pth Percentile is the value Xp such that p
    of the measurements will fall below that value
    and (100-p) of the measurements will fall above
    the value.
  • Quartile
  • Quartiles divide the measurements into four parts
    such that 25 of the measurements are contained
    in each part. The first quartile (Lower
    Quartile) is denoted by Q1, the second by Q2, and
    the third (Upper Quartile) by Q3.

P
(100-P)
Xp
Q1
Q2
Q3
67
  • InterQuartile Range (IQR)
  • IQRQ3-Q1
  • Outlier
  • Observations that are considered to be unusually
    far removed from the bulk of the data.
  • We label the observations as outliers when the
    distance from the box exceeds 1.5 times the
    interquartile range (in either direction).
  • Box encloses the interquartile range of the data
  • Whiskers show the extreme observations in the
    sample.

68
Box and Whiskers Plot or Boxplot
  • Calculating Fence Values
  • Lower Inner Fence
  • Q1-1.5(IQR)
  • Upper Inner Fence
  • Q31.5(IQR)
  • Lower Outer Fence
  • Q1-3(IQR)
  • Upper Outer Fence
  • Q33(IQR)

Maximum
Upper Quartile
Median
Lower Quartile
Minimum
69
A Quick Method
  • 1. Order the data from smallest to largest value.
  • 2. Divide the ordered data set into two data sets
    using the median as the dividing value.
  • 3. Let the lower quartile be the median of the
    set of values consisting the smaller values.
  • 4. Let the upper quartile be the median of the
    set of values consisting of the larger values.

70
Example
  • Nicotine content was measured in a random sample
    of 40 cigarettes. The data is displayed below.

71
1.Order the data from the smallest to the
largest 2.Divide the ordered data set into two
data sets using the median as the dividing value
0.72 0.85 1.09 1.24 1.37
1.40 1.47 1.51 1.58 1.63
1.64 1.64 1.67 1.68 1.69
1.69 1.70 1.74 1.75 1.75
1.79 1.79 1.82 1.85 1.86
1.88 1.90 1.92 1.93 1.97
2.03 2.08 2.09 2.11 2.17
2.28 2.31 2.37 2.46 2.55
72
  • Q2?
  • Q1?
  • Q3?
  • IQRQ3-Q1?
  • Q1(1.631.64)/21.635
  • Q2(1.751.79)/21.77
  • Q3(1.972.03)/22.000
  • IQRQ3-Q10.365

73
Box-whisker Plot
Outlier
Outlier
74
Information Drawn from Boxplot
  • The center of the distribution is indicated by
    the median line in the box.
  • A measure of the variability is given by the
    interquartile range, the length of the box.
  • The relative position of the median line
    indicates the symmetry of the middle 50 of the
    data.
  • The skewness can be obtained by the length of the
    whiskers.
  • The presence of outliers can be examined.

75
Quantile Plot
A quantile plot simply plots the data values on
the vertical axis against an empirical assessment
of the fraction of observations exceeded by the
data value.
Where i is the order of observations when they
are ranked from low to high.
76
Quantile Plot for paint data (table 8.2 page 238)
77
Normal Quantile Plots
  • The normal quantile-quantile plot is a plot of
    y(i) (ordered observations)
  • against

78
Normal Quantile Plots
79
(No Transcript)
80
5
Lecture 1 Introduction to Statistics
Purpose
Statistical Concepts
Descriptive Statistics and Some of Their Graphs
Inferential Statistics
Lecture Summary
81
Objectives
  • Understand the importance of making inference.
  • Understand the steps conducting a statistical
    study.

82
Statistical Inference
  • making an "INFORMED GUESS" about a parameter
    based on a statistic.
  • (This is the main objective of statistics.)

83
STATISTICAL INFERENCE
GATHER DATA
MAKE INFERENCES
SAMPLE STATISTICS
PARAMETERS
84
Variable
  • A VARIABLE is a characteristic of an individual
    or object that may vary for different
    observations.
  • A QUANTITATIVE VARIABLE measures a variable on
    some sort of scale.
  • A QUALITATIVE VARIABLE categorizes the values of
    the variable.

85
RAISIN BRAN EXAMPLE
  • A cereal company claims that the average amount
    of raisins in its boxes of raisin bran is two
    scoops.
  • A random sample of five boxes was taken off the
    production line, and an analysis revealed an
    average of 1.9 scoops per box.

86
Components of the Problem
  • Identify the population
  • Identify the sample
  • Identify the symbol for the parameter
  • Identify the symbol for the statistic

87
Five Steps in a Statistical Study
  • 1. Stating the problem
  • 2. Gathering the data
  • 3. Summarizing the data
  • 4. Analyzing the data
  • 5. Reporting the results

88
Stating the Problem
  • Specifically identifying the population to be
    sampled
  • Identifying the parameter (s) being studied

89
Gathering the Data
  • SURVEYS
  • Random Sampling
  • Stratified Sampling
  • Cluster Sampling
  • Systematic sampling
  • EXPERIMENTS
  • Completely Randomized Design
  • Randomized Block Design
  • Factorial Design

90
(No Transcript)
91
5
Lecture 1 Introduction to Statistics
Purpose
Statistical Concepts
Descriptive Statistics and Some of Their Graphs
Inferential Statistics
Lecture Summary
92
Summary
  • Basics of statistics
  • Descriptive statistics and graphs
  • Inferential statistics
  • Textbook
  • Chapter 1 (page 1-28)
  • Chapter 8 (page 229-243)

93
(No Transcript)
Write a Comment
User Comments (0)
About PowerShow.com