Title: Lecture 1
1INEN 270
- ENGINEERING STATISTICS
- Fall 2011
2Agenda
Purpose
Statistical Concepts
Descriptive Statistics and Some of Their Graphs
Inferential Statistics
Lecture Summary
35
Lecture 1 Introduction to Statistics
Purpose
Statistical Concepts
Descriptive Statistics and Some of Their Graphs
Inferential Statistics
Lecture Summary
4Why Study Statistics?
- You need to know how to evaluate published
numerical facts. - Your profession or employment may require you to
interpret the results of sampling or to employ
statistical methods of analysis to make
inferences in your work.
5What Is the Purpose of Statistics?
- One purpose of statistics is to make sense of
your data. - Statistics provide information about your data so
you can answer questions and make informed
business decisions.
6(No Transcript)
75
Lecture 1 Introduction to Statistics
Purpose
Statistical Concepts
Descriptive Statistics and Some of Their Graphs
Inferential Statistics
Lecture Summary
8Objectives
- Explain use of statistics.
- Define population and sample.
- Describe processes involved in statistical
analysis. - Compare descriptive and inferential statistics.
- Discuss the sampling plan.
9Defining the Problem
- Before you begin any analysis, you should
complete certain tasks. - 1. Outline the purpose of the study.
- 2. Document the study questions.
- 3. Define the population of interest.
- 4. Determine the need for sampling.
- 5. Define the data collection protocol.
10Example Speeding Data
11Population and Sample
12Basic Definition
- STATISTICS Area of science concerned with
extraction of information from numerical data and
its use in making inference about a population
from data that are obtained from a sample.
13Extract Information
?
Population (set of all measurements)
Sample (set of measurements selected from the
population)
?
Make Inference
14Basic Definition
- Population and Parameter
- Population set representing all measurements of
interest to the investigator. - Parameters an unknown population characteristic
of interest to the investigator. - Sample and Statistic
- Sample subset of measurements selected from the
population of interest. - Statistic a sample characteristic of interest to
the investigator. - Descriptive Statistics
- Center of location mean, median, mode
- Variability variance, standard deviation
- Distribution
15Examples of Population and Sample
- Selecting the proper diet for shrimp or other sea
animals is an important aspect of sea farming. A
researcher wishes to estimate the average weight
of shrimp maintained on a specific diet for a
period of 6 months. One hundred shrimp are
randomly selected from an artificial pond and
each is weighed. - Identify the population
- Identify the sample
- Identify the parameter
- Identify the statistic
16Simple Random Sampling
17Convenience Sampling
18Process of Statistical Data Analysis
Population
RandomSample
Make Inferences
Describe
SampleStatistics
19Sampling Plan
20(No Transcript)
215
Lecture 1 Introduction to Statistics
Purpose
Statistical Concepts
Descriptive Statistics and Some of Their Graphs
Inferential Statistics
Lecture Summary
22Objectives
- Compute and interpret statistics describing the
location of a set of values, such as the mean and
median and mode. - Compute and interpret statistics describing the
variability in a set of values, such as the range
and standard deviation. - Compute and interpret the measures of shape,
skewness and kurtosis. - Produce graphical displays of data.
23Some Frequently Used Statistics and Parameters
24Measure of Location
- Descriptive statistics that locate the center of
your data are called measures of central tendency
- Sample Mean
- The sample mean of a set of n measurements (x1,
x2,xn) is equal to the sum of the measurements
divided by n.
25Measure of Location
- Sample Median
- Median the middle value (also known as the
50th percentile) - The median of a set of n measurements (x1,
x2,xn) is the value that falls in the middle
position when the measurements are ordered from
the smallest to the largest. - x1,xn are arranged in increasing order
of magnitude
26RULE FOR CALCULATING THE MEDIAN
- 1. Order the measurements from the smallest to
the largest. - 2. A) If the sample size is odd, the median is
the middle measurement. - B) If the sample size is even, the median is
the average of the two middle measurements.
27(No Transcript)
28Percentiles
29Example
- A random sample of six values were
- taken from a population. These values were
- x17, x21, x310, x48, x54, and x612.
- What are the sample mean and
- sample median for these data?
30Sample Mean
31(No Transcript)
32CALCULATIONS FOR THE SAMPLE MEDIAN
1. Order Sample
2.Median
33x21, x54, x17, x48, x310, x612
MEDIAN ( 7 8 ) / 2 7.5
34Example
- Given a set of data 1.7, 2.2, 3.11, 3.9, and
14.7 - Sample mean
- Sample median
35(No Transcript)
36Example
Consider the following sample 4 18 36
39 41 42 43 44 44 45 46 47
48 49 49 50 51 53 54 60
Which measure of central tendency best describes
the central location of the data THE SAMPLE
MEAN OR SAMPLE MEDIAN? Why?
37the median
38- Why?
- Because there is an outlier (extreme value),4 in
the data set, the mean is heavily influenced by
this single outlier. - Solution
- Trimmed meandrops the highest and lowest extreme
values and averages the rest. - e.g. 5 trimmed mean drops the highest and lowest
5 and averages the rest.
39Sample Mode
- Sample Mode
- What is the mode for the previous example?
- 44 (occurs twice)
- 49 (occurs twice)
40Measures of Central Tendency (Mode, mean and
median)
- How are they related to a given data set?
- Depending on the skewness of the population
(a) A bell-shaped distribution
41(b) A distribution skewed to the left
(c) A distribution skewed to the right
A mean B median C mode
A mode B median C mean
42- Suppose IRS wants to measure the central tendency
of the income of the American population, which
measure will you recommend and why? - Hint Bill Gates
- Skewed to the right
43Other Measures of Locations
- Trimmed means
- Computed by trimming away a certain percent of
both the largest and smallest set of values. - Less sensitive to outliers than the mean but
more-so than the median. - What is the relationship between trimmed mean and
the median? - Example 0.32 0.53 0.28 0.37
0.47 0.43 0.36 0.42 0.38 0.43
440.32 0.53 0.28 0.37 0.47 0.43 0.36 0.42 0.38 0
.43
0.28 0.32 0.36 0.37 0.38 0.42 0.43
0.43 0.47 0.53
45The Spread of a Distribution Variation
Measure Definition
range the difference between the maximum and minimum data values
interquartile range the difference between the 25th and 75th percentiles (IR or IQR)
variance a measure of dispersion of the data around the mean
standard deviation a measure of dispersion expressed in the same units of measurement as your data (the square root of the variance)
coefficient of variation standard deviation as a percentage of of the mean
46Typical Variation Standard Deviation
- The variance is a measure of variation. The
square root of the variance, or standard
deviation, is a measure of variation in terms of
the original linear scale. - is the population standard
deviation -
- is an estimate of the population standard
deviation.
47Typical Variation Average Squared Deviation
- Consider the data 3, 4, 8
Obs Data Deviation (Deviation)2
1 3 -2 4
2 4 -1 1
3 8 3 9
Sum 15 0 14
Average 5 0 14/3
48Measures of Variability
- Sample Range
- XMax-XMin
- Sample Variance
- Sample Standard Deviation
49Obs.
Obs.
1 7 49 2 1
1 3 10 100 4 8 64 5
4 16 6 12 144
1 7 0 0 2 1 -6
36 3 10 3 9 4 8 1
1 5 4 -3 9 6
12 5 25
42
374
80
50Sample Variance
51Unbiased Estimate of Population Variance
- Calculate the unbiased estimate of population
variance by averaging with n-1 instead of n. - This estimator is unbiased because, on average,
it equals the population variance.
52Discrete and Continuous Data
- Discrete Data
- Counted of defective items, of accidents
- Continuous Data
- Measured all possible heights, weights,
distance,etc.
53Distributions
- When you examine the distribution of values for
speed, you can determine - the range of possible data values
- the frequency of data values
- whether the data values accumulate in the middle
of the distribution or at one end.
54Graphical Methods and Data Description
- Stem and Leaf Plot
- Relative Frequency distribution
- Relative Frequency Histogram
55Construction of a Stem-Leaf Display
- List the stem values, in order, in a vertical
column - Draw a vertical line to the right of the stem
values - For each observation, record the leaf portion of
the observation in the row corresponding to the
appropriate stem - Reorder the leaves from the lowest to highest
within each stem row - If the number of leaves appearing in each stem is
too large, divide the stems into two groups, the
first corresponding to leaves 0 through 4, and
the second corresponding to leaves 5 through 9.
(This subdivision can be increased to five groups
if necessary).
56Car Battery Life
2.2 4.1 3.5 4.5 3.2 3.7 3.0 2.6
3.4 1.6 3.1 3.3 3.8 3.1 4.7 3.7
2.5 4.3 3.4 3.6 2.9 3.3 3.9 3.1
3.3 3.1 3.7 4.4 3.2 4.1 1.9 3.4
4.7 3.8 3.2 2.6 3.9 3.0 4.2 3.5
57Stem and Leaf Plot of Battery Life
58Double-Stem and Leaf Plot of Battery Life
59Relative Frequency Distribution
- Group data into different classes or intervals
- Counting leaves belonging to each stem
- Each stem defines a class interval
- Divide each class frequency by the total number
of observations, we obtain the proportion of the
set of observations in each of the classes.
60Relative Frequency Distribution of Battery Life
Class Interval Class midpoint Frequency, f Relative frequency
1.5-1.9 1.7 2 0.05
2.0-2.4 2.2 1 0.025
2.5-2.9 2.7 4 0.100
3.0-3.4 3.2 15 0.375
3.5-3.9 ? ? ?
4.0-4.4 ? ? ?
4.5-4.9 ? ? ?
61Class Interval Class midpoint Frequency, f Relative frequency
1.5-1.9 1.7 2 0.05
2.0-2.4 2.2 1 0.025
2.5-2.9 2.7 4 0.100
3.0-3.4 3.2 15 0.375
3.5-3.9 3.7 10 0.250
4.0-4.4 4.2 5 0.125
4.5-4.9 4.7 3 0.075
62Relative Frequency Histogram of Battery Life
63Picturing Distributions Histogram
- Each bar in the histogram represents a group of
values (a bin). - The height of the bar is the percent of values in
the bin.
PERCENT
Bins
64Measures of Shape Skewness
65Measures of Shape Kurtosis
66Data Displays and Graphical Methods
- Box and Whisker Plot or Boxplot
- Pth Percentile
- The Pth Percentile is the value Xp such that p
of the measurements will fall below that value
and (100-p) of the measurements will fall above
the value. - Quartile
- Quartiles divide the measurements into four parts
such that 25 of the measurements are contained
in each part. The first quartile (Lower
Quartile) is denoted by Q1, the second by Q2, and
the third (Upper Quartile) by Q3.
P
(100-P)
Xp
Q1
Q2
Q3
67- InterQuartile Range (IQR)
- IQRQ3-Q1
- Outlier
- Observations that are considered to be unusually
far removed from the bulk of the data. - We label the observations as outliers when the
distance from the box exceeds 1.5 times the
interquartile range (in either direction). - Box encloses the interquartile range of the data
- Whiskers show the extreme observations in the
sample.
68Box and Whiskers Plot or Boxplot
- Calculating Fence Values
- Lower Inner Fence
- Q1-1.5(IQR)
- Upper Inner Fence
- Q31.5(IQR)
- Lower Outer Fence
- Q1-3(IQR)
- Upper Outer Fence
- Q33(IQR)
Maximum
Upper Quartile
Median
Lower Quartile
Minimum
69A Quick Method
- 1. Order the data from smallest to largest value.
- 2. Divide the ordered data set into two data sets
using the median as the dividing value. - 3. Let the lower quartile be the median of the
set of values consisting the smaller values. - 4. Let the upper quartile be the median of the
set of values consisting of the larger values.
70Example
- Nicotine content was measured in a random sample
of 40 cigarettes. The data is displayed below.
711.Order the data from the smallest to the
largest 2.Divide the ordered data set into two
data sets using the median as the dividing value
0.72 0.85 1.09 1.24 1.37
1.40 1.47 1.51 1.58 1.63
1.64 1.64 1.67 1.68 1.69
1.69 1.70 1.74 1.75 1.75
1.79 1.79 1.82 1.85 1.86
1.88 1.90 1.92 1.93 1.97
2.03 2.08 2.09 2.11 2.17
2.28 2.31 2.37 2.46 2.55
72- Q2?
- Q1?
- Q3?
- IQRQ3-Q1?
- Q1(1.631.64)/21.635
- Q2(1.751.79)/21.77
- Q3(1.972.03)/22.000
- IQRQ3-Q10.365
73Box-whisker Plot
Outlier
Outlier
74Information Drawn from Boxplot
- The center of the distribution is indicated by
the median line in the box. - A measure of the variability is given by the
interquartile range, the length of the box. - The relative position of the median line
indicates the symmetry of the middle 50 of the
data. - The skewness can be obtained by the length of the
whiskers. - The presence of outliers can be examined.
75Quantile Plot
A quantile plot simply plots the data values on
the vertical axis against an empirical assessment
of the fraction of observations exceeded by the
data value.
Where i is the order of observations when they
are ranked from low to high.
76Quantile Plot for paint data (table 8.2 page 238)
77Normal Quantile Plots
- The normal quantile-quantile plot is a plot of
y(i) (ordered observations) - against
78Normal Quantile Plots
79(No Transcript)
805
Lecture 1 Introduction to Statistics
Purpose
Statistical Concepts
Descriptive Statistics and Some of Their Graphs
Inferential Statistics
Lecture Summary
81Objectives
- Understand the importance of making inference.
- Understand the steps conducting a statistical
study.
82Statistical Inference
- making an "INFORMED GUESS" about a parameter
based on a statistic. - (This is the main objective of statistics.)
83STATISTICAL INFERENCE
GATHER DATA
MAKE INFERENCES
SAMPLE STATISTICS
PARAMETERS
84Variable
- A VARIABLE is a characteristic of an individual
or object that may vary for different
observations. - A QUANTITATIVE VARIABLE measures a variable on
some sort of scale. - A QUALITATIVE VARIABLE categorizes the values of
the variable.
85RAISIN BRAN EXAMPLE
- A cereal company claims that the average amount
of raisins in its boxes of raisin bran is two
scoops. - A random sample of five boxes was taken off the
production line, and an analysis revealed an
average of 1.9 scoops per box.
86Components of the Problem
- Identify the population
- Identify the sample
- Identify the symbol for the parameter
- Identify the symbol for the statistic
87Five Steps in a Statistical Study
- 1. Stating the problem
- 2. Gathering the data
- 3. Summarizing the data
- 4. Analyzing the data
- 5. Reporting the results
88Stating the Problem
- Specifically identifying the population to be
sampled - Identifying the parameter (s) being studied
89Gathering the Data
- SURVEYS
- Random Sampling
- Stratified Sampling
- Cluster Sampling
- Systematic sampling
- EXPERIMENTS
- Completely Randomized Design
- Randomized Block Design
- Factorial Design
90(No Transcript)
915
Lecture 1 Introduction to Statistics
Purpose
Statistical Concepts
Descriptive Statistics and Some of Their Graphs
Inferential Statistics
Lecture Summary
92Summary
- Basics of statistics
- Descriptive statistics and graphs
- Inferential statistics
- Textbook
- Chapter 1 (page 1-28)
- Chapter 8 (page 229-243)
93(No Transcript)