Title: Basic Statistics GB Module
1- Basic StatisticsGB Module
2Continuous Improvement Road Map
Define
Improve
- Define CTQ
- Determine Current State
- Verify Effects of Key inputs with DOEs
- Determine Optimum Settings
Analyze
- Evaluate Existing Control Plan
- Using statistical methods to
- determine potential key inputs
- Prioritize key input variables
Measure
- Determine Key Input / Output Variables
- Perform MSA
- Calculate initial process capabilities
Control
- Update Control Plan
- Verify Improvements
N
3Customers and Variation
- Customers complain when they believe the product
or service they receive differs from their
expectations there is variation - Variation has many faces
- Missing functionality/actions
- Defects and faults
- Delays etc
- All variation is caused
- Six Sigma is about reducing and controlling
variation - We need to understand variation and the causes of
variation
4Causes of Variation
Process (Xs)
Input (Xs)
Output (Ys)
The variation in Y is caused by variation of the
Xs
Therefore we need to understand the Xs and
improve and control the ones with most influence
on Y
- Y
- Dependent
- Output
- Effect
- Symptom
- Monitor
- X1 . . . XN
- Independent
- Input-Process
- Cause
- Problem
- Control
5Long Term Short Term Variation
Process Response Y
- Short-Term includes common cause
variation only - Long-Term includes common cause (some)
special cause variation
EXAMPLE I drive to work. It takes me 35 /- 3
minutes. This is the common cause variation. One
day it takes me 50 minutes due to road works -
this is a special cause.
6Examples of Special Causes
Special causes are assignable and can
include Weather (season, time of day) Lighting
Conditions Machine Type Machine
Age Maintenance Supplier Operator etc
Process Response
special causes
Time
7Exercise Special Causes
- Consider the process in your project
- Make a list of the potential special causes
- Be prepared to share your list with the rest of
the group - Time 10 Minutes
8Measuring Variation
- Variation is not simple to measure because it is
RANDOM - Random does not mean erratic! While it may not be
possible to predict what an individual process
output is, there is usually a pattern if we
measure a number of outputs - Process outputs will group together and we are
interested in their central value, the value they
group around, and their spread. - This grouping forms a pattern that is often
predictable
9Example
- If a coin is tossed we cannot predict whether it
will be a head or tail - If we tossed the coin say 100 times we would
expect that on 50 occasions it would be a head
and on 50 occasions it would be a tail - So there is a pattern - but we cannot predict any
individual toss - We relate the expectations to chance
(probability), there is a 50 chance it will be a
head - Randomness is about chance
10Coin Toss Exercise
- Everyone needs a coin of some type
- Flip the coin 25 times and record the number of
heads - Report the total number of heads obtained and
create a dot plot - Repeat the experiment
- How do the dot plots compare?
11Randomness and Distributions
- Outputs group together to form a pattern
- This pattern describes the distribution of the
variation - We cannot predict where an individual value will
fall, but we can predict the overall pattern
12Real Life and Distributions
- Distributions can be modelled mathematically
- If we collect data from a process or product we
can match it to a distribution and use the
properties of the distribution for analysis and
predictions
13Probability Distributions
- Standard distributions
- Attribute data
- Binomial
- Poisson
- Variable Data
- Normal (Gaussian)
- Lognormal (skewed)
- Student t
- F-distribution
- Exponential
- Probability distributions (normally just called
distributions) are a way of being able to make
predictions about random events - There are many standard distributions which
enable us to model real world variation
14Key Properties of Distributions
Central Tendency the value the data groups
around
Spread or dispersion of the values
15Measures of Central Tendency
- Mean (mu)
- Median
- - middle value of ranked data
- Mode
- - most frequently occurring value
x
å
?
i
n
16Measures of Spread
- Range
- R Biggest value smallest value
- The range is susceptible to outlying values, as a
result we need a better measure - One approach is to calculate the average
deviation from the mean
17Variance and Standard Deviation
- The average of the deviations squared is called
the variance and is a measure of spread - It suffers from having units the same as the
mean2. To overcome this we take the square root
to give the standard deviation which has the
symbol ? - We use standard deviation as a measure of spread
18Descriptive Statistics
- Can be calculated using Minitab or Excel
- Gives information about a data sets central
tendency, spread and shape
19Descriptive Statistics
- Descriptive Statistics Data1
- Variable N Mean Median
TrMean StDev SE Mean - Data1 500 56.421 56.355
56.486 5.563 0.249 - Variable Minimum Maximum Q1
Q3 - Data1 38.260 69.801
52.693 60.227
Spread
Central Tendency
Shape
20Computer Exercise!
- Open the file 3L54 Stone.mpj
- Calculate the Descriptive Statistics for the data
set - Additionally, create a graphical descriptive
statistics output
21Descriptive Statistics Result
22Descriptive Statistics Graphical Summary
23Descriptive Statistics Graphical Summary
24Mean and standard deviation tell us a great deal
about a process
Off-Target Measured by the mean
Spread Measured by the Standard Deviation
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
25Distributions and Variation ..
26Normal Distribution
- Frequently occurs in practice
- Models random behaviour
- Shows that the variation groups
- around the mean and tails off
- Symmetric about the mean with a
- 50 chance of falling either side
- of the mean
- Is the basis for Six Sigma and
- many Six Sigma tools
27Area and Probability
Area under Normal curve probability or chance
of being in that region
Area 1.0 probability 1.0 or 100
Area 0.5 probability 0.5 or 50
Area 0.159 probability 0.159 or15.9
28Area, Probability Standard Deviation
29A Common Situation
30Standard Normal Distribution
Tables exist for the probability vs. number of
standard deviations for the case of a Standard
Normal distribution which has a mean ? 0 and a
standard deviation ? 1
Z value or number of standard deviations
Z
? 1
X
Mean 0
Tables exist that give the probability of a
point of interest X being greater or equal to Z
31P-values are Probabilities of Interest
- The probability values are often called p-values
- p-value tail area
- Area under curve beyond point or value of
interest - Probability of being at value of interest or
beyond - A small p-value (0 to 0.05) indicates
- The probability is small that the value of
interest comes from that distribution by chance - Something else is going on
-4 -3 -2 -1 0 1 2 3
4
-4 -3 -2 -1 0 1 2 3
4
-4 -3 -2 -1 0 1 2 3
4
32Z Values
33Using Z Values
34Mini Exercise
Probability that an item is greater than 1.75?
1.75
Mean 1.5
? 0.25
35Variation and 6-Sigma
This is a 6-Sigma (Process or Product)
Target
Customer Critical Requirements
USL
LSL
36Long-Term Short-Term Variation
The presence of special causes will act to
increase the variation seen by the customer. A
gross assumption is a 1.5 sigma shift.
371.5 Sigma Shift
This is a Six Sigma (Process or Product)
1.5?
Target
Customer Critical Requirements
USL
LSL
381.5 Sigma Shift Demonstration
- Open Minitab file Glass Strength.mpj
- Calculate the subgroup standard deviations using
Descriptive Statistics - Calculate the average standard deviation across
the subgroups - Stack the subgroups
- Calculate the combined Standard Deviation of the
stacked data - Divide the standard deviation of the stacked data
by the average standard deviation of the
subgroups. - What did you find?
39Summary
- In the short term we need a Zst 6.0 to
guarantee a long term Zlt 4.5 - Note to achieve 3.4 defects per million requires
Zlt 4.5 - we should not strive to achieve Zst
6.0 if - Zshift Zst - Zlt lt 1.5
40Z values and Sigma levels
- Zst values are related to Sigma levels
- In 6-Sigma we look at short and long term values
Zst and Zlt - In 6-Sigma if we cannot calculate long term
variation we assume a 1.5 sigma shift - Zlt Zst - 1.5
- Note Z-tables generally do not have the 1.5 sigma
shift and give Zlt - Sigma/DPMO tables do have the 1.5 sigma shift and
give Zst -
41Testing for Normality
- The Normal distribution is important to 6-Sigma
since many of the tools and techniques are
affected by Non Normal data -
Tool Consequence Process sigma
Incorrect process
sigma Individuals control chart
False detection of special
causes Hypothesis testing
Incorrect conclusions Regression
False identification of important factors
poor predictive properties DOE
Incorrect conclusions about important
factors
poor prediction
abilities
42Effect of not checking Normality
- Example Effect of skewed distribution on
calculating the process Sigma Level - Process Sigma Level is determined by finding the
area beyond the specification limits using
Z-tables - If the data is not Normal, the area will be
incorrectly estimated from the Z-tables and
therefore give a misleading Process Sigma Level
43Exercise Normal?
- Look at each histogram on the following pages and
decide which data sets come from a Normal
distribution - Circle or mark the ones you think are Normal.
- Work in pairs to confirm your answers
- Be prepared to share your answers with the whole
group - Time 10 minutes
44Assess Data for Normality
Mark the histograms that you think come from a
Normal distribution
45Assess Data for Normality, cont.
50 Data Points
Mark the histograms that you think come from a
Normal distribution
46Assess Data for Normality, cont.
100 Data Points
Mark the histograms that you think come from a
Normal distribution
47Exercise Answers
- Just looking at histograms can be deceptive
- Each of the Histograms on the previous pages were
randomly generated in Minitab as a Normal
distribution with a mean 50, and a standard
deviation 10 they are all Normal. - It is difficult to tell if data is Normal by
looking at histograms of n 25, n 50, and
sometimes even n 100 - Plotting the data is very good practice, but do
not be misled by small amounts of data
48Other Distributions
Exponential
Poisson
Uniform
25
50
Sample Size
100
49Data Not Normal
- If the data is not Normal there may be reasons
which can be corrected - Extreme values, Typographical errors
- - correct them
- Multiple modes - separate them
- Data rounded - increase precision
- Not enough data collect more
- Special causes present remove them
- Underlying distribution is not normal
Always check these first! before concluding
50Check for Normal Distribution
- Both variable and discrete data (if there is
enough) can often be modelled by a Normal
distribution - Many Statistical tools are based upon a normal
distribution. However, many of the statistical
tools will produce outputs even if the data is
not normal. These outputs could be misleading - Hence one of the first steps having collected
data is to check for normality - There is a test in Minitab for this
51Normality Test
In a normality test the data is plotted on
normal probability paper - if the data
follows a straight line it is normal. The test
also includes a hypothesis test (see Week 2).
This test provides a quantitative value as
to whether the data is normal through the
p-value. If pgt0.05 we can say that the data is
normally distributed
52P-values
- The p-value is the risk of making the wrong
decision - in this case concluding that the data
is not normally distributed when it is. - In this case the p-value is 54 - a 54 risk of
making the wrong decision - in this case
concluding that the data is not normally
distributed when it is. - This risk is too high so we conclude the data is
normally distributed - We can never be risk free or 100 certain. Hence
we need to set a decision level. Experience shows
that this is 5 (or 95 confidence) - Hence we test to see if p gt 0.05 if it is the
data is normally distributed
53Using Minitab to Check for Normality
54Normality Test Exercise
- Using the data from Glass Strength.mpj, check for
normality for each of the subgroups - Then check for normality on the combined data
- Be prepared to report your findings
55Non Normal Distributions
- Having checked the data for typographical errors
etc and concluded that the data is not normally
distributed progress can still be made - In some cases of non-normal distributions
(typically Skewed Distributions) it is possible
to transform the data to make it normal - In some cases the data may be close enough to a
normal distribution to use the statistical tools
with care - In some cases it does not matter that the data is
not normally distributed
56Summary
- This ppt has covered
- Variation
- Common and Special causes, Long and short term
data - Distributions
- Central Value (tendency) mean, median and mode
- Spread or dispersion range, variance and
standard deviation - Normal Distribution
- Z-values and p-values, Six Sigma, 1.5 Sigma shift
and Z-values - Checking for normality and dealing with Non
normal data