Basic Statistics GB Module - PowerPoint PPT Presentation

1 / 56

About This Presentation

Title:

Basic Statistics GB Module

Description:

Six Sigma is about reducing and controlling variation ... Six Sigma. Measuring Variation. Variation is not simple to measure because it is RANDOM ... – PowerPoint PPT presentation

Number of Views:49

Avg rating:3.0/5.0

Slides: 57

Provided by: Sal76

Category:

more less

Transcript and Presenter's Notes

Title: Basic Statistics GB Module

1

Basic StatisticsGB Module

2
Continuous Improvement Road Map
Define
Improve

Define CTQ
Determine Current State

Verify Effects of Key inputs with DOEs
Determine Optimum Settings

Analyze

Evaluate Existing Control Plan
Using statistical methods to
determine potential key inputs
Prioritize key input variables

Measure

Determine Key Input / Output Variables
Perform MSA
Calculate initial process capabilities

Control

Update Control Plan
Verify Improvements

N
3
Customers and Variation

Customers complain when they believe the product
or service they receive differs from their
expectations there is variation
Variation has many faces
Missing functionality/actions
Defects and faults
Delays etc
All variation is caused
Six Sigma is about reducing and controlling
variation
We need to understand variation and the causes of
variation

4
Causes of Variation
Process (Xs)
Input (Xs)
Output (Ys)
The variation in Y is caused by variation of the
Xs
Therefore we need to understand the Xs and
improve and control the ones with most influence
on Y

Y
Dependent
Output
Effect
Symptom
Monitor

X1 . . . XN
Independent
Input-Process
Cause
Problem
Control

5
Long Term Short Term Variation
Process Response Y

Short-Term includes common cause
variation only
Long-Term includes common cause (some)
special cause variation

EXAMPLE I drive to work. It takes me 35 /- 3
minutes. This is the common cause variation. One
day it takes me 50 minutes due to road works -
this is a special cause.
6
Examples of Special Causes
Special causes are assignable and can
include Weather (season, time of day) Lighting
Conditions Machine Type Machine
Age Maintenance Supplier Operator etc
Process Response
special causes
Time
7
Exercise Special Causes

Consider the process in your project
Make a list of the potential special causes
Be prepared to share your list with the rest of
the group
Time 10 Minutes

8
Measuring Variation

Variation is not simple to measure because it is
RANDOM
Random does not mean erratic! While it may not be
possible to predict what an individual process
output is, there is usually a pattern if we
measure a number of outputs
Process outputs will group together and we are
interested in their central value, the value they
group around, and their spread.
This grouping forms a pattern that is often
predictable

9
Example

If a coin is tossed we cannot predict whether it
will be a head or tail
If we tossed the coin say 100 times we would
expect that on 50 occasions it would be a head
and on 50 occasions it would be a tail
So there is a pattern - but we cannot predict any
individual toss
We relate the expectations to chance
(probability), there is a 50 chance it will be a
head
Randomness is about chance

10
Coin Toss Exercise

Everyone needs a coin of some type
Flip the coin 25 times and record the number of
heads
Report the total number of heads obtained and
create a dot plot
Repeat the experiment
How do the dot plots compare?

11
Randomness and Distributions

Outputs group together to form a pattern
This pattern describes the distribution of the
variation
We cannot predict where an individual value will
fall, but we can predict the overall pattern

12
Real Life and Distributions

Distributions can be modelled mathematically
If we collect data from a process or product we
can match it to a distribution and use the
properties of the distribution for analysis and
predictions

13
Probability Distributions

Standard distributions
Attribute data
Binomial
Poisson
Variable Data
Normal (Gaussian)
Lognormal (skewed)
Student t
F-distribution
Exponential

Probability distributions (normally just called
distributions) are a way of being able to make
predictions about random events
There are many standard distributions which
enable us to model real world variation

14
Key Properties of Distributions
Central Tendency the value the data groups
around
Spread or dispersion of the values
15
Measures of Central Tendency

Mean (mu)
Median
- middle value of ranked data
Mode
- most frequently occurring value

x
å
?

i
n
16
Measures of Spread

Range
R Biggest value smallest value
The range is susceptible to outlying values, as a
result we need a better measure
One approach is to calculate the average
deviation from the mean

17
Variance and Standard Deviation

The average of the deviations squared is called
the variance and is a measure of spread
It suffers from having units the same as the
mean2. To overcome this we take the square root
to give the standard deviation which has the
symbol ?
We use standard deviation as a measure of spread

18
Descriptive Statistics

Can be calculated using Minitab or Excel
Gives information about a data sets central
tendency, spread and shape

19
Descriptive Statistics

Descriptive Statistics Data1
Variable N Mean Median
TrMean StDev SE Mean
Data1 500 56.421 56.355
56.486 5.563 0.249
Variable Minimum Maximum Q1
Q3
Data1 38.260 69.801
52.693 60.227

Spread
Central Tendency
Shape
20
Computer Exercise!

Open the file 3L54 Stone.mpj
Calculate the Descriptive Statistics for the data
set
Additionally, create a graphical descriptive
statistics output

21
Descriptive Statistics Result
22
Descriptive Statistics Graphical Summary
23
Descriptive Statistics Graphical Summary
24
Mean and standard deviation tell us a great deal
about a process
Off-Target Measured by the mean
Spread Measured by the Standard Deviation
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
25
Distributions and Variation ..
26
Normal Distribution

Frequently occurs in practice
Models random behaviour
Shows that the variation groups
around the mean and tails off
Symmetric about the mean with a
50 chance of falling either side
of the mean
Is the basis for Six Sigma and
many Six Sigma tools

27
Area and Probability
Area under Normal curve probability or chance
of being in that region
Area 1.0 probability 1.0 or 100
Area 0.5 probability 0.5 or 50
Area 0.159 probability 0.159 or15.9
28
Area, Probability Standard Deviation
29
A Common Situation
30
Standard Normal Distribution
Tables exist for the probability vs. number of
standard deviations for the case of a Standard
Normal distribution which has a mean ? 0 and a
standard deviation ? 1
Z value or number of standard deviations
Z
? 1
X
Mean 0
Tables exist that give the probability of a
point of interest X being greater or equal to Z
31
P-values are Probabilities of Interest

The probability values are often called p-values
p-value tail area
Area under curve beyond point or value of
interest
Probability of being at value of interest or
beyond
A small p-value (0 to 0.05) indicates
The probability is small that the value of
interest comes from that distribution by chance
Something else is going on

-4 -3 -2 -1 0 1 2 3
4
-4 -3 -2 -1 0 1 2 3
4
-4 -3 -2 -1 0 1 2 3
4
32
Z Values
33
Using Z Values
34
Mini Exercise
Probability that an item is greater than 1.75?
1.75
Mean 1.5
? 0.25
35
Variation and 6-Sigma
This is a 6-Sigma (Process or Product)
Target
Customer Critical Requirements
USL
LSL
36
Long-Term Short-Term Variation
The presence of special causes will act to
increase the variation seen by the customer. A
gross assumption is a 1.5 sigma shift.
37
1.5 Sigma Shift
This is a Six Sigma (Process or Product)
1.5?
Target
Customer Critical Requirements
USL
LSL
38
1.5 Sigma Shift Demonstration

Open Minitab file Glass Strength.mpj
Calculate the subgroup standard deviations using
Descriptive Statistics
Calculate the average standard deviation across
the subgroups
Stack the subgroups
Calculate the combined Standard Deviation of the
stacked data
Divide the standard deviation of the stacked data
by the average standard deviation of the
subgroups.
What did you find?

39
Summary

In the short term we need a Zst 6.0 to
guarantee a long term Zlt 4.5
Note to achieve 3.4 defects per million requires
Zlt 4.5 - we should not strive to achieve Zst
6.0 if
Zshift Zst - Zlt lt 1.5

40
Z values and Sigma levels

Zst values are related to Sigma levels
In 6-Sigma we look at short and long term values
Zst and Zlt
In 6-Sigma if we cannot calculate long term
variation we assume a 1.5 sigma shift
Zlt Zst - 1.5
Note Z-tables generally do not have the 1.5 sigma
shift and give Zlt
Sigma/DPMO tables do have the 1.5 sigma shift and
give Zst

41
Testing for Normality

The Normal distribution is important to 6-Sigma
since many of the tools and techniques are
affected by Non Normal data

Tool Consequence Process sigma
Incorrect process
sigma Individuals control chart
False detection of special
causes Hypothesis testing
Incorrect conclusions Regression

False identification of important factors

poor predictive properties DOE

Incorrect conclusions about important
factors
poor prediction
abilities
42
Effect of not checking Normality

Example Effect of skewed distribution on
calculating the process Sigma Level
Process Sigma Level is determined by finding the
area beyond the specification limits using
Z-tables
If the data is not Normal, the area will be
incorrectly estimated from the Z-tables and
therefore give a misleading Process Sigma Level

43
Exercise Normal?

Look at each histogram on the following pages and
decide which data sets come from a Normal
distribution
Circle or mark the ones you think are Normal.
Work in pairs to confirm your answers
Be prepared to share your answers with the whole
group
Time 10 minutes

44
Assess Data for Normality

25 Data Points

Mark the histograms that you think come from a
Normal distribution
45
Assess Data for Normality, cont.
50 Data Points
Mark the histograms that you think come from a
Normal distribution
46
Assess Data for Normality, cont.
100 Data Points
Mark the histograms that you think come from a
Normal distribution
47
Exercise Answers

Just looking at histograms can be deceptive
Each of the Histograms on the previous pages were
randomly generated in Minitab as a Normal
distribution with a mean 50, and a standard
deviation 10 they are all Normal.
It is difficult to tell if data is Normal by
looking at histograms of n 25, n 50, and
sometimes even n 100
Plotting the data is very good practice, but do
not be misled by small amounts of data

48
Other Distributions
Exponential
Poisson
Uniform
25
50
Sample Size
100
49
Data Not Normal

If the data is not Normal there may be reasons
which can be corrected
Extreme values, Typographical errors
- correct them
Multiple modes - separate them
Data rounded - increase precision
Not enough data collect more
Special causes present remove them
Underlying distribution is not normal

Always check these first! before concluding
50
Check for Normal Distribution

Both variable and discrete data (if there is
enough) can often be modelled by a Normal
distribution
Many Statistical tools are based upon a normal
distribution. However, many of the statistical
tools will produce outputs even if the data is
not normal. These outputs could be misleading
Hence one of the first steps having collected
data is to check for normality
There is a test in Minitab for this

51
Normality Test
In a normality test the data is plotted on
normal probability paper - if the data
follows a straight line it is normal. The test
also includes a hypothesis test (see Week 2).
This test provides a quantitative value as
to whether the data is normal through the
p-value. If pgt0.05 we can say that the data is
normally distributed
52
P-values

The p-value is the risk of making the wrong
decision - in this case concluding that the data
is not normally distributed when it is.
In this case the p-value is 54 - a 54 risk of
making the wrong decision - in this case
concluding that the data is not normally
distributed when it is.
This risk is too high so we conclude the data is
normally distributed
We can never be risk free or 100 certain. Hence
we need to set a decision level. Experience shows
that this is 5 (or 95 confidence)
Hence we test to see if p gt 0.05 if it is the
data is normally distributed

53
Using Minitab to Check for Normality
54
Normality Test Exercise

Using the data from Glass Strength.mpj, check for
normality for each of the subgroups
Then check for normality on the combined data
Be prepared to report your findings

55
Non Normal Distributions

Having checked the data for typographical errors
etc and concluded that the data is not normally
distributed progress can still be made
In some cases of non-normal distributions
(typically Skewed Distributions) it is possible
to transform the data to make it normal
In some cases the data may be close enough to a
normal distribution to use the statistical tools
with care
In some cases it does not matter that the data is
not normally distributed

56
Summary