Title: Quantitative
1STATISTICS
Quantitative Graphical Analysis
DEPARTMENT OF STATISTICS
REDGEMAN_at_UIDAHO.EDU
OFFICE 1-208-885-4410
DR. RICK EDGEMAN, PROFESSOR CHAIR SIX SIGMA
BLACK BELT
2Measurement a numerical assignment to
something, usually a non-numerical
element. Scales of Measurement Nominal Ordinal I
nterval Ratio
3Data Reliability Validity Validity An item
measures what it is intended to
measure. Reliability A re-measurement would
order individual responses in the same way. Bias
The difference between the average measured
value and a reference value is called bias.
Bias is controlled via calibration.
4Repeatability Reproducibility Repeatability is
the variation in measurements obtained with one
measurement instrument when used several times by
one appraiser, while measuring the identical
characteristic on the same part. Variation
obtained when the measurement system is applied
repeatedly under the same conditions and is
often caused by conditions inherent in the
measurement system. Precision is the closeness
of agreement between randomly selected individual
measurement or test results. Reproducibility is
the variation in the average of the measurements
made by different appraisers using the
same measuring instrument when measuring the
identical characteristic on the same item.
5Stability is the total variation in the
measurements obtained with a measurement system
on the same master or parts when measuring a
single characteristic over an extended
time period. A system is said to be stable if the
results are the same at different points in
time. Linearity is the difference in the bias
values through the expected operating range of
the gage.
6Inside Data
Pareto Diagrams
Process Flow Diagrams
Cause-and-Effect Diagrams
7Cause-and-Effect Diagram Also called a fishbone
or Ishikawa diagram
Materials Environment Methods
Branch
Stem
Effect
Major Cause Limb
Manpower-HR Machinery Measurement
8- This is a team tool - assemble the right team!
- After the team has finished the diagram, post
it prominently in a department that either
affects or is impacted by the process. - Leave a blue marker by the diagram so that
annotations can be made - give it a few days. - Move from department to department, changing the
color of the marker in each department. - After this rotation is complete have the team
update the diagram. - State effects in the positive - youll get
better results. - Time-Sequencing may be possible.
- Try balancing enablers (solutions) on top of the
diagram and obstacles on the bottom of the
diagram - similar to Force Field Analysis.
Cause Effect Diagram Thoughts
9Pareto Analysis
- Graphically indicates which diseases or problems
most infect the process. Pareto diagrams may be
regarded as a special use form of the histogram. - This aids organizations in focusing efforts
resources and is a commonly used quality
improvement tool.
10 Pareto Analysis for Process Improvement
f or
- A Useful Team Diagnostic and Improvement Tool
Policy More Pages
Toner Other Unclear Copies
Copied Marks
Reasons on Than That
Personal Needed Arent
Usage Needed
Cause of Excessive Photocopier Use
11Process Flow Diagrams
- A visual portrayal of the sequential and parallel
stages steps and the relationships between
these steps that make up a process or a portion
of a process. Such diagrams can function as
aids to understanding.
12Process Flow Diagrams
Start
Recommendation Current Ideal!
Finish
13Analyzing Data
Boxplots
Histograms
Scatter Diagrams
Time Series Plots
14 The Voice of the ProcessA Statistical
Perspective
15- Statistics as a vehicle to search for, quantify,
understand and respond to truth is explored. - Foundational concepts are presented and
illustrated, including process characterization
and guidance. - Measures of process centrality, variation,
position, shape are introduced and illustrated.
16 Application of Material
- Material from this segment is useful in most
fields of endeavor including business. Areas of
application include finance, marketing,
management, real estate, information systems, and
accounting. - READ Journal of Financial Quantitative
Analysis - Journal of Accounting
Research - Management Information Systems
Quarterly - Decision Sciences
- Journal of Marketing Research
- Quality Progress.
17Presentation Content
- The Quest for Truth
- Sources of and Paths to Truth
- The Scientific Method of Inquiry
- Process Proactivity Reactivity
- Wiretapping the Process
- Signal Processing
- Process Centrality, Consistency, Shape and the
Unusual.
18The Quest for Truth
- Eternal / Immutable Truth
- Changing and Changeable Truth
- Situational Truth - Process Control
-
- Sources Paths to Truth
- Revelation / Providence
- Serendipity
- Tenacity / Persistence / Investigation
- Historical Research Scientific Method
19Correlation Causality
- Correlation typically implies a relationship
between traits and MAY indicate of causality.
If there is a relationship, then behavior of one
trait allows prediction of the behavior of the
other trait. - Causality implies both the ability to predict and
(often) to control the behavior of a second
trait.
20Process Reactivity Proactivity
- Statistical Methods are Used Both to Analyze and
Respond to What has Occurred in the Past ---
Reactive Use --- and to Plan and Study Process
Interventions for the Purpose of Process
Improvement --- Proactive Use. - Even if youre on the right track, youll get
run over if you just sit there. Will
Rogers
21WIRETAPPING THE PROCESSWhat is Our Interest in
the Process?
- Understanding
- Prediction of Process Behavior
- Guidance Control of Process Behavior
22The Voice of the Process
- What Do We Measure?
- Process Performance Measures (PPM)
- These Pickup the Heartbeat of the Process and May
be DIRECT or SURROGATE. - Why Do We Measure?
- How Do We Measure?
- Where When in the Process Do We Measure?
23Signal Processing
- Having addressed issues related to PPMs, we may
now gather process data. - Data should provide an image of the process from
which the data originates. - We will want to know
- Where does the process live?
- How consistent is the process?
- What is the shape of the process?
- What is unusual about the process?
- Does process behavior vary over time?
24Signal Processing Data Analysis
- Signal processing, more commonly called data
analysis, is often conducted with the aid of
canned spreadsheet or statistical software
packages such as - MINITAB, SAS, SPSS, BMDP, or SYSTAT.
25Copier Abuses
Firm in multilevel building, each level has its
own photocopier Free usage the tenth level is
selected and each person is given a key to the
machine on their level and will have their usage
tracked. Third level employee usage is tracked
in aggregate but those employees are unaware of
monitoring. Data for 50 days follows.
2650 Days of Photocopier Usage
Day 10th 3rd Day 10th 3rd
Day 10th 3rd --------------------------
--------------------------------------------------
--------------------------------------------------
--------------- 1 500 440 18
360 20 35 150 370 2 420
220 19 310 250 36 140
405 3 440 360 20 320
350 37 130 130 4 480
110 21 290 150 38 150 120 5
450 240 22 290 250 39
130 70 6 460 360 23
270 230 40 110 240 7 450
80 24 250 90 41 90
20 8 420 420 25 240
50 42 80 450 9 410
310 26 250 320 43 90
20 10 405 30 27 250
360 44 70 40 11 380
290 28 230 450 45 20
320 12 360 410 29 240
270 46 50 140 13 360
460 30 220 380 47 40
90 14 370 420 31 190
190 48 20 130 15 350
150 32 150 500 49 30
480 16 320 170 33 170
290 50 30 350 17 350
250 34 120 150
27Where Does the Process Live?Measures of
Centrality
- In a sense we seek an address for the process -
forced to provide exactly one number which best
represents the location of the process, what
should we use? Traditional measures have
included - The Mean or Average
- The Median
- The Midrange
- The Mode.
28Determination of the Mean
- The true average of values for the process is
symbolized by the Greek lower case letter mu,
that is - True Process Mean ?
- This value tends to be both unknown and
unknowable and is generally estimated from the
sample data. If we represent the PPM (shelf life
in days) as X, then the sample mean is X-bar or X.
29 Sample Mean for 10th Level
- The sample mean is determined as the simple
- arithmetic average of the sample data values.
- That is
- X ?Xi/n (500 420 ... 30)/50 or
- X 248.1 copies/day
- Average number of copies per day is about 248
30Mode, Midrange Median
- Three other common measures of process address
(or centrality as this concept is called in
statistics) are the - MODE, or the most frequently occurring value
(150, 250, and 360 each occur 3 times). - MIDRANGE or the average of the minimum and
maximum process PPM values. This value is (20
500)/2 260 - MEDIAN or Q2, the value for which half of the
process values are larger in value and half of
values are smaller. Q2 is the (n1)th/2 ordered
value, averaging the two surrounding ordered
values if there is a fractional part. For the
10th level data (n1)/2 25.5 so that Q2 is
the average of the 25th 26th ordered values or
(250 270)/2 260 copies/day
31Process Consistency
- This issue is like a coin --- it is two-sided.
- On one side, the side generally emphasized in
statistics, is the side labeled variability
with statisticians discussing measures of
variability - variability is an enemy! - On the other side is process consistency where
consistency is a desirable trait!
32Consistent With What?
- When measuring process consistency we generally
are referring to how close to (e.g. consistent
with) or how far away from (e.g. variable) is the
process with respect to some point of reference
or an anchor, - Typically the mean serves as this anchor.
- Though there are many measures of variability /
consistency, only the most commonly applied
measures are examined - range, interquartile range, variance standard
deviation.
33Range Interquartile Range
- The range is simply the distance between the
largest and smallest values in the process and is
estimated from the sample values. For the 10th
level data - Range max - min 500 - 20 480 copies/day
- The interquartile range (IQR) is given by IQR
(Q3 - Q1) where Q3 and Q1 are the values at the
75th and 25th percentiles, respectively. That
is - Q3 value with 75 of the sample smaller
- Q1 value with 25 of the sample smaller.
34Interquartile Range (IQR)
- Q3 the 3(n1)th/4 ordered value where 3(n1)/4
3(51)/4 38.25 so Q3 is the number that is
one-fourth of the way from the 38th ordered value
to the 39th ordered value. These values are 360
and 370 so that Q3 362.5 -
- Q1 the (n1)th/4 ordered value where (n1)/4
51/4 12.75 so Q1 is three-fourths of the way
from the 12th ordered value to the 13th ordered
value. These are 120 and 130 for the 10th Level
data so that Q1 127.5. - Thus IQR (Q3-Q1) (362.5 - 127.5) 235 copies
/ day
35The Sample Variance
- The true variance of the process is represented
by ?2 and can be thought of as the average
squared distance of observations in the process
from the true process mean, ?. - Generally, the values of both ? and ?2 are
unknown and must be estimated, ? by X and ?2 by
S2. This latter value, S2, is called the
sample variance.
36Sample Variance - Calculation
- S2 is defined as S2 ?(Xi - X)2/(n-1) but is
better calculated using - S2 (?Xi2 - nX2)/(n-1)
- For the 10th Level data we have
- S????????????
37 Sample Standard Deviation
- The sample variance, S2, has desirable properties
but is difficult to interpret since it is defined
in terms of the square of the original PPM units. - More understandable is the sample standard
deviation, S, which is simply the positive square
root of S2 and can be thought of as
representative (standard) of the distance
(deviation) between values in the data set and
the sample mean. - This value estimates the true process standard
deviation, ?.
38Sample Standard Deviation
- The value of the sample standard deviation for
the 10th Leveldata is - S ????????? 143.3 copies /day
- That is, it is representative of the values in
the 10th Level data set that they vary from the
average by about 143.3 copies / day, more or
less.
39The Empirical Rule
- The Empirical Rule is applicable for
approximately mound-shaped distributions and
relates X and S as follows (figures are rule
of thumb) - 67 of all values within X /- S
- 95 of all values within X /- 2S
- 99 of all values within X /- 3S
- Values outside this latter range are often
considered unusual.
40Process Shape
- It is often important to know not just how much
the PPM varies with regard to some anchor point,
say the mean, but also the particular pattern of
variation followed by the PPM is of value. This
concept is often referred to as shape. - Shape is often represented graphically through
use of histograms or box-and-whisker plots
(boxplots).
41Process Histograms
- Such plots generally show the number of data
values in successive categories with categories
being - mutually exclusive
- and exhaustive.
- Construction of a histogram is both art and
science.
42Histogram Construction
- Determine the range, R max -min,
- for 10th Level data R 500 - 20 480.
- Determine the number of categories to be used, k.
A useful rule of thumb is k log2(n) where n
is the sample size. This gives k 5 (4 to 6) - Determine W R/k, this is the minimum width that
a category should be, and usually W will be
rounded to a convenient value. That is W is
between 480/4 120 and 480/6 80 - Construct categories classify data.
43Histogram Categories
- Given these guidelines, the number of histogram
categories, k, for a sample of n items is
approximately - n k
- 1-7 dont bother
- 8-15 3 /- 1 These are rule-of-thumb
- 16-31 4 /- 1 We should not under- or
- 32-63 5 /- 1 over- resolve the data.
- 64-127 6 /- 2
- 128-255 7 /- 2
44Process Shape Via the Histogram
H
i
s
t
o
g
r
a
m
o
f
P
h
o
t
o
c
o
p
i
e
r
U
s
e
1
0
y
c
n
e
u
q
e
5
r
F
0
5
0
0
4
0
0
3
0
0
2
0
0
1
0
0
0
1
0
t
h
45Box-and-Whisker Plots
- Most commonly called boxplots.
- Consist of a five-number summary and four
outlier points - min, Q1, Q2, Q3, max
- Inner LOP Q1-1.5(IQR) Outer LOP
Q1-3(IQR) - Inner UOP Q31.5(IQR) Outer UOP Q33(IQR)
- min 20 max 500
- Q1 127.5 Q2 250 Q3 362.5
- Inner LOP 127.5 - 1.5(235) -225.5 copies
(N/A) Outer LOP 127.5 -
3.0(235) -577.5 copies (N/A) - Inner UOP 362.5 1.5(235) 615 days
Outer UOP 362.5
3.0(235) 1,067.5 copies - NO OUTLIERS BY THESE MEASURES!
46Shape - The Rest of the Story?
47Correlation Causality
- PATTERNS
- The Graph at Left Indicates, in Part, a
Relationship Between Two Variables With One
Increasing in Value as the Other Increases in
Value. Cycling is also Evident.
48Scatter Diagrams Correlation Causality
PATTERNS This graph shows
an indirect relation with one
trait
increasing in value as the other
trait decreases.
Correlation
sometimes indicates causality. Prediction
and possibly process guidance may be
possible.
49Time Series Plot
250
50A Case Report - What Might We Conclude?
Let the punishment fit the crime!
51Shelf Life of a Perishable Food Product - An
Exercise
- Consider the shelf life of a perishable food
product. A sample of n 25 such products, taken
under essentially uniform conditions provided the
following results, with days being the unit of
measurement
52Data Analysis Assignment
- Determine mean, median, mode and midrange.
- Determine range, variance, standard deviation,
and inter-quartile range. - Apply the empirical rule.
- Construct histogram, and box-plot that includes
all outlier point criteria. - Construct time series plot (the data should be
read left-to-right). - INTERPRET ALL RESULTS what does all of this
mean? see the following slide
53Written Case Guidelines
GOOD ADVICE! A. One or
two page written summary - concise Key Issues
Data Results Recommendation(s)
Limitations. B. Only vitally important
statistical / graphical output should be
included in these two pages C. Otherwise
important output should be placed in an
appendix and should be clearly summarized D.
Spell- and grammar-checked
54STATISTICS
Quantiative Graphical Analysis
End of Session
DEPARTMENT OF STATISTICS
REDGEMAN_at_UIDAHO.EDU
OFFICE 1-208-885-4410
DR. RICK EDGEMAN, PROFESSOR CHAIR SIX SIGMA
BLACK BELT