STAT 424/524 Statistical Design for Process Improvement presentation

About This Presentation

Title:

STAT 424/524 Statistical Design for Process Improvement

Description:

Title: STAT 424/524 Statistical Design for Process Improvement Author: Saihua Yu Last modified by: SZhang Created Date: 8/8/2006 1:45:04 AM Document presentation format –

Number of Views:567

Avg rating:3.0/5.0

Slides: 290

Provided by: Saih9

Learn more at: https://web.stcloudstate.edu

Category:

more less

Transcript and Presenter's Notes

Title: STAT 424/524 Statistical Design for Process Improvement

1
STAT 424/524Statistical Design for Process
Improvement

Lecture 1
An Overview of Statistical Process Control

2
Book online

http//books.google.com/books?idtj1nMz8ajmQCpgP
A77lpgPA77dqcontaminatedproductionprocessso
urcewebotsisGEpndi8asig0XEQbXo2bDo7EZQ1HQvnS8
b0baMhlensaXoibook_resultresnum1ctresult
PPP1,M1

3
Homework 1 2

1 Do Problems 1.1, 1.3, 1.5, 1.7
2 Do 1.12 and Derive the table 1.9 on page 36
of the text (also on slide 20), using the
following formula

4
1.1 Introduction

Diligence, a good attitude, and hard work are not
sufficient for achieving quality control.
Statistical process control (SPC) is a way of
quality control, which enables us to seek steady
improvement in the quality of a product. It is an
effective method of monitoring a process through
the use of control charts.
Statistical Process Control was pioneered by
Walter A. Shewhart in the early 1920s. W. Edwards
Deming later applied SPC methods in the United
States during World War II.

5
Differences between Quality Control and Quality
Assurance

Suppose that you are a PhD student about to
graduate and applying for an academic position.
If you were a product, your supervisor would be
the quality control manager, and a search
committee who is reviewing your application would
be quality assurance manager. Read the following.
http//www.builderau.com.au/strategy/projectmanage
ment/soa/Quality-control-vs-quality-assurance/0,33
9028292,339191784,00.htm

6
Core Steps of a Statistical Process Control

Flowcharting of the production process
Random sampling and measurement at regular
temporal intervals at numerous stages of the
production process
The use of Pareto glitches discovered in this
sampling to backtrack in time to discover their
causes so that they can be improved.

7
Self Reading

Section 1.2 to 1.7

8
1.8 White Balls, Black Balls

Recall that the second core step of a statistics
process control is random sampling and
measurement at regular temporal intervals at
numerous stages of the production process.
The following data table shows measurements of
thickness in centimeters of 40 bolts in ten lots
of 4 each.

9
Lot Bolt 1 (smallest) Bolt 2 Bolt 3 Bolt 4 (largest)
1 9.93 10.04 10.05 10.09
2 10.00 10.03 10.05 10.12
3 9.94 10.06 10.09 10.10
4 9.90 9.95 10.01 10.02
5 9.89 9.93 10.03 10.06
6 9.91 10.01 10.02 10.09
7 9.89 10.01 10.04 10.09
8 9.96 9.97 10.00 10.03
9 9.98 9.99 10.05 10.11
10 9.93 10.02 10.10 10.11
10
Data Display

We construct a run chart of thickness against lot
number for each bolt.

11
Question Is the process in control?
12
A SAS Program
13
One Bad Lot

We add 0.500 to each of the 4 measurements in lot
10. The new graph is generated.

14
One Bad Bolt

We add 0.500 only to the 4th measurements in lot
10. The new graph is generated.

15
Run Chart of Means

We have constructed run charts of original
measurements. We can also construct run charts of
a summary statistic, say the lot mean.
To do this, we first find the mean for each lot.
Then we plot the means against corresponding
lots.
The run chart of the lot mean for the first data
set is shown below.

16
(No Transcript)
17

We add 0.500 to each of the 4 measurements in lot
10. The new run chart is generated.

18
1.9 The Basic Paradigm of Statistical Process
Control

We considered ten lots of 4 bolts each. We saw
that there is variation within each lot and
variation across lots (in terms of lot average).
A major task in SPC is to seek significantly
outlying lots, good or bad. Once found, such lots
can then be investigated to find out why they
deviate from others.
This is the basic paradigm of SPC
1. Find a Pareto glitch (a non-standard lot)
2. Discover the causes of the glitch
3. Use this information to improve the
production process.
The variability across lots is the key notion in
search for Pareto glitches.

19
1.10 Basic Statistical Procedures in Statistical
Process Control

Lets use the original thickness data.

Lot Bolt 1 Bolt 2 Bolt 3 Bolt 4
1 9.93 10.04 10.05 10.09
2 10.00 10.03 10.05 10.12
3 9.94 10.06 10.09 10.10
4 9.90 9.95 10.01 10.02
5 9.89 9.93 10.03 10.06
6 9.91 10.01 10.02 10.09
7 9.89 10.01 10.04 10.09
8 9.96 9.97 10.00 10.03
9 9.98 9.99 10.05 10.11
10 9.93 10.02 10.10 10.11
20
Control Chart on Lot Means

To construct control charts for lot means,
first calculate the mean and standard deviation
of each lot.
Then find the mean of means, and mean of
standard deviations,
Finally find the acceptance interval on the mean
that is given by where
can be read from the following table.

21
Multiplication Factors for Different Lot Sizes
n B3(n) B4(n) A3(n)
2 3 4 5 6 7 8 9 10 15 20 25 0.000 0.000 0.000 0.000 0.030 0.118 0.185 0.239 0.284 0.428 0.510 0.565 3.267 2.568 2.266 2.089 1.970 1.882 1.815 1.761 1.716 1.572 1.490 1.435 2.659 1.954 1.628 1.427 1.287 1.182 1.099 1.032 0.975 0.789 0.680 0.606
22
Mean Control Chart for Thickness Data

For the thickness data, we calculate the lot
means and standard deviations. Click here.
The acceptance interval is
where the 9.907 is called the Lower Control
Limit (LCL), and 10.123 the Upper Control Limit
(UCL).
Does any mean appear to be out of control?

23
(No Transcript)
24
Control Chart on Standard Deviation
25
Standard Deviation Control Chart for Thickness
Data

LCL 0 UCL 2.266(0.0664) 0.15

26
(No Transcript)
27
Creating Control Charts for Means and Standard
Deviations Based on Summary Statistics
Lot x-bar s
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 146.21 146.18 146.22 146.31 146.20 146.15 145.93 145.96 145.88 145.98 146.08 146.12 146.26 146.32 146.00 145.83 145.76 145.90 145.94 145.97 0.12 0.09 0.13 0.10 0.08 0.11 0.18 0.18 0.16 0.21 0.11 0.12 0.21 0.18 0.32 0.19 0.12 0.17 0.10 0.09

Refer to the problem 1.4 on page 48 of Thompsons
text. The summary statistics are reproduced here

DATA problem1_4
INPUT lot Ax As_at__at_
An 5
CARDS
1 146.21 0.12 2 146.18 0.09
3 146.22 0.13 4 146.31 0.1
5 146.2 0.08 6 146.15 0.11
7 145.93 0.18 8 145.96 0.18
9 145.88 0.16 10 145.98 0.21
11 146.08 0.11 12 146.12 0.12
13 146.26 0.21 14 146.32 0.18
15 146 0.32 16 145.83 0.19
17 145.76 0.12 18 145.9 0.17
19 145.94 0.1 20 145.97 0.09
SYMBOL v dot c red
PROC SHEWHART HISTORY problem1_4
xschart Alot
RUN

29
(No Transcript)
30
1.11 Acceptance Sampling

Data, which are characterized as defective or
not, are called acceptance/rejection data or
failure data.
Suppose that a bolt is considered defective if it
is smaller than 9.92 or greater than 10.08.
Then the first data set considered can be
converted to acceptance/rejection data as follows.

31
Lot Bolt 1 (smallest) Bolt 2 Bolt 3 Bolt 4 (largest)
1 9.93 10.04 10.05 10.09
2 10.00 10.03 10.05 10.12
3 9.94 10.06 10.09 10.10
4 9.90 9.95 10.01 10.02
5 9.89 9.93 10.03 10.06
6 9.91 10.01 10.02 10.09
7 9.89 10.01 10.04 10.09
8 9.96 9.97 10.00 10.03
9 9.98 9.99 10.05 10.11
10 9.93 10.02 10.10 10.11
Lot Bolt 1 (smallest) Bolt 2 Bolt 3 Bolt 4 (largest)
1 0 0 0 1
2 0 0 0 1
3 0 0 1 1
4 1 0 0 0
5 1 0 0 0
6 1 0 0 1
7 1 0 0 1
8 0 0 0 0
9 0 0 0 1
10 0 0 1 1
1 Defective 0 nondefective
32
Overall Proportion of Defectives

In order to apply a SPC procedure to
acceptance/rejection data, we note the proportion
of defectives in each lot.
The overall proportion of defectives is a key
statistic, which is calculated by averaging the
lot proportions of defectives.
In the previous acceptance/rejection data set,
the 10 lot proportions of defectives are 0.25,
0.25, 0.50, 0.25, 0.25, 0.50, 0.50, 0.00, 0.25,
0.50 which yields the overall proportion of
defectives (0.25 0.25 0.50 0.25 0.25
0.50 0.50 0.00 0.25 0.50)/10 0.325.

33
Control Limits on the Proportion of Defectives

The lower and upper control limits are
The rationale to choose the two limits will be
discussed in section 1.13.
For our acceptance/rejection data, we have

34
Control Charts on the Proportion of Defectives

Control charts on the proportion of defectives
are called p-charts.
One may create p-charts from count data or
summary data.

35
Creating p Charts from Count Data

An electronics company manufactures circuits in
batches of 500 and uses a p chart to monitor the
proportion of failing circuits. Thirty batches
are examined, and the failures in each batch are
counted. The following statements create a SAS
data set named CIRCUITS, which contains the
failure counts, as shown below.
data circuits
input batch fail _at__at_
datalines
1 5 2 6 3 11 4 6 5 4 6 9 7 17 8 10 9 12 10 9
11 8 12 7 13 7 14 15 15 8 16 18 17 12 18 16 19 4
20 7 21 17 22 12 23 8 24 7 25 15 26 6 27 8 28 12
29 7 30 9
run

36
SAS Statements that Create the p-Chart

symbol color salmon
title 'p Chart for the Proportion of Failing
Circuits'
proc shewhart datacircuits
pchart failbatch / subgroupn 500
cframe
lib
cinfill
bwh
coutfill
yellow
cconnect
salmon
run

37
Creating p Charts from Summary Data

The previous example illustrates how you can
create p charts using raw data (counts of
nonconforming items). However, in many
applications, the data are provided in summarized
form as proportions or percentages of
nonconforming items. This example illustrates how
you can use the PCHART statement with data of
this type.
data cirprop
input batch pfailed _at__at_
sampsize500
datalines
1 0.010 2 0.012 3 0.022 4 0.012 5 0.008 6
0.018 7 0.034
8 0.020 9 0.024 10 0.018 11 0.016 12 0.014
13 0.014
14 0.030 15 0.016 16 0.036 17 0.024 18 0.032
19 0.008
20 0.014 21 0.034 22 0.024 23 0.016 24 0.014
25 0.030
26 0.012 27 0.016 28 0.024 29 0.014 30 0.018

title 'p Chart for the Proportion of Failing
Circuits'
symbol vdot
proc shewhart datacirprop
pchart pfailedbatch / subgroupnsampsize
dataunit
proportion label
pfailed 'Proportion for FAIL'
run

39
What Data to Use, Failure Data or Measurement
Data?

The use of failure data is a very blunt
instrument when compared to the use of
measurement data.
This is because of information loss when items
are characterized as defective or not, ignoring
specific measurements.

40
Control Charts in Minitab

http//www.qualproxl.com/Control_Charts.html

41
1.12 The Case for Understanding Variation

Variation within any process, let alone a system
of processes, is inevitable.
Large variation adds to complexity and
inefficiency of a system.
Reducing variation of a process is an important
issue.

42
Two Different Sources of Variation

According to Walter Shewhart, there are two
qualitatively different sources of variation
Common cause variation (aka random variation or
noise)
Special cause variation (aka assignable
variation)
It is the special cause variation that leads to
Pareto glitches (also called signals), which can
be detected using control charts. Special cause
variation is caused by known factors that result
in a non-random disruption of output.
The special cause variation can be removed
through the proper use of control charts.

43
Processes That Are in Control

A process that is already in a state of
statistical control is not subject to special
cause variation, but only subject to common cause
or inherent variation, which is always present
and can not be reduced unless the process itself
is redesigned.
An in-control process is predictable, but may not
have to perform satisfactorily.
The first data set shown before are from an
in-control process, but improvement of the
process is still welcome.

44
Two Types of Error

Error of the first kind (aka tampering) Treating
common causes as special ones.
Error of the second kind Treating special causes
as common ones disregarding signals.

45
Process Capability

An in-control process reveals only common cause
variation. This variation is measured by process
capability.
Reduction of common cause variation requires
improvement of the process itself.

46
Improvement of a Stable Process

Shewhart and Deming developed the Plan Do
Study Act (PDSA) cycle for improvement of a
stable process.
In order to improve a stable process, one has to
PLAN it. Such a plan is recommended to be based
on a mathematical model of the process under
scrutiny. DOE is also needed.
Then DO it on a small scale that is, run it on
a pilot study.
Then STUDY or check if the changed process is in
control.
Finally, ACT accordingly adopt the change if
successful or try some other.

47
1.3 Statistical Coda

The control limits on the mean is based on the
following Central Limit Theorem
If the number of previous lots is large, say 25
or more, the average of the lot means, will
give an excellent estimate of µ, and the average
of the sample standard deviations, is a good
estimate of s when multiplied by an unbiasing
factor a(n),

Now replacing µ and s by their estimates yields
the control limits on the mean

Similarly, the CLT for the sample standard
deviation gives
It can be shown that
Now replacing E(s) and sd(s) by their estimates
yields the control limits on the standard
deviation

Finally, for failure data, the CLT also applies
for large lot size n to give

51
STAT 424/524Statistical Design for Process
Improvement

Lecture 2
Acceptance-Rejection SPC

52
Homework 3

Page 71-74 problems 1 to 6

53
Sections 2.1 and 2.2

Self reading

54
2.3 Basic Tests with Equal Lot Size

Consider the following failure data
DATA table2_1
Lot 1
INPUT defectives_at__at_
prop defectives/100
DATALINES
3 2 5 0 6 4 2 4 1 2 7 9 11 12 14 15 12 10 8
3 5
6 0 1 3 3 4 6 5 5 3 3 7 8 2 0 6 7 4 4
PROC PRINT DATA table2_1 (obs 5)

55
The SAS System

Obs Lot defective prop
1 1 3 0.03
2 2 2 0.02
3 3 5 0.05
4 4 0 0.00
5 5 6 0.06

56
symbol v dot c red PROC GPLOT DATA
TABLE2_1 PLOT propLot run quit
57
(No Transcript)
58
Control Limits on the Number of Defectives

Let n be the equal lot size and p be the
proportion of defectives in the product.
Let X number of defectives in a lot. Then
X B(n, p) that is X follows a binomial
distribution with parameters n and p.
By the CLT,

59
Control Limits on the Number of Defectives
(contd)

By the empirical rule in statistics, Z is between
3 and 3 that is,
Solving for X gets the Lower Control Limit and
Upper Control Limit for the number of defectives

60
Control Limits on the Number of Defectives
(contd)

Since the LCL might be negative, UCL might be
greater than 1, modified LCL and UCL are
A control chart based on the above LCL and UCL is
called a np chart.

symbol color salmon
title np Chart for the Proportion Defectives'
proc shewhart data table2_1
npchart defectivesLot / subgroupn 100
cframe lib
cinfill bwh
coutfill yellow
cconnect salmon
run

62
p Charts

Control limits on the proportion defectives have
the form
The corresponding charts are called p charts.

symbol color salmon
title 'p Chart for the Proportion Defectives'
proc shewhart data table2_1
pchart defectivesLot / subgroupn 100
cframe lib
cinfill bwh
coutfill yellow
cconnect salmon
run

64
2.4 Testing with Unequal Lot Sizes

If the lot sizes are unequal, the control limit
for the proportion defectives has to be
calculated for each lot separately.
The control limits for the p chart assume now the
forms

65
Page 64

DATA table2_2
Month 1
INPUT Patients Infections_at__at_
DATALINES
50 3 42 2 37 6 71 5 55 6 44 6 38
10 33 2
41 4 27 1 33 1 49 3 66 8 49 5 55
4 41 2 29 0 40 3 41 2 48 5 52 4 55
6 49 5 60 2

proc shewhart data table2_2
pchart InfectionsMonth / subgroupn
Patients
outtable CLtable
run
DATA page64
MERGE table2_2 CLtable (KEEP _SUBP_
_UCLP_)
run
PROC print
RUN
quit

67
2.5 Testing with Open-Ended Count Data

Let X denote the number of items returned per
week say.
X roughly has a Poisson distribution, i.e.,

68
Control Limits on the Number Returned Items

For Poisson distribution, the mean equals the
variance.
The control limits of the number returned items
are
A chart with these control limits is called a c
chart.
A c chart should not be used if lots are of
unequal sizes, instead, use u chart.

data table2_4
Week 1
input numReturned _at__at_
datalines
22 13 28 17 22 29 32 17 19 27 48 53 31 22 31
27 20 24 17 22 29 30 31 22 26 24
proc print run

70
c Charts

symbol color red h .8
title1 'c Chart for Number of Returned Items Per
Week'
proc shewhart datatable2_4
cchart numReturnedWeek
run

71
u Charts

Suppose the sample size in lot k is nk, and the
number defects in lot k is ck, then the number of
defects per unit in lot k is uk ck/nk.
The control limits on the average number of
defects per unit are

Example In a fabric manufacturing process, each
roll of fabric is 30 meters long, and an
inspection unit is defined as one square meter.
Thus, there are 30 inspection units in each
subgroup sample. Suppose now that the length of
each piece of fabric varies. The following
statements create a SAS data set (FABRICS) that
contains the number of fabric defects and size
(in square meters) of 25 pieces of fabric

data fabrics
input roll defects sqmeters _at__at_
datalines
1 7 30.0 2 11 27.6 3 15 30.4 4 6 34.8 5 11
26.0 6 15 28.6 7 5 28.0 8 10 30.2 9 8 28.2 10 3
31.4 11 3 30.3 12 14 27.8 13 3 27.0 14 9 30.0 15
7 32.1 16 6 34.8 17 7 26.5 18 5 30.0 19 14 31.3
20 13 31.6 21 11 29.4 22 6 28.6 23 6 27.5 24 9
32.6 25 11 31.7

The variable ROLL contains the roll number, the
variable DEFECTS contains the number of defects
in each piece of fabric, and the variable
SQMETERS contains the size of each piece.
The following statements request a u chart for
the number of defects per square meter
symbol color vig
title 'u Chart for Fabric Defects per Square
Meter'
proc shewhart datafabrics
uchart defectsroll / subgroupn sqmeters
cframe
steel
cinfill
ligr
coutfill
yellow
cconnect
vig
outlimits flimits
run

75
(No Transcript)
76
data abc input lot _at_ do i1 to 5 input
diamtr _at_ output end drop i cards 1 35.00
34.99 34.99 34.98 35.00 2 35.01 34.99 34.99 34.98
35.00 3 34.99 35.00 35.00 35.00 35.00 4 35.00
35.00 34.99 35.01 34.98 5 34.99 34.99 34.99 35.00
35.00 proc print dataabc noobs run
77
Constructing Control Charts With Summary Data In
SAS

data abc
input lot Ax As
An 5
cards
1 34.992 0.02
2 34.994 0.03
3 34.998 0.01
4 34.996 0.03
5 34.994 0.01
title Mean and Standard Deviation Charts for
Diameters
symbol vdot
proc shewhart historyabc
xschart Alot
run
quit

78
STAT 424/524Statistical Design for Process
Improvement

Lecture 3
The development of mean and standard deviation
control charts

79
Homework 4

Page 127-128 problems 3.1(a), 3.4, 12, 13, 14

80
3.1 Introduction

In a process of manufacturing bolts, which are
required to have a 10 cm diameter, we often
actually observe bolts of diameter other than 10
cm. This is a consequence of flaws in the
production process.
These imperfections might be excessive lubricant
temperature, bearing vibration, or nonstandard
raw materials, etc.
In SPC, these flaws can often be modeled as
follows
Let Y observed diameter. Then Y N(µ, s2).

To model a process clearly taking account of
possible individual flaws, we write the observed
measurement in lot t, Y(t), as
Here we have assumed additive flaws, representing
k assignable causes. There may be, in any lot t,
as many as 2k possible combinations of flaws
contributing to Y(t).

Let I be a subcollection of 1, 2, 3,..., k.
Then
In the special case where each distribution is
normal,
In the above discussion, X0 accounts for the
common cause while other Xs represents special
causes. The major task of SPC is to identify
these special causes and to take steps which
remove them.

83
3.2 A Contaminated Production Process

Continue our discuss in section 1. Let X0 N(µ0,
s02), where µ0 10 and s02 0.01.
In addition, we have one special cause due to
intermittent lubricant heating, say X1 which is
N(0.4, 0.02), and another due bearing vibration,
say X2 which is N(- 0.2, 0.08), with probability
of occurrence p1 0.01 and p2 0.005,
respectively.
So, for a sampled lot t, Y(t) can be written as

84
Or, Y(t) has the following distribution
One can verify that
85
3.3 Estimation of Parameters of the Norm
Process

The norm process refers to a uncontaminated
process whose mean and variance can be estimated
respectively by
Properties
Both are unbiased
Both are asymptotically normal.

86
Estimating the Process Standard Deviation, s

An intuitive estimator for s is the square root
of
A more commonly used estimator is the average of
lot standard deviations

87
The Famous Result
88

One can thus show that
So, an unbiased estimator of the process standard
deviation, s, is

One can also show that
This is because
So, another unbiased estimator of the process
standard deviation, s, is

90
Which One is More Efficient?
91
(No Transcript)
92
Using the (Adjusted) Average of Lot Ranges to
Estimate the Process Standard Deviation
93
Using the (Adjusted) Median of Lot Standard
Deviations to Estimate the Process Standard
Deviation
94
3.4 Robust Estimators for Uncontaminated Process
Parameters

Suppose that the proportion of good lots is p and
the proportion of bad lots is 1 p.
Suppose that data in a good lot come from a
normal distribution with mean µ0 and standard
deviation s0.
Suppose that data in a bad lot come from a normal
distribution with mean µ1 and standard deviation
s1.

How can we construct a control chart for lot
means, based on data from the above contaminated
distribution?
A solution The control limits are

96
A Simulation Study

Lets generate 90 lots of size 5 each. A lot is
good with probability p 70.
Data in a good lot are from N(10, 0.01).
Data in a bad lot are from N(9.8, 0.09).
Data simulated in Excel.

97
A SAS Program Generate Good and Bad Lots
data table3_3 retain mu0 10retain mu1
9.8retain sigma0 0.1retain sigma1 0.3 do
Lot 1 to 90 u ranuni(12345) if (u lt 0.7)
then do j 1 to 5 x mu0
sigma0rannor(1) /good lot data/ output e
nd else do j 1 to 5 x mu1
sigma1rannor(1) /bad lot data/ output en
d end keep Lot x Run proc print
run symbol v dot c red proc
shewhart xschart xLot run quit
98
R Program Generate Data from Good and Bad Lots
mu0 100 mu1 9.8 sigma0 0.01 sigma1
0.09 p 0.7 n 90 k5 x matrix(0,n,k)
for(i in 1n) u lt- runif(1) if (u lt p)
xi, rnorm(k, mu0, sigma0) from good lot
else xi, rnorm(k, mu1, sigma1) from
bad lot x
99
3.5 A Process with Mean Drift3.6 A Process with
Upward Drift in Variance

Skip

100
3.7 Charts for Individual Measurements

Grouping as many measurements as possible into a
lot is important in order to get more accurate
estimates of the population mean and standard
deviation.
This is not possible in some situations.
The reason typically is that either the
production rate is too slow or the production is
performed under precisely the same conditions
over short time intervals.

101
Construct Control Charts for Individual
Measurements

Lets consider 90 observations coming from
N(10, 0.01) with probability
0.855,
N(10, 0.01) with probability 0.095,
N(10, 0.01) with probability
0.045, and
N(10, 0.01) with probability
0.005.
We construct a scatterplot of the 90 measurements
against corresponding lot numbers. The control
limits are

102
Moving Ranges Control Charts for Individual
Measurements

A better control charts for Individual
Measurements are based on artificial lots of size
2 or 3 and calculate the so-called moving ranges.
Given N individual measurements, X1, X2, ...,XN,
the ith moving range of n observations is defined
as the difference between the largest and the
smallest value in the ith artificial lot formed
from the n measurements Xi, Xi1 , ...,Xin-1,
where i 1, 2, ..., N n 1.

103

The average of N n 1 moving ranges is
The moving range control chart has
D3(n) and D4(n) are given in Table 3.9 of the
text, p115.

104

Given N individual measurements, X1, X2, ...,XN,
the N 1 moving ranges of n 2 observations are
defined as
MR1 X2 - X1, MR2 X3 X2,...,
MRN-1 XN XN-1.
The average of N 1 moving ranges is

105
X Charts based on Moving Ranges

For given individual measurements, X1, X2,
...,XN, the X chart is constructed by estimating
the population standard deviation s, the product
of b2 and the the average of the moving ranges.
The control limits are

106
SAS Code for X Charts based on Moving Ranges
data table3_6 mu0 10 mu1 10.4 mu2 9.8
mu3 10.2 sigma0 0.1 sigma1 sqrt(0.03)
sigma2 0.3 sigma3 sqrt(0.11) do Lot 1 to
90 u ranuni(12345) if (u lt 0.855) then
do x mu0 sigma0rannor(1) output
end else if (u lt 0.95) then do x mu1
sigma1rannor(1) output end else if
(u lt 0.995) then do x mu2
sigma2rannor(1) output end else do
x mu3 sigma3rannor(1) output end end
keep Lot x Run proc print Run symbol v
dot c red proc shewhart xchart
xLot run quit
107
3.8 Process Capability

We have discussed some control charts for
detecting Pareto glitches in a process. Whether
an in control process meets some technological
specification is another important issue.
It is summary statistics for lots that are
examined for the purpose of controlling a
process, while individual measurements are
compared to specifications.
The capability of an in-control process in
relation to technological specifications is
measured by some indices.

108
The Cp Index
109
The Cp Index for Nonnormal Data
110
The Cpk Index

The index Cp does not account for process
centering.
To account for process centering, use

111

Example Use Table 3.10 on textbook page 121.
Calculate Cp and Cpk. Suppose that the diameter
was specified to 6.75 mm with tolerances /- 0.1
mm that is LSL 6.65 mm and USL 6.85 mm.
Solution From the table,

112

http//www.itl.nist.gov/div898/handbook/pmc/sectio
n1/pmc16.htm

113
SAS proc capability
data amps label decibels 'Amplification in
Decibels (dB)' input decibels _at__at_ datalines
4.54 4.87 4.66 4.90 4.68 5.22 4.43 5.14 3.07
4.22 5.09 3.41 5.75 5.16 3.96 5.37 5.70 4.11 4.83
4.51 4.57 4.16 5.73 3.64 5.48 4.95 4.57 4.46 4.75
5.38 5.19 4.35 4.98 4.87 3.53 4.46 4.57 4.69 5.27
4.67 5.03 4.50 5.35 4.55 4.05 6.63 5.32 5.24 5.73
5.08 5.07 5.42 5.05 5.70 4.79 4.34 5.06 4.64 4.82
3.24 4.79 4.46 3.84 5.05 5.46 4.64 6.13 4.31 4.81
4.98 4.95 5.57 4.11 4.15 5.95 run
title 'Boosting Power of Telephone Amplifiers'
legend2 FRAME CFRAMEligr CBORDERblack
POSITIONcenter proc capability dataamps
noprint alpha0.10 var decibels spec target
5 lsl 4 usl 6 ltarget 2 llsl
3 lusl 4 ctarget red
clsl yellow cusl yellow histogram decibels
/ cframe ligr cfill steel cbarline white
legend legend2 inset cpklcl cpk cpkucl /
header '90 Confidence Interval' cframe black

ctext black cfill ywh format 6.3
run
114
(No Transcript)
115
The following statements can be used to produce a
table of process capability indices including the
index Cpk ods select indices proc capability
dataamps alpha0.10 spec target 5 lsl 4
usl 6 ltarget 2 llsl 3
lusl 4 var decibels run
116
STAT 424/524Statistical Design for Process
Improvement

Lecture 4
Sequential Approaches

117
Homework 5

Page 166 problems 4.1 (do the first part only)
and 4.10

118
4.1 Introduction

The first three chapters deal with Shewhart
control charts, which are useful in detecting
special cause variation.
A major disadvantage of a Shewhart control chart
is that it uses only the information about the
process contained in the last sample observation
and it ignores any information given by the
entire sequence of points. This feature makes the
Shewhart control charts insensitive to small
process shift.
This chapter deal with two alternatives to the
Shewhart control charts cumulative sum (CUSUM)
control charts and Exponentially Weighted Moving
Average (EWMA) control charts, both are sensitive
to small process drifts.

119
4.2 The Sequential Likelihood Ratio Test

Suppose we have a time ordered data set x1, x2,
, xn coming from a distribution with density
f(x ?). We may wish to test whether the true
parameter is ?0 or ?1.
A natural criterion for deciding between the two
parameters is the log-likelihood ratio

120
Decision Rule

We propose the following decision rule
When z ln(k0), (x1, x2, , xn) being said
in region Gn0, we decide for ?0
When z ln(k1), (x1, x2, , xn) being said
in region Gn1, we decide for ?1
Otherwise, we are in region Gn, and we
continue sampling.

121

Before we make our decision, our sample (x1, x2,
, xn) falls in one of three regions
Gn0, Gn1, and Gn.
Denote the true parameter by ?. The probability
of ever declaring for ?0 is given by
L(?) P(G10?) P(G20?)
By the definition of Gn0, we have

122
Let us suppose that if ? is truly equal to ?0, we
wish to have L(?0) 1 - ?. Let us suppose that
if ? is truly equal to ?1, we wish to have L(?1)
?. Here ? and ? are customarily referred to as
Type I and Type II errors. Then, we must have
? L(?1) k0L(?0) k0(1 - ?). So, By a
similar argument for Gn1, we have In
practice, choose

123
We can show that the actual Type I and Type II
errors, say ? and ? satisfy

124
4.3 CUSUM Test for Shift of the Mean

To detect a shift of the mean of a production
process from ?0 to some other value ?1, we
consider a sequential test.
We assume that the process variance is known and
does not change.
We propose a test on the basis of the
log-likelihood ratio of N sample means, each
sample being of size n. The test statistic is

125
The test statistic R1 is based on a cumulative
sum. It is not so much oriented to detecting
Pareto glitches, but rather to discovering a
persistent change in the mean.
126

For given Type I and Type II errors, say ? 0.01
and ? 0.01, by the sequential test procedure,
The CUSUM chart is a plot of R1, based on data up
to the jth sample, versus j, j 1, 2, , N,
added with control limits ln(k0) and ln(k1).

127
data table3_6 retain mu0 10 mu1 10.4 mu2 9.8
mu3 10.2 retain sigma0 0.1 sigma1 0.1732
sigma2 0.3 sigma3 0.3316625 array x(5) x1 x2
x3 x4 x5 do Lot 1 to 90 u
ranuni(12345) if (u lt 0.855) then
do do i 1 to 5 xi mu0
sigma0rannor(1) end output end
else if (u lt 0.95) then do do i 1 to
5 xi mu1 sigma1rannor(1) end
output end else if (u lt 0.995) then
do do i 1 to 5 xi mu2
sigma2rannor(1) end output end
else do do i 1 to 5 xi mu3
sigma3rannor(1) end output end end Keep
Lot x1-x5 Run proc printrun Data means set
table3_6 sum mean(of
x1-x5) m sum/Lot R1 Lot(5/0.1)(m -
(10100.1)/2) LCL -4.595 UCL
4.595 Keep Lot R1 LCL UCL Run
symbol1 v dot c blue r 1 symbol2 v dot c
red r 1 symbol3 v dot c blue r 1 proc
gplot data means plot (LCL R1
UCL)Lot/overlay label R1 R1
statistic run data long set table3_6 Lot
_N_ array x5 x1-x5 do i1 to 5 y
xi output end keep Lot y Run Proc
cusum data long xchart yLot/mu0 10.0
sigma0 0.1
delta 1
alpha 0.1
vaxis -20 to 80 Run quit
128
4.4 Shewhart CUSUM Charts

A popular empirical alternative to the CUSUM
chart is the Shewhart CUSUM chart. This chart is
based on the pooled cumulative/running means
Suppose that all the lot means are iid with
common lot mean ?0 and lot variance ?02/n. Then
A Shewhart CUSUM Chart for mean shift is one that
plots zi against I, along with the horizontal
lines 3 and 3.

129
Shewhart CUSUM Charts
Data means2 set table3_6
sum mean(of x1-x5) m sum/Lot R2
sqrt(Lot5)/0.1(m-10) LCL -3 UCL
3 Keep Lot sum m R2 LCL UCL Run proc print
run symbol1 v dot c black r 1 symbol2 v
dot c red r 1 symbol3 v dot c red r
1 proc gplot data means2 plot (R2 LCL
UCL)Lot/overlay label R2 R2
statistic run quit
130
4.8 Acceptance-Rejection CUSUMs

Let p denote the proportion of defective goods
from a production system.
Let p0 denote the target proportion deemed
appropriate.
When p rises to p1, intervention will be
introduced.
Let nj denote the size of lot j. Then the
likelihood ratio is given by

131
(No Transcript)
132
CUSUM Test for Defect Data

To detect a process drift in mean, plot R5 versus
Lot N. To horizontal lines R5 - 4.596 and R5
4.596 are also plotted.
Any point in the plot that is above the line R5
4.596 indicates a process drift in proportion.

133
data table4_8 Lot _N_ input defective
proportion_at__at_ size 100 cards
3 0.03 2 0.02 5 0.05 0 0.00 6 0.06 4 0.04 2 0.02
4 0.04 1 0.01 2 0.02 7 0.07 9
0.09 11 0.11 12 0.12 14 0.14 15 0.15 12 0.12 10
0.10 8 0.08 3 0.03 5 0.05 6 0.06 0
0.00 1 0.01 3 0.03 3 0.03 4 0.04 6 0.06 5 0.05 5
0.05 3 0.03 3 0.03 7 0.07 8 0.08
2 0.02 0 0.00 6 0.06 7 0.07 4 0.04 4 0.04
run data new set table4_8 p1 0.05 p0
0.03 xsum defective nsum size R5
log(p1/p0)xsum log((1-p1)/(1-p0))(nsum -
xsum) LCL -4.596 UCL 4.596 keep Lot R5
LCL UCL run proc print run symbol1 v dot c
black r 1 symbol2 v dot c red r
1 symbol3 v dot c red r 1 proc gplot data
new plot (R5 LCL UCL)Lot/overlay run quit
134
Shewhart CUSUM Test for Defect Data
Plot R6 against Lot, along with the two lines R6
- 3 and R6 3.
135
data table4_8 Lot _N_ input defective
proportion_at__at_ size 100 cards
3 0.03 2 0.02 5 0.05 0 0.00 6 0.06 4 0.04 2 0.02
4 0.04 1 0.01 2 0.02 7 0.07 9
0.09 11 0.11 12 0.12 14 0.14 15 0.15 12 0.12 10
0.10 8 0.08 3 0.03 5 0.05 6 0.06 0
0.00 1 0.01 3 0.03 3 0.03 4 0.04 6 0.06 5 0.05 5
0.05 3 0.03 3 0.03 7 0.07 8 0.08
2 0.02 0 0.00 6 0.06 7 0.07 4 0.04 4 0.04
run data new set table4_8 p1 0.05 p0
0.03 xsum defective nsum size R6
(xsum - p0nsum)/sqrt(nsump0(1-p0)) LCL -3
UCL 3 keep Lot R6 LCL UCL run proc print
run symbol1 v dot c black r 1 symbol2 v
dot c red r 1 symbol3 v dot c red r
1 proc gplot data new plot (R6 LCL
UCL)Lot/overlay run quit
136
STAT 424/524Statistical Design for Process
Improvement

Lecture 5
Exploratory Techniques for Preliminary Analysis

137
Homework 6

Page 220 problems 2, 4, 5, 6

138
5.2 The Schematic Plot The Boxplot
Maximum observation
Upper fence (not drawn) 1.5 (IQR) above
75th percentile
1.5 (IQR)
75th percentile

Mean (specified with SYMBOL1 statement)
Interquartile Range (IQR
Median

25th percentile
1.5 (IQR)
Whisker
Minimum observation
Lower fence (not drawn) 1.5 (IQR) below
25th percentile
BOXSTYLE schematic ( or schematicid, or
schematicidfar, if id statement used)
Observations that are outside the fences point to
Pareto glitches.
139
data myData retain mu0 10 mu1 10.4 mu2 9.8 mu3
10.2 retain sigma0 0.1 sigma1 0.1732 sigma2 0.3
sigma3 0.3316625 array x(5) x1 x2 x3 x4
x5 do Lot 1 to 90 u
ranuni(12345) if (u lt 0.855) then do
do i 1 to 5 xi mu0
sigma0rannor(1) end output end
else if (u lt 0.95) then do do i 1 to
5 xi mu1 sigma1rannor(1)
endoutput end else if (u lt 0.995) then
do do i 1 to 5 xi mu2
sigma2rannor(1) endoutput end
else do do i 1 to 5 xi mu3
sigma3rannor(1) output endoutput
end end keep Lot x1-x5Run Data
mean set myData lotMean mean (of x1-x5)
x "-" keep Lot lotMean x Run Symbol
v plus c blue title Box Plot of Lot
Means proc boxplot / create side-by-side
boxplot/ plot lotMeanx/ boxstyleschematicidfar
idsymbolcircle / identify obs. out of
fences or extremes/ id Lot label x ''
run
140
5.3 Smoothing by Threes

Signal is usually contaminated with noise. John
Tukey developed the so-called 3R smooth method
which somehow removes the jitters and enables one
to better approximate the signal.
http//www.galaxy.gmu.edu/ACAS/ACAS00-02/ACAS00/Th
ompsonJames/ThompsonJames.pdf

141
3R SAS or R Program
142
5.4 Bootstrapping

Most of the standard testing in SPC is based on
the assumption that lot means are normally
distributed.
This assumption is questionable because
measurements may not be normal and lot sizes are
usually small, say less than 10.
To avoid the normality assumption, one use
resampling.
Bootstrapping is one of the resampling methods.

143
Bootstrapping Means

Suppose we have a data set of size n. We wish to
construct a bootstrap confidence interval for the
mean of the distribution from which the data were
taken.
There are at least four methods for bootstrapping
the mean.

144
The Percentile Method

The procedure is as follows
Select with replacement n of the original
observations. Such a sample is called a bootstrap
sample. Computer the mean of this bootstrap
sample.
Repeat the resampling procedure B 10,000 times.
The B means are denoted as
Order theses means from smallest to largest.
Denote the 250th largest value and the 9750th
largest value as a and b, respectively, then the
95 percentile bootstrap confidence interval of
the mean is a, b.

145
R program for the Percentile Method
boot function(x, B) n length(x) A
matrix(0, B, n) for (i in 1B)
Ai, sample(x, n, replace T)
A xc(2,5,1,8,3,2) D boot(x, 10000) y
apply(D, 1, mean) y holds 10000
means confidenceInterval c(y250, y9750)
146
Lunneborg's Method

Denote the mean of the original sample by
Clifford Lunneborg proposed to use
as the 95 confidence interval of the mean.

147
The Bootstrapped t Method

Denote the B bootstrap standard deviations as
Calculate the B t values
Order these t values from smallest to largest.
Denote the 250th as a and the 9750th as b. Then
the 95 bootstrapped t confidence interval is

148
The BCa Method

A better confidence interval for a parameter is
constructed using the BCa (bias correction and
acceleration) method.
One may be concerned with two problems. One is
that the sample estimate may be a biased estimate
of the population parameter.
Another problem is that the standard deviation of
the sample estimate usually depends on the
unknown parameter we are trying to estimate.
To deal with the two problems, Bradley Efron
proposed the BCA method.
For details, refer to this paper.

149
5.5 Pareto and Ishikawa Diagrams

The Pareto diagram tells top management where it
is most appropriate to spend resources in finding
problems.
The Ishikawa diagram, also known as fishbone
diagram or cause and effect diagram, is favored
by some as a tool for finding the ultimate cause
of a system failure. See an example of such a
diagram on page 197.

150
Create Pareto Charts Using SAS
data failure3 input cause1-16
count cards Contamination 14 Corrosion
2 Doping 1 Metallization 2 Miscellaneous 3
Oxide Defect 8 Silicon Defec 1 run title
'Analysis of IC Failures' symbol color
salmon proc pareto datafailure3 vbar
cause / freq count
scale count interbar
1.0 last
'Miscellaneous' nlegend
'Total Circuits' cframenleg
ywh cframe green
cbars vigb run
151
5.6 A Bayesian Pareto Analysis for System
Optimization of the Space Station
152
STAT 424Statistical Design for Process
Improvement

Lecture 6
Introductory Statistical Inference and Regression
Analysis

153
1.1 Elementary Statistical Inference

Population
Sample
Statistical inference
the endeavor that uses sample data to make
decision about a population.
Statistic
Estimators and estimates
Random variable

154
Unbiasedness and Efficiency

Unbiasedness

155
Suppose ? is an unknown parameter which is to be
estimated from measurements x, distributed
according to some probability density function
f(x?). It can be shown that the variance of any
unbiased estimator of ? is bounded by the
inverse of the Fisher information I(?)
where the Fisher
information I(?) is defined by and
is the natural logarithm of the
likelihood function and E denotes the expected
value. The efficiency of is defined to be the
following ratio The sample mean and sample
median of a normal sample are both unbiased
estimators of the population mean. The sample
mean is more efficient.
(CramérRao lower bound)
156
Point and Interval Estimation

When we estimate a parameter ? by , we say
is a point estimator of ?.
Alternatively, we use interval to locate the
unknown parameter ?. Such an interval contains
the unknown parameter with some probability 1
a. The interval is called a 1 a confidence
interval.
A 95 confidence interval means that, when the
random sampling procedure is repeated 1000 times,
among the 1000 confidence intervals, about 950
will cover the known parameter ?.

157
Confidence Intervals for the Mean of a Normal
Population

We consider a population that is normally
distributed as N(µ, s2).
If the variance s2 is known, then the exact 1 a
confidence interval for µ is
But, s is usually unknown. We estimate it by the
sample standard deviation s. A new exact 1 a
confidence interval for µ is

158
Normal-theory Based Confidence Interval for a
Parameter ?

The point estimator is usually normally
distributed when sample size n is large (gt30),
even for a non-normal population.
A 1 a confidence interval is then constructed
as

159
Examples
data Heights label Height 'Height
(in)' input Height _at__at_ datalines
64.1 60.9 64.1 64.7 66.7 65.0 63.7 67.4 64.9
63.7 64.0 67.5 62.8 63.9 65.9 62.3 64.1 60.6
68.6 68.6 63.7 63.0 64.7 68.2 66.7 62.8 64.0
64.1 62.1 62.9 62.7 60.9 61.6 64.6 65.7 66.6
66.7 66.0 68.5 64.4 60.5 63.0 60.0 61.6 64.3
60.2 63.5 64.7 66.0 65.1 63.6 62.0 63.6 65.8
66.0 65.4 63.5 66.3 66.2 67.5 65.8 63.1 65.8
64.4 64.0 64.9 65.7 61.0 64.1 65.5 68.6 66.6
65.7 65.1 70.0 run title 'Analysis of
Female Heights' proc univariate dataHeights
mu0 65 alpha 0.05 normal var
Height histogram Height qqplot Height
probplot Height run
160
Confidence Interval for Difference between Two
Means of Normal Populations with Unequal Known
Variance
161
Confidence Interval for Difference between Two
Means of Normal Populations with Equal Unknown
Variance
162
Confidence Interval for Difference between Two
Means (Equal Unknown Variances), When Sample
Sizes Are Large
163
Examples
164
Confidence Interval for a Proportion
165
SAS Procedure for a Proportion PROC FREQ
data Color input Region Eyes Hair
Count _at__at_ label Eyes 'Eye Color'
Hair 'Hair Color'
Region'Geographic Region'
datalines 1 blue fair 23 1 blue red
7 1 blue medium 24 1 blue dark 11 1
green fair 19 1 green red 7 1 green
medium 18 1 green dark 14 1 brown fair 34
1 brown red 5 1 brown medium 41 1 brown
dark 40 1 brown black 3 2 blue fair
46 2 blue red 21 2 blue medium 44 2
blue dark 40 2 blue black 6 2 green
fair 50 2 green red 31 2 green medium 37
2 green dark 23 2 brown fair 56 2 brown
red 42 2 brown medium 53 2 brown dark
54 2 brown black 13 proc freq
dataColor orderfreq weight Count
tables Eyes / binomial alpha.1 tables
Hair / binomial(p.28) title 'Hair and Eye
Color of European Children' run
166
Confidence Interval for the Difference between
Two Proportions (Independent Samples)
167
Confidence Interval for the Difference between
Two Proportions (Paired Samples)
168
Examples
169
Tests of Hypotheses

The null hypothesis
The alternative hypothesis
Type I and type II errors
Level of significance

170
One Sample t-Test
title 'One-Sample t Test' data time input
time _at__at_ datalines 43 90 84
87 116 95 86 99 93 92 121 71 66 98 79 102 60 112
105 98 run proc ttest
h080 alpha 0.05 var time run
171
Two-Sample t-Test Comparing Group Means

Equal variance case
Unequal variance case

172
Two-Sample t-Test Comparing Group Means
title 'Comparing Group Means' data
OnyiahExample1_14 input machine speed _at__at_
datalines 1 1603 1 1604 1 1605 1 1605 1 1602
1 1601 1 1596 1 1598 1 1599 1 1602
1 1614 1 1612 1 1607 1 1593 1 1604 2 1602
2 1597 2 1596 2 1601 2 1599 2 1603 2 1604 2
1602 2 1601 2 1607 2 1600 2 1596 2
1595 2 1606 2 1597
run proc ttest / produce results for both
equal and unequal variances/ class machine
var speed run
Question How can you find the p-value for
one-sided test? Use symmetry.
173
Paired Comparison Paired t-Test
Pairs (i) Before Treatment After
Treatment Differences (di) 1
Y11
Y12 Y11 - Y12 2
Y21
Y22
Y21 Y22 3 Y31
Y32
Y31 Y32 . . . N
Yn1
Yn2 Yn1 - Yn2
174
Two-Sample Paired t-Test Comparing Group Means
title 'Paired Comparison' data pressure
input SBPbefore SBPafter _at__at_ d SBPbefore -
SBPafter datalines 120 128 124 131 130 131
118 127 140 132 128 125 140 141 135
137 126 118 130 132 126 129 127 135
Run proc univariate var d Run proc ttest
paired SBPbeforeSBPafter run
175
Operating Characteristic (OC) Curves
176
Find the Power of the Test for a Population Mean

Assume that we have the following
H0 ? 1500 (1500 is called the claimed
value)
H1 ? lt 1500
Sample size n 20
Significance level ? 0.05
The population standard deviation is known ?
110.
Question
(1) Find the power of the test, which is the
probability of
rejecting the null hypothesis, given
that the population
mean is actually 1450 (called
alternative value).
(2) Find powers corresponding to any
alternative ?. Plot the
power against ?.

177

Since the test is left-tailed, the rejection
region is the left tail on the number line. The
borderline value is - z? - 1.645. That is, the
rejection can be written as
Replace ?0 1500, ? 110, and n 20 to solve
the above inequalities
When ? is actually 1450, follows a normal
distribution with mean 1450 and standard
deviation
The power is

178
R codes for Power Calculation and Plot
power.mean function(mu01500,mu11450,sigma110,
n20, level 0.05, tailc("left", "two",
"right")) s sigma/sqrt(n) if (tail
"two") E qnorm(1-level/2)s c1 mu0
- E c2 mu0 E
pL pnorm(c1, mu1, s) pR pnorm(c2, mu1,
s) power 1 - pR pL else if (tail
"left") E qnorm(1 - level)s c1
mu0 - E
pL pnorm(c1, mu1, s) power pL
else E qnorm(1 - level)s c2 mu0
E pR pnorm(c2, mu1, s) power 1 - pR
return(power) power.mean(mu0 1500, mu1
1450, sigma 110, n 20, level 0.05, tail
"left") mu0 1500 mu seq(1350, 1550, by 1)
n 20 level 0.05 tail "left" power
power.mean(mu0 mu0, mu1 mu, n n, level
level, tail tail) plot(mu, power, type "l",
xlab expression(mu), col "blue", lwd 3) n1
30 power
power.mean(mu0 mu0, mu1 mu, n n1, level
level, tail tail) lines(mu, power, type "l",
col "red", lwd 3) abline(v1500) legend(1352,
0.3, legendc(paste("Claimed ", mu0),
paste("Level ", level), paste("Sample
Size ", n), paste("Sample Size ", n1)),
text.col c(1,1,4,2))
179
1.2 Regression Analysis

Suppose that the true relationship between a
response variable y and a set of predictor
variable x1, x2, , xp is y f(x1, x2, , xp).
But, due to measurement error, y may be observed
as
y f(x1, x2, , xp) ?.
()
If the assumption that ? is distributed as N(0,
?2), then it is said that we have a normal
regression model.
If f(x1, x2, , xn) ?0 ?1x1 ?2x2
?pxp, then the model is called a normal linear
regression model.

180
The Ordinary Least Squares Method

Suppose that n observations (xi1, xi2, , xip,
yi), i 1, 2, , n are available from an
experiment or a pure observational study.
Then the model () can be written as
yi f(xi1, xi2, , xip) ?i, i
1, 2, , n.
Suppose that f has a known form. To estimate the
function f, a traditional method is the least
squares method.
The method starts from minimizing the error sum
of squares