Title: STAT 424/524 Statistical Design for Process Improvement
1STAT 424/524Statistical Design for Process
Improvement
- Lecture 1
- An Overview of Statistical Process Control
2Book online
- http//books.google.com/books?idtj1nMz8ajmQCpgP
A77lpgPA77dqcontaminatedproductionprocessso
urcewebotsisGEpndi8asig0XEQbXo2bDo7EZQ1HQvnS8
b0baMhlensaXoibook_resultresnum1ctresult
PPP1,M1
3Homework 1 2
- 1 Do Problems 1.1, 1.3, 1.5, 1.7
- 2 Do 1.12 and Derive the table 1.9 on page 36
of the text (also on slide 20), using the
following formula -
41.1 Introduction
- Diligence, a good attitude, and hard work are not
sufficient for achieving quality control. - Statistical process control (SPC) is a way of
quality control, which enables us to seek steady
improvement in the quality of a product. It is an
effective method of monitoring a process through
the use of control charts. - Statistical Process Control was pioneered by
Walter A. Shewhart in the early 1920s. W. Edwards
Deming later applied SPC methods in the United
States during World War II.
5Differences between Quality Control and Quality
Assurance
- Suppose that you are a PhD student about to
graduate and applying for an academic position.
If you were a product, your supervisor would be
the quality control manager, and a search
committee who is reviewing your application would
be quality assurance manager. Read the following. - http//www.builderau.com.au/strategy/projectmanage
ment/soa/Quality-control-vs-quality-assurance/0,33
9028292,339191784,00.htm
6Core Steps of a Statistical Process Control
- Flowcharting of the production process
- Random sampling and measurement at regular
temporal intervals at numerous stages of the
production process - The use of Pareto glitches discovered in this
sampling to backtrack in time to discover their
causes so that they can be improved.
7Self Reading
81.8 White Balls, Black Balls
- Recall that the second core step of a statistics
process control is random sampling and
measurement at regular temporal intervals at
numerous stages of the production process. - The following data table shows measurements of
thickness in centimeters of 40 bolts in ten lots
of 4 each.
9Lot Bolt 1 (smallest) Bolt 2 Bolt 3 Bolt 4 (largest)
1 9.93 10.04 10.05 10.09
2 10.00 10.03 10.05 10.12
3 9.94 10.06 10.09 10.10
4 9.90 9.95 10.01 10.02
5 9.89 9.93 10.03 10.06
6 9.91 10.01 10.02 10.09
7 9.89 10.01 10.04 10.09
8 9.96 9.97 10.00 10.03
9 9.98 9.99 10.05 10.11
10 9.93 10.02 10.10 10.11
10Data Display
- We construct a run chart of thickness against lot
number for each bolt.
11Question Is the process in control?
12A SAS Program
13One Bad Lot
- We add 0.500 to each of the 4 measurements in lot
10. The new graph is generated.
14One Bad Bolt
- We add 0.500 only to the 4th measurements in lot
10. The new graph is generated.
15Run Chart of Means
- We have constructed run charts of original
measurements. We can also construct run charts of
a summary statistic, say the lot mean. - To do this, we first find the mean for each lot.
Then we plot the means against corresponding
lots. - The run chart of the lot mean for the first data
set is shown below.
16(No Transcript)
17- We add 0.500 to each of the 4 measurements in lot
10. The new run chart is generated.
181.9 The Basic Paradigm of Statistical Process
Control
- We considered ten lots of 4 bolts each. We saw
that there is variation within each lot and
variation across lots (in terms of lot average). - A major task in SPC is to seek significantly
outlying lots, good or bad. Once found, such lots
can then be investigated to find out why they
deviate from others. - This is the basic paradigm of SPC
- 1. Find a Pareto glitch (a non-standard lot)
- 2. Discover the causes of the glitch
- 3. Use this information to improve the
production process. - The variability across lots is the key notion in
search for Pareto glitches.
191.10 Basic Statistical Procedures in Statistical
Process Control
- Lets use the original thickness data.
Lot Bolt 1 Bolt 2 Bolt 3 Bolt 4
1 9.93 10.04 10.05 10.09
2 10.00 10.03 10.05 10.12
3 9.94 10.06 10.09 10.10
4 9.90 9.95 10.01 10.02
5 9.89 9.93 10.03 10.06
6 9.91 10.01 10.02 10.09
7 9.89 10.01 10.04 10.09
8 9.96 9.97 10.00 10.03
9 9.98 9.99 10.05 10.11
10 9.93 10.02 10.10 10.11
20Control Chart on Lot Means
- To construct control charts for lot means,
- first calculate the mean and standard deviation
of each lot. - Then find the mean of means, and mean of
standard deviations, - Finally find the acceptance interval on the mean
that is given by where
can be read from the following table.
21Multiplication Factors for Different Lot Sizes
n B3(n) B4(n) A3(n)
2 3 4 5 6 7 8 9 10 15 20 25 0.000 0.000 0.000 0.000 0.030 0.118 0.185 0.239 0.284 0.428 0.510 0.565 3.267 2.568 2.266 2.089 1.970 1.882 1.815 1.761 1.716 1.572 1.490 1.435 2.659 1.954 1.628 1.427 1.287 1.182 1.099 1.032 0.975 0.789 0.680 0.606
22Mean Control Chart for Thickness Data
- For the thickness data, we calculate the lot
means and standard deviations. Click here. - The acceptance interval is
- where the 9.907 is called the Lower Control
Limit (LCL), and 10.123 the Upper Control Limit
(UCL). - Does any mean appear to be out of control?
-
-
23(No Transcript)
24Control Chart on Standard Deviation
25Standard Deviation Control Chart for Thickness
Data
- LCL 0 UCL 2.266(0.0664) 0.15
26(No Transcript)
27Creating Control Charts for Means and Standard
Deviations Based on Summary Statistics
Lot x-bar s
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 146.21 146.18 146.22 146.31 146.20 146.15 145.93 145.96 145.88 145.98 146.08 146.12 146.26 146.32 146.00 145.83 145.76 145.90 145.94 145.97 0.12 0.09 0.13 0.10 0.08 0.11 0.18 0.18 0.16 0.21 0.11 0.12 0.21 0.18 0.32 0.19 0.12 0.17 0.10 0.09
- Refer to the problem 1.4 on page 48 of Thompsons
text. The summary statistics are reproduced here -
28- DATA problem1_4
- INPUT lot Ax As_at__at_
- An 5
- CARDS
- 1 146.21 0.12 2 146.18 0.09
- 3 146.22 0.13 4 146.31 0.1
- 5 146.2 0.08 6 146.15 0.11
- 7 145.93 0.18 8 145.96 0.18
- 9 145.88 0.16 10 145.98 0.21
- 11 146.08 0.11 12 146.12 0.12
- 13 146.26 0.21 14 146.32 0.18
- 15 146 0.32 16 145.83 0.19
- 17 145.76 0.12 18 145.9 0.17
- 19 145.94 0.1 20 145.97 0.09
-
- SYMBOL v dot c red
- PROC SHEWHART HISTORY problem1_4
- xschart Alot
- RUN
29(No Transcript)
301.11 Acceptance Sampling
- Data, which are characterized as defective or
not, are called acceptance/rejection data or
failure data. - Suppose that a bolt is considered defective if it
is smaller than 9.92 or greater than 10.08. - Then the first data set considered can be
converted to acceptance/rejection data as follows.
31Lot Bolt 1 (smallest) Bolt 2 Bolt 3 Bolt 4 (largest)
1 9.93 10.04 10.05 10.09
2 10.00 10.03 10.05 10.12
3 9.94 10.06 10.09 10.10
4 9.90 9.95 10.01 10.02
5 9.89 9.93 10.03 10.06
6 9.91 10.01 10.02 10.09
7 9.89 10.01 10.04 10.09
8 9.96 9.97 10.00 10.03
9 9.98 9.99 10.05 10.11
10 9.93 10.02 10.10 10.11
Lot Bolt 1 (smallest) Bolt 2 Bolt 3 Bolt 4 (largest)
1 0 0 0 1
2 0 0 0 1
3 0 0 1 1
4 1 0 0 0
5 1 0 0 0
6 1 0 0 1
7 1 0 0 1
8 0 0 0 0
9 0 0 0 1
10 0 0 1 1
1 Defective 0 nondefective
32Overall Proportion of Defectives
- In order to apply a SPC procedure to
acceptance/rejection data, we note the proportion
of defectives in each lot. - The overall proportion of defectives is a key
statistic, which is calculated by averaging the
lot proportions of defectives. - In the previous acceptance/rejection data set,
the 10 lot proportions of defectives are 0.25,
0.25, 0.50, 0.25, 0.25, 0.50, 0.50, 0.00, 0.25,
0.50 which yields the overall proportion of
defectives (0.25 0.25 0.50 0.25 0.25
0.50 0.50 0.00 0.25 0.50)/10 0.325.
33Control Limits on the Proportion of Defectives
- The lower and upper control limits are
- The rationale to choose the two limits will be
discussed in section 1.13. - For our acceptance/rejection data, we have
-
-
34Control Charts on the Proportion of Defectives
- Control charts on the proportion of defectives
are called p-charts. - One may create p-charts from count data or
summary data.
35Creating p Charts from Count Data
- An electronics company manufactures circuits in
batches of 500 and uses a p chart to monitor the
proportion of failing circuits. Thirty batches
are examined, and the failures in each batch are
counted. The following statements create a SAS
data set named CIRCUITS, which contains the
failure counts, as shown below. - data circuits
- input batch fail _at__at_
- datalines
- 1 5 2 6 3 11 4 6 5 4 6 9 7 17 8 10 9 12 10 9
11 8 12 7 13 7 14 15 15 8 16 18 17 12 18 16 19 4
20 7 21 17 22 12 23 8 24 7 25 15 26 6 27 8 28 12
29 7 30 9 -
- run
36 SAS Statements that Create the p-Chart
- symbol color salmon
- title 'p Chart for the Proportion of Failing
Circuits' - proc shewhart datacircuits
- pchart failbatch / subgroupn 500
- cframe
lib - cinfill
bwh - coutfill
yellow - cconnect
salmon - run
37Creating p Charts from Summary Data
- The previous example illustrates how you can
create p charts using raw data (counts of
nonconforming items). However, in many
applications, the data are provided in summarized
form as proportions or percentages of
nonconforming items. This example illustrates how
you can use the PCHART statement with data of
this type. - data cirprop
- input batch pfailed _at__at_
- sampsize500
- datalines
- 1 0.010 2 0.012 3 0.022 4 0.012 5 0.008 6
0.018 7 0.034 - 8 0.020 9 0.024 10 0.018 11 0.016 12 0.014
13 0.014 - 14 0.030 15 0.016 16 0.036 17 0.024 18 0.032
19 0.008 - 20 0.014 21 0.034 22 0.024 23 0.016 24 0.014
25 0.030 - 26 0.012 27 0.016 28 0.024 29 0.014 30 0.018
-
38- title 'p Chart for the Proportion of Failing
Circuits' - symbol vdot
- proc shewhart datacirprop
- pchart pfailedbatch / subgroupnsampsize
- dataunit
proportion label - pfailed 'Proportion for FAIL'
- run
39What Data to Use, Failure Data or Measurement
Data?
- The use of failure data is a very blunt
instrument when compared to the use of
measurement data. - This is because of information loss when items
are characterized as defective or not, ignoring
specific measurements.
40Control Charts in Minitab
- http//www.qualproxl.com/Control_Charts.html
411.12 The Case for Understanding Variation
- Variation within any process, let alone a system
of processes, is inevitable. - Large variation adds to complexity and
inefficiency of a system. - Reducing variation of a process is an important
issue.
42Two Different Sources of Variation
- According to Walter Shewhart, there are two
qualitatively different sources of variation - Common cause variation (aka random variation or
noise) - Special cause variation (aka assignable
variation) - It is the special cause variation that leads to
Pareto glitches (also called signals), which can
be detected using control charts. Special cause
variation is caused by known factors that result
in a non-random disruption of output. - The special cause variation can be removed
through the proper use of control charts.
43Processes That Are in Control
- A process that is already in a state of
statistical control is not subject to special
cause variation, but only subject to common cause
or inherent variation, which is always present
and can not be reduced unless the process itself
is redesigned. - An in-control process is predictable, but may not
have to perform satisfactorily. - The first data set shown before are from an
in-control process, but improvement of the
process is still welcome.
44Two Types of Error
- Error of the first kind (aka tampering) Treating
common causes as special ones. - Error of the second kind Treating special causes
as common ones disregarding signals.
45Process Capability
- An in-control process reveals only common cause
variation. This variation is measured by process
capability. - Reduction of common cause variation requires
improvement of the process itself.
46Improvement of a Stable Process
- Shewhart and Deming developed the Plan Do
Study Act (PDSA) cycle for improvement of a
stable process. - In order to improve a stable process, one has to
PLAN it. Such a plan is recommended to be based
on a mathematical model of the process under
scrutiny. DOE is also needed. - Then DO it on a small scale that is, run it on
a pilot study. - Then STUDY or check if the changed process is in
control. - Finally, ACT accordingly adopt the change if
successful or try some other.
471.3 Statistical Coda
- The control limits on the mean is based on the
following Central Limit Theorem - If the number of previous lots is large, say 25
or more, the average of the lot means, will
give an excellent estimate of µ, and the average
of the sample standard deviations, is a good
estimate of s when multiplied by an unbiasing
factor a(n), -
-
48- Now replacing µ and s by their estimates yields
the control limits on the mean -
49- Similarly, the CLT for the sample standard
deviation gives -
- It can be shown that
-
- Now replacing E(s) and sd(s) by their estimates
yields the control limits on the standard
deviation -
50- Finally, for failure data, the CLT also applies
for large lot size n to give -
51STAT 424/524Statistical Design for Process
Improvement
- Lecture 2
- Acceptance-Rejection SPC
52Homework 3
- Page 71-74 problems 1 to 6
53Sections 2.1 and 2.2
542.3 Basic Tests with Equal Lot Size
- Consider the following failure data
- DATA table2_1
- Lot 1
- INPUT defectives_at__at_
- prop defectives/100
- DATALINES
- 3 2 5 0 6 4 2 4 1 2 7 9 11 12 14 15 12 10 8
3 5 - 6 0 1 3 3 4 6 5 5 3 3 7 8 2 0 6 7 4 4
-
- PROC PRINT DATA table2_1 (obs 5)
55The SAS System
Obs Lot defective prop
1 1 3 0.03
2 2 2 0.02
3 3 5 0.05
4 4 0 0.00
5 5 6 0.06
56symbol v dot c red PROC GPLOT DATA
TABLE2_1 PLOT propLot run quit
57(No Transcript)
58Control Limits on the Number of Defectives
- Let n be the equal lot size and p be the
proportion of defectives in the product. - Let X number of defectives in a lot. Then
- X B(n, p) that is X follows a binomial
distribution with parameters n and p. - By the CLT,
-
59Control Limits on the Number of Defectives
(contd)
- By the empirical rule in statistics, Z is between
3 and 3 that is, - Solving for X gets the Lower Control Limit and
Upper Control Limit for the number of defectives -
60Control Limits on the Number of Defectives
(contd)
- Since the LCL might be negative, UCL might be
greater than 1, modified LCL and UCL are - A control chart based on the above LCL and UCL is
called a np chart. -
61- symbol color salmon
- title np Chart for the Proportion Defectives'
- proc shewhart data table2_1
- npchart defectivesLot / subgroupn 100
-
cframe lib -
cinfill bwh -
coutfill yellow -
cconnect salmon - run
62p Charts
- Control limits on the proportion defectives have
the form - The corresponding charts are called p charts.
-
63- symbol color salmon
- title 'p Chart for the Proportion Defectives'
- proc shewhart data table2_1
- pchart defectivesLot / subgroupn 100
-
cframe lib -
cinfill bwh -
coutfill yellow -
cconnect salmon - run
642.4 Testing with Unequal Lot Sizes
- If the lot sizes are unequal, the control limit
for the proportion defectives has to be
calculated for each lot separately. - The control limits for the p chart assume now the
forms -
65Page 64
- DATA table2_2
- Month 1
- INPUT Patients Infections_at__at_
- DATALINES
- 50 3 42 2 37 6 71 5 55 6 44 6 38
10 33 2 - 41 4 27 1 33 1 49 3 66 8 49 5 55
4 41 2 29 0 40 3 41 2 48 5 52 4 55
6 49 5 60 2 -
66- proc shewhart data table2_2
- pchart InfectionsMonth / subgroupn
Patients -
outtable CLtable - run
- DATA page64
- MERGE table2_2 CLtable (KEEP _SUBP_
_UCLP_) - run
- PROC print
- RUN
- quit
672.5 Testing with Open-Ended Count Data
- Let X denote the number of items returned per
week say. - X roughly has a Poisson distribution, i.e.,
-
68Control Limits on the Number Returned Items
- For Poisson distribution, the mean equals the
variance. - The control limits of the number returned items
are - A chart with these control limits is called a c
chart. - A c chart should not be used if lots are of
unequal sizes, instead, use u chart. -
69- data table2_4
- Week 1
- input numReturned _at__at_
- datalines
- 22 13 28 17 22 29 32 17 19 27 48 53 31 22 31
27 20 24 17 22 29 30 31 22 26 24 -
- proc print run
70c Charts
- symbol color red h .8
- title1 'c Chart for Number of Returned Items Per
Week' - proc shewhart datatable2_4
- cchart numReturnedWeek
- run
71u Charts
- Suppose the sample size in lot k is nk, and the
number defects in lot k is ck, then the number of
defects per unit in lot k is uk ck/nk. - The control limits on the average number of
defects per unit are -
72- Example In a fabric manufacturing process, each
roll of fabric is 30 meters long, and an
inspection unit is defined as one square meter.
Thus, there are 30 inspection units in each
subgroup sample. Suppose now that the length of
each piece of fabric varies. The following
statements create a SAS data set (FABRICS) that
contains the number of fabric defects and size
(in square meters) of 25 pieces of fabric
73- data fabrics
- input roll defects sqmeters _at__at_
- datalines
- 1 7 30.0 2 11 27.6 3 15 30.4 4 6 34.8 5 11
26.0 6 15 28.6 7 5 28.0 8 10 30.2 9 8 28.2 10 3
31.4 11 3 30.3 12 14 27.8 13 3 27.0 14 9 30.0 15
7 32.1 16 6 34.8 17 7 26.5 18 5 30.0 19 14 31.3
20 13 31.6 21 11 29.4 22 6 28.6 23 6 27.5 24 9
32.6 25 11 31.7 -
74- The variable ROLL contains the roll number, the
variable DEFECTS contains the number of defects
in each piece of fabric, and the variable
SQMETERS contains the size of each piece. - The following statements request a u chart for
the number of defects per square meter - symbol color vig
- title 'u Chart for Fabric Defects per Square
Meter' - proc shewhart datafabrics
- uchart defectsroll / subgroupn sqmeters
- cframe
steel - cinfill
ligr - coutfill
yellow - cconnect
vig -
outlimits flimits - run
75(No Transcript)
76data abc input lot _at_ do i1 to 5 input
diamtr _at_ output end drop i cards 1 35.00
34.99 34.99 34.98 35.00 2 35.01 34.99 34.99 34.98
35.00 3 34.99 35.00 35.00 35.00 35.00 4 35.00
35.00 34.99 35.01 34.98 5 34.99 34.99 34.99 35.00
35.00 proc print dataabc noobs run
77Constructing Control Charts With Summary Data In
SAS
- data abc
- input lot Ax As
- An 5
- cards
- 1 34.992 0.02
- 2 34.994 0.03
- 3 34.998 0.01
- 4 34.996 0.03
- 5 34.994 0.01
-
- title Mean and Standard Deviation Charts for
Diameters - symbol vdot
- proc shewhart historyabc
- xschart Alot
- run
- quit
78STAT 424/524Statistical Design for Process
Improvement
- Lecture 3
- The development of mean and standard deviation
control charts
79Homework 4
- Page 127-128 problems 3.1(a), 3.4, 12, 13, 14
803.1 Introduction
- In a process of manufacturing bolts, which are
required to have a 10 cm diameter, we often
actually observe bolts of diameter other than 10
cm. This is a consequence of flaws in the
production process. - These imperfections might be excessive lubricant
temperature, bearing vibration, or nonstandard
raw materials, etc. - In SPC, these flaws can often be modeled as
follows - Let Y observed diameter. Then Y N(µ, s2).
81- To model a process clearly taking account of
possible individual flaws, we write the observed
measurement in lot t, Y(t), as -
-
- Here we have assumed additive flaws, representing
k assignable causes. There may be, in any lot t,
as many as 2k possible combinations of flaws
contributing to Y(t).
82- Let I be a subcollection of 1, 2, 3,..., k.
Then - In the special case where each distribution is
normal, - In the above discussion, X0 accounts for the
common cause while other Xs represents special
causes. The major task of SPC is to identify
these special causes and to take steps which
remove them. -
-
833.2 A Contaminated Production Process
- Continue our discuss in section 1. Let X0 N(µ0,
s02), where µ0 10 and s02 0.01. - In addition, we have one special cause due to
intermittent lubricant heating, say X1 which is
N(0.4, 0.02), and another due bearing vibration,
say X2 which is N(- 0.2, 0.08), with probability
of occurrence p1 0.01 and p2 0.005,
respectively. - So, for a sampled lot t, Y(t) can be written as
-
84Or, Y(t) has the following distribution
One can verify that
853.3 Estimation of Parameters of the Norm
Process
- The norm process refers to a uncontaminated
process whose mean and variance can be estimated
respectively by - Properties
- Both are unbiased
- Both are asymptotically normal.
-
86Estimating the Process Standard Deviation, s
- An intuitive estimator for s is the square root
of - A more commonly used estimator is the average of
lot standard deviations -
87The Famous Result
88- One can thus show that
- So, an unbiased estimator of the process standard
deviation, s, is -
89- One can also show that
- This is because
- So, another unbiased estimator of the process
standard deviation, s, is -
90Which One is More Efficient?
91(No Transcript)
92Using the (Adjusted) Average of Lot Ranges to
Estimate the Process Standard Deviation
93Using the (Adjusted) Median of Lot Standard
Deviations to Estimate the Process Standard
Deviation
943.4 Robust Estimators for Uncontaminated Process
Parameters
- Suppose that the proportion of good lots is p and
the proportion of bad lots is 1 p. - Suppose that data in a good lot come from a
normal distribution with mean µ0 and standard
deviation s0. - Suppose that data in a bad lot come from a normal
distribution with mean µ1 and standard deviation
s1.
95- How can we construct a control chart for lot
means, based on data from the above contaminated
distribution? - A solution The control limits are
-
-
-
-
96A Simulation Study
- Lets generate 90 lots of size 5 each. A lot is
good with probability p 70. - Data in a good lot are from N(10, 0.01).
- Data in a bad lot are from N(9.8, 0.09).
- Data simulated in Excel.
97A SAS Program Generate Good and Bad Lots
data table3_3 retain mu0 10retain mu1
9.8retain sigma0 0.1retain sigma1 0.3 do
Lot 1 to 90 u ranuni(12345) if (u lt 0.7)
then do j 1 to 5 x mu0
sigma0rannor(1) /good lot data/ output e
nd else do j 1 to 5 x mu1
sigma1rannor(1) /bad lot data/ output en
d end keep Lot x Run proc print
run symbol v dot c red proc
shewhart xschart xLot run quit
98R Program Generate Data from Good and Bad Lots
mu0 100 mu1 9.8 sigma0 0.01 sigma1
0.09 p 0.7 n 90 k5 x matrix(0,n,k)
for(i in 1n) u lt- runif(1) if (u lt p)
xi, rnorm(k, mu0, sigma0) from good lot
else xi, rnorm(k, mu1, sigma1) from
bad lot x
993.5 A Process with Mean Drift3.6 A Process with
Upward Drift in Variance
1003.7 Charts for Individual Measurements
- Grouping as many measurements as possible into a
lot is important in order to get more accurate
estimates of the population mean and standard
deviation. - This is not possible in some situations.
- The reason typically is that either the
production rate is too slow or the production is
performed under precisely the same conditions
over short time intervals.
101Construct Control Charts for Individual
Measurements
- Lets consider 90 observations coming from
- N(10, 0.01) with probability
0.855, - N(10, 0.01) with probability 0.095,
- N(10, 0.01) with probability
0.045, and - N(10, 0.01) with probability
0.005. - We construct a scatterplot of the 90 measurements
against corresponding lot numbers. The control
limits are -
-
102Moving Ranges Control Charts for Individual
Measurements
- A better control charts for Individual
Measurements are based on artificial lots of size
2 or 3 and calculate the so-called moving ranges. - Given N individual measurements, X1, X2, ...,XN,
the ith moving range of n observations is defined
as the difference between the largest and the
smallest value in the ith artificial lot formed
from the n measurements Xi, Xi1 , ...,Xin-1,
where i 1, 2, ..., N n 1.
103- The average of N n 1 moving ranges is
- The moving range control chart has
- D3(n) and D4(n) are given in Table 3.9 of the
text, p115. -
-
104- Given N individual measurements, X1, X2, ...,XN,
the N 1 moving ranges of n 2 observations are
defined as - MR1 X2 - X1, MR2 X3 X2,...,
MRN-1 XN XN-1. - The average of N 1 moving ranges is
105X Charts based on Moving Ranges
- For given individual measurements, X1, X2,
...,XN, the X chart is constructed by estimating
the population standard deviation s, the product
of b2 and the the average of the moving ranges. - The control limits are
-
106SAS Code for X Charts based on Moving Ranges
data table3_6 mu0 10 mu1 10.4 mu2 9.8
mu3 10.2 sigma0 0.1 sigma1 sqrt(0.03)
sigma2 0.3 sigma3 sqrt(0.11) do Lot 1 to
90 u ranuni(12345) if (u lt 0.855) then
do x mu0 sigma0rannor(1) output
end else if (u lt 0.95) then do x mu1
sigma1rannor(1) output end else if
(u lt 0.995) then do x mu2
sigma2rannor(1) output end else do
x mu3 sigma3rannor(1) output end end
keep Lot x Run proc print Run symbol v
dot c red proc shewhart xchart
xLot run quit
1073.8 Process Capability
- We have discussed some control charts for
detecting Pareto glitches in a process. Whether
an in control process meets some technological
specification is another important issue. - It is summary statistics for lots that are
examined for the purpose of controlling a
process, while individual measurements are
compared to specifications. - The capability of an in-control process in
relation to technological specifications is
measured by some indices.
108The Cp Index
109The Cp Index for Nonnormal Data
110The Cpk Index
- The index Cp does not account for process
centering. - To account for process centering, use
-
111- Example Use Table 3.10 on textbook page 121.
Calculate Cp and Cpk. Suppose that the diameter
was specified to 6.75 mm with tolerances /- 0.1
mm that is LSL 6.65 mm and USL 6.85 mm. - Solution From the table,
-
112- http//www.itl.nist.gov/div898/handbook/pmc/sectio
n1/pmc16.htm
113SAS proc capability
data amps label decibels 'Amplification in
Decibels (dB)' input decibels _at__at_ datalines
4.54 4.87 4.66 4.90 4.68 5.22 4.43 5.14 3.07
4.22 5.09 3.41 5.75 5.16 3.96 5.37 5.70 4.11 4.83
4.51 4.57 4.16 5.73 3.64 5.48 4.95 4.57 4.46 4.75
5.38 5.19 4.35 4.98 4.87 3.53 4.46 4.57 4.69 5.27
4.67 5.03 4.50 5.35 4.55 4.05 6.63 5.32 5.24 5.73
5.08 5.07 5.42 5.05 5.70 4.79 4.34 5.06 4.64 4.82
3.24 4.79 4.46 3.84 5.05 5.46 4.64 6.13 4.31 4.81
4.98 4.95 5.57 4.11 4.15 5.95 run
title 'Boosting Power of Telephone Amplifiers'
legend2 FRAME CFRAMEligr CBORDERblack
POSITIONcenter proc capability dataamps
noprint alpha0.10 var decibels spec target
5 lsl 4 usl 6 ltarget 2 llsl
3 lusl 4 ctarget red
clsl yellow cusl yellow histogram decibels
/ cframe ligr cfill steel cbarline white
legend legend2 inset cpklcl cpk cpkucl /
header '90 Confidence Interval' cframe black
ctext black cfill ywh format 6.3
run
114(No Transcript)
115The following statements can be used to produce a
table of process capability indices including the
index Cpk ods select indices proc capability
dataamps alpha0.10 spec target 5 lsl 4
usl 6 ltarget 2 llsl 3
lusl 4 var decibels run
116STAT 424/524Statistical Design for Process
Improvement
- Lecture 4
- Sequential Approaches
117Homework 5
- Page 166 problems 4.1 (do the first part only)
and 4.10
1184.1 Introduction
- The first three chapters deal with Shewhart
control charts, which are useful in detecting
special cause variation. - A major disadvantage of a Shewhart control chart
is that it uses only the information about the
process contained in the last sample observation
and it ignores any information given by the
entire sequence of points. This feature makes the
Shewhart control charts insensitive to small
process shift. - This chapter deal with two alternatives to the
Shewhart control charts cumulative sum (CUSUM)
control charts and Exponentially Weighted Moving
Average (EWMA) control charts, both are sensitive
to small process drifts.
1194.2 The Sequential Likelihood Ratio Test
- Suppose we have a time ordered data set x1, x2,
, xn coming from a distribution with density
f(x ?). We may wish to test whether the true
parameter is ?0 or ?1. - A natural criterion for deciding between the two
parameters is the log-likelihood ratio -
120Decision Rule
- We propose the following decision rule
- When z ln(k0), (x1, x2, , xn) being said
in region Gn0, we decide for ?0 - When z ln(k1), (x1, x2, , xn) being said
in region Gn1, we decide for ?1 - Otherwise, we are in region Gn, and we
continue sampling.
121- Before we make our decision, our sample (x1, x2,
, xn) falls in one of three regions - Gn0, Gn1, and Gn.
- Denote the true parameter by ?. The probability
of ever declaring for ?0 is given by - L(?) P(G10?) P(G20?)
- By the definition of Gn0, we have
-
122Let us suppose that if ? is truly equal to ?0, we
wish to have L(?0) 1 - ?. Let us suppose that
if ? is truly equal to ?1, we wish to have L(?1)
?. Here ? and ? are customarily referred to as
Type I and Type II errors. Then, we must have
? L(?1) k0L(?0) k0(1 - ?). So, By a
similar argument for Gn1, we have In
practice, choose
123We can show that the actual Type I and Type II
errors, say ? and ? satisfy
1244.3 CUSUM Test for Shift of the Mean
- To detect a shift of the mean of a production
process from ?0 to some other value ?1, we
consider a sequential test. - We assume that the process variance is known and
does not change. - We propose a test on the basis of the
log-likelihood ratio of N sample means, each
sample being of size n. The test statistic is
125The test statistic R1 is based on a cumulative
sum. It is not so much oriented to detecting
Pareto glitches, but rather to discovering a
persistent change in the mean.
126- For given Type I and Type II errors, say ? 0.01
and ? 0.01, by the sequential test procedure, - The CUSUM chart is a plot of R1, based on data up
to the jth sample, versus j, j 1, 2, , N,
added with control limits ln(k0) and ln(k1). -
127data table3_6 retain mu0 10 mu1 10.4 mu2 9.8
mu3 10.2 retain sigma0 0.1 sigma1 0.1732
sigma2 0.3 sigma3 0.3316625 array x(5) x1 x2
x3 x4 x5 do Lot 1 to 90 u
ranuni(12345) if (u lt 0.855) then
do do i 1 to 5 xi mu0
sigma0rannor(1) end output end
else if (u lt 0.95) then do do i 1 to
5 xi mu1 sigma1rannor(1) end
output end else if (u lt 0.995) then
do do i 1 to 5 xi mu2
sigma2rannor(1) end output end
else do do i 1 to 5 xi mu3
sigma3rannor(1) end output end end Keep
Lot x1-x5 Run proc printrun Data means set
table3_6 sum mean(of
x1-x5) m sum/Lot R1 Lot(5/0.1)(m -
(10100.1)/2) LCL -4.595 UCL
4.595 Keep Lot R1 LCL UCL Run
symbol1 v dot c blue r 1 symbol2 v dot c
red r 1 symbol3 v dot c blue r 1 proc
gplot data means plot (LCL R1
UCL)Lot/overlay label R1 R1
statistic run data long set table3_6 Lot
_N_ array x5 x1-x5 do i1 to 5 y
xi output end keep Lot y Run Proc
cusum data long xchart yLot/mu0 10.0
sigma0 0.1
delta 1
alpha 0.1
vaxis -20 to 80 Run quit
1284.4 Shewhart CUSUM Charts
- A popular empirical alternative to the CUSUM
chart is the Shewhart CUSUM chart. This chart is
based on the pooled cumulative/running means - Suppose that all the lot means are iid with
common lot mean ?0 and lot variance ?02/n. Then - A Shewhart CUSUM Chart for mean shift is one that
plots zi against I, along with the horizontal
lines 3 and 3. -
-
-
129Shewhart CUSUM Charts
Data means2 set table3_6
sum mean(of x1-x5) m sum/Lot R2
sqrt(Lot5)/0.1(m-10) LCL -3 UCL
3 Keep Lot sum m R2 LCL UCL Run proc print
run symbol1 v dot c black r 1 symbol2 v
dot c red r 1 symbol3 v dot c red r
1 proc gplot data means2 plot (R2 LCL
UCL)Lot/overlay label R2 R2
statistic run quit
1304.8 Acceptance-Rejection CUSUMs
- Let p denote the proportion of defective goods
from a production system. - Let p0 denote the target proportion deemed
appropriate. - When p rises to p1, intervention will be
introduced. - Let nj denote the size of lot j. Then the
likelihood ratio is given by
131(No Transcript)
132CUSUM Test for Defect Data
- To detect a process drift in mean, plot R5 versus
Lot N. To horizontal lines R5 - 4.596 and R5
4.596 are also plotted. - Any point in the plot that is above the line R5
4.596 indicates a process drift in proportion.
133data table4_8 Lot _N_ input defective
proportion_at__at_ size 100 cards
3 0.03 2 0.02 5 0.05 0 0.00 6 0.06 4 0.04 2 0.02
4 0.04 1 0.01 2 0.02 7 0.07 9
0.09 11 0.11 12 0.12 14 0.14 15 0.15 12 0.12 10
0.10 8 0.08 3 0.03 5 0.05 6 0.06 0
0.00 1 0.01 3 0.03 3 0.03 4 0.04 6 0.06 5 0.05 5
0.05 3 0.03 3 0.03 7 0.07 8 0.08
2 0.02 0 0.00 6 0.06 7 0.07 4 0.04 4 0.04
run data new set table4_8 p1 0.05 p0
0.03 xsum defective nsum size R5
log(p1/p0)xsum log((1-p1)/(1-p0))(nsum -
xsum) LCL -4.596 UCL 4.596 keep Lot R5
LCL UCL run proc print run symbol1 v dot c
black r 1 symbol2 v dot c red r
1 symbol3 v dot c red r 1 proc gplot data
new plot (R5 LCL UCL)Lot/overlay run quit
134Shewhart CUSUM Test for Defect Data
Plot R6 against Lot, along with the two lines R6
- 3 and R6 3.
135data table4_8 Lot _N_ input defective
proportion_at__at_ size 100 cards
3 0.03 2 0.02 5 0.05 0 0.00 6 0.06 4 0.04 2 0.02
4 0.04 1 0.01 2 0.02 7 0.07 9
0.09 11 0.11 12 0.12 14 0.14 15 0.15 12 0.12 10
0.10 8 0.08 3 0.03 5 0.05 6 0.06 0
0.00 1 0.01 3 0.03 3 0.03 4 0.04 6 0.06 5 0.05 5
0.05 3 0.03 3 0.03 7 0.07 8 0.08
2 0.02 0 0.00 6 0.06 7 0.07 4 0.04 4 0.04
run data new set table4_8 p1 0.05 p0
0.03 xsum defective nsum size R6
(xsum - p0nsum)/sqrt(nsump0(1-p0)) LCL -3
UCL 3 keep Lot R6 LCL UCL run proc print
run symbol1 v dot c black r 1 symbol2 v
dot c red r 1 symbol3 v dot c red r
1 proc gplot data new plot (R6 LCL
UCL)Lot/overlay run quit
136STAT 424/524Statistical Design for Process
Improvement
- Lecture 5
- Exploratory Techniques for Preliminary Analysis
137Homework 6
- Page 220 problems 2, 4, 5, 6
1385.2 The Schematic Plot The Boxplot
Maximum observation
Upper fence (not drawn) 1.5 (IQR) above
75th percentile
1.5 (IQR)
75th percentile
Mean (specified with SYMBOL1 statement)
Interquartile Range (IQR
Median
25th percentile
1.5 (IQR)
Whisker
Minimum observation
Lower fence (not drawn) 1.5 (IQR) below
25th percentile
BOXSTYLE schematic ( or schematicid, or
schematicidfar, if id statement used)
Observations that are outside the fences point to
Pareto glitches.
139data myData retain mu0 10 mu1 10.4 mu2 9.8 mu3
10.2 retain sigma0 0.1 sigma1 0.1732 sigma2 0.3
sigma3 0.3316625 array x(5) x1 x2 x3 x4
x5 do Lot 1 to 90 u
ranuni(12345) if (u lt 0.855) then do
do i 1 to 5 xi mu0
sigma0rannor(1) end output end
else if (u lt 0.95) then do do i 1 to
5 xi mu1 sigma1rannor(1)
endoutput end else if (u lt 0.995) then
do do i 1 to 5 xi mu2
sigma2rannor(1) endoutput end
else do do i 1 to 5 xi mu3
sigma3rannor(1) output endoutput
end end keep Lot x1-x5Run Data
mean set myData lotMean mean (of x1-x5)
x "-" keep Lot lotMean x Run Symbol
v plus c blue title Box Plot of Lot
Means proc boxplot / create side-by-side
boxplot/ plot lotMeanx/ boxstyleschematicidfar
idsymbolcircle / identify obs. out of
fences or extremes/ id Lot label x ''
run
1405.3 Smoothing by Threes
- Signal is usually contaminated with noise. John
Tukey developed the so-called 3R smooth method
which somehow removes the jitters and enables one
to better approximate the signal. - http//www.galaxy.gmu.edu/ACAS/ACAS00-02/ACAS00/Th
ompsonJames/ThompsonJames.pdf
1413R SAS or R Program
1425.4 Bootstrapping
- Most of the standard testing in SPC is based on
the assumption that lot means are normally
distributed. - This assumption is questionable because
measurements may not be normal and lot sizes are
usually small, say less than 10. - To avoid the normality assumption, one use
resampling. - Bootstrapping is one of the resampling methods.
143Bootstrapping Means
- Suppose we have a data set of size n. We wish to
construct a bootstrap confidence interval for the
mean of the distribution from which the data were
taken. - There are at least four methods for bootstrapping
the mean.
144The Percentile Method
- The procedure is as follows
- Select with replacement n of the original
observations. Such a sample is called a bootstrap
sample. Computer the mean of this bootstrap
sample. - Repeat the resampling procedure B 10,000 times.
- The B means are denoted as
- Order theses means from smallest to largest.
- Denote the 250th largest value and the 9750th
largest value as a and b, respectively, then the
95 percentile bootstrap confidence interval of
the mean is a, b. -
145R program for the Percentile Method
boot function(x, B) n length(x) A
matrix(0, B, n) for (i in 1B)
Ai, sample(x, n, replace T)
A xc(2,5,1,8,3,2) D boot(x, 10000) y
apply(D, 1, mean) y holds 10000
means confidenceInterval c(y250, y9750)
146Lunneborg's Method
- Denote the mean of the original sample by
- Clifford Lunneborg proposed to use
- as the 95 confidence interval of the mean.
-
147The Bootstrapped t Method
- Denote the B bootstrap standard deviations as
- Calculate the B t values
- Order these t values from smallest to largest.
Denote the 250th as a and the 9750th as b. Then
the 95 bootstrapped t confidence interval is -
-
-
148The BCa Method
- A better confidence interval for a parameter is
constructed using the BCa (bias correction and
acceleration) method. - One may be concerned with two problems. One is
that the sample estimate may be a biased estimate
of the population parameter. - Another problem is that the standard deviation of
the sample estimate usually depends on the
unknown parameter we are trying to estimate. - To deal with the two problems, Bradley Efron
proposed the BCA method. - For details, refer to this paper.
1495.5 Pareto and Ishikawa Diagrams
- The Pareto diagram tells top management where it
is most appropriate to spend resources in finding
problems. - The Ishikawa diagram, also known as fishbone
diagram or cause and effect diagram, is favored
by some as a tool for finding the ultimate cause
of a system failure. See an example of such a
diagram on page 197.
150Create Pareto Charts Using SAS
data failure3 input cause1-16
count cards Contamination 14 Corrosion
2 Doping 1 Metallization 2 Miscellaneous 3
Oxide Defect 8 Silicon Defec 1 run title
'Analysis of IC Failures' symbol color
salmon proc pareto datafailure3 vbar
cause / freq count
scale count interbar
1.0 last
'Miscellaneous' nlegend
'Total Circuits' cframenleg
ywh cframe green
cbars vigb run
1515.6 A Bayesian Pareto Analysis for System
Optimization of the Space Station
152STAT 424Statistical Design for Process
Improvement
- Lecture 6
- Introductory Statistical Inference and Regression
Analysis
1531.1 Elementary Statistical Inference
- Population
- Sample
- Statistical inference
- the endeavor that uses sample data to make
decision about a population. - Statistic
- Estimators and estimates
- Random variable
154Unbiasedness and Efficiency
155Suppose ? is an unknown parameter which is to be
estimated from measurements x, distributed
according to some probability density function
f(x?). It can be shown that the variance of any
unbiased estimator of ? is bounded by the
inverse of the Fisher information I(?)
where the Fisher
information I(?) is defined by and
is the natural logarithm of the
likelihood function and E denotes the expected
value. The efficiency of is defined to be the
following ratio The sample mean and sample
median of a normal sample are both unbiased
estimators of the population mean. The sample
mean is more efficient.
(CramérRao lower bound)
156Point and Interval Estimation
- When we estimate a parameter ? by , we say
- is a point estimator of ?.
- Alternatively, we use interval to locate the
unknown parameter ?. Such an interval contains
the unknown parameter with some probability 1
a. The interval is called a 1 a confidence
interval. - A 95 confidence interval means that, when the
random sampling procedure is repeated 1000 times,
among the 1000 confidence intervals, about 950
will cover the known parameter ?.
157Confidence Intervals for the Mean of a Normal
Population
- We consider a population that is normally
distributed as N(µ, s2). - If the variance s2 is known, then the exact 1 a
confidence interval for µ is - But, s is usually unknown. We estimate it by the
sample standard deviation s. A new exact 1 a
confidence interval for µ is
158Normal-theory Based Confidence Interval for a
Parameter ?
- The point estimator is usually normally
distributed when sample size n is large (gt30),
even for a non-normal population. - A 1 a confidence interval is then constructed
as
159Examples
data Heights label Height 'Height
(in)' input Height _at__at_ datalines
64.1 60.9 64.1 64.7 66.7 65.0 63.7 67.4 64.9
63.7 64.0 67.5 62.8 63.9 65.9 62.3 64.1 60.6
68.6 68.6 63.7 63.0 64.7 68.2 66.7 62.8 64.0
64.1 62.1 62.9 62.7 60.9 61.6 64.6 65.7 66.6
66.7 66.0 68.5 64.4 60.5 63.0 60.0 61.6 64.3
60.2 63.5 64.7 66.0 65.1 63.6 62.0 63.6 65.8
66.0 65.4 63.5 66.3 66.2 67.5 65.8 63.1 65.8
64.4 64.0 64.9 65.7 61.0 64.1 65.5 68.6 66.6
65.7 65.1 70.0 run title 'Analysis of
Female Heights' proc univariate dataHeights
mu0 65 alpha 0.05 normal var
Height histogram Height qqplot Height
probplot Height run
160Confidence Interval for Difference between Two
Means of Normal Populations with Unequal Known
Variance
161Confidence Interval for Difference between Two
Means of Normal Populations with Equal Unknown
Variance
162Confidence Interval for Difference between Two
Means (Equal Unknown Variances), When Sample
Sizes Are Large
163Examples
164Confidence Interval for a Proportion
165SAS Procedure for a Proportion PROC FREQ
data Color input Region Eyes Hair
Count _at__at_ label Eyes 'Eye Color'
Hair 'Hair Color'
Region'Geographic Region'
datalines 1 blue fair 23 1 blue red
7 1 blue medium 24 1 blue dark 11 1
green fair 19 1 green red 7 1 green
medium 18 1 green dark 14 1 brown fair 34
1 brown red 5 1 brown medium 41 1 brown
dark 40 1 brown black 3 2 blue fair
46 2 blue red 21 2 blue medium 44 2
blue dark 40 2 blue black 6 2 green
fair 50 2 green red 31 2 green medium 37
2 green dark 23 2 brown fair 56 2 brown
red 42 2 brown medium 53 2 brown dark
54 2 brown black 13 proc freq
dataColor orderfreq weight Count
tables Eyes / binomial alpha.1 tables
Hair / binomial(p.28) title 'Hair and Eye
Color of European Children' run
166Confidence Interval for the Difference between
Two Proportions (Independent Samples)
167Confidence Interval for the Difference between
Two Proportions (Paired Samples)
168Examples
169Tests of Hypotheses
- The null hypothesis
- The alternative hypothesis
- Type I and type II errors
- Level of significance
170One Sample t-Test
title 'One-Sample t Test' data time input
time _at__at_ datalines 43 90 84
87 116 95 86 99 93 92 121 71 66 98 79 102 60 112
105 98 run proc ttest
h080 alpha 0.05 var time run
171Two-Sample t-Test Comparing Group Means
- Equal variance case
- Unequal variance case
172Two-Sample t-Test Comparing Group Means
title 'Comparing Group Means' data
OnyiahExample1_14 input machine speed _at__at_
datalines 1 1603 1 1604 1 1605 1 1605 1 1602
1 1601 1 1596 1 1598 1 1599 1 1602
1 1614 1 1612 1 1607 1 1593 1 1604 2 1602
2 1597 2 1596 2 1601 2 1599 2 1603 2 1604 2
1602 2 1601 2 1607 2 1600 2 1596 2
1595 2 1606 2 1597
run proc ttest / produce results for both
equal and unequal variances/ class machine
var speed run
Question How can you find the p-value for
one-sided test? Use symmetry.
173Paired Comparison Paired t-Test
Pairs (i) Before Treatment After
Treatment Differences (di) 1
Y11
Y12 Y11 - Y12 2
Y21
Y22
Y21 Y22 3 Y31
Y32
Y31 Y32 . . . N
Yn1
Yn2 Yn1 - Yn2
174Two-Sample Paired t-Test Comparing Group Means
title 'Paired Comparison' data pressure
input SBPbefore SBPafter _at__at_ d SBPbefore -
SBPafter datalines 120 128 124 131 130 131
118 127 140 132 128 125 140 141 135
137 126 118 130 132 126 129 127 135
Run proc univariate var d Run proc ttest
paired SBPbeforeSBPafter run
175Operating Characteristic (OC) Curves
176Find the Power of the Test for a Population Mean
- Assume that we have the following
- H0 ? 1500 (1500 is called the claimed
value) - H1 ? lt 1500
- Sample size n 20
- Significance level ? 0.05
- The population standard deviation is known ?
110. - Question
- (1) Find the power of the test, which is the
probability of - rejecting the null hypothesis, given
that the population - mean is actually 1450 (called
alternative value). - (2) Find powers corresponding to any
alternative ?. Plot the - power against ?.
177- Since the test is left-tailed, the rejection
region is the left tail on the number line. The
borderline value is - z? - 1.645. That is, the
rejection can be written as - Replace ?0 1500, ? 110, and n 20 to solve
the above inequalities - When ? is actually 1450, follows a normal
distribution with mean 1450 and standard
deviation - The power is
-
-
-
178R codes for Power Calculation and Plot
power.mean function(mu01500,mu11450,sigma110,
n20, level 0.05, tailc("left", "two",
"right")) s sigma/sqrt(n) if (tail
"two") E qnorm(1-level/2)s c1 mu0
- E c2 mu0 E
pL pnorm(c1, mu1, s) pR pnorm(c2, mu1,
s) power 1 - pR pL else if (tail
"left") E qnorm(1 - level)s c1
mu0 - E
pL pnorm(c1, mu1, s) power pL
else E qnorm(1 - level)s c2 mu0
E pR pnorm(c2, mu1, s) power 1 - pR
return(power) power.mean(mu0 1500, mu1
1450, sigma 110, n 20, level 0.05, tail
"left") mu0 1500 mu seq(1350, 1550, by 1)
n 20 level 0.05 tail "left" power
power.mean(mu0 mu0, mu1 mu, n n, level
level, tail tail) plot(mu, power, type "l",
xlab expression(mu), col "blue", lwd 3) n1
30 power
power.mean(mu0 mu0, mu1 mu, n n1, level
level, tail tail) lines(mu, power, type "l",
col "red", lwd 3) abline(v1500) legend(1352,
0.3, legendc(paste("Claimed ", mu0),
paste("Level ", level), paste("Sample
Size ", n), paste("Sample Size ", n1)),
text.col c(1,1,4,2))
1791.2 Regression Analysis
- Suppose that the true relationship between a
response variable y and a set of predictor
variable x1, x2, , xp is y f(x1, x2, , xp). - But, due to measurement error, y may be observed
as - y f(x1, x2, , xp) ?.
() - If the assumption that ? is distributed as N(0,
?2), then it is said that we have a normal
regression model. - If f(x1, x2, , xn) ?0 ?1x1 ?2x2
?pxp, then the model is called a normal linear
regression model.
180The Ordinary Least Squares Method
- Suppose that n observations (xi1, xi2, , xip,
yi), i 1, 2, , n are available from an
experiment or a pure observational study. - Then the model () can be written as
- yi f(xi1, xi2, , xip) ?i, i
1, 2, , n. - Suppose that f has a known form. To estimate the
function f, a traditional method is the least
squares method. - The method starts from minimizing the error sum
of squares -
181