STAT 424/524 Statistical Design for Process Improvement - PowerPoint PPT Presentation

About This Presentation
Title:

STAT 424/524 Statistical Design for Process Improvement

Description:

Title: STAT 424/524 Statistical Design for Process Improvement Author: Saihua Yu Last modified by: SZhang Created Date: 8/8/2006 1:45:04 AM Document presentation format – PowerPoint PPT presentation

Number of Views:559
Avg rating:3.0/5.0
Slides: 290
Provided by: Saih9
Category:

less

Transcript and Presenter's Notes

Title: STAT 424/524 Statistical Design for Process Improvement


1
STAT 424/524Statistical Design for Process
Improvement
  • Lecture 1
  • An Overview of Statistical Process Control

2
Book online
  • http//books.google.com/books?idtj1nMz8ajmQCpgP
    A77lpgPA77dqcontaminatedproductionprocessso
    urcewebotsisGEpndi8asig0XEQbXo2bDo7EZQ1HQvnS8
    b0baMhlensaXoibook_resultresnum1ctresult
    PPP1,M1

3
Homework 1 2
  • 1 Do Problems 1.1, 1.3, 1.5, 1.7
  • 2 Do 1.12 and Derive the table 1.9 on page 36
    of the text (also on slide 20), using the
    following formula

4
1.1 Introduction
  • Diligence, a good attitude, and hard work are not
    sufficient for achieving quality control.
  • Statistical process control (SPC) is a way of
    quality control, which enables us to seek steady
    improvement in the quality of a product. It is an
    effective method of monitoring a process through
    the use of control charts.
  • Statistical Process Control was pioneered by
    Walter A. Shewhart in the early 1920s. W. Edwards
    Deming later applied SPC methods in the United
    States during World War II.

5
Differences between Quality Control and Quality
Assurance
  • Suppose that you are a PhD student about to
    graduate and applying for an academic position.
    If you were a product, your supervisor would be
    the quality control manager, and a search
    committee who is reviewing your application would
    be quality assurance manager. Read the following.
  • http//www.builderau.com.au/strategy/projectmanage
    ment/soa/Quality-control-vs-quality-assurance/0,33
    9028292,339191784,00.htm

6
Core Steps of a Statistical Process Control
  1. Flowcharting of the production process
  2. Random sampling and measurement at regular
    temporal intervals at numerous stages of the
    production process
  3. The use of Pareto glitches discovered in this
    sampling to backtrack in time to discover their
    causes so that they can be improved.

7
Self Reading
  • Section 1.2 to 1.7

8
1.8 White Balls, Black Balls
  • Recall that the second core step of a statistics
    process control is random sampling and
    measurement at regular temporal intervals at
    numerous stages of the production process.
  • The following data table shows measurements of
    thickness in centimeters of 40 bolts in ten lots
    of 4 each.

9
Lot Bolt 1 (smallest) Bolt 2 Bolt 3 Bolt 4 (largest)
1 9.93 10.04 10.05 10.09
2 10.00 10.03 10.05 10.12
3 9.94 10.06 10.09 10.10
4 9.90 9.95 10.01 10.02
5 9.89 9.93 10.03 10.06
6 9.91 10.01 10.02 10.09
7 9.89 10.01 10.04 10.09
8 9.96 9.97 10.00 10.03
9 9.98 9.99 10.05 10.11
10 9.93 10.02 10.10 10.11
10
Data Display
  • We construct a run chart of thickness against lot
    number for each bolt.

11
Question Is the process in control?
12
A SAS Program
13
One Bad Lot
  • We add 0.500 to each of the 4 measurements in lot
    10. The new graph is generated.

14
One Bad Bolt
  • We add 0.500 only to the 4th measurements in lot
    10. The new graph is generated.

15
Run Chart of Means
  • We have constructed run charts of original
    measurements. We can also construct run charts of
    a summary statistic, say the lot mean.
  • To do this, we first find the mean for each lot.
    Then we plot the means against corresponding
    lots.
  • The run chart of the lot mean for the first data
    set is shown below.

16
(No Transcript)
17
  • We add 0.500 to each of the 4 measurements in lot
    10. The new run chart is generated.

18
1.9 The Basic Paradigm of Statistical Process
Control
  • We considered ten lots of 4 bolts each. We saw
    that there is variation within each lot and
    variation across lots (in terms of lot average).
  • A major task in SPC is to seek significantly
    outlying lots, good or bad. Once found, such lots
    can then be investigated to find out why they
    deviate from others.
  • This is the basic paradigm of SPC
  • 1. Find a Pareto glitch (a non-standard lot)
  • 2. Discover the causes of the glitch
  • 3. Use this information to improve the
    production process.
  • The variability across lots is the key notion in
    search for Pareto glitches.

19
1.10 Basic Statistical Procedures in Statistical
Process Control
  • Lets use the original thickness data.

Lot Bolt 1 Bolt 2 Bolt 3 Bolt 4
1 9.93 10.04 10.05 10.09
2 10.00 10.03 10.05 10.12
3 9.94 10.06 10.09 10.10
4 9.90 9.95 10.01 10.02
5 9.89 9.93 10.03 10.06
6 9.91 10.01 10.02 10.09
7 9.89 10.01 10.04 10.09
8 9.96 9.97 10.00 10.03
9 9.98 9.99 10.05 10.11
10 9.93 10.02 10.10 10.11
20
Control Chart on Lot Means
  • To construct control charts for lot means,
  • first calculate the mean and standard deviation
    of each lot.
  • Then find the mean of means, and mean of
    standard deviations,
  • Finally find the acceptance interval on the mean
    that is given by where
    can be read from the following table.

21
Multiplication Factors for Different Lot Sizes
n B3(n) B4(n) A3(n)
2 3 4 5 6 7 8 9 10 15 20 25 0.000 0.000 0.000 0.000 0.030 0.118 0.185 0.239 0.284 0.428 0.510 0.565 3.267 2.568 2.266 2.089 1.970 1.882 1.815 1.761 1.716 1.572 1.490 1.435 2.659 1.954 1.628 1.427 1.287 1.182 1.099 1.032 0.975 0.789 0.680 0.606
22
Mean Control Chart for Thickness Data
  • For the thickness data, we calculate the lot
    means and standard deviations. Click here.
  • The acceptance interval is
  • where the 9.907 is called the Lower Control
    Limit (LCL), and 10.123 the Upper Control Limit
    (UCL).
  • Does any mean appear to be out of control?

23
(No Transcript)
24
Control Chart on Standard Deviation
25
Standard Deviation Control Chart for Thickness
Data
  • LCL 0 UCL 2.266(0.0664) 0.15

26
(No Transcript)
27
Creating Control Charts for Means and Standard
Deviations Based on Summary Statistics
Lot x-bar s
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 146.21 146.18 146.22 146.31 146.20 146.15 145.93 145.96 145.88 145.98 146.08 146.12 146.26 146.32 146.00 145.83 145.76 145.90 145.94 145.97 0.12 0.09 0.13 0.10 0.08 0.11 0.18 0.18 0.16 0.21 0.11 0.12 0.21 0.18 0.32 0.19 0.12 0.17 0.10 0.09
  • Refer to the problem 1.4 on page 48 of Thompsons
    text. The summary statistics are reproduced here

28
  • DATA problem1_4
  • INPUT lot Ax As_at__at_
  • An 5
  • CARDS
  • 1 146.21 0.12 2 146.18 0.09
  • 3 146.22 0.13 4 146.31 0.1
  • 5 146.2 0.08 6 146.15 0.11
  • 7 145.93 0.18 8 145.96 0.18
  • 9 145.88 0.16 10 145.98 0.21
  • 11 146.08 0.11 12 146.12 0.12
  • 13 146.26 0.21 14 146.32 0.18
  • 15 146 0.32 16 145.83 0.19
  • 17 145.76 0.12 18 145.9 0.17
  • 19 145.94 0.1 20 145.97 0.09
  • SYMBOL v dot c red
  • PROC SHEWHART HISTORY problem1_4
  • xschart Alot
  • RUN

29
(No Transcript)
30
1.11 Acceptance Sampling
  • Data, which are characterized as defective or
    not, are called acceptance/rejection data or
    failure data.
  • Suppose that a bolt is considered defective if it
    is smaller than 9.92 or greater than 10.08.
  • Then the first data set considered can be
    converted to acceptance/rejection data as follows.

31
Lot Bolt 1 (smallest) Bolt 2 Bolt 3 Bolt 4 (largest)
1 9.93 10.04 10.05 10.09
2 10.00 10.03 10.05 10.12
3 9.94 10.06 10.09 10.10
4 9.90 9.95 10.01 10.02
5 9.89 9.93 10.03 10.06
6 9.91 10.01 10.02 10.09
7 9.89 10.01 10.04 10.09
8 9.96 9.97 10.00 10.03
9 9.98 9.99 10.05 10.11
10 9.93 10.02 10.10 10.11
Lot Bolt 1 (smallest) Bolt 2 Bolt 3 Bolt 4 (largest)
1 0 0 0 1
2 0 0 0 1
3 0 0 1 1
4 1 0 0 0
5 1 0 0 0
6 1 0 0 1
7 1 0 0 1
8 0 0 0 0
9 0 0 0 1
10 0 0 1 1
1 Defective 0 nondefective
32
Overall Proportion of Defectives
  • In order to apply a SPC procedure to
    acceptance/rejection data, we note the proportion
    of defectives in each lot.
  • The overall proportion of defectives is a key
    statistic, which is calculated by averaging the
    lot proportions of defectives.
  • In the previous acceptance/rejection data set,
    the 10 lot proportions of defectives are 0.25,
    0.25, 0.50, 0.25, 0.25, 0.50, 0.50, 0.00, 0.25,
    0.50 which yields the overall proportion of
    defectives (0.25 0.25 0.50 0.25 0.25
    0.50 0.50 0.00 0.25 0.50)/10 0.325.

33
Control Limits on the Proportion of Defectives
  • The lower and upper control limits are
  • The rationale to choose the two limits will be
    discussed in section 1.13.
  • For our acceptance/rejection data, we have

34
Control Charts on the Proportion of Defectives
  • Control charts on the proportion of defectives
    are called p-charts.
  • One may create p-charts from count data or
    summary data.

35
Creating p Charts from Count Data
  • An electronics company manufactures circuits in
    batches of 500 and uses a p chart to monitor the
    proportion of failing circuits. Thirty batches
    are examined, and the failures in each batch are
    counted. The following statements create a SAS
    data set named CIRCUITS, which contains the
    failure counts, as shown below.
  • data circuits
  • input batch fail _at__at_
  • datalines
  • 1 5 2 6 3 11 4 6 5 4 6 9 7 17 8 10 9 12 10 9
    11 8 12 7 13 7 14 15 15 8 16 18 17 12 18 16 19 4
    20 7 21 17 22 12 23 8 24 7 25 15 26 6 27 8 28 12
    29 7 30 9
  • run

36
SAS Statements that Create the p-Chart
  • symbol color salmon
  • title 'p Chart for the Proportion of Failing
    Circuits'
  • proc shewhart datacircuits
  • pchart failbatch / subgroupn 500
  • cframe
    lib
  • cinfill
    bwh
  • coutfill
    yellow
  • cconnect
    salmon
  • run

37
Creating p Charts from Summary Data
  • The previous example illustrates how you can
    create p charts using raw data (counts of
    nonconforming items). However, in many
    applications, the data are provided in summarized
    form as proportions or percentages of
    nonconforming items. This example illustrates how
    you can use the PCHART statement with data of
    this type.
  • data cirprop
  • input batch pfailed _at__at_
  • sampsize500
  • datalines
  • 1 0.010 2 0.012 3 0.022 4 0.012 5 0.008 6
    0.018 7 0.034
  • 8 0.020 9 0.024 10 0.018 11 0.016 12 0.014
    13 0.014
  • 14 0.030 15 0.016 16 0.036 17 0.024 18 0.032
    19 0.008
  • 20 0.014 21 0.034 22 0.024 23 0.016 24 0.014
    25 0.030
  • 26 0.012 27 0.016 28 0.024 29 0.014 30 0.018

38
  • title 'p Chart for the Proportion of Failing
    Circuits'
  • symbol vdot
  • proc shewhart datacirprop
  • pchart pfailedbatch / subgroupnsampsize
  • dataunit
    proportion label
  • pfailed 'Proportion for FAIL'
  • run

39
What Data to Use, Failure Data or Measurement
Data?
  • The use of failure data is a very blunt
    instrument when compared to the use of
    measurement data.
  • This is because of information loss when items
    are characterized as defective or not, ignoring
    specific measurements.

40
Control Charts in Minitab
  • http//www.qualproxl.com/Control_Charts.html

41
1.12 The Case for Understanding Variation
  • Variation within any process, let alone a system
    of processes, is inevitable.
  • Large variation adds to complexity and
    inefficiency of a system.
  • Reducing variation of a process is an important
    issue.

42
Two Different Sources of Variation
  • According to Walter Shewhart, there are two
    qualitatively different sources of variation
  • Common cause variation (aka random variation or
    noise)
  • Special cause variation (aka assignable
    variation)
  • It is the special cause variation that leads to
    Pareto glitches (also called signals), which can
    be detected using control charts. Special cause
    variation is caused by known factors that result
    in a non-random disruption of output.
  • The special cause variation can be removed
    through the proper use of control charts.

43
Processes That Are in Control
  • A process that is already in a state of
    statistical control is not subject to special
    cause variation, but only subject to common cause
    or inherent variation, which is always present
    and can not be reduced unless the process itself
    is redesigned.
  • An in-control process is predictable, but may not
    have to perform satisfactorily.
  • The first data set shown before are from an
    in-control process, but improvement of the
    process is still welcome.

44
Two Types of Error
  • Error of the first kind (aka tampering) Treating
    common causes as special ones.
  • Error of the second kind Treating special causes
    as common ones disregarding signals.

45
Process Capability
  • An in-control process reveals only common cause
    variation. This variation is measured by process
    capability.
  • Reduction of common cause variation requires
    improvement of the process itself.

46
Improvement of a Stable Process
  • Shewhart and Deming developed the Plan Do
    Study Act (PDSA) cycle for improvement of a
    stable process.
  • In order to improve a stable process, one has to
    PLAN it. Such a plan is recommended to be based
    on a mathematical model of the process under
    scrutiny. DOE is also needed.
  • Then DO it on a small scale that is, run it on
    a pilot study.
  • Then STUDY or check if the changed process is in
    control.
  • Finally, ACT accordingly adopt the change if
    successful or try some other.

47
1.3 Statistical Coda
  • The control limits on the mean is based on the
    following Central Limit Theorem
  • If the number of previous lots is large, say 25
    or more, the average of the lot means, will
    give an excellent estimate of µ, and the average
    of the sample standard deviations, is a good
    estimate of s when multiplied by an unbiasing
    factor a(n),

48
  • Now replacing µ and s by their estimates yields
    the control limits on the mean

49
  • Similarly, the CLT for the sample standard
    deviation gives
  • It can be shown that
  • Now replacing E(s) and sd(s) by their estimates
    yields the control limits on the standard
    deviation

50
  • Finally, for failure data, the CLT also applies
    for large lot size n to give

51
STAT 424/524Statistical Design for Process
Improvement
  • Lecture 2
  • Acceptance-Rejection SPC

52
Homework 3
  • Page 71-74 problems 1 to 6

53
Sections 2.1 and 2.2
  • Self reading

54
2.3 Basic Tests with Equal Lot Size
  • Consider the following failure data
  • DATA table2_1
  • Lot 1
  • INPUT defectives_at__at_
  • prop defectives/100
  • DATALINES
  • 3 2 5 0 6 4 2 4 1 2 7 9 11 12 14 15 12 10 8
    3 5
  • 6 0 1 3 3 4 6 5 5 3 3 7 8 2 0 6 7 4 4
  • PROC PRINT DATA table2_1 (obs 5)

55
The SAS System

Obs Lot defective prop
1 1 3 0.03
2 2 2 0.02
3 3 5 0.05
4 4 0 0.00
5 5 6 0.06

56
symbol v dot c red PROC GPLOT DATA
TABLE2_1 PLOT propLot run quit
57
(No Transcript)
58
Control Limits on the Number of Defectives
  • Let n be the equal lot size and p be the
    proportion of defectives in the product.
  • Let X number of defectives in a lot. Then
  • X B(n, p) that is X follows a binomial
    distribution with parameters n and p.
  • By the CLT,

59
Control Limits on the Number of Defectives
(contd)
  • By the empirical rule in statistics, Z is between
    3 and 3 that is,
  • Solving for X gets the Lower Control Limit and
    Upper Control Limit for the number of defectives

60
Control Limits on the Number of Defectives
(contd)
  • Since the LCL might be negative, UCL might be
    greater than 1, modified LCL and UCL are
  • A control chart based on the above LCL and UCL is
    called a np chart.

61
  • symbol color salmon
  • title np Chart for the Proportion Defectives'
  • proc shewhart data table2_1
  • npchart defectivesLot / subgroupn 100

  • cframe lib

  • cinfill bwh

  • coutfill yellow

  • cconnect salmon
  • run

62
p Charts
  • Control limits on the proportion defectives have
    the form
  • The corresponding charts are called p charts.

63
  • symbol color salmon
  • title 'p Chart for the Proportion Defectives'
  • proc shewhart data table2_1
  • pchart defectivesLot / subgroupn 100

  • cframe lib

  • cinfill bwh

  • coutfill yellow

  • cconnect salmon
  • run

64
2.4 Testing with Unequal Lot Sizes
  • If the lot sizes are unequal, the control limit
    for the proportion defectives has to be
    calculated for each lot separately.
  • The control limits for the p chart assume now the
    forms

65
Page 64
  • DATA table2_2
  • Month 1
  • INPUT Patients Infections_at__at_
  • DATALINES
  • 50 3 42 2 37 6 71 5 55 6 44 6 38
    10 33 2
  • 41 4 27 1 33 1 49 3 66 8 49 5 55
    4 41 2 29 0 40 3 41 2 48 5 52 4 55
    6 49 5 60 2

66
  • proc shewhart data table2_2
  • pchart InfectionsMonth / subgroupn
    Patients

  • outtable CLtable
  • run
  • DATA page64
  • MERGE table2_2 CLtable (KEEP _SUBP_
    _UCLP_)
  • run
  • PROC print
  • RUN
  • quit

67
2.5 Testing with Open-Ended Count Data
  • Let X denote the number of items returned per
    week say.
  • X roughly has a Poisson distribution, i.e.,

68
Control Limits on the Number Returned Items
  • For Poisson distribution, the mean equals the
    variance.
  • The control limits of the number returned items
    are
  • A chart with these control limits is called a c
    chart.
  • A c chart should not be used if lots are of
    unequal sizes, instead, use u chart.

69
  • data table2_4
  • Week 1
  • input numReturned _at__at_
  • datalines
  • 22 13 28 17 22 29 32 17 19 27 48 53 31 22 31
    27 20 24 17 22 29 30 31 22 26 24
  • proc print run

70
c Charts
  • symbol color red h .8
  • title1 'c Chart for Number of Returned Items Per
    Week'
  • proc shewhart datatable2_4
  • cchart numReturnedWeek
  • run

71
u Charts
  • Suppose the sample size in lot k is nk, and the
    number defects in lot k is ck, then the number of
    defects per unit in lot k is uk ck/nk.
  • The control limits on the average number of
    defects per unit are

72
  • Example In a fabric manufacturing process, each
    roll of fabric is 30 meters long, and an
    inspection unit is defined as one square meter.
    Thus, there are 30 inspection units in each
    subgroup sample. Suppose now that the length of
    each piece of fabric varies. The following
    statements create a SAS data set (FABRICS) that
    contains the number of fabric defects and size
    (in square meters) of 25 pieces of fabric

73
  • data fabrics
  • input roll defects sqmeters _at__at_
  • datalines
  • 1 7 30.0 2 11 27.6 3 15 30.4 4 6 34.8 5 11
    26.0 6 15 28.6 7 5 28.0 8 10 30.2 9 8 28.2 10 3
    31.4 11 3 30.3 12 14 27.8 13 3 27.0 14 9 30.0 15
    7 32.1 16 6 34.8 17 7 26.5 18 5 30.0 19 14 31.3
    20 13 31.6 21 11 29.4 22 6 28.6 23 6 27.5 24 9
    32.6 25 11 31.7

74
  • The variable ROLL contains the roll number, the
    variable DEFECTS contains the number of defects
    in each piece of fabric, and the variable
    SQMETERS contains the size of each piece.
  • The following statements request a u chart for
    the number of defects per square meter
  • symbol color vig
  • title 'u Chart for Fabric Defects per Square
    Meter'
  • proc shewhart datafabrics
  • uchart defectsroll / subgroupn sqmeters
  • cframe
    steel
  • cinfill
    ligr
  • coutfill
    yellow
  • cconnect
    vig

  • outlimits flimits
  • run

75
(No Transcript)
76
data abc input lot _at_ do i1 to 5 input
diamtr _at_ output end drop i cards 1 35.00
34.99 34.99 34.98 35.00 2 35.01 34.99 34.99 34.98
35.00 3 34.99 35.00 35.00 35.00 35.00 4 35.00
35.00 34.99 35.01 34.98 5 34.99 34.99 34.99 35.00
35.00 proc print dataabc noobs run
77
Constructing Control Charts With Summary Data In
SAS
  • data abc
  • input lot Ax As
  • An 5
  • cards
  • 1 34.992 0.02
  • 2 34.994 0.03
  • 3 34.998 0.01
  • 4 34.996 0.03
  • 5 34.994 0.01
  • title Mean and Standard Deviation Charts for
    Diameters
  • symbol vdot
  • proc shewhart historyabc
  • xschart Alot
  • run
  • quit

78
STAT 424/524Statistical Design for Process
Improvement
  • Lecture 3
  • The development of mean and standard deviation
    control charts

79
Homework 4
  • Page 127-128 problems 3.1(a), 3.4, 12, 13, 14

80
3.1 Introduction
  • In a process of manufacturing bolts, which are
    required to have a 10 cm diameter, we often
    actually observe bolts of diameter other than 10
    cm. This is a consequence of flaws in the
    production process.
  • These imperfections might be excessive lubricant
    temperature, bearing vibration, or nonstandard
    raw materials, etc.
  • In SPC, these flaws can often be modeled as
    follows
  • Let Y observed diameter. Then Y N(µ, s2).

81
  • To model a process clearly taking account of
    possible individual flaws, we write the observed
    measurement in lot t, Y(t), as
  • Here we have assumed additive flaws, representing
    k assignable causes. There may be, in any lot t,
    as many as 2k possible combinations of flaws
    contributing to Y(t).

82
  • Let I be a subcollection of 1, 2, 3,..., k.
    Then
  • In the special case where each distribution is
    normal,
  • In the above discussion, X0 accounts for the
    common cause while other Xs represents special
    causes. The major task of SPC is to identify
    these special causes and to take steps which
    remove them.

83
3.2 A Contaminated Production Process
  • Continue our discuss in section 1. Let X0 N(µ0,
    s02), where µ0 10 and s02 0.01.
  • In addition, we have one special cause due to
    intermittent lubricant heating, say X1 which is
    N(0.4, 0.02), and another due bearing vibration,
    say X2 which is N(- 0.2, 0.08), with probability
    of occurrence p1 0.01 and p2 0.005,
    respectively.
  • So, for a sampled lot t, Y(t) can be written as

84
Or, Y(t) has the following distribution
One can verify that
85
3.3 Estimation of Parameters of the Norm
Process
  • The norm process refers to a uncontaminated
    process whose mean and variance can be estimated
    respectively by
  • Properties
  • Both are unbiased
  • Both are asymptotically normal.

86
Estimating the Process Standard Deviation, s
  • An intuitive estimator for s is the square root
    of
  • A more commonly used estimator is the average of
    lot standard deviations

87
The Famous Result
88
  • One can thus show that
  • So, an unbiased estimator of the process standard
    deviation, s, is

89
  • One can also show that
  • This is because
  • So, another unbiased estimator of the process
    standard deviation, s, is

90
Which One is More Efficient?
91
(No Transcript)
92
Using the (Adjusted) Average of Lot Ranges to
Estimate the Process Standard Deviation
93
Using the (Adjusted) Median of Lot Standard
Deviations to Estimate the Process Standard
Deviation
94
3.4 Robust Estimators for Uncontaminated Process
Parameters
  • Suppose that the proportion of good lots is p and
    the proportion of bad lots is 1 p.
  • Suppose that data in a good lot come from a
    normal distribution with mean µ0 and standard
    deviation s0.
  • Suppose that data in a bad lot come from a normal
    distribution with mean µ1 and standard deviation
    s1.

95
  • How can we construct a control chart for lot
    means, based on data from the above contaminated
    distribution?
  • A solution The control limits are

96
A Simulation Study
  • Lets generate 90 lots of size 5 each. A lot is
    good with probability p 70.
  • Data in a good lot are from N(10, 0.01).
  • Data in a bad lot are from N(9.8, 0.09).
  • Data simulated in Excel.

97
A SAS Program Generate Good and Bad Lots
data table3_3 retain mu0 10retain mu1
9.8retain sigma0 0.1retain sigma1 0.3 do
Lot 1 to 90 u ranuni(12345) if (u lt 0.7)
then do j 1 to 5 x mu0
sigma0rannor(1) /good lot data/ output e
nd else do j 1 to 5 x mu1
sigma1rannor(1) /bad lot data/ output en
d end keep Lot x Run proc print
run symbol v dot c red proc
shewhart xschart xLot run quit
98
R Program Generate Data from Good and Bad Lots
mu0 100 mu1 9.8 sigma0 0.01 sigma1
0.09 p 0.7 n 90 k5 x matrix(0,n,k)
for(i in 1n) u lt- runif(1) if (u lt p)
xi, rnorm(k, mu0, sigma0) from good lot
else xi, rnorm(k, mu1, sigma1) from
bad lot x
99
3.5 A Process with Mean Drift3.6 A Process with
Upward Drift in Variance
  • Skip

100
3.7 Charts for Individual Measurements
  • Grouping as many measurements as possible into a
    lot is important in order to get more accurate
    estimates of the population mean and standard
    deviation.
  • This is not possible in some situations.
  • The reason typically is that either the
    production rate is too slow or the production is
    performed under precisely the same conditions
    over short time intervals.

101
Construct Control Charts for Individual
Measurements
  • Lets consider 90 observations coming from
  • N(10, 0.01) with probability
    0.855,
  • N(10, 0.01) with probability 0.095,
  • N(10, 0.01) with probability
    0.045, and
  • N(10, 0.01) with probability
    0.005.
  • We construct a scatterplot of the 90 measurements
    against corresponding lot numbers. The control
    limits are

102
Moving Ranges Control Charts for Individual
Measurements
  • A better control charts for Individual
    Measurements are based on artificial lots of size
    2 or 3 and calculate the so-called moving ranges.
  • Given N individual measurements, X1, X2, ...,XN,
    the ith moving range of n observations is defined
    as the difference between the largest and the
    smallest value in the ith artificial lot formed
    from the n measurements Xi, Xi1 , ...,Xin-1,
    where i 1, 2, ..., N n 1.

103
  • The average of N n 1 moving ranges is
  • The moving range control chart has
  • D3(n) and D4(n) are given in Table 3.9 of the
    text, p115.

104
  • Given N individual measurements, X1, X2, ...,XN,
    the N 1 moving ranges of n 2 observations are
    defined as
  • MR1 X2 - X1, MR2 X3 X2,...,
    MRN-1 XN XN-1.
  • The average of N 1 moving ranges is

105
X Charts based on Moving Ranges
  • For given individual measurements, X1, X2,
    ...,XN, the X chart is constructed by estimating
    the population standard deviation s, the product
    of b2 and the the average of the moving ranges.
  • The control limits are

106
SAS Code for X Charts based on Moving Ranges
data table3_6 mu0 10 mu1 10.4 mu2 9.8
mu3 10.2 sigma0 0.1 sigma1 sqrt(0.03)
sigma2 0.3 sigma3 sqrt(0.11) do Lot 1 to
90 u ranuni(12345) if (u lt 0.855) then
do x mu0 sigma0rannor(1) output
end else if (u lt 0.95) then do x mu1
sigma1rannor(1) output end else if
(u lt 0.995) then do x mu2
sigma2rannor(1) output end else do
x mu3 sigma3rannor(1) output end end
keep Lot x Run proc print Run symbol v
dot c red proc shewhart xchart
xLot run quit
107
3.8 Process Capability
  • We have discussed some control charts for
    detecting Pareto glitches in a process. Whether
    an in control process meets some technological
    specification is another important issue.
  • It is summary statistics for lots that are
    examined for the purpose of controlling a
    process, while individual measurements are
    compared to specifications.
  • The capability of an in-control process in
    relation to technological specifications is
    measured by some indices.

108
The Cp Index
109
The Cp Index for Nonnormal Data
110
The Cpk Index
  • The index Cp does not account for process
    centering.
  • To account for process centering, use

111
  • Example Use Table 3.10 on textbook page 121.
    Calculate Cp and Cpk. Suppose that the diameter
    was specified to 6.75 mm with tolerances /- 0.1
    mm that is LSL 6.65 mm and USL 6.85 mm.
  • Solution From the table,

112
  • http//www.itl.nist.gov/div898/handbook/pmc/sectio
    n1/pmc16.htm

113
SAS proc capability
data amps label decibels 'Amplification in
Decibels (dB)' input decibels _at__at_ datalines
4.54 4.87 4.66 4.90 4.68 5.22 4.43 5.14 3.07
4.22 5.09 3.41 5.75 5.16 3.96 5.37 5.70 4.11 4.83
4.51 4.57 4.16 5.73 3.64 5.48 4.95 4.57 4.46 4.75
5.38 5.19 4.35 4.98 4.87 3.53 4.46 4.57 4.69 5.27
4.67 5.03 4.50 5.35 4.55 4.05 6.63 5.32 5.24 5.73
5.08 5.07 5.42 5.05 5.70 4.79 4.34 5.06 4.64 4.82
3.24 4.79 4.46 3.84 5.05 5.46 4.64 6.13 4.31 4.81
4.98 4.95 5.57 4.11 4.15 5.95 run
title 'Boosting Power of Telephone Amplifiers'
legend2 FRAME CFRAMEligr CBORDERblack
POSITIONcenter proc capability dataamps
noprint alpha0.10 var decibels spec target
5 lsl 4 usl 6 ltarget 2 llsl
3 lusl 4 ctarget red
clsl yellow cusl yellow histogram decibels
/ cframe ligr cfill steel cbarline white
legend legend2 inset cpklcl cpk cpkucl /
header '90 Confidence Interval' cframe black

ctext black cfill ywh format 6.3
run
114
(No Transcript)
115
The following statements can be used to produce a
table of process capability indices including the
index Cpk ods select indices proc capability
dataamps alpha0.10 spec target 5 lsl 4
usl 6 ltarget 2 llsl 3
lusl 4 var decibels run
116
STAT 424/524Statistical Design for Process
Improvement
  • Lecture 4
  • Sequential Approaches

117
Homework 5
  • Page 166 problems 4.1 (do the first part only)
    and 4.10

118
4.1 Introduction
  • The first three chapters deal with Shewhart
    control charts, which are useful in detecting
    special cause variation.
  • A major disadvantage of a Shewhart control chart
    is that it uses only the information about the
    process contained in the last sample observation
    and it ignores any information given by the
    entire sequence of points. This feature makes the
    Shewhart control charts insensitive to small
    process shift.
  • This chapter deal with two alternatives to the
    Shewhart control charts cumulative sum (CUSUM)
    control charts and Exponentially Weighted Moving
    Average (EWMA) control charts, both are sensitive
    to small process drifts.

119
4.2 The Sequential Likelihood Ratio Test
  • Suppose we have a time ordered data set x1, x2,
    , xn coming from a distribution with density
    f(x ?). We may wish to test whether the true
    parameter is ?0 or ?1.
  • A natural criterion for deciding between the two
    parameters is the log-likelihood ratio

120
Decision Rule
  • We propose the following decision rule
  • When z ln(k0), (x1, x2, , xn) being said
    in region Gn0, we decide for ?0
  • When z ln(k1), (x1, x2, , xn) being said
    in region Gn1, we decide for ?1
  • Otherwise, we are in region Gn, and we
    continue sampling.

121
  • Before we make our decision, our sample (x1, x2,
    , xn) falls in one of three regions
  • Gn0, Gn1, and Gn.
  • Denote the true parameter by ?. The probability
    of ever declaring for ?0 is given by
  • L(?) P(G10?) P(G20?)
  • By the definition of Gn0, we have

122
Let us suppose that if ? is truly equal to ?0, we
wish to have L(?0) 1 - ?. Let us suppose that
if ? is truly equal to ?1, we wish to have L(?1)
?. Here ? and ? are customarily referred to as
Type I and Type II errors. Then, we must have
? L(?1) k0L(?0) k0(1 - ?). So, By a
similar argument for Gn1, we have In
practice, choose

123
We can show that the actual Type I and Type II
errors, say ? and ? satisfy

124
4.3 CUSUM Test for Shift of the Mean
  • To detect a shift of the mean of a production
    process from ?0 to some other value ?1, we
    consider a sequential test.
  • We assume that the process variance is known and
    does not change.
  • We propose a test on the basis of the
    log-likelihood ratio of N sample means, each
    sample being of size n. The test statistic is

125
The test statistic R1 is based on a cumulative
sum. It is not so much oriented to detecting
Pareto glitches, but rather to discovering a
persistent change in the mean.
126
  • For given Type I and Type II errors, say ? 0.01
    and ? 0.01, by the sequential test procedure,
  • The CUSUM chart is a plot of R1, based on data up
    to the jth sample, versus j, j 1, 2, , N,
    added with control limits ln(k0) and ln(k1).

127
data table3_6 retain mu0 10 mu1 10.4 mu2 9.8
mu3 10.2 retain sigma0 0.1 sigma1 0.1732
sigma2 0.3 sigma3 0.3316625 array x(5) x1 x2
x3 x4 x5 do Lot 1 to 90 u
ranuni(12345) if (u lt 0.855) then
do do i 1 to 5 xi mu0
sigma0rannor(1) end output end
else if (u lt 0.95) then do do i 1 to
5 xi mu1 sigma1rannor(1) end
output end else if (u lt 0.995) then
do do i 1 to 5 xi mu2
sigma2rannor(1) end output end
else do do i 1 to 5 xi mu3
sigma3rannor(1) end output end end Keep
Lot x1-x5 Run proc printrun Data means set
table3_6 sum mean(of
x1-x5) m sum/Lot R1 Lot(5/0.1)(m -
(10100.1)/2) LCL -4.595 UCL
4.595 Keep Lot R1 LCL UCL Run
symbol1 v dot c blue r 1 symbol2 v dot c
red r 1 symbol3 v dot c blue r 1 proc
gplot data means plot (LCL R1
UCL)Lot/overlay label R1 R1
statistic run data long set table3_6 Lot
_N_ array x5 x1-x5 do i1 to 5 y
xi output end keep Lot y Run Proc
cusum data long xchart yLot/mu0 10.0
sigma0 0.1
delta 1
alpha 0.1
vaxis -20 to 80 Run quit
128
4.4 Shewhart CUSUM Charts
  • A popular empirical alternative to the CUSUM
    chart is the Shewhart CUSUM chart. This chart is
    based on the pooled cumulative/running means
  • Suppose that all the lot means are iid with
    common lot mean ?0 and lot variance ?02/n. Then
  • A Shewhart CUSUM Chart for mean shift is one that
    plots zi against I, along with the horizontal
    lines 3 and 3.

129
Shewhart CUSUM Charts
Data means2 set table3_6
sum mean(of x1-x5) m sum/Lot R2
sqrt(Lot5)/0.1(m-10) LCL -3 UCL
3 Keep Lot sum m R2 LCL UCL Run proc print
run symbol1 v dot c black r 1 symbol2 v
dot c red r 1 symbol3 v dot c red r
1 proc gplot data means2 plot (R2 LCL
UCL)Lot/overlay label R2 R2
statistic run quit
130
4.8 Acceptance-Rejection CUSUMs
  • Let p denote the proportion of defective goods
    from a production system.
  • Let p0 denote the target proportion deemed
    appropriate.
  • When p rises to p1, intervention will be
    introduced.
  • Let nj denote the size of lot j. Then the
    likelihood ratio is given by

131
(No Transcript)
132
CUSUM Test for Defect Data
  • To detect a process drift in mean, plot R5 versus
    Lot N. To horizontal lines R5 - 4.596 and R5
    4.596 are also plotted.
  • Any point in the plot that is above the line R5
    4.596 indicates a process drift in proportion.

133
data table4_8 Lot _N_ input defective
proportion_at__at_ size 100 cards
3 0.03 2 0.02 5 0.05 0 0.00 6 0.06 4 0.04 2 0.02
4 0.04 1 0.01 2 0.02 7 0.07 9
0.09 11 0.11 12 0.12 14 0.14 15 0.15 12 0.12 10
0.10 8 0.08 3 0.03 5 0.05 6 0.06 0
0.00 1 0.01 3 0.03 3 0.03 4 0.04 6 0.06 5 0.05 5
0.05 3 0.03 3 0.03 7 0.07 8 0.08
2 0.02 0 0.00 6 0.06 7 0.07 4 0.04 4 0.04
run data new set table4_8 p1 0.05 p0
0.03 xsum defective nsum size R5
log(p1/p0)xsum log((1-p1)/(1-p0))(nsum -
xsum) LCL -4.596 UCL 4.596 keep Lot R5
LCL UCL run proc print run symbol1 v dot c
black r 1 symbol2 v dot c red r
1 symbol3 v dot c red r 1 proc gplot data
new plot (R5 LCL UCL)Lot/overlay run quit
134
Shewhart CUSUM Test for Defect Data
Plot R6 against Lot, along with the two lines R6
- 3 and R6 3.
135
data table4_8 Lot _N_ input defective
proportion_at__at_ size 100 cards
3 0.03 2 0.02 5 0.05 0 0.00 6 0.06 4 0.04 2 0.02
4 0.04 1 0.01 2 0.02 7 0.07 9
0.09 11 0.11 12 0.12 14 0.14 15 0.15 12 0.12 10
0.10 8 0.08 3 0.03 5 0.05 6 0.06 0
0.00 1 0.01 3 0.03 3 0.03 4 0.04 6 0.06 5 0.05 5
0.05 3 0.03 3 0.03 7 0.07 8 0.08
2 0.02 0 0.00 6 0.06 7 0.07 4 0.04 4 0.04
run data new set table4_8 p1 0.05 p0
0.03 xsum defective nsum size R6
(xsum - p0nsum)/sqrt(nsump0(1-p0)) LCL -3
UCL 3 keep Lot R6 LCL UCL run proc print
run symbol1 v dot c black r 1 symbol2 v
dot c red r 1 symbol3 v dot c red r
1 proc gplot data new plot (R6 LCL
UCL)Lot/overlay run quit
136
STAT 424/524Statistical Design for Process
Improvement
  • Lecture 5
  • Exploratory Techniques for Preliminary Analysis

137
Homework 6
  • Page 220 problems 2, 4, 5, 6

138
5.2 The Schematic Plot The Boxplot
Maximum observation
Upper fence (not drawn) 1.5 (IQR) above
75th percentile
1.5 (IQR)
75th percentile

Mean (specified with SYMBOL1 statement)
Interquartile Range (IQR
Median

25th percentile
1.5 (IQR)
Whisker
Minimum observation
Lower fence (not drawn) 1.5 (IQR) below
25th percentile
BOXSTYLE schematic ( or schematicid, or
schematicidfar, if id statement used)
Observations that are outside the fences point to
Pareto glitches.
139
data myData retain mu0 10 mu1 10.4 mu2 9.8 mu3
10.2 retain sigma0 0.1 sigma1 0.1732 sigma2 0.3
sigma3 0.3316625 array x(5) x1 x2 x3 x4
x5 do Lot 1 to 90 u
ranuni(12345) if (u lt 0.855) then do
do i 1 to 5 xi mu0
sigma0rannor(1) end output end
else if (u lt 0.95) then do do i 1 to
5 xi mu1 sigma1rannor(1)
endoutput end else if (u lt 0.995) then
do do i 1 to 5 xi mu2
sigma2rannor(1) endoutput end
else do do i 1 to 5 xi mu3
sigma3rannor(1) output endoutput
end end keep Lot x1-x5Run Data
mean set myData lotMean mean (of x1-x5)
x "-" keep Lot lotMean x Run Symbol
v plus c blue title Box Plot of Lot
Means proc boxplot / create side-by-side
boxplot/ plot lotMeanx/ boxstyleschematicidfar
idsymbolcircle / identify obs. out of
fences or extremes/ id Lot label x ''
run
140
5.3 Smoothing by Threes
  • Signal is usually contaminated with noise. John
    Tukey developed the so-called 3R smooth method
    which somehow removes the jitters and enables one
    to better approximate the signal.
  • http//www.galaxy.gmu.edu/ACAS/ACAS00-02/ACAS00/Th
    ompsonJames/ThompsonJames.pdf

141
3R SAS or R Program
142
5.4 Bootstrapping
  • Most of the standard testing in SPC is based on
    the assumption that lot means are normally
    distributed.
  • This assumption is questionable because
    measurements may not be normal and lot sizes are
    usually small, say less than 10.
  • To avoid the normality assumption, one use
    resampling.
  • Bootstrapping is one of the resampling methods.

143
Bootstrapping Means
  • Suppose we have a data set of size n. We wish to
    construct a bootstrap confidence interval for the
    mean of the distribution from which the data were
    taken.
  • There are at least four methods for bootstrapping
    the mean.

144
The Percentile Method
  • The procedure is as follows
  • Select with replacement n of the original
    observations. Such a sample is called a bootstrap
    sample. Computer the mean of this bootstrap
    sample.
  • Repeat the resampling procedure B 10,000 times.
  • The B means are denoted as
  • Order theses means from smallest to largest.
  • Denote the 250th largest value and the 9750th
    largest value as a and b, respectively, then the
    95 percentile bootstrap confidence interval of
    the mean is a, b.

145
R program for the Percentile Method
boot function(x, B) n length(x) A
matrix(0, B, n) for (i in 1B)
Ai, sample(x, n, replace T)
A xc(2,5,1,8,3,2) D boot(x, 10000) y
apply(D, 1, mean) y holds 10000
means confidenceInterval c(y250, y9750)
146
Lunneborg's Method
  • Denote the mean of the original sample by
  • Clifford Lunneborg proposed to use
  • as the 95 confidence interval of the mean.

147
The Bootstrapped t Method
  • Denote the B bootstrap standard deviations as
  • Calculate the B t values
  • Order these t values from smallest to largest.
    Denote the 250th as a and the 9750th as b. Then
    the 95 bootstrapped t confidence interval is

148
The BCa Method
  • A better confidence interval for a parameter is
    constructed using the BCa (bias correction and
    acceleration) method.
  • One may be concerned with two problems. One is
    that the sample estimate may be a biased estimate
    of the population parameter.
  • Another problem is that the standard deviation of
    the sample estimate usually depends on the
    unknown parameter we are trying to estimate.
  • To deal with the two problems, Bradley Efron
    proposed the BCA method.
  • For details, refer to this paper.

149
5.5 Pareto and Ishikawa Diagrams
  • The Pareto diagram tells top management where it
    is most appropriate to spend resources in finding
    problems.
  • The Ishikawa diagram, also known as fishbone
    diagram or cause and effect diagram, is favored
    by some as a tool for finding the ultimate cause
    of a system failure. See an example of such a
    diagram on page 197.

150
Create Pareto Charts Using SAS
data failure3 input cause1-16
count cards Contamination 14 Corrosion
2 Doping 1 Metallization 2 Miscellaneous 3
Oxide Defect 8 Silicon Defec 1 run title
'Analysis of IC Failures' symbol color
salmon proc pareto datafailure3 vbar
cause / freq count
scale count interbar
1.0 last
'Miscellaneous' nlegend
'Total Circuits' cframenleg
ywh cframe green
cbars vigb run
151
5.6 A Bayesian Pareto Analysis for System
Optimization of the Space Station
152
STAT 424Statistical Design for Process
Improvement
  • Lecture 6
  • Introductory Statistical Inference and Regression
    Analysis

153
1.1 Elementary Statistical Inference
  • Population
  • Sample
  • Statistical inference
  • the endeavor that uses sample data to make
    decision about a population.
  • Statistic
  • Estimators and estimates
  • Random variable

154
Unbiasedness and Efficiency
  • Unbiasedness

155
Suppose ? is an unknown parameter which is to be
estimated from measurements x, distributed
according to some probability density function
f(x?). It can be shown that the variance of any
unbiased estimator of ? is bounded by the
inverse of the Fisher information I(?)
where the Fisher
information I(?) is defined by and
is the natural logarithm of the
likelihood function and E denotes the expected
value. The efficiency of is defined to be the
following ratio The sample mean and sample
median of a normal sample are both unbiased
estimators of the population mean. The sample
mean is more efficient.
(CramérRao lower bound)
156
Point and Interval Estimation
  • When we estimate a parameter ? by , we say
  • is a point estimator of ?.
  • Alternatively, we use interval to locate the
    unknown parameter ?. Such an interval contains
    the unknown parameter with some probability 1
    a. The interval is called a 1 a confidence
    interval.
  • A 95 confidence interval means that, when the
    random sampling procedure is repeated 1000 times,
    among the 1000 confidence intervals, about 950
    will cover the known parameter ?.

157
Confidence Intervals for the Mean of a Normal
Population
  • We consider a population that is normally
    distributed as N(µ, s2).
  • If the variance s2 is known, then the exact 1 a
    confidence interval for µ is
  • But, s is usually unknown. We estimate it by the
    sample standard deviation s. A new exact 1 a
    confidence interval for µ is

158
Normal-theory Based Confidence Interval for a
Parameter ?
  • The point estimator is usually normally
    distributed when sample size n is large (gt30),
    even for a non-normal population.
  • A 1 a confidence interval is then constructed
    as

159
Examples
data Heights label Height 'Height
(in)' input Height _at__at_ datalines
64.1 60.9 64.1 64.7 66.7 65.0 63.7 67.4 64.9
63.7 64.0 67.5 62.8 63.9 65.9 62.3 64.1 60.6
68.6 68.6 63.7 63.0 64.7 68.2 66.7 62.8 64.0
64.1 62.1 62.9 62.7 60.9 61.6 64.6 65.7 66.6
66.7 66.0 68.5 64.4 60.5 63.0 60.0 61.6 64.3
60.2 63.5 64.7 66.0 65.1 63.6 62.0 63.6 65.8
66.0 65.4 63.5 66.3 66.2 67.5 65.8 63.1 65.8
64.4 64.0 64.9 65.7 61.0 64.1 65.5 68.6 66.6
65.7 65.1 70.0 run title 'Analysis of
Female Heights' proc univariate dataHeights
mu0 65 alpha 0.05 normal var
Height histogram Height qqplot Height
probplot Height run
160
Confidence Interval for Difference between Two
Means of Normal Populations with Unequal Known
Variance
161
Confidence Interval for Difference between Two
Means of Normal Populations with Equal Unknown
Variance
162
Confidence Interval for Difference between Two
Means (Equal Unknown Variances), When Sample
Sizes Are Large
163
Examples
164
Confidence Interval for a Proportion
165
SAS Procedure for a Proportion PROC FREQ
data Color input Region Eyes Hair
Count _at__at_ label Eyes 'Eye Color'
Hair 'Hair Color'
Region'Geographic Region'
datalines 1 blue fair 23 1 blue red
7 1 blue medium 24 1 blue dark 11 1
green fair 19 1 green red 7 1 green
medium 18 1 green dark 14 1 brown fair 34
1 brown red 5 1 brown medium 41 1 brown
dark 40 1 brown black 3 2 blue fair
46 2 blue red 21 2 blue medium 44 2
blue dark 40 2 blue black 6 2 green
fair 50 2 green red 31 2 green medium 37
2 green dark 23 2 brown fair 56 2 brown
red 42 2 brown medium 53 2 brown dark
54 2 brown black 13 proc freq
dataColor orderfreq weight Count
tables Eyes / binomial alpha.1 tables
Hair / binomial(p.28) title 'Hair and Eye
Color of European Children' run
166
Confidence Interval for the Difference between
Two Proportions (Independent Samples)
167
Confidence Interval for the Difference between
Two Proportions (Paired Samples)
168
Examples
169
Tests of Hypotheses
  • The null hypothesis
  • The alternative hypothesis
  • Type I and type II errors
  • Level of significance

170
One Sample t-Test
title 'One-Sample t Test' data time input
time _at__at_ datalines 43 90 84
87 116 95 86 99 93 92 121 71 66 98 79 102 60 112
105 98 run proc ttest
h080 alpha 0.05 var time run
171
Two-Sample t-Test Comparing Group Means
  • Equal variance case
  • Unequal variance case

172
Two-Sample t-Test Comparing Group Means
title 'Comparing Group Means' data
OnyiahExample1_14 input machine speed _at__at_
datalines 1 1603 1 1604 1 1605 1 1605 1 1602
1 1601 1 1596 1 1598 1 1599 1 1602
1 1614 1 1612 1 1607 1 1593 1 1604 2 1602
2 1597 2 1596 2 1601 2 1599 2 1603 2 1604 2
1602 2 1601 2 1607 2 1600 2 1596 2
1595 2 1606 2 1597
run proc ttest / produce results for both
equal and unequal variances/ class machine
var speed run
Question How can you find the p-value for
one-sided test? Use symmetry.
173
Paired Comparison Paired t-Test
Pairs (i) Before Treatment After
Treatment Differences (di) 1
Y11
Y12 Y11 - Y12 2
Y21
Y22
Y21 Y22 3 Y31
Y32
Y31 Y32 . . . N
Yn1
Yn2 Yn1 - Yn2
174
Two-Sample Paired t-Test Comparing Group Means
title 'Paired Comparison' data pressure
input SBPbefore SBPafter _at__at_ d SBPbefore -
SBPafter datalines 120 128 124 131 130 131
118 127 140 132 128 125 140 141 135
137 126 118 130 132 126 129 127 135
Run proc univariate var d Run proc ttest
paired SBPbeforeSBPafter run
175
Operating Characteristic (OC) Curves
176
Find the Power of the Test for a Population Mean
  • Assume that we have the following
  • H0 ? 1500 (1500 is called the claimed
    value)
  • H1 ? lt 1500
  • Sample size n 20
  • Significance level ? 0.05
  • The population standard deviation is known ?
    110.
  • Question
  • (1) Find the power of the test, which is the
    probability of
  • rejecting the null hypothesis, given
    that the population
  • mean is actually 1450 (called
    alternative value).
  • (2) Find powers corresponding to any
    alternative ?. Plot the
  • power against ?.

177
  • Since the test is left-tailed, the rejection
    region is the left tail on the number line. The
    borderline value is - z? - 1.645. That is, the
    rejection can be written as
  • Replace ?0 1500, ? 110, and n 20 to solve
    the above inequalities
  • When ? is actually 1450, follows a normal
    distribution with mean 1450 and standard
    deviation
  • The power is

178
R codes for Power Calculation and Plot
power.mean function(mu01500,mu11450,sigma110,
n20, level 0.05, tailc("left", "two",
"right")) s sigma/sqrt(n) if (tail
"two") E qnorm(1-level/2)s c1 mu0
- E c2 mu0 E
pL pnorm(c1, mu1, s) pR pnorm(c2, mu1,
s) power 1 - pR pL else if (tail
"left") E qnorm(1 - level)s c1
mu0 - E
pL pnorm(c1, mu1, s) power pL
else E qnorm(1 - level)s c2 mu0
E pR pnorm(c2, mu1, s) power 1 - pR
return(power) power.mean(mu0 1500, mu1
1450, sigma 110, n 20, level 0.05, tail
"left") mu0 1500 mu seq(1350, 1550, by 1)
n 20 level 0.05 tail "left" power
power.mean(mu0 mu0, mu1 mu, n n, level
level, tail tail) plot(mu, power, type "l",
xlab expression(mu), col "blue", lwd 3) n1
30 power
power.mean(mu0 mu0, mu1 mu, n n1, level
level, tail tail) lines(mu, power, type "l",
col "red", lwd 3) abline(v1500) legend(1352,
0.3, legendc(paste("Claimed ", mu0),
paste("Level ", level), paste("Sample
Size ", n), paste("Sample Size ", n1)),
text.col c(1,1,4,2))
179
1.2 Regression Analysis
  • Suppose that the true relationship between a
    response variable y and a set of predictor
    variable x1, x2, , xp is y f(x1, x2, , xp).
  • But, due to measurement error, y may be observed
    as
  • y f(x1, x2, , xp) ?.
    ()
  • If the assumption that ? is distributed as N(0,
    ?2), then it is said that we have a normal
    regression model.
  • If f(x1, x2, , xn) ?0 ?1x1 ?2x2
    ?pxp, then the model is called a normal linear
    regression model.

180
The Ordinary Least Squares Method
  • Suppose that n observations (xi1, xi2, , xip,
    yi), i 1, 2, , n are available from an
    experiment or a pure observational study.
  • Then the model () can be written as
  • yi f(xi1, xi2, , xip) ?i, i
    1, 2, , n.
  • Suppose that f has a known form. To estimate the
    function f, a traditional method is the least
    squares method.
  • The method starts from minimizing the error sum
    of squares

181
  • Suppo
Write a Comment
User Comments (0)
About PowerShow.com