Title: Statistical Analysis
1Session VI Statistical Analysis
2Manufacturing Scenario
- Aluminum castings
- Important factor Hardness
- Measured with Brinell units
- Possibly affected by
- Machine and/or Operator
- Chemistry (Iron, Zinc, Manganese)
- Physics (Pressure, Temperature)
- Minimum acceptable hardness is 70 HB
3Design of Experiments
- Many possible type of designs (random, blocked,
Latin square, etc.) - Should be driven by a theory or hypothesis
- Make sure that if the hypothetical effect is in
fact present, - the design used has a good chance of detecting
it, (small chance of Type II error more on that
later) and - there will be no other reasonable explanation (a
key driver of DOE)
4Design of Experiments
- Include measurement of factors you would like to
test for - No need for independent variables that do not
vary - Matched pairs gt independent samples
5Design of Experiments
- One possible approach for the aluminum problem
6- Process is highly variable
- Some castings do no meet the 70 HB target
7Inferential Statistics
- Estimation
- Confidence Intervals
- Sample Size Determination
- Design of Experiments
- Hypothesis Testing
- Classical Method
- p-Values
- Analysis of Variance
- Regression Analysis
- Analysis of Variance in Regression
- High Level Measures
- Hypothesis Tests for Independent Variables
8Estimation
- Fundamental difference between Probability and
Statistics - Probability is making an inference about an
unknown sample from a known population (useful
for developing theory) - Statistics is making an inference about an
unknown population from a known sample (useful in
the real world) - Estimation is a statistical tool using sample
data to make a probabilistic statement about some
unknown population parameter - Mean
- Variance
- Proportion
- Differences between Parameters
9Estimation Confidence Intervals
General form of a confidence interval Measure
of Central Tendency ? (Number of Standard
Errors)(Measure of Dispersion)
sample mean, sample proportion, etc.
usually z or t
standard error of mean, etc.
10Estimation Confidence Intervals
11Example Confidence Interval for Population Mean
We are 95 confident that the true population
mean is between 69.75 and 84.27 HB.
12Estimation Sample Size
Our sample of 9 castings has a confidence
interval 15 HB wide maybe too wide for
managerial decision making. How many data would
we need to have a 95 confidence interval within
1 HB?
13Estimation Sample Size
14Estimation Sample Size
- We would need 343 observations (assuming the
standard deviation is no more than 9.441 HB). - Slightly different formula for proportions
15Hypothesis Testing Classical Method
16Hypothesis Testing Classical Method
17Example Step 1
18Example Step 2
19Example Step 3
20Example Step 3
T distribution centered on 80
5 probability In lower tail
Critical value 1.86 standard errors
below Hypothesized mean
21Example Step 4
22Example Step 4
Observed value 0.95 standard errors
below Hypothesized mean
Critical value 1.86 standard errors
below Hypothesized mean
23(No Transcript)
24(No Transcript)
25Hypothesis Testing p-values
- Note that the classical method only yields a
reject or do not reject decision - Not helpful in situations where different people
have different tolerances for Type I Error risk - We would like to know
- How far from the hypothesized value was it, in
standardized terms? (provided by the test
statistic) - How unlikely would this result be, if the null
hypothesis were true? (provided by the p-value)
26If the null hypothesis is true, we would see a
sample mean this low or lower 18.5 of the time.
27Hypothesis Testing p-values
- English translation If the true mean were really
80 HB, we would see a sample mean this far below
80 or farther 18.5 of the time. - Since our alpha is 5, we dont consider this to
be strong evidence against the null hypothesis
28Hypothesis Testing Type II Errors
- Any time we fail to reject, we might be
committing a Type II Error. - In this case, maybe the true mean is less than 80
and our sample didnt provide enough information
for us to realize this. - What if the true mean had drifted down to 75HB?
Would our test be able to detect this shift?
29Hypothesis Testing Type II Errors
Hypothesized distribution centered on 80 HB
True distribution centered on 75 HB
Critical value 1.86 standard errors below
hypothesized mean 0.27 standard errors below
true mean of 75 HB
30Hypothesis Testing Type II Errors
60.3 chance of not rejecting a false hypothesis!
31Hypothesis Testing Analysis of Variance
- Useful for testing for differences between more
than two means - The F test, named for Fisher
32The F Distribution
33The F Distribution
- Has only one tail cant be negative
- Central to ANOVA and regression analysis
- Based on the ratio of explained to unexplained
variability - Two degrees of freedom numbers
34The F Test
- Null Hypothesis Three types of machines produce
aluminum castings with equal mean hardness. - Alternative Hypothesis At least one of the
machines produces aluminum castings with mean
hardness not equal to the others. - Test Statistic F
- Decision Rule Critical Value depends on
numerator and denominator degrees of freedom, and
our acceptable risk of Type I Error.
35One-way ANOVA
36One-way ANOVA
37One-way ANOVA
38One-way ANOVA
39One-way ANOVA
40One-way ANOVA
41One-way ANOVA
42One-way ANOVA
43One-way ANOVA
44One-way ANOVA
45One-way ANOVA
46One-way ANOVA
47One-way ANOVA
48One-way ANOVA
49One-way ANOVA
Tools Data Analysis (need to have Analysis
ToolPak installed Tools Add-Ins)
50One-way ANOVA
51One-way ANOVA
52Two-way ANOVA
53Two-way ANOVA
54ANOVA
- Advantages
- Good for qualitative (categorical) data
- Can easily handle multiple categories
- Flexible in terms of sample sizes
- Disadvantages
- Not especially useful for continuous data (or
discrete data with many possible values) - Most ANOVA procedures can be done equivalently
using regression, which is not true in reverse
55Regression Analysis
56Regression Analysis
57Correlation Analysis
58Correlation Analysis
59Regression Analysis
60Regression Analysis
61Regression Analysis
62Regression Analysis
63Regression Analysis
64Regression Analysis ANOVA
65Regression Analysis High-level Measures
66Regression Analysis Tests for Independent
Variables
67Model Building
- Enter
- Start with one independent variable (logically,
the one with the strongest correlation with the
dependent variable) - Add new independent variables one by one, in
order of correlation strength to the dependent
variable - Try to maximize Adjusted R square
- Remove
- Start with all possible independent variables
- Remove independent variables (logically, on the
basis of highest p-value) - Try to maximize Adjusted R square
- In Both Procedures
- Watch out for multicollinearity
687 Variables
696 Variables
705 Variables
714 Variables (a)
724 Variables (b)
734 Variables (c)
74Summary of Six Models
75Regression Analysis
- Problems with these data
- Multicollinearity between zinc and machine types
- Too few degrees of freedom (because of sample
size) - Possible Type I and Type II errors
76Conclusions?
- Preliminary Findings
- Machines matter (Burugula machine really sucks)
- Possible interactions between operators and
machines - Pressure also seems to matter
- Zinc?
- Next Steps
- Look for root causes of low pressure
- Study best practices of operators
- Collect more data
- Whats the deal with Zinc?
- DOE to avoid problems with multicollinearity
- Test theories about pressure
- Test theories about operator/machine interactions