Statistical Analysis - PowerPoint PPT Presentation

1 / 76

About This Presentation

Title:

Statistical Analysis

Description:

Start with one independent variable (logically, the one with the strongest ... Add new independent variables one by one, in order of correlation strength to ... – PowerPoint PPT presentation

Number of Views:19

Avg rating:3.0/5.0

Slides: 77

Provided by: david529

Category:

more less

Transcript and Presenter's Notes

Title: Statistical Analysis

1
Session VI Statistical Analysis
2
Manufacturing Scenario

Aluminum castings
Important factor Hardness
Measured with Brinell units
Possibly affected by
Machine and/or Operator
Chemistry (Iron, Zinc, Manganese)
Physics (Pressure, Temperature)
Minimum acceptable hardness is 70 HB

3
Design of Experiments

Many possible type of designs (random, blocked,
Latin square, etc.)
Should be driven by a theory or hypothesis
Make sure that if the hypothetical effect is in
fact present,
the design used has a good chance of detecting
it, (small chance of Type II error more on that
later) and
there will be no other reasonable explanation (a
key driver of DOE)

4
Design of Experiments

Include measurement of factors you would like to
test for
No need for independent variables that do not
vary
Matched pairs gt independent samples

5
Design of Experiments

One possible approach for the aluminum problem

Process is highly variable
Some castings do no meet the 70 HB target

7
Inferential Statistics

Estimation
Confidence Intervals
Sample Size Determination
Design of Experiments
Hypothesis Testing
Classical Method
p-Values
Analysis of Variance
Regression Analysis
Analysis of Variance in Regression
High Level Measures
Hypothesis Tests for Independent Variables

8
Estimation

Fundamental difference between Probability and
Statistics
Probability is making an inference about an
unknown sample from a known population (useful
for developing theory)
Statistics is making an inference about an
unknown population from a known sample (useful in
the real world)
Estimation is a statistical tool using sample
data to make a probabilistic statement about some
unknown population parameter
Mean
Variance
Proportion
Differences between Parameters

9
Estimation Confidence Intervals
General form of a confidence interval Measure
of Central Tendency ? (Number of Standard
Errors)(Measure of Dispersion)
sample mean, sample proportion, etc.
usually z or t
standard error of mean, etc.
10
Estimation Confidence Intervals
11
Example Confidence Interval for Population Mean
We are 95 confident that the true population
mean is between 69.75 and 84.27 HB.
12
Estimation Sample Size
Our sample of 9 castings has a confidence
interval 15 HB wide maybe too wide for
managerial decision making. How many data would
we need to have a 95 confidence interval within
1 HB?
13
Estimation Sample Size
14
Estimation Sample Size

We would need 343 observations (assuming the
standard deviation is no more than 9.441 HB).
Slightly different formula for proportions

15
Hypothesis Testing Classical Method
16
Hypothesis Testing Classical Method
17
Example Step 1
18
Example Step 2
19
Example Step 3
20
Example Step 3
T distribution centered on 80
5 probability In lower tail
Critical value 1.86 standard errors
below Hypothesized mean
21
Example Step 4
22
Example Step 4
Observed value 0.95 standard errors
below Hypothesized mean
Critical value 1.86 standard errors
below Hypothesized mean
23
(No Transcript)
24
(No Transcript)
25
Hypothesis Testing p-values

Note that the classical method only yields a
reject or do not reject decision
Not helpful in situations where different people
have different tolerances for Type I Error risk
We would like to know
How far from the hypothesized value was it, in
standardized terms? (provided by the test
statistic)
How unlikely would this result be, if the null
hypothesis were true? (provided by the p-value)

26
If the null hypothesis is true, we would see a
sample mean this low or lower 18.5 of the time.
27
Hypothesis Testing p-values

English translation If the true mean were really
80 HB, we would see a sample mean this far below
80 or farther 18.5 of the time.
Since our alpha is 5, we dont consider this to
be strong evidence against the null hypothesis

28
Hypothesis Testing Type II Errors

Any time we fail to reject, we might be
committing a Type II Error.
In this case, maybe the true mean is less than 80
and our sample didnt provide enough information
for us to realize this.
What if the true mean had drifted down to 75HB?
Would our test be able to detect this shift?

29
Hypothesis Testing Type II Errors
Hypothesized distribution centered on 80 HB
True distribution centered on 75 HB
Critical value 1.86 standard errors below
hypothesized mean 0.27 standard errors below
true mean of 75 HB
30
Hypothesis Testing Type II Errors
60.3 chance of not rejecting a false hypothesis!
31
Hypothesis Testing Analysis of Variance

Useful for testing for differences between more
than two means
The F test, named for Fisher

32
The F Distribution
33
The F Distribution

Has only one tail cant be negative
Central to ANOVA and regression analysis
Based on the ratio of explained to unexplained
variability
Two degrees of freedom numbers

34
The F Test

Null Hypothesis Three types of machines produce
aluminum castings with equal mean hardness.
Alternative Hypothesis At least one of the
machines produces aluminum castings with mean
hardness not equal to the others.
Test Statistic F
Decision Rule Critical Value depends on
numerator and denominator degrees of freedom, and
our acceptable risk of Type I Error.

35
One-way ANOVA
36
One-way ANOVA
37
One-way ANOVA
38
One-way ANOVA
39
One-way ANOVA
40
One-way ANOVA
41
One-way ANOVA
42
One-way ANOVA
43
One-way ANOVA
44
One-way ANOVA
45
One-way ANOVA
46
One-way ANOVA
47
One-way ANOVA
48
One-way ANOVA
49
One-way ANOVA
Tools Data Analysis (need to have Analysis
ToolPak installed Tools Add-Ins)
50
One-way ANOVA
51
One-way ANOVA
52
Two-way ANOVA
53
Two-way ANOVA
54
ANOVA

Advantages
Good for qualitative (categorical) data
Can easily handle multiple categories
Flexible in terms of sample sizes
Disadvantages
Not especially useful for continuous data (or
discrete data with many possible values)
Most ANOVA procedures can be done equivalently
using regression, which is not true in reverse

55
Regression Analysis
56
Regression Analysis
57
Correlation Analysis
58
Correlation Analysis
59
Regression Analysis
60
Regression Analysis
61
Regression Analysis
62
Regression Analysis
63
Regression Analysis
64
Regression Analysis ANOVA
65
Regression Analysis High-level Measures
66
Regression Analysis Tests for Independent
Variables
67
Model Building

Enter
Start with one independent variable (logically,
the one with the strongest correlation with the
dependent variable)
Add new independent variables one by one, in
order of correlation strength to the dependent
variable
Try to maximize Adjusted R square
Remove
Start with all possible independent variables
Remove independent variables (logically, on the
basis of highest p-value)
Try to maximize Adjusted R square
In Both Procedures
Watch out for multicollinearity