Data - PowerPoint PPT Presentation

1 / 21

About This Presentation

Title:

Data

Description:

... to estimate the same characteristics for the whole population ... Likely to be representative of whole population. Ironically, requires careful planning ... – PowerPoint PPT presentation

Number of Views:14

Avg rating:3.0/5.0

Slides: 22

Provided by: ohio8

Category:

Tags: data | whole

more less

Transcript and Presenter's Notes

Title: Data

1
Data Sampling Summary StatisticsModels for
Parameter Estimation and Multifactor Effects

Engineering Experimental Design
Valerie Young

2
In Todays Lecture

Principles of sampling
Summary statistics
Location
Variability
Models

3
Principles of Sampling
4
Sampling

Goal use statistics based on analysis of a
sample to estimate the same characteristics for
the whole population
Random sampling best choice
Likely to be representative of whole population
Ironically, requires careful planning
Uses a consistent, predetermined protocol

5
Questions

True or false you can always get more accurate
characterization of a population if you measure
every element instead of sampling.
Which is a better example of random sampling?
The operator takes a sample at 10 past the hour
every hour.
The operator takes a sample once an hour at
whatever time he gets a chance

6
Replicate Measurements

Goal determine how much of the variability in y
is due to the way the experiment is done, and not
due to the factors you want to test.
Your definition of replicate determines what
effects (or sources of variability) are included
in the uncertainty you determine.
Watching the flow meter for awhile could be
considered making replicate measurements.

7
Questions

Suppose that you watch a thermocouple reading for
several minutes, and observe that it varies by no
more than ?1 C while you are watching.
What might cause this variability?
The manufacturer specifies an accuracy of 2 C.
What additional sources of uncertainty might be
included in the manufacturers estimate that you
cannot see by watching this single readout?

8
Summary Statistics
9
Location

To define a typical value for your data, try
Mean, x
Sum of values / Number of values
Susceptible to any outliers
Median, x0.5
Middle value
Not altered by a couple of outliers
Typical value alone loses information about
variability, time dependence, etc.
The typical value for your sample is your best
estimate of the true value (or location for
whole population).

1,2,3,4,10 Mean ? Median ?
10
Location

To define a typical value for your data, try
Mean, x
Sum of values / Number of values
Susceptible to any outliers
Median, x0.5
Middle value
Not altered by a couple of outliers
Typical value alone loses information about
variability, time dependence, etc.
The typical value for your sample is your best
estimate of the true value (or location for
whole population).

1,2,3,4,10 Mean 20/5 4 Median 3
11
Variability

To define the variability of your data, try
Standard deviation, s
sqrt(Sxx / (n-1))
Variance, s2
Sxx / (n-1)
Interquartile range, IQR
x0.75 x0.25
Less susceptible to outliers than s or s2

Sxx means the sum of the squared differences
between each data point (xi) and the mean of all
data points (x-bar). Sxx ?((xi x)2).
Thus, s and s2 measure how widely distributed the
data are around their mean.
1,4,15,16,17,25,50,90 x0.25 ?, x0.75 ?
12
Variability

To define the variability of your data, try
Standard deviation, s
sqrt(Sxx / (n-1))
Variance, s2
Sxx / (n-1)
Interquartile range, IQR
x0.75 x0.25
Less susceptible to outliers than s or s2

1,4,15,16,17,25,50,90 x0.25 15, x0.75 25
13
Getting Uncertainty from Replicate Measurements

With replicate measurements, the mean is commonly
reported as the best estimate of the true value.
Uncertainty may be described by standard
deviations or confidence limits (covered later).
1 s.d. or 2 s.d. are both common choices
Propagation of error on the calculation of the
mean is NOT appropriate.
Error propagation makes the uncertainty grow with
more math operations. You should be more certain
of your answer with more replicates, not less.

14
Questions

After carefully controlling all the chemical
reagents and conditions during a reaction, the
researcher weighs the product on an electronic
balance five times, removing and replacing the
same sample on the balance each time.
What measurement is being replicated?
What sources of uncertainty are characterized by
the standard deviation of the five weighings?
What would you do to determine the uncertainty on
the reaction yield?

15
Mathematical Models for Experimental Data
16
Single-Factor Experiment

Hypothesis The height of a chemical engineering
student depends on the students gender.
Population All U.S. chemical engineering
undergraduates
Sample Students in ChE 408 W03 at OU
Whether this sample is representative of all ChE
students could certainly be questioned, but lets
go with it.
Factor (independent variable to be investigated)
Gender
Response (dependent variable to be investigated)
Height

17
Model for Single-Factor Experiment
Table 1. Self-reported heights of students in
ChE 408 in Winter 03
From this sample of 10 women and 19 men, male
chemical engineering students are taller than
their female counterparts, with heights of (72
2) in and (64 3) in, respectively. The
uncertainties represent the standard deviations
of the data.
The Model Heightfemale,j 64 inches
?j Heightmale,i 72 inches ?i

Every model consists of 2 parts
The predictable relationship between the factor
and response.
The random variability

18
Two-Factor Experiment

Hypothesis The height of a chemical engineering
student depends on the students gender and
whether his/her last name starts with A-L or M-Z.
Population All U.S. chemical engineering
undergraduates
Sample Students in ChE 408 W03 at OU
Factors (independent variables to be
investigated) Gender, First Letter of Last Name
Response (dependent variable to be investigated)
Height

19
Two-Factor Experiment

Gender appears to have an important effect.
Alphabet appears not to have an important effect.
How do we quantify these effects?
What about interaction?
Does alphabet modify the effect of gender?

20
Two-Factor Experiment
Mean
Crossing here doesnt count. There is no gender
after Male.
Page 304 of text shows a plot with interaction.
Lines dont cross so no interaction
21
Model for Two-Factor Experiment with No
Interaction

Heightfemale,A-L,i 69.4 in (-5.597 in)
(-1.014 in) ?I
Every model consists of two parts
Variability due to predictable relationship
between response and factors (often the only part
of the model that is written)
Random variability (also called error or
uncertainty)

Overall mean of all data in sample (best estimate
of true mean of population)
(Mean height of females) (overall mean of
data). (best estimate of main effect of being
female)
Error
(Mean height of A-L) (overall mean of data).
(best estimate of main effect of being A-L)

Write a Comment

User Comments (0)