Data - PowerPoint PPT Presentation

1 / 21
About This Presentation
Title:

Data

Description:

... to estimate the same characteristics for the whole population ... Likely to be representative of whole population. Ironically, requires careful planning ... – PowerPoint PPT presentation

Number of Views:14
Avg rating:3.0/5.0
Slides: 22
Provided by: ohio8
Category:
Tags: data | whole

less

Transcript and Presenter's Notes

Title: Data


1
Data Sampling Summary StatisticsModels for
Parameter Estimation and Multifactor Effects
  • Engineering Experimental Design
  • Valerie Young

2
In Todays Lecture
  • Principles of sampling
  • Summary statistics
  • Location
  • Variability
  • Models

3
Principles of Sampling
4
Sampling
  • Goal use statistics based on analysis of a
    sample to estimate the same characteristics for
    the whole population
  • Random sampling best choice
  • Likely to be representative of whole population
  • Ironically, requires careful planning
  • Uses a consistent, predetermined protocol

5
Questions
  • True or false you can always get more accurate
    characterization of a population if you measure
    every element instead of sampling.
  • Which is a better example of random sampling?
  • The operator takes a sample at 10 past the hour
    every hour.
  • The operator takes a sample once an hour at
    whatever time he gets a chance

6
Replicate Measurements
  • Goal determine how much of the variability in y
    is due to the way the experiment is done, and not
    due to the factors you want to test.
  • Your definition of replicate determines what
    effects (or sources of variability) are included
    in the uncertainty you determine.
  • Watching the flow meter for awhile could be
    considered making replicate measurements.

7
Questions
  • Suppose that you watch a thermocouple reading for
    several minutes, and observe that it varies by no
    more than ?1 C while you are watching.
  • What might cause this variability?
  • The manufacturer specifies an accuracy of 2 C.
    What additional sources of uncertainty might be
    included in the manufacturers estimate that you
    cannot see by watching this single readout?

8
Summary Statistics
9
Location
  • To define a typical value for your data, try
  • Mean, x
  • Sum of values / Number of values
  • Susceptible to any outliers
  • Median, x0.5
  • Middle value
  • Not altered by a couple of outliers
  • Typical value alone loses information about
    variability, time dependence, etc.
  • The typical value for your sample is your best
    estimate of the true value (or location for
    whole population).

1,2,3,4,10 Mean ? Median ?
10
Location
  • To define a typical value for your data, try
  • Mean, x
  • Sum of values / Number of values
  • Susceptible to any outliers
  • Median, x0.5
  • Middle value
  • Not altered by a couple of outliers
  • Typical value alone loses information about
    variability, time dependence, etc.
  • The typical value for your sample is your best
    estimate of the true value (or location for
    whole population).

1,2,3,4,10 Mean 20/5 4 Median 3
11
Variability
  • To define the variability of your data, try
  • Standard deviation, s
  • sqrt(Sxx / (n-1))
  • Variance, s2
  • Sxx / (n-1)
  • Interquartile range, IQR
  • x0.75 x0.25
  • Less susceptible to outliers than s or s2

Sxx means the sum of the squared differences
between each data point (xi) and the mean of all
data points (x-bar). Sxx ?((xi x)2).
Thus, s and s2 measure how widely distributed the
data are around their mean.
1,4,15,16,17,25,50,90 x0.25 ?, x0.75 ?
12
Variability
  • To define the variability of your data, try
  • Standard deviation, s
  • sqrt(Sxx / (n-1))
  • Variance, s2
  • Sxx / (n-1)
  • Interquartile range, IQR
  • x0.75 x0.25
  • Less susceptible to outliers than s or s2

1,4,15,16,17,25,50,90 x0.25 15, x0.75 25
13
Getting Uncertainty from Replicate Measurements
  • With replicate measurements, the mean is commonly
    reported as the best estimate of the true value.
  • Uncertainty may be described by standard
    deviations or confidence limits (covered later).
  • 1 s.d. or 2 s.d. are both common choices
  • Propagation of error on the calculation of the
    mean is NOT appropriate.
  • Error propagation makes the uncertainty grow with
    more math operations. You should be more certain
    of your answer with more replicates, not less.

14
Questions
  • After carefully controlling all the chemical
    reagents and conditions during a reaction, the
    researcher weighs the product on an electronic
    balance five times, removing and replacing the
    same sample on the balance each time.
  • What measurement is being replicated?
  • What sources of uncertainty are characterized by
    the standard deviation of the five weighings?
  • What would you do to determine the uncertainty on
    the reaction yield?

15
Mathematical Models for Experimental Data
16
Single-Factor Experiment
  • Hypothesis The height of a chemical engineering
    student depends on the students gender.
  • Population All U.S. chemical engineering
    undergraduates
  • Sample Students in ChE 408 W03 at OU
  • Whether this sample is representative of all ChE
    students could certainly be questioned, but lets
    go with it.
  • Factor (independent variable to be investigated)
    Gender
  • Response (dependent variable to be investigated)
    Height

17
Model for Single-Factor Experiment
Table 1. Self-reported heights of students in
ChE 408 in Winter 03
From this sample of 10 women and 19 men, male
chemical engineering students are taller than
their female counterparts, with heights of (72
2) in and (64 3) in, respectively. The
uncertainties represent the standard deviations
of the data.
The Model Heightfemale,j 64 inches
?j Heightmale,i 72 inches ?i
  • Every model consists of 2 parts
  • The predictable relationship between the factor
    and response.
  • The random variability

18
Two-Factor Experiment
  • Hypothesis The height of a chemical engineering
    student depends on the students gender and
    whether his/her last name starts with A-L or M-Z.
  • Population All U.S. chemical engineering
    undergraduates
  • Sample Students in ChE 408 W03 at OU
  • Factors (independent variables to be
    investigated) Gender, First Letter of Last Name
  • Response (dependent variable to be investigated)
    Height

19
Two-Factor Experiment
  • Gender appears to have an important effect.
  • Alphabet appears not to have an important effect.
  • How do we quantify these effects?
  • What about interaction?
  • Does alphabet modify the effect of gender?

20
Two-Factor Experiment
Mean
Crossing here doesnt count. There is no gender
after Male.
Page 304 of text shows a plot with interaction.
Lines dont cross so no interaction
21
Model for Two-Factor Experiment with No
Interaction
  • Heightfemale,A-L,i 69.4 in (-5.597 in)
    (-1.014 in) ?I
  • Every model consists of two parts
  • Variability due to predictable relationship
    between response and factors (often the only part
    of the model that is written)
  • Random variability (also called error or
    uncertainty)

Overall mean of all data in sample (best estimate
of true mean of population)
(Mean height of females) (overall mean of
data). (best estimate of main effect of being
female)
Error
(Mean height of A-L) (overall mean of data).
(best estimate of main effect of being A-L)
Write a Comment
User Comments (0)
About PowerShow.com