Title: 2DS00
12DS00
- Statistics 1 for Chemical Engineering
2Lecturers
- Dr. A. Di Bucchianico
- Department of Mathematics,
- Statistics group
- HG 9.24
- phone (040) 247 2902
- a.d.bucchianico_at_tue.nl
- Ir. G.D. Mooiweer,
- Department of Mathematics
- ICTOO
- HG 9.12
- phone 040 247 4277 (Thursdays)
- g.d.mooiweer_at_tue.nl
- Dr. R.W. van der Hofstad
- Department of Mathematics,
- Statistics group
- HG 9.04
- phone (040) 247 2910
- rhofstad_at_win.tue.nl
3Goals of this course
- to prepare students for (first-year) laboratory
assignments - to learn students how to perform basic
statistical analyses of experiments - to learn students how to use software for data
analysis - to learn students how to avoid pitfalls in
analysing measurements
4Important to remember
- Web site for this course www.win.tue.nl/sandro/
2DS00/ - No textbook, but handouts (Word) Powerpoint
sheets through web site - Bring notebook to both lectures and self-study
- (Optional) buy lecture notes 2256 Statgraphics
voor regulier onderwijs - (Optional) buy lectures notes 2218 Statistisch
Compendium
5How to study
- read lecture notes briefly before lecture
- ask questions during lecture
- study lecture notes carefully after lecture
- make excercises during guided self-study
- reread lecture notes after guided self-study
- try out previous examinations shortly before the
examination - N.B. Lecture notes (pdf documents) ? PowerPoint
files
6Week schedule
- Week 1 Measurement and statistics
- Week 2 Error propagation
- Week 3 Simple linear regression analysis
- Week 4 Multiple linear regression analysis
- Week 5 Nonlinear regression analysis
7Detailed contents of week 1
- measurement errors
- graphical displays of data
- summary statistics
- normal distribution
- confidence intervals
- hypothesis testing
8Measurements and statistics
- perfect measurements do not exist
- possible sources of measurement errors
- reading
- environment
- temperature
- humidity
- ...
- impurities
- ...
9Necessity of good measurement system
10Three experiments
11Types of measurement errors
- Random errors
- always present
- reduce influence by averaging repeated
measurements - Systematic errors
- requires adjustment/repair of measuring devices
- Outliers
- recording errors
- mistakes in applying procedures
12Illustration of measurement concepts
13Accuracy
difference between average of measured values and
true value
14Accuracy
- relates to systematic errors
- absolute error
- relative error
15Location statistics
- mean
- median
- trimmed means
16Precision
- the degree in which consistent results are
obtained
17Accurate and precise
18Statistics for precision standard deviation co
- standard deviation
-
- standard error
- variation coefficient
- variance
-
- range
19Robust statistics for precision
- robust statistics
- less sensitive to outliers
- difficult mathematical theory
- requires use of statistical software
- interquartile range
- IQR 75 quantile 25 quantile 3rd quartile
1st quartile - mean absolute deviation
20Graphical displays
- always make graphical displays for first
impression - one picture says more than 1000 words
2 3.1 4 1.9 2.8
21Basic graphical displays
- scatter plot
- watch out for scale (automatic resizing)
- time sequence plot
- for detecting time effects like warming up
- Box-and-Whisker plot
- outliers
- quartiles
- skewness
22Time sequence plot
23Box-and-Whisker plot
24(No Transcript)
25Probability theory
- (cumulative) distribution function
- density
- density to distribution function
26The concept of probability density
density function
a
b
area denotes probability that observation falls
between a and b
27Normal distribution
28Normal distribution
- bell shaped curve
- Important because of Central Limit Theorem
- Normal distribution
- symmetric around µ (location of centre)
- spread parametrised by ?2
- http//www.win.tue.nl/marko/statApplets/function
Plots.html - http//www-stat.stanford.edu/naras/jsm/NormalDen
sity/NormalDensity.html - µ0 and ?21 standard normal distribution Z
29More on normal distribution
- Area between
- ? ? 0,67? is 0,500
- ? ? 1,00? is 0,683? ? 1,645? is 0,975
- ? ? 1,96? is 0,950
- ? ? 2,00? is 0,954
- ? ? 2,33? is 0,980? ? 2,58? is 0,990
- ? ? 3,00? is 0,997
30Standardisation
- X normally distributed with parameters ? en ?2,
then (X-?)/? standard normal - suppose
- ?3
- ?24
31Testing normality
- many statistical procedures implicitly assume
normality - if data are not normally distributed, then
outcome of procedure may be completely wrong - user is always responsible for checking
assumptions of statistical procedures - Graphical checks
- normal probability plot
- density trace
- Formal check
- Shapiro-Wilks test
32Estimation of density function histogram
curve normal distribution with sample mean and
variance as parameters
33Drawbacks of the histogram
- misused for investigating normality
- time ordering of data is lost
- shape depends heavily on bin width bin location
Histogram for strength
5
4
same data set
3
frequency
2
1
0
24
29
34
39
44
49
54
strength
- shape is stable for data sets of size 75 or
larger - optimal number of bins ??n
34Alternative to histogram Density Trace
- Density Trace (also called naive density
estimator) - use moving bins instead of fixed bins
- choose bin width (automatically in Statgraphics)
- count number of observations in bin at each
point - divide by length of bin
35Density Trace
4/9
3/9
2/9
1/9
1
2
3
4
5
6
36Choice of bin widths in density trace
- too small bin width yields too fluctuating curve
- too large bin width yields too smooth curve
37Patterns in distribution normal curve
- Depicted by a bell-shaped curve
- Indicates that measurement process is running
normally
38Patterns in distribution bi-modal curve
- Distribution appears to have two peaks
- May indicate that data from more than process
are mixed together
39Patterns in distribution saw-toothed
- Also commonly referred to as a comb distribution,
appears as an alternating jagged pattern - Often indicates a measuring problem
- improper gauge readings
- gauge not sensitive enough for readings
40Testing normality
41Normal Probability Plot
42Normally distributed?
43Normal Probability Plot of not normally
distributed data
44Test for Normality Shapiro-Wilks
- statistical test for Normality Shapiro-Wilks
- idea sophisticated regression analysis in the
spirit of normal probability plot - makes Normal Probability Plot objective
- check outliers (measurement error? normality
sometimes disturbed by single observation) - analyse if not normally distributed
45Statgraphics Shapiro Wilks
Tests for Normality for width Computed
Chi-Square goodness-of-fit statistic
254.667 P-Value 0.0 Shapiro-Wilks W statistic
0.921395 P-Value 0.000722338
- Interpretation
- value statistic itself cannot need be
interpreted - P-value indicates how likely normal distribution
is - use ? 0.01 as critical value in order to avoid
too strict rejections of normality
46Dixons test
- Box-and-Whisker plot graphical test of outliers
- if data are normally distributed, then formal
test may be used
47Disadvantages of point estimators
48Confidence intervals
- 95 confidence interval for µ probability 0.95
that interval contains true value µ - more observations ? narrower interval (effect in
particular for n lt 20) - higher confidence ? wider interval
- example ?0,05 ?
49Confidence intervals example
50Hypothesis testing
- example test whether there is a systematic
error
Hypothesis Tests for meting Sample mean
4.994 Sample median 5.01 t-test ------ Null
hypothesis mean 5.0 Alternative not
equal Computed t statistic -0.155011 P-Value
0.880233 Do not reject the null hypothesis for
alpha 0.05.