Title: Matlab Statistics
1Matlab Statistics
- Statistics toolbox
- Probability distributions
- Hypothesis tests
- Linear regression and response surface modeling
- Statistical process control
- Design of experiments
2Statistics Toolbox Capabilities
- Descriptive statistics
- Statistical visualization
- Probability distributions
- Hypothesis tests
- Linear models
- Nonlinear Models
- Multivariate Statistics
- Statistical process control
- Design of experiments
- Hidden Markov Models
3Probability Distributions
- Continuous distributions for data analysis 21
distributions - Includes normal distribution
- Continuous distributions for statistics 6
distributions - Includes chi-square and students t distributions
- Discrete distributions 8 distributions
- Includes binomial and Poisson distributions
- Multivariable distributions 10 distributions
- Includes multivariable extension of normal
distribution - Each distribution has functions for
- pdf Probability density function
- cdf Cumulative distribution function
- inv Inverse cumulative distribution
- functionsstat Distribution statistics function
- fit Distribution fitting function
- like Negative log-likelihood function
- rnd Random number generator
4Normal Distribution Functions
- normpdf probability distribution function
- normcdf cumulative distribution function
- norminv inverse cumulative distribution
function - normstat mean and variance
- normfit parameter estimates and confidence
intervals for normally distributed data - normlike negative log-likelihood for maximum
likelihood estimation - normrnd random numbers from normal distribution
5Confidence Intervals Example
- Polymer molecular weight (scaled by 10-5)
- gtgt muhat,sigmahat,muci,sigmaci
normfit(data,alpha) - data vector or matrix of data
- alpha confidence level 1-alpha
- muhat estimated mean
- sigmahat estimated standard deviation
- muci confidence interval on the mean
- sigmaci confidence interval on the standard
deviation - gtgt x 1.25 1.36 1.22 1.19 1.33 1.12 1.27 1.27
1.31 1.26 - gtgt muhat,sigmahat,muci,sigmaci
normfit(x,0.05) - muhat 1.2580
- sigmahat 0.0697
- muci 1.2081
- 1.3079
- sigmaci 0.0480
- 0.1273
6Hypothesis Tests
- 17 hypothesis tests available
- chi2gof chi-square goodness-of-fit test. Tests
if a sample comes from a specified distribution,
against the alternative that it does not come
from that distribution. - ttest one-sample or paired-sample t-test. Tests
if a sample comes from a normal distribution with
unknown variance and a specified mean, against
the alternative that it does not have that mean. - vartest one-sample chi-square variance test.
Tests if a sample comes from a normal
distribution with specified variance, against the
alternative that it comes from a normal
distribution with a different variance.
7Mean Hypothesis Test Example
- gtgt h ttest(data,m,alpha,tail)
- data vector or matrix of data
- m expected mean
- alpha confidence level 1-alpha
- Tail left (left handed alternative), right
(left handed alternative) or both (two-sided
alternative) - h 1 (reject hypothesis) or 0 (accept
hypothesis) - Measurements of polymer molecular weight
- Hypothesis m0 1.3 instead of m1 1.2
- gtgt h ttest(x,1.3,0.05,'left')
- h 1
8Variance Hypothesis Test Example
- gtgt h vartest(data,v,alpha,tail)
- data vector or matrix of data
- v expected variance
- alpha confidence level 1-alpha
- Tail left (left handed alternative), right
(left handed alternative) or both (two-sided
alternative) - h 1 (reject hypothesis) or 0 (accept
hypothesis) - Hypothesis s2 0.0049 and not a different
variance - gtgt h vartest(x,0.0049,0.05,'both')
- h 0
- gtgt h vartest(x,0.0049,0.9,'both')
- h 1
9Goodness of Fit Example
- gtgt h chi2gof(data)
- data vector or matrix of data
- h 1 (reject hypothesis) or 0 (accept
hypothesis) - Default values alpha 0.05 nbins 10
- gtgt h chi2gof(x)
- Warning After pooling, some bins still have
low expected counts. The chi-square
approximation may not be accurate. - gtgt x90x-0.2 x-0.15 x-0.1 x-0.05 x x0.05 x0.1
x0.15 x0.2 - gtgt h chi2gof(x90)
- h 0
10Linear Models
- Linear regression
- Multiple linear regression build linear models
between a group of input variables (factors) and
an output variable (response) - Quadratic response surface models build
response surface models between a group of input
variables (factors) and an output variable
(response) - Stepwise regression determine the most
significant terms (linear, interaction,
quadratic) to include in a regression model - Generalized linear models techniques for
nonlinear models linear in the unknown parameters - Robust and nonparametric methods techniques
that are insensitive to data outliers (robust) or
do not assume any underlying distribution
(nonparametric) - Analysis of variance determine whether data
from several groups have a common mean
11Linear Regression Example
- gtgt b,bint regress(y,x)
- x vector or matrix of input values
- y vector of output values
- b linear model slope and intercept
- bint 95 confidence limits on the slope and
intercept - Reaction rate data
- gtgt x 0.1 0.3 0.5 0.7 0.9 1.2 1.5 2.0
- gtgt x x ones(size(x))
- gtgt y 2.3 5.7 10.7 13.1 18.5 25.4 32.1 45.2
- gtgt b,bint regress(y,x)
- b 22.5315
- -1.1533
- bint 21.0040 24.0590
- -2.8038 0.4972
12Response Surface Model Example
- gtgt rstool(x,y,model)
- x vector or matrix of input values
- y vector or matrix of output values
- model linear (constant and linear terms),
interaction (linear model plus interaction
terms), quadratic (interaction model plus
quadratic terms), pure quadratic (quadratic
model minus interaction terms) - Creates graphical user interface for model
analysis - VLE data liquid composition held constant
- x 300 1 275 1 250 1 300 0.75 275 0.75 250
0.75 300 1.25 275 1.25 250 1.25 - y 0.75 0.77 0.73 0.81 0.80 0.76 0.72
0.74 0.71
13Response Surface Model Example cont.
- gtgt rstool(x,y,'linear')
- gtgt beta 0.7411 (bias)
- 0.0005 (T)
- -0.1333 (P)
- gtgt rstool(x,y,'interaction')
- gtgt beta2 0.3011 (bias)
- 0.0021 (T)
- 0.3067 (P)
- -0.0016 (TP)
- gtgt rstool(x,y,'quadratic')
- gtgt beta3 -2.4044 (bias)
- 0.0227 (T)
- 0.0933 (P)
- -0.0016 (TP)
- -0.0000 (TT)
- 0.1067 (PP)
14Statistical Process Control
- Can produce a wide variety of quality control
charts - gtgt controlchart(data,param1,val1,param2,val2,...)
- data data matrix with each row a subgroup of
measurements containing replicate observations
taken at the same time and the rows in time
order. - param, val matching sets of adjustable
parameters and their values - charttype xbar (mean, default), s
(standard deviation) or i (individual
observation) - nsigma number of sigma multiples from the
center line (default 3) - Many other options available
15Control Chart Example
- gtgt load parts
- gtgt who
- Your variables are
- runout
- gtgt controlchart(runout)
- gtgt controlchart(runout,'chart','s')
16Design of Experiments
- Full factorial designs
- Fractional factorial designs
- Response surface designs
- Central composite designs
- Box-Behnken designs
- D-optimal designs minimize the volume of the
confidence ellipsoid of the regression estimates
of the linear model parameters
17Full Factorial Designs
- gtgt d fullfact(L1,,Lk)
- L1 number of levels for first factor
- Lk number of levels for last (kth) factor
- d design matrix
- gtgt d ff2n(k)
- k number of factors
- d design matrix for two levels
- gtgt d ff2n(3)
- d
- 0 0 0
- 0 0 1
- 0 1 0
- 0 1 1
- 1 0 0
- 1 0 1
- 1 1 0
- 1 1 1
18Fractional Factorial Designs
- gtgt d,conf fracfact(gen)
- gen generator string for the design
- d design matrix
- conf cell array that describes the confounding
pattern - gtgt x,conf fracfact('a b c abc')
- x
- -1 -1 -1 -1
- -1 -1 1 1
- -1 1 -1 1
- -1 1 1 -1
- 1 -1 -1 1
- 1 -1 1 -1
- 1 1 -1 -1
- 1 1 1 1
- conf
- 'Term' 'Generator' 'Confounding'
- 'X1' 'a' 'X1'
- 'X2' 'b' 'X2'
- 'X3' 'c' 'X3'
19Fractional Factorial Designs cont.
- gtgt gens fracfactgen(model,K,res)
- model string containing terms that must be
estimable in the design - K 2K total experiments in the design
- res resolution of the design
- gen generator string for use in fracfact
- gtgt fracfactgen('a b c d e f g',4,4)
- ans
- 'a'
- 'b'
- 'c'
- 'd'
- 'bcd'
- 'acd'
- 'abd'
20Central Composite Designs
- gtgt d ccdesign(nfactors,param1,val1,param2,val2,.
..) - nfactors number of factors
- d design matrix
- param, val matching sets of adjustable
parameters and their values - fraction fraction of full factorial design for
cube portion expressed as an exponent of ½ 0
(full, default), 1 (½ design), 2 (¼ design) - type inscribed, circumscribed (default),
or faced - center' number of center points m (force m
center points), orthogonal (achieve orthogonal
design, default), uniform (achieve uniform
precision)
21Central Composite Designs cont.
- gtgt d ccdesign(2)
- d
- -1.0000 -1.0000
- -1.0000 1.0000
- 1.0000 -1.0000
- 1.0000 1.0000
- -1.4142 0
- 1.4142 0
- 0 -1.4142
- 0 1.4142
- 0 0
- 0 0
- 0 0
- 0 0
- 0 0
- 0 0
- 0 0
- 0 0
- gtgt d ccdesign(5,'fraction',1,'type','inscribed',
'center',3) - d
- -0.5000 -0.5000 -0.5000 -0.5000
0.5000 - -0.5000 -0.5000 -0.5000 0.5000
-0.5000 - -0.5000 -0.5000 0.5000 -0.5000
-0.5000 - -0.5000 -0.5000 0.5000 0.5000
0.5000 - -0.5000 0.5000 -0.5000 -0.5000
-0.5000 - -0.5000 0.5000 -0.5000 0.5000
0.5000 - -0.5000 0.5000 0.5000 -0.5000
0.5000 - -0.5000 0.5000 0.5000 0.5000
-0.5000 - 0.5000 -0.5000 -0.5000 -0.5000
-0.5000 - 0.5000 -0.5000 -0.5000 0.5000
0.5000 - 0.5000 -0.5000 0.5000 -0.5000
0.5000 - 0.5000 -0.5000 0.5000 0.5000
-0.5000 - 0.5000 0.5000 -0.5000 -0.5000
0.5000 - 0.5000 0.5000 -0.5000 0.5000
-0.5000 - 0.5000 0.5000 0.5000 -0.5000
-0.5000 - 0.5000 0.5000 0.5000 0.5000
0.5000 - -1.0000 0 0
0 0
22Box-Behnken Designs
- gtgt d bbdesign(nfactors)
- nfactors number of factors
- d design matrix
- gtgt d bbdesign(3)
- d
- -1 -1 0
- -1 1 0
- 1 -1 0
- 1 1 0
- -1 0 -1
- -1 0 1
- 1 0 -1
- 1 0 1
- 0 -1 -1
- 0 -1 1
- 0 1 -1
- 0 1 1
- 0 0 0
- 0 0 0