Matlab Training Session 12: Statistics II - PowerPoint PPT Presentation

1 / 45
About This Presentation
Title:

Matlab Training Session 12: Statistics II

Description:

Matlab Training Session 12: Statistics II Course Website: http://www.queensu.ca/neurosci/Matlab Training Sessions.htm Course Outline Term 1 Introduction to Matlab and ... – PowerPoint PPT presentation

Number of Views:213
Avg rating:3.0/5.0
Slides: 46
Provided by: DML92
Category:

less

Transcript and Presenter's Notes

Title: Matlab Training Session 12: Statistics II


1
Matlab Training Session 12Statistics II
Course Website http//www.queensu.ca/neurosci/Mat
lab Training Sessions.htm
2
  • Course Outline
  • Term 1
  • Introduction to Matlab and its Interface
  • Fundamentals (Operators)
  • Fundamentals (Flow)
  • Importing Data
  • Functions and M-Files
  • Plotting (2D and 3D)
  • Plotting (2D and 3D)
  • Statistical Tools in Matlab
  • Term 2
  • 9. Term 1 review
  • 10. Loading Binary Data
  • 11. Nonlinear Curve Fitting
  • 12. Statistical Tools in Matlab II
  • 13.
  • 14.

3
  • Week 12 Lecture Outline
  • Statistics II
  • Basic Matlab Statistics Review
  • Mean, Median, Variance
  • Statistics Toolbox
  • Simple Parametric and Non-parametric statistical
    tests
  • Simple Statistical Plotting
  • Histograms
  • Box Plots
  • D. Anovas
  • 1 Way Unrelated Design
  • Post Hoc vs A Priori Comparisons
  • N-Way Anovas
  • Related (Repeated Measures) Design
  • Unrelated (Between Groups) Design

4
  • Week 12 Lecture Outline
  • Required Toolboxes
  • Statistics Toolbox

5
  • Week 12 Lecture Outline
  • Statistics II
  • Part A Basic Matlab Statistics Review

6
Part A Basics
  • The Matlab installation contains basic
    statistical tools.
  • Including, mean, median, standard deviation,
    error variance, and correlations
  • More advanced statistics are available from the
    statistics toolbox and include parametric and
    non-parametric comparisons, analysis of variance
    and curve fitting tools

7
Mean and Median
Mean Average or mean value of a
distribution Median Middle value of a sorted
distribution M mean(A), M median(A) M
mean(A,dim), M median(A,dim) M mean(A), M
median(A) Returns the mean or median value of
vector A. If A is a multidimensional mean/median
returns an array of mean values. Example A
0 2 5 7 20 B 1 2 3
3 3 6 4 6 8 4 7 7
mean(A) 6.8 mean(B) 3.0000 4.5000 6.0000
(column-wise mean) mean(B,2) 2.0000 4.0000
6.0000 6.0000 (row-wise mean)
8
Mean and Median
Examples A 0 2 5 7 20 B 1 2 3
3 3 6 4 6
8 4 7 7 Mean mean(A) 6.8 mean(B)
3.0 4.5 6.0 (column-wise mean) mean(B,2) 2.0
4.0 6.0 6.0 (row-wise mean) Median median(A)
5 median(B) 3.5 4.5 6.5 (column-wise
median) median(B,2) 2.0
3.0 6.0
7.0 (row-wise median)
9
Standard Deviation and Variance
  • Standard deviation is calculated using the std()
    function
  • std(X) Calcuate the standard deviation of
    vector x
  • If x is a matrix, std() will return the standard
    deviation of each column
  • Variance (defined as the square of the standard
    deviation) is calculated using the var() function
  • var(X) Calcuate the variance of vector x
  • If x is a matrix, var() will return the standard
    deviation of each column

10
Standard Error of the Mean
  • Often the most appropriate measure of
    error/variance is the standard error of the mean
  • Matlab does not contain a standard error function
    so it is useful to create your own.
  • The standard error of the mean is defined as the
    standard deviation divided by the square root of
    the number of samples

11
  • Week 12 Lecture Outline
  • Statistics II
  • Part B Parametric and Non-parametric
    statistical tests

12
Comparison of Means
  • A wide variety of mathametical methods exist for
    determining whether the means of different groups
    are statistically different
  • Methods for comparing means can be either
    parametric (assumes data is normally distributed)
    or non-parametric (does not assume normal
    distribution)


13
Parametric Tests - TTEST
  • H,P ttest2(X,Y)
  • Determines whether the means from matrices X and
    Y are statistically different.
  • H return a 0 or 1 indicating accept or reject nul
    hypothesis (that the means are the same)
  • P will return the significance level


14
Parametric Tests - TTEST
  • H,P ttest2(X,Y)
  • Determines whether the means from matrices X and
    Y are statistically different.
  • H return a 0 or 1 indicating accept or reject nul
    hypothesis (that the means are the same)
  • P will return the significance level


15
Parametric Tests - TTEST
  • Example
  • For the data from Week 8
  • exercise 3
  • H,P ttest2(var1,var2)
  • gtgt H,P ttest2(var1,var2)
  • H 1
  • P 0.00000000000014877

Variable 1
Variable 2

16
Non-Parametric Tests Ranksum
  • The wilcoxin ranksum test assesses whether the
    means of two groups are statistically different
    from each other.
  • This test is non-parametric and should be used
    when data is not normally distributed
  • Matlab implements the wilcoxin ranksum test using
    the ranksum() function
  • ranksum(X,Y) statistically compares the means of
    two data distributions X and Y


17
Non-Parametric Tests - RankSum
  • Example
  • For the data from week 8
  • exercise 3
  • P,H ranksum(var1,var2)
  • P 1.1431e-014
  • H 1

Variable 1

Variable 2
18
  • Week 12 Lecture Outline
  • Statistics II
  • Part C Simple Statistical Plotting
  • Histograms
  • Box Plots

19
Histograms
  • Histograms are useful for showing the pattern of
    the whole data set
  • Allows the shape of the distribution to be easily
    visualized

20
Histograms
  • Matlab hist(y,m) command will generate a
    frequency histogram of vector y distributed among
    m bins
  • Also can use hist(y,x) where x is a vector
    defining the bin centers
  • Example

gtgtbsin(2pit) gtgthist(b,10) gtgthist(b,-1
-0.75 0 0.25 0.5 0.75 1)
21
Histograms
  • The histc function is a bit more powerful and
    allows bin edges to be defined
  • n, bin histc(x, binrange)
  • x statistical distribution
  • binrange the range of bins to plot eg 1110
  • n the number of elements in each bin from
    vector x
  • bin the bin number each element of x belongs
  • Use the bar function to plot the histogram

22
Histograms
  • The histc function is a bit more powerful and
    allows bin edges to be defined
  • Example
  • gtgt test round(rand(100,1)10)
  • gtgt histc(test,1110)
  • gtgt Bar(test)

23
Box Plots
  • Box plots are useful to graphically display the
    mean and variance of distributions, as well as
    the interquartile range and outliers

24
Box Plots
  • Matlab function boxplot(x) will generate a
    boxplot of the distribution defined by x
  • Example
  • add outlier to test distribution
  • gtgttest(101) 16
  • gtgtboxplot(test)

25
Box Plots
  • The box has lines at the lower quartile, median,
    and upper quartile values.
  • The whiskers are lines extending from each end of
    the box to show the extent of the rest of the
    data.
  • Outliers are data with values beyond the ends of
    the whiskers.
  • If there is no data outside the whisker, a dot is
    placed at the bottom whisker.


26
Box Plots
  • boxplot(X,notch) with notch 1 produces a
    notched-box plot.
  • Notches graph a robust estimate of the
    uncertainty about the means for box-to-box
    comparison. The default, notch 0, produces a
    rectangular box plot.
  • Example
  • gtgttest2 test (rand10)
  • gtgtboxplot(test test2,1)

27
  • Week 12 Lecture Outline
  • Statistics II
  • D. Anovas
  • 1 Way Unrelated Design
  • Post Hoc vs A Priori Comparisons
  • N-Way Anovas
  • Unrelated (Between Groups) Design
  • Related (Repeated Measures) Design

28
Anovas
  • ANOVAs are tests used to make direct comparisons
    between the amount by which sample means vary and
    the amount that values in each sample vary around
    the group means

29
Anovas
  • ANOVAs are tests used to make direct comparisons
    between the amount by which sample means vary and
    the amount that values in each sample vary around
    the group means

30
Anovas
  • Terminology
  • Null Hypothesis Both Means are the same
  • Type I error
  • Reject Null Hypothesis when it is true. Eg Means
    are not actually significantly when p lt 0.05
  • Type II error
  • Accept Null Hypothesis when it is false. Eg
    means are actually significantly different when p
    gt 0.05

31
Anovas
Beta Probability of making type II Error
Alpha Probability of making type I Error
P lt 0.05
32
Anovas
  • Terminology
  • Family Wise Error
  • The probability of making at least 1 family wise
    error while making multiple ANOVA comparisons

33
1 way Anovas
  • The matlab function anova1 calculates a 1 way
    anova
  • p anova1(X) performs a balanced 1-way ANOVA
    comparing the means of the columns of data in the
    matrix X
  • each column must represent an independent
    sample containing m mutually independent
    observations.
  • The function returns the p-value for the null
    hypothesis
  • p anova1(X,group)
  • group Each row of group contains the data
    label for the corresponding column of X

34
1 way Anovas
Assumptions All sample populations are normally
distributed All sample populations have equal
variance All observations are mutually
independent The ANOVA test is known to be robust
to modest violations of the first two assumptions.
35
1 way Anovas
  • The standard ANOVA table divides the variability
    of the data in X into two parts
  • Variability due to the differences among the
    column means (variability between groups)
  • Variability due to the differences between the
    data in each column and the column mean
    (variability within groups)

36
1 way Anovas
  • The ANOVA table has six columns
  • Source of the variability
  • The Sum of Squares (SS) due to each source.
  • The degrees of freedom (df) associated with each
    source.
  • Mean Squares (MS) for each source, which is the
    ratio SS/df.
  • F statistic, which is the ratio of the MS's.
  • The p-value, which is derived from the cdf of F.
    As F increases, the p-value decreases.

37
1 way Anovas
Example 1 The following example comes from a
study of the material strength of structural
beams in Hogg (1987). The vector strength
measures the deflection of a beam in thousandths
of an inch under 3,000 pounds of force. Stronger
beams deflect less. The civil engineer performing
the study wanted to determine whether the
strength of steel beams was equal to the strength
of two more expensive alloys.
38
1 way Anovas
Example 1 Steel is coded 'st' in the vector
alloy. The other materials are coded 'al1' and
'al2'. S strength 82 86 79 83 84 85 86 87 74
82 78 75 76 77 79 ... 79 77 78 82
79 alloy 'st','st','st','st','st','st','st',
'st',... 'al1','al1','al1','al1','al1','a
l1',... 'al2','al2','al2','al2','al2','al
2' Though alloy is sorted in this example, you
do not need to sort the grouping variable.
39
1 way Anovas
Solution p anova1(strength,alloy) p
1.5264e-004 The p-value indicates that the
three alloys are significantly different. The box
plot confirms this graphically and shows that the
steel beams deflect more than the more expensive
alloys.
40
1 way Anovas
41
Post Hoc and A Priori Comparisons
  • If a 1 way anova test indicates a significant
    difference between at least on mean
  • Post Hoc Comparisons The decision to compare
    means after a significant 1 way anova is
    caluculated. When all possible comparisons are
    made after the fact the changes of type 1 error
    become high.
  • A Priori Comparisons Comparisons decided upon
    before the 1 way anova is performed based on the
    general theory of the study. This minimizes
    possible type I error.

42
N-way Anovas
  • Unrelated (Between Groups) Design
  • p anovan(X,group) performs a balanced or
    unbalanced mult way ANOVA for comparing the means
    of the observations in vector X with respect to N
    different factors.
  • The factors and factor levels of the observations
    in X are assigned by the cell array group.
  • Each of the N cells in group contains a list of
    factor levels identifying the observations in X
    with respect to one of the N factors.
  • The list within each cell can be a vector,
    character array, or cell array of strings, and
    must have the same number of elements as X.

43
N-way Anovas
  • Related (Repeated Measures) Design
  • NOT IMPLEMENTED IN THE STATISTICS TOOLBOX!!

44
Exercise
  • Load testdata2.txt from week 8
  • Assume the data columns represent independent
    normally distributed variables
  • Perform a 1 way ANOVA on the data and interpret
    the results

45
Getting Help
  • Help and Documentation
  • Digital
  • Accessible Help from the Matlab Start Menu
  • Updated online help from the Matlab Mathworks
    website
  • http//www.mathworks.com/access/helpdesk/help/tech
    doc/matlab.html
  • Matlab command prompt function lookup
  • Built in Demos
  • Websites
  • Hard Copy
  • Books, Guides, Reference
  • The Student Edition of Matlab pub. Mathworks Inc.
Write a Comment
User Comments (0)
About PowerShow.com