Title: Matlab Training Sessions 8: Introduction to Statistics
1Matlab Training Sessions 8Introduction to
Statistics
2- Course Outline
- Weeks
- Introduction to Matlab and its Interface (Jan 13
2009) - Fundamentals (Operators)
- Fundamentals (Flow)
- Functions and M-Files
- Importing Data
- Plotting (2D and 3D)
- Plotting (2D and 3D)
- Statistical Tools in Matlab
- Additional classes will begin next week (Feb 10
2009) and will continue from where the first 8
sessions left off. - These sessions will be run by Andrew Pruszynski
(4jap1_at_qlink.queensu.ca) - Course Website
- http//www.queensu.ca/neurosci/matlab.php
3- Week 8 Lecture Outline
- Basic Matlab Statistics
- Basic Matlab Statistics
- Mean, Median, Variance
- Correlations
- Statistics Toolbox
- Parametric and Non-parametric statistical tests
- Curve fitting
4Part A Basics
- The Matlab installation contains basic
statistical tools. - Including, mean, median, standard deviation,
error variance, and correlations - More advanced statistics are available from the
statistics toolbox and include parametric and
non-parametric comparisons, analysis of variance
and curve fitting tools
5Mean and Median
Mean Average or mean value of a
distribution Median Middle value of a sorted
distribution M mean(A), M median(A) M
mean(A,dim), M median(A,dim) M mean(A), M
median(A) Returns the mean or median value of
vector A. If A is a multidimensional mean/median
returns an array of mean values. Example A
0 2 5 7 20 B 1 2 3
3 3 6 4 6 8 4 7 7
mean(A) 6.8 mean(B) 3.0000 4.5000 6.0000
(column-wise mean) mean(B,2) 2.0000 4.0000
6.0000 6.0000 (row-wise mean)
6Mean and Median
Examples A 0 2 5 7 20 B 1 2 3
3 3 6 4 6
8 4 7 7 Mean mean(A) 6.8 mean(B)
3.0 4.5 6.0 (column-wise mean) mean(B,2) 2.0
4.0 6.0 6.0 (row-wise mean) Median median(A)
5 median(B) 3.5 4.5 6.5 (column-wise
median) median(B,2) 2.0
3.0 6.0
7.0 (row-wise median)
7Standard Deviation and Variance
- Standard deviation is calculated using the std()
function - std(X) Calcuate the standard deviation of
vector x - If x is a matrix, std() will return the standard
deviation of each column - Variance (defined as the square of the standard
deviation) is calculated using the var() function - var(X) Calcuate the variance of vector x
- If x is a matrix, var() will return the standard
deviation of each column
8Standard Error of the Mean
- Often the most appropriate measure of
error/variance is the standard error of the mean - Matlab does not contain a standard error function
so it is useful to create your own. - The standard error of the mean is defined as the
standard deviation divided by the square root of
the number of samples
9Standard Error of the Mean
- In Class Exercise 1
- Create a function called se that calculates the
standard error of some vector supplied to the
function - Eg. se(x) should return the standard error of
matrix x
10Standard Error of the Mean
- In Class Exercise 1 Solution
- function result se(input_vect)
- result STD(input_vect)/sqrt(length(input_vec
t)) - return
11In Class Exercise 2
- From the class website download the file
testdata1.txt (http//www.queensu.ca/neurosci/matl
ab.php) - This text file contains data from two subjects
arranged in columns - Load the text file into matlab using any method
you like (load, import, textread(), fscanf()) - Calculate the mean and standard error for each
subject - In figure 1, plot the data distribution for each
subject using the hist() plotting function - In figure 2, plot the mean and standard error of
each subject using a bar graph (bar() function
and errorbar() functions).
12In Class Exercise 2Solution
read data subj1, subj2 textread('testdata1.t
xt','ff','headerlines',1) plot distributions
of each subject figure(1) hold on subplot(2,1,1) h
ist(subj1) subplot(2,1,2) hist(subj2) plot mean
and standard error on bar graph figure(2) hold
on bar(1,2,mean(subj1),mean(subj2)) errorbar(
1,2,mean(subj1),mean(subj2),se(subj1),
se(subj2),'r')
13In Class Exercise 2Solution
Subject 1
Subject 2
Subject 1
Subject 2
14Data Correlations
- Matlab can calculate statistical correlations
using the corrcoef() function - R,P corrcoef(A,B)
- Calculates a matrix of R correlation
coefficiencts and P significance values (95
confidence intervals) for variables A and B - A B
- R A AcorA BcorA
- B AcorB BcorB
-
15Data Correlations
- Matlab can calculate statistical correlations
using the corrcoef() function - R,P corrcoef(A,B)
- Calculates a matrix of R correlation
coefficiencts and P significance values (95
confidence intervals) for variables A and B - A B
- R A AcorA BcorA 1
BcorA - B AcorB BcorB AcorB
1 -
16Data Correlations
- Matlab can calculate statistical correlations
using the corrcoef() function - R,P corrcoef(A,B)
- Calculates a matrix of R correlation
coefficiencts and P significance values (95
confidence intervals) for variables A and B - A B
- R A AcorA BcorA 1
BcorA - B AcorB BcorB AcorB
1 - A B
- P A sig(AcorA) sig(BcorA)
1 sig(BcorA) - B sig(AcorB) sig(BcorB)
sig(AcorB) 1
17Data Correlations
Variable 1
Variable 2
18Data Correlations
Variable 1
Variable 2
19Data Correlations
Compute sample correlation r, p
corrcoef(var1,var2)
Variable 1
Variable 2
20Data Correlations
Compute sample correlation r, p
corrcoef(var1,var2) r 1.0000 0.7051
0.7051 1.0000 p 1.0000 0.0000
0.0000 1.0000
Variable 1
Variable 2
21In Class Exercise 3
- From the class website download the file
testdata2.txt (http//www.queensu.ca/neurosci/matl
ab.php) - This text file contains data from variables
arranged in columns - Load the text file into matlab using any method
you like (load, import, textread(), fscanf()) - Plot the data points
- Calculate the Correlation
22In Class Exercise 3Solution
read data var1, var2 textread('testdata2.txt'
,'ff','headerlines',1) Compute sample
correlation r corrcoef(var1,var2) Plot
data points figure(1) plot(var1,var2,'ro')
Variable 1
Variable 2
23Part B Statistics Toolbox
- The Statistics tool box contains a large array of
statistical tools. - This lecture will concentrate on some of the most
commonly used statistics for research - Parametric and non-parametric comparisons
- Curve Fitting
24Comparison of Means
- A wide variety of mathametical methods exist for
determining whether the means of different groups
are statistically different - Methods for comparing means can be either
parametric (assumes data is normally distributed)
or non-parametric (does not assume normal
distribution)
25Parametric Tests - TTEST
- H,P ttest2(X,Y)
- Determines whether the means from matrices X and
Y are statistically different. - H return a 0 or 1 indicating accept or reject nul
hypothesis (that the means are the same) - P will return the significance level
26Parametric Tests - TTEST
- H,P ttest2(X,Y)
- Determines whether the means from matrices X and
Y are statistically different. - H return a 0 or 1 indicating accept or reject nul
hypothesis (that the means are the same) - P will return the significance level
27Parametric Tests - TTEST
- Example
- For the data from exercise 3
- H,P ttest2(var1,var2)
- gtgt H,P ttest2(var1,var2)
- H 1
- P 0.00000000000014877
Variable 1
Variable 2
28Non-Parametric Tests Ranksum
- The wilcoxin ranksum test assesses whether the
means of two groups are statistically different
from each other. - This test is non-parametric and should be used
when data is not normally distributed - Matlab implements the wilcoxin ranksum test using
the ranksum() function - ranksum(X,Y) statistically compares the means of
two data distributions X and Y
29Non-Parametric Tests - RankSum
- Example
- For the data from exercise 3
- P,H ranksum(var1,var2)
- P 1.1431e-014
- H 1
Variable 1
Variable 2
30Curve Fitting
- Plotting a line of best fit in Matlab can be
performed using either a traditional least
squares fit or a robust fitting method.
12
10
8
6
Least squares
4
Robust
2
0
-2
1
2
3
4
5
6
7
8
9
10
31Curve Fitting
- A least squares linear fit minimizes the square
of the distance between every data point and the
line of best fit - polyfit(X,Y,N) finds the coefficients of a
polynomial P(X) of degree N that fits the data - Uses least-square minimization
- N 1 (linear fit)
- P polyfit(X,Y,N) returns P, a matrix
containing the slope and the x intercept for a
linear fit - Y polyval(P,X) calculates the Y values for
every X point on the line of best fit
32Curve Fitting
- Example
- Draw a line of best fit using least squares
approximation for the data in exercise 2 - var1, var2 textread('testdata2.txt','ff','he
aderlines',1) - P polyfit(var1,var2,1)
- Y polyval(P,var1)
- close all
- figure(1)
- hold on
- plot(var1,var2,'ro')
- plot(var1,Y)
33Curve Fitting
- A least squares linear fit minimizes the square
of the distance between every data point and the
line of best fit - P robustfit(X,Y) returns the vector B of the y
intercept and slope, obtained by performing
robust linear fit
34Curve Fitting
- Example
- Draw a line of best fit using robust fit
approximation for the data in exercise 2 - var1, var2 textread('testdata2.txt','ff','he
aderlines',1) - P robustfit(var1,var2,1)
- Y polyval(P(2),P(1),var1)
- close all
- figure(1)
- hold on
- plot(var1,var2,'ro')
- plot(var1,Y)
35Ideas for Next Term?
- Additional Statistics, ANOVAs ect..
- Curve fitting with quadratic functions and cubic
splines - Algorithms and Data structures
- Improving Program Execution Time
- Assistance Tutorials for individual programming
problems - Any Suggestions?
36Getting Help
- Help and Documentation
- Digital
- Accessible Help from the Matlab Start Menu
- Updated online help from the Matlab Mathworks
website - http//www.mathworks.com/access/helpdesk/help/tech
doc/matlab.html - Matlab command prompt function lookup
- Built in Demos
- Websites
- Hard Copy
- Books, Guides, Reference
- The Student Edition of Matlab pub. Mathworks Inc.