Matlab Training Sessions 8: Introduction to Statistics - PowerPoint PPT Presentation

1 / 36
About This Presentation
Title:

Matlab Training Sessions 8: Introduction to Statistics

Description:

mean(B) = 3.0000 4.5000 6.0000 (column-wise mean) ... This lecture will concentrate on some of the most commonly used statistics for research ... – PowerPoint PPT presentation

Number of Views:755
Avg rating:3.0/5.0
Slides: 37
Provided by: DML92
Category:

less

Transcript and Presenter's Notes

Title: Matlab Training Sessions 8: Introduction to Statistics


1
Matlab Training Sessions 8Introduction to
Statistics
2
  • Course Outline
  • Weeks
  • Introduction to Matlab and its Interface (Jan 13
    2009)
  • Fundamentals (Operators)
  • Fundamentals (Flow)
  • Functions and M-Files
  • Importing Data
  • Plotting (2D and 3D)
  • Plotting (2D and 3D)
  • Statistical Tools in Matlab
  • Additional classes will begin next week (Feb 10
    2009) and will continue from where the first 8
    sessions left off.
  • These sessions will be run by Andrew Pruszynski
    (4jap1_at_qlink.queensu.ca)
  • Course Website
  • http//www.queensu.ca/neurosci/matlab.php

3
  • Week 8 Lecture Outline
  • Basic Matlab Statistics
  • Basic Matlab Statistics
  • Mean, Median, Variance
  • Correlations
  • Statistics Toolbox
  • Parametric and Non-parametric statistical tests
  • Curve fitting

4
Part A Basics
  • The Matlab installation contains basic
    statistical tools.
  • Including, mean, median, standard deviation,
    error variance, and correlations
  • More advanced statistics are available from the
    statistics toolbox and include parametric and
    non-parametric comparisons, analysis of variance
    and curve fitting tools

5
Mean and Median
Mean Average or mean value of a
distribution Median Middle value of a sorted
distribution M mean(A), M median(A) M
mean(A,dim), M median(A,dim) M mean(A), M
median(A) Returns the mean or median value of
vector A. If A is a multidimensional mean/median
returns an array of mean values. Example A
0 2 5 7 20 B 1 2 3
3 3 6 4 6 8 4 7 7
mean(A) 6.8 mean(B) 3.0000 4.5000 6.0000
(column-wise mean) mean(B,2) 2.0000 4.0000
6.0000 6.0000 (row-wise mean)
6
Mean and Median
Examples A 0 2 5 7 20 B 1 2 3
3 3 6 4 6
8 4 7 7 Mean mean(A) 6.8 mean(B)
3.0 4.5 6.0 (column-wise mean) mean(B,2) 2.0
4.0 6.0 6.0 (row-wise mean) Median median(A)
5 median(B) 3.5 4.5 6.5 (column-wise
median) median(B,2) 2.0
3.0 6.0
7.0 (row-wise median)
7
Standard Deviation and Variance
  • Standard deviation is calculated using the std()
    function
  • std(X) Calcuate the standard deviation of
    vector x
  • If x is a matrix, std() will return the standard
    deviation of each column
  • Variance (defined as the square of the standard
    deviation) is calculated using the var() function
  • var(X) Calcuate the variance of vector x
  • If x is a matrix, var() will return the standard
    deviation of each column

8
Standard Error of the Mean
  • Often the most appropriate measure of
    error/variance is the standard error of the mean
  • Matlab does not contain a standard error function
    so it is useful to create your own.
  • The standard error of the mean is defined as the
    standard deviation divided by the square root of
    the number of samples

9
Standard Error of the Mean
  • In Class Exercise 1
  • Create a function called se that calculates the
    standard error of some vector supplied to the
    function
  • Eg. se(x) should return the standard error of
    matrix x

10
Standard Error of the Mean
  • In Class Exercise 1 Solution
  • function result se(input_vect)
  • result STD(input_vect)/sqrt(length(input_vec
    t))
  • return

11
In Class Exercise 2
  • From the class website download the file
    testdata1.txt (http//www.queensu.ca/neurosci/matl
    ab.php)
  • This text file contains data from two subjects
    arranged in columns
  • Load the text file into matlab using any method
    you like (load, import, textread(), fscanf())
  • Calculate the mean and standard error for each
    subject
  • In figure 1, plot the data distribution for each
    subject using the hist() plotting function
  • In figure 2, plot the mean and standard error of
    each subject using a bar graph (bar() function
    and errorbar() functions).

12
In Class Exercise 2Solution
read data subj1, subj2 textread('testdata1.t
xt','ff','headerlines',1) plot distributions
of each subject figure(1) hold on subplot(2,1,1) h
ist(subj1) subplot(2,1,2) hist(subj2) plot mean
and standard error on bar graph figure(2) hold
on bar(1,2,mean(subj1),mean(subj2)) errorbar(
1,2,mean(subj1),mean(subj2),se(subj1),
se(subj2),'r')
13
In Class Exercise 2Solution
Subject 1
Subject 2
Subject 1
Subject 2
14
Data Correlations
  • Matlab can calculate statistical correlations
    using the corrcoef() function
  • R,P corrcoef(A,B)
  • Calculates a matrix of R correlation
    coefficiencts and P significance values (95
    confidence intervals) for variables A and B
  • A B
  • R A AcorA BcorA
  • B AcorB BcorB

15
Data Correlations
  • Matlab can calculate statistical correlations
    using the corrcoef() function
  • R,P corrcoef(A,B)
  • Calculates a matrix of R correlation
    coefficiencts and P significance values (95
    confidence intervals) for variables A and B
  • A B
  • R A AcorA BcorA 1
    BcorA
  • B AcorB BcorB AcorB
    1

16
Data Correlations
  • Matlab can calculate statistical correlations
    using the corrcoef() function
  • R,P corrcoef(A,B)
  • Calculates a matrix of R correlation
    coefficiencts and P significance values (95
    confidence intervals) for variables A and B
  • A B
  • R A AcorA BcorA 1
    BcorA
  • B AcorB BcorB AcorB
    1
  • A B
  • P A sig(AcorA) sig(BcorA)
    1 sig(BcorA)
  • B sig(AcorB) sig(BcorB)
    sig(AcorB) 1

17
Data Correlations
Variable 1
Variable 2
18
Data Correlations
Variable 1
Variable 2
19
Data Correlations
Compute sample correlation r, p
corrcoef(var1,var2)
Variable 1
Variable 2
20
Data Correlations
Compute sample correlation r, p
corrcoef(var1,var2) r 1.0000 0.7051
0.7051 1.0000 p 1.0000 0.0000
0.0000 1.0000
Variable 1
Variable 2
21
In Class Exercise 3
  • From the class website download the file
    testdata2.txt (http//www.queensu.ca/neurosci/matl
    ab.php)
  • This text file contains data from variables
    arranged in columns
  • Load the text file into matlab using any method
    you like (load, import, textread(), fscanf())
  • Plot the data points
  • Calculate the Correlation

22
In Class Exercise 3Solution
read data var1, var2 textread('testdata2.txt'
,'ff','headerlines',1) Compute sample
correlation r corrcoef(var1,var2) Plot
data points figure(1) plot(var1,var2,'ro')
Variable 1
Variable 2
23
Part B Statistics Toolbox
  • The Statistics tool box contains a large array of
    statistical tools.
  • This lecture will concentrate on some of the most
    commonly used statistics for research
  • Parametric and non-parametric comparisons
  • Curve Fitting

24
Comparison of Means
  • A wide variety of mathametical methods exist for
    determining whether the means of different groups
    are statistically different
  • Methods for comparing means can be either
    parametric (assumes data is normally distributed)
    or non-parametric (does not assume normal
    distribution)


25
Parametric Tests - TTEST
  • H,P ttest2(X,Y)
  • Determines whether the means from matrices X and
    Y are statistically different.
  • H return a 0 or 1 indicating accept or reject nul
    hypothesis (that the means are the same)
  • P will return the significance level


26
Parametric Tests - TTEST
  • H,P ttest2(X,Y)
  • Determines whether the means from matrices X and
    Y are statistically different.
  • H return a 0 or 1 indicating accept or reject nul
    hypothesis (that the means are the same)
  • P will return the significance level


27
Parametric Tests - TTEST
  • Example
  • For the data from exercise 3
  • H,P ttest2(var1,var2)
  • gtgt H,P ttest2(var1,var2)
  • H 1
  • P 0.00000000000014877

Variable 1
Variable 2

28
Non-Parametric Tests Ranksum
  • The wilcoxin ranksum test assesses whether the
    means of two groups are statistically different
    from each other.
  • This test is non-parametric and should be used
    when data is not normally distributed
  • Matlab implements the wilcoxin ranksum test using
    the ranksum() function
  • ranksum(X,Y) statistically compares the means of
    two data distributions X and Y


29
Non-Parametric Tests - RankSum
  • Example
  • For the data from exercise 3
  • P,H ranksum(var1,var2)
  • P 1.1431e-014
  • H 1

Variable 1

Variable 2
30
Curve Fitting
  • Plotting a line of best fit in Matlab can be
    performed using either a traditional least
    squares fit or a robust fitting method.

12
10
8
6
Least squares

4
Robust
2
0
-2
1
2
3
4
5
6
7
8
9
10
31
Curve Fitting
  • A least squares linear fit minimizes the square
    of the distance between every data point and the
    line of best fit
  • polyfit(X,Y,N) finds the coefficients of a
    polynomial P(X) of degree N that fits the data
  • Uses least-square minimization
  • N 1 (linear fit)
  • P polyfit(X,Y,N) returns P, a matrix
    containing the slope and the x intercept for a
    linear fit
  • Y polyval(P,X) calculates the Y values for
    every X point on the line of best fit

32
Curve Fitting
  • Example
  • Draw a line of best fit using least squares
    approximation for the data in exercise 2
  • var1, var2 textread('testdata2.txt','ff','he
    aderlines',1)
  • P polyfit(var1,var2,1)
  • Y polyval(P,var1)
  • close all
  • figure(1)
  • hold on
  • plot(var1,var2,'ro')
  • plot(var1,Y)

33
Curve Fitting
  • A least squares linear fit minimizes the square
    of the distance between every data point and the
    line of best fit
  • P robustfit(X,Y) returns the vector B of the y
    intercept and slope, obtained by performing
    robust linear fit

34
Curve Fitting
  • Example
  • Draw a line of best fit using robust fit
    approximation for the data in exercise 2
  • var1, var2 textread('testdata2.txt','ff','he
    aderlines',1)
  • P robustfit(var1,var2,1)
  • Y polyval(P(2),P(1),var1)
  • close all
  • figure(1)
  • hold on
  • plot(var1,var2,'ro')
  • plot(var1,Y)

35
Ideas for Next Term?
  • Additional Statistics, ANOVAs ect..
  • Curve fitting with quadratic functions and cubic
    splines
  • Algorithms and Data structures
  • Improving Program Execution Time
  • Assistance Tutorials for individual programming
    problems
  • Any Suggestions?

36
Getting Help
  • Help and Documentation
  • Digital
  • Accessible Help from the Matlab Start Menu
  • Updated online help from the Matlab Mathworks
    website
  • http//www.mathworks.com/access/helpdesk/help/tech
    doc/matlab.html
  • Matlab command prompt function lookup
  • Built in Demos
  • Websites
  • Hard Copy
  • Books, Guides, Reference
  • The Student Edition of Matlab pub. Mathworks Inc.
Write a Comment
User Comments (0)
About PowerShow.com