Statistical tests and data fitting - PowerPoint PPT Presentation

1 / 23
About This Presentation
Title:

Statistical tests and data fitting

Description:

Shows the probability that they are from the same population. ... You compare sand grains in a suspect's car with sand grains from a beach? ... – PowerPoint PPT presentation

Number of Views:25
Avg rating:3.0/5.0
Slides: 24
Provided by: RobertM7
Category:

less

Transcript and Presenter's Notes

Title: Statistical tests and data fitting


1
Statistical tests and data fitting
2
tests
  • T-test compare groups of data by comparing mean
    values
  • Correlation and regression compare data sets
    and find best fits
  • ANOVA Analysis of variance - evaluate
    differences between data

3
T-test in excel
  • Can use tools, data analysis
  • Tails
  • Two-tailed is one mean greater and/or lesser?
  • One-tailed one mean only greater
  • Type
  • Do data sets have the same variance?
  • Type 1 (paired) gt same data, different time
  • Type 2 (same variance)
  • Type 3 (different variance)

4
Results
P value Probability associated with a Students
t-test. Shows the probability that they are from
the same population. In this case, the data are
identical so the probability that they are from
the same population is 1.0 or 100
5
(No Transcript)
6
T-tests
  • You sample the same beach before and after a
    hurricane which one?
  • You compare sand grains in a suspects car with
    sand grains from a beach?
  • You compare sand grains taken from the same beach.

7
What are they?
  • Correlation- tells how much two variables are
    related
  • X and Y measured independently
  • Line fitting derives a best-fitting model
    between two variables.
  • Least squares (linear regression - straight line)
  • Curved lines (polynomial or spline fit)
  • Typically, for known X and measured Y (function
    of time, etc)

8
correlation
9
Correlation coefficient
Varies between -1 and 1 1 is perfectly
anti-correlated 0 is no correlation 1 is
correlated
10
correlation
Use correl function in Excel
correlation 0.98
correlation1
correlation-1
correlation0.01
11
Confidence interval for correlation
  • Possible to define a variable w

W has a normal distribution with a defined mean
and variance
12
Use this mean and variance to set the normal
distribution
  • Now can check confidence intervals
  • Often useful to check confidence interval of the
    null hypotheses (rxy0)

13
Least squares line fitting(linear regression)
  • For perfect linear correlation, it is
    straightforward to define an equation so that
  • Need to determine the coefficients A and constant
    B so that they define a straight line that fits
    the data as well as possible
  • We are estimating the best value of A and B.
  • We are assuming that the x value is known
    exactly and that the y value is uncertain.

14
Least squares fit
  • Common to use a least-squares fit.
  • The error between the best-fitting line and each
    data point is (y-y) where y is the data and y
    is the best fit (in a vertical distance).
  • We seek to minimize the sum of all the errors
    squared.
  • Why squared? Well, it has some nice properties.

15
Some details
regression line
Error between data and best-fit.
Y-intercept (jn this case, close to zero)
16
More details
  • We can think of the best fit line as a sort of
    mean value.
  • The scatter is measured by the estimated standard
    error.
  • This is analogous to the standard deviation.

17
Confidence intervals
  • 95 confidence interval for y (i.e., we are 95
    sure that y lies between the values a and b is
    defined by
  • (a,b) (y-k,yk) where k is

18
Some problems
  • Outliers tend to skew the line away from other
    data.
  • Results in a poor fit.
  • Line is weighted by the square of the vertical
    distance between the data point and the trend.
  • One large offset counts more than several small
    ones.

outliers
19
Why square?
  • Could use 3rd power
  • Or just absolute value
  • Also provide a straight line
  • More complicated and less elegant mathematics.
  • May be useful for some data
  • Absolute value handles outliers better.

20
Least-squares fit and Excel
  • Three ways (at least) to make a least squares fit
    to data in Excel.
  • Use linest(y,x,b,stats) and then plot.
  • Allows calculation of statistics
  • Powerful but complicated.
  • Use regression in Analysis ToolPak add-in
  • Make data plot (without line), then left click on
    data point. Then add trend line much easier but
    it is not clear how it does it.

21
Excel output for regression
70 of the variance is explained.
If you use this line,you could be off by this
much. It is square root of MS.
Probability of how significant the fit is
The y intercept is -0.58833 and the constant (b)
is 0.99256 so the equation is Y -0.58833X
0.99156.
Upper and lower bound on coefficient.
22
Fitting a curved line
  • Suppose the data are exponential or something you
    expect is curved.
  • Use a polynomial fit - click box under add
    trendline
  • Spline fit
  • Nonlinear least squares

23
ANOVA and F-test
  • Analysis of variance
  • Does the variance of two or more datasets vary
    significantly?

data
Write a Comment
User Comments (0)
About PowerShow.com