MULTIVARIATE STATISTICS - PowerPoint PPT Presentation

1 / 18
About This Presentation
Title:

MULTIVARIATE STATISTICS

Description:

Cluster Analysis: discover natural groupings of similar data points ... Use generalized additive model (GAM) If model is not linear in parameters ... – PowerPoint PPT presentation

Number of Views:188
Avg rating:3.0/5.0
Slides: 19
Provided by: brucek64
Category:

less

Transcript and Presenter's Notes

Title: MULTIVARIATE STATISTICS


1
MULTIVARIATE STATISTICS
  • A collection of techniques to help us understand
    patterns in and make predictions with large
    datasets with many variables
  • Ordination find a (hopefully small) number of
    composite variables that capture most of the
    variability among data points
  • Cluster Analysis discover natural groupings of
    similar data points
  • Discriminant Analysis find a (hopefully small)
    number of composite variables that can be used to
    predict the levels of a categorical dependent
    variable
  • Canonical Correlation Analysis find
    relationships between two groups of variables
  • Dependent variable is multivariate

2
WHAT CAN MULTIVARIATE STATISTICS DO?
  • Reflect more accurately the true multidimensional
    nature of environmental systems
  • Provide a way to handle large datasets with large
    numbers of variables by summarizing the
    redundancy
  • Provide rules for combining variables in an
    optimal way
  • Provide a means of detecting and quantifying
    truly multivariate patterns that arise out of
    correlational structure of the variable set
  • Provide a means of exploring complex data sets
    for patterns and relationships from which
    hypotheses can be generated and subsequently
    tested experimentally

3
Time series analysis
  • A Time Series is a list of observations of a
    variable (or variables) through time.
  • Most analyses require equal spacing in time
  • Usually violate assumptions of OLS and ANOVA.
  • Adjacent observations are not independent
  • In contrast to regression models, Time Series
    Analysis exploits temporal patterns in data.
  • Major uses (1) analyze trends cycles (2)
    describe relationships within time series and (3)
    forecasting

4
ARIMA model
  • Auto Regressive Integrated Moving Average
  • Auto Regressive means current value depends on
    past values
  • Integrated means take differences (to remove
    linear or seasonal trend)
  • Moving Average means replace X(t) with average
    of points nearby in time
  • Need to specify orders for each of these
  • of AR coefficients
  • of times differenced
  • Window for moving average
  • Possible seasonality in each
  • Choosing ARIMA model used to be a black art
  • Now just try lots of possibilities and select one
    with lowest AIC

5
pH of Norwegian Lakes
6
Questions about Norwegian lake data
  • Is there spatial autocorrelation in pH values?
  • How can we interpolate and smooth those values?

7
Semivariogram
  • Semivariance is half the average squared
    difference between pairs of lakes a certain
    distance apart
  • Measures variance among sites as a function of
    distance
  • Also called empirical variogram

8
Theoretical variogram
Sill
Range
Nugget
9
(No Transcript)
10
Kriging
  • Minimum mean-squared-error method of spatial
    prediction
  • Interpolates and smoothes
  • Named after South African mining engineer D.G.
    Krige
  • Uses a theoretical variogram
  • Have to fit the theoretical variogram first
  • Produces a smooth surface that fits the data

11
pH
12
Other complications
  • Censored data
  • Some sample units disappear before you get the
    data
  • Rat dies early in study would it have gotten a
    tumor?
  • Throwing that sample out of analysis may bias
    results
  • Measurement error
  • Creates bias in estimates of regression
    parameters
  • There is a large set of tools for dealing with
    these
  • Need to know something about the statistical
    properties of the measurement process or the
    causes of censoring

13
Congratulations!!!
14
WHAT IS STATISTICS GOOD FOR?
  • Transforms data into information
  • Describe patterns in data
  • What is the trend in CO2 emissions over time?
  • What is the covariation between temperature and
    rainfall?
  • Enhance scientific understanding
  • Is there a relationship between gasoline taxes
    and gasoline consumption?
  • What is the nature of the relationship between
    fish stocks and fish recruits?
  • Make predictions
  • Given the observed relationship between
    investment in green technologies and share price,
    what will be the effect on a firms share price
    of a one-unit increase in its green technology
    investment?
  • Need to distinguish between interpolation and
    extrapolation
  • Make decisions
  • Are we sufficiently confident that arsenic levels
    exceed a regulatory threshold that we should take
    action?

15
Enhance understanding
Make predictions
Estimate parameters
Describe patterns and relationships in data
Select models
Test statistical hypotheses
Test theories
Make decisions
16
A SYSTEMATIC APPROACH TO STATISTICAL ANALYSIS
  • Clearly formulate the problem, question, or
    decision that you are facing
  • What are the quantities that you need to
    estimate?
  • Write down a statistical model that relates the
    quantities of interest to the data you will
    collect (or have collected)
  • This model will include a random component that
    represents natural variability or sampling error
  • Estimate the parameters of the statistical model
  • In addition to the estimate of the most likely
    value, quantify your uncertainty in that estimate
  • Use the results to address your problem,
    question, or decision
  • Your report should include a quantification of
    the probability that your answer is incorrect

17
HYPOTHESIS TESTING OVERVIEW
18
Often start with OLS, but...
Write a Comment
User Comments (0)
About PowerShow.com