Trend Analysis - PowerPoint PPT Presentation

1 / 31

About This Presentation

Title:

Trend Analysis

Description:

trend tests with and without exogeneous variables; dealing with seasonality; ... Exogenous variable - variable other than time trend that may have influence on Y. ... – PowerPoint PPT presentation

Number of Views:613

Avg rating:3.0/5.0

Slides: 32

Provided by: Owne655

Category:

more less

Transcript and Presenter's Notes

Title: Trend Analysis

1
Trend Analysis

Step vs. monotonic trends
approaches to trend testing
trend tests with and without exogeneous
variables
dealing with seasonality
Introduction to time series analysis
Step trends

2
Testing for Trends

Purpose
To determine if a series of observations of a
random variable is generally increasing or
decreasing with time
Or, has probability distribution changed with
time?
Also, we may want to describe the amount or rate
of change, in terms of some central value of the
distribution such as the mean of median.

3
Monotonic Trend vs. Step Trend-Some Rules

Situation Monotonic Step
Long record with a known event that naturally
X
divides the period of record into a pre and
post period.
Record broken into two segments with a long
X
gap between them.
Unbroken or nearly unbroken long record X
Multiple records with a variety of lengths and
X
timing of data gaps.
Unbroken record that shows a sudden jump in X
magnitude of r.v. for no known season.

4
Approaches to Monotonic Trend Testing

Where Y r.v. of interest in the trend test
(e.g. conc., biomass, etc.)
X an exogenous variable expected to affect
Y, (e.g. flow rate, etc.)
R residuals from a regression or LOWESS
of Y vs. X
T time (often expressed in years)

5
Trend tests with No Exogenous Variable

Nonparametric Mann-Kendall test
same test as Kendalls ? (discussed in the next
few slides)
test is invariant to power transformation.
Kendalls S statistic is computed from the Y, T
data pairs. H0 of no change is rejected when S
(and therefore Kendalls ? of Y vs T) is
significantly different from zero.
If H0 rejected, we conclude that there is a
monotonic trend in Y over time T.

6
Kendalls Tau (t)

Tau (t) measures the strength of the monotonic
relationship between X and Y. Tau is a rank-based
procedure and is therefore resistant to the
effect of a small number of unusual values.
Because t depends only on the ranks of the data
and not the values themselves, it can be used
even in cases where some of the data are
censored.
In general, for linear associations, t lt r.
Strong linear correlations of r gt 0.9 corresponds
to t gt 0.7.
Tau - easy to compute by hand, resistant to
outliers, measures all monotonic correlations,
and invariant to power transformations of X or Y
or both.

7
Computation of Tau (t)

First order all data pairs by increasing x. If a
positive correlation exists, the ys will
increase more often than decreases as x
increases.
For a negative correlation, the ys will decrease
more than increase.
If no correlation exists, the ys will increase
and decrease about the same number of times.
A 2-sided test for correlation will evaluate
Ho no correlation exists between x and y (t 0)
Ha x and y are correlated (t ? 0)

The test statistic S measures the monotonic
dependence of y on x
S P - M
where P of (), the of times the ys
increase as the xs increase, or the of yi lt yj
for all i lt j.
M of (-), the of times the ys decrease as
the xs increase, or the number of yi gt yj for
all i lt j.
i 1, 2, (n-1) and j (i1), , n.
There are n(n-1)/2 possible comparisons to be
made among the n data pairs. If all y values
increased along the x values, S n(n-1)/2. In
this situation, t 1, and vice versa.
Therefore dividing S by n(n-1)/2 will give a -1
lt t lt 1.

Hence the definition of t is
To test for the significance of t, S is compared
to what would be expected when the null
hypothesis is true. If it is further from 0 than
expected, Ho is rejected.
For n lt 10, an exact test should be computed.
The table of exact critical values is given in
Table 1. For n gt 10, we can use a large sample
approximation for the test statistic.

10
(No Transcript)
11
Large sample approximation - t

The large sample approximation Zs is given by
And, Zs 0, if S 0, and where
The null hypothesis is rejected at significance
level a if Zs gt Zcrit where Zcrit is the critical
value of the standard normal distribution with
probability of exceedence of a/2.

12
Example 10 pairs of x and y are given below,
ordered by increasing xy 1.22 2.20 4.80
1.28 1.97 1.46 2.34 2.64 4.84 2.96
x 2 24 99 197 377 544
3452 632 6587 53170
Outlier
x
y
13

To compute S, first compare y1 1.22 with all
subsequent ys.
2.20 gt 1.22, hence
4.40 gt 1.22 hence , etc.
Move on to i2, and compare y2 2.20 to all
subsequent ys.
4.80 gt 2.20, hence
1.28 lt 2.20 hence -, etc.
For i2, there are 5 s and 3 -s. It is
convenient to write all and - below their
respective yi, as shown on the next slide.
In total there are 33 s (P33) and 12 -s
(M12). Therefore
S33-12 21, and there are 10(9)/245 possible
comparisons, so t 21/45 0.47. From Table
1, for n 10 and S21, the exact p-value is
2(0.036) 0.072.

14
Table of and - signs

yi 1.22 2.20 4.80 1.28 1.97 1.46
2.64 2.34 4.84 2.96
- -
- -
- -
- -
- -
-
-
33 () and 12 (-), S 33-12 21

15
Large sample approximation

The large sample approximation is
From the Table of normal distribution, the
1-sided quantile for 1.79 0.963, so that
p2(1-0.963) 0.074
The large sample approximate is quite good even
for a small sample of size 10.

16
Kendall-Theil Robust Line (Non-parametric)

The K-T Robust line is related to Kendalls
correlation coefficient tau ( ) and is
applicable when Y is linearly related to X.
This line is not
dependant on the normality of residuals for the
validity of significant tests,
strongly affected by outliers.
The Kendall-Theil line is of the form

This line is closely related to Kendalls t, in
that the significance to the test for H0 slope
is identical to the test for H0
.
The slope estimate is computed by comparing
each data pair to all others in a pairwise
fashion.
The median of all pairwise slopes is taken to be
the non-parametric estimate of slope .
The intercept is defined as follows

for all i lt j
18

Where Ymed and Xmed are the medians of X and Y.
The formula assures that the fitted line goes
through the point (Ymed, Xmed). This is
analogous to OLS, where the fitted line always
goes through the means of X and Y.

Example 1 Given the following 7 data pairs
There are n(n-1)/2 pairs
19
Test of Significance

The test is identical to Kendalls t. That is,
first compute S, then check Table 1 if n lt 10, or
use large sample approximation for n gt 10.
For the example, S20-119, and there are 21
pairwise slopes. t19/210.90. From Table 1,
with n7 and S19, the exact 2-sided p-value is
2(0.0014)0.003
Note If the Y value was 60 instead of 16, a
clear outlier, the estimate of the slope would
not change. This shows that the Kendall-Theil
line is resistant to outliers.

20
Parametric Regression of Y on T

Simple regression of Y on T is a test for trend.
H0 is that the slope coefficient ?1 0.
All assumptions of regression must be met -
normally of residuals, constant variance,
linearity of relationship, and independence.
Need to transform Y if assumptions not met.
If H0 is rejected, we conclude that there is a
linear trend in Y over time T.

21
Comparison of Simple Tests for Trends

If regression assumptions are OK, then
regression is best. Also good if there are more
that one exogenous variable.
If assumptions of regression not met (outliers,
censored, non-normal, etc.) Mann-Kendall will be
OK or better.
Transformation of Y will affect regression, but
not Mann-Kendall.
Best to try both methods.

22
(No Transcript)
23
(No Transcript)
24
Accounting for Exogenous Variables

Exogenous variable - variable other than time
trend that may have influence on Y. These
variables are usually natural, random phenomena
such as rainfall, temperature or streamflow.
Removing variation in Y caused by these
variables, the background variability or noise
is reduced so that any trend signal present is
not masked. The ability of a trend test to
discern changes in Y with T is then increased.

Removal process involves modelling, and thus
explaining the effect of exogenous variables with
regression or LOWESS.
When removing the effect of one or more
exogenous variables X, the probability
distribution of the Xs is assumed to be
unchanged over the period of record.
If the probability distribution of X has
changed, a trend in the residuals may not
necessarily be due to a trend in Y. Need to be
careful of what is chosen as exogenous variable.

26
Nonparametric approach - LOWESS

LOWESS - describes the relationship between Y
and X without assuming linearity or normality of
residuals.
LOWESS pattern should be smooth enough that it
doesnt have several local minima and maxima, but
not so smooth as to eliminate the true change in
slope.
LOWESS residuals
Then, Kendall S statistic is computed from R and
T pairs to test for trend.

27
Mixed Approach

First do regression of Y on X (can have more
than one X).
Check all regression assumption normality,
linearity, constant variance, significant ?1,
etc.
Then residuals (from
regression)
Then Kendall S is computed from R, T pairs to
test for trend.

28
Parametric approach

Uses regression of Y on T and X in one go.
This test for trend and simultaneously
compensates for the effects of exogenous
variables.
Must check for assumptions of regression. If ?1
is significantly different from zero, then there
is trend. ?2 should be significant as well.
Otherwise no point including X.

29
(No Transcript)
30
(No Transcript)
31
(No Transcript)

Write a Comment

User Comments (0)