Title: CHAPTER 9 Random Processes and Random Walks
1CHAPTER 9 Random Processes and Random Walks
2Random Processes and Random Walks
- Basic Longitudinal Data Vocabulary
- Identifying and Summarizing Data
- Random Process and Random Walk Models
- Inference using Random Walk Models
- Detecting Nonstationarity
- Filtering to Achieve Stationarity
- Forecast Evaluation
3Basic Longitudinal Data Vocabulary
- Longitudinal data is a numerical realizations of
a process that evolves over time. - Ordering is the key, not time. Ordering could
also be spatial (oil exploration). - A process that is stable over time is called
stationary - Successive samples of modest size should have
approximately the same distribution - We are particularly concerned with the mean level
and variation( we use histograms). - If the process is stationary, we may define a
distribution. - Cross-sectional data - a collection of
observations for which there is no natural
ordering such as space or time
4Graphically Summarizing Data
- Illustration 9.1 Sum of Two Dice - example of
stationary time series - TABLE 9.1 FIRST FIVE OF THE FIFTY ROLLS
- t 1 2 3 4 5
- yt 10 6 11 3 9
- Time series plot is a Scatter plot of the
response versus time. - We usually connect points to help detecting
patterns over time. - Illustration 9.2 Domestic Beer Prices- example
of a nonstationary time series. This TSplot may
suggest a trend in time.
5Stationary time series
6Presence of Trend Tt Nonstationarity of time
series
7Histogram is meaningful
8Modeling a series
- Three important component
- 1. Trends in time T
- 2. Seasonal patterns
- 3. Random patterns
- YtTtStet
- Or
- YtTt Stet
9Trend in time
- Lack of Trends
- Yt ?o et
- Fitting Polynomial function of time
- Yt ?o ?1 tet
- or
- Yt ?o ?1 t ?1 t² et
- Or
- Yt ?o ?1 t ?1 t² ?1 t³ et
- And so on.
10Fitting Trends in Time
- One can think of forecasting as simply trend
extrapolation. - To extrapolate trends into the future, we must
first identify trends. (from 1952 until 1988) - To be complete, we must identify the complete
lack of trends as one type of trend. - ADJBEERt 161.94 - 1.7482 t.
- (Std Error) (0.932) (0.0452)
- R296.4, sy19.28 versus s 3.716. Big
Improvement. - For prediction, ADJBEER38 161.94-1.7482
(38)95.508. - In Section 9.3 we will argue that a useful
technique for identifying is a Control Chart - TS
plot plus superimposed control limits useful to
ascertain stationarity
11Fitting a regression model
12Fitting a cubic model
13Fitting Seasonal TrendsPeriodic
behaviorPercentage of qualified voters
14Quadratic trend in time
15Continue
16Fitting seasonal trends
- Yt ?o ?1 t ?2 t² ?3 zt et
- Where zt 1 if presidential election
- 0 otherwise
- ?3 zt captures the seasonal component in this
model - Class activity fit this model?
17Random Process Models
- A random process is a stationary process that
displays no apparent patterns through time - It is the link between cross-sectional and
longitudinal models - In Section 2 we called
these random errors (yi?ei ). - Thus, if you identify a series as a random
process, use Chapter 2 tools for inference. For
example, for forecasting the approximate 95
prediction interval - 2 sy
- tv.sy
- A filter is a procedure that reduces a process to
a random process
18Random Walk Model
- A Random walk is our first model that is not a
random process. We start with it because there
is a very easy rule to reduce, or filter it, to a
random process. CtCt-1yt - TABLE 9.2 WINNINGS FOR FIVE OF THE FIFTY ROLLS
- t 1 2 3 4 5
- yt 10 6 11 3 9
- yt 3 -1 4 -4 2
- Ct 103 102 106 102 104
- ytyt-7, and Ct Ct-1 yt (suppose that C0100)
- A random walk process may be defined by the
partial sums of a random process. CtC0 y1
y2..yt - Differencing is the procedure (filter) that
reduces a random walk to a random process
19Inference using Random Walk Models
- Model Properties Suppose that yt is a random
process. Then Ct C0 y1 ... yt is a random
walk. - Using results from math stat, we can show that
- E Ct C0 t ?y and Var Ct t ?y2.
- The Random Walk Process is nonstationary in the
variance. Further, it is nonstationary in the
mean if ?y 0.
20Forecasting with a Random Walk Model
- Suppose that we wish to forecast Ct. First,
- CTL CTL -1 yTL CTL-2 yTL -1 yTL
... - CT yTL yTL-1 ... yTl .
- Because a good forecast of yTL is , a good
forecast for CTL is CT L . - An approximate 95 prediction interval for the
forecast of CTL is CT L 2 sy L1/2 - Note that the range of the predictions interval,
4 sy L1/2 grows as the lead time L grows.
21Illustration 9.1 Sum of Two Dice
- Start with C0 100 and from the data, we have
C50 93. - Thus, the average change was -7/50
-0.14 with standard deviation sy 2.703. - The forecast of our sum of capital at time 60,
for example, is 93 10 (-.14) 91.6. - The corresponding 95 prediction interval is
- 91.6 2 (2.703) 10 1/2 91.6 17.1 (74.5,
108.7).
22Random Walk versus Linear Trend in Time Model
- For the linear trend in time model, we have Ct
?0 ?1 t et where et is a random error
process. - If Ct is a random walk, then it can be modelled
as a partial sum, that is, Ct C0 y1 ...
yt. - We can also decompose the random process into a
drift term ?y plus a random process, that is, yt
?y et. - Combining these two ideas, we see that a random
walk model can be written as - Ct C0 ?y t ut where ut ?j ej.
- Comparing these expressions, we see that the two
models are the same in that the deterministic
portion is an unknown linear function of time.
The difference is in the error component.
23Detecting Nonstationarity
- A (retrospective) control chart is a time series
plot with control limits (for example, 3
SD's) superimposed. - These control limits are useful for deciding
whether or not a process is stationary. - For a given series of observations, calculate
and standard deviation (SD). - Define the "upper control limit" by UCL 3
SD and the "lower control limit" by LCL - 3
SD. - Control limits calculated at plus or minus three
standard deviations are called 3-sigma limits. - We use the "individual" control charts. Other
types of control charts include Xbar, R and s
charts.
24Identifying a Random Walk
- Because E Ct C0 t ?y, if the series follows a
linear trend in time, this may suggest a random
walk model. - Var Ct t ?y2, if the variability of a series
gets larger as time t gets large, this may
suggest a random walk model. - Because Var Ct t ?y2 gt ?y2 Var yt, if you
difference the data and greatly reduce the
standard deviation, this may suggest a random
walk model. - If the original data follows a random walk model,
then the differenced series will follow a random
process model. Try differencing, if you come up
with a random process, then a random walk is a
good model for the original series. - In Chapter 10, we will discuss two additional
identification devices. - Scatter plots of the series versus a lagged
version of the series and - Summarizing these scatter plots with statistics
called autocorrelations.
25Filtering to Achieve Stationarity
- Filters are procedures for reducing observations
to a random process. - Three types of filters
- Introduce explanatory variables (x's) to control
for known variables (in cross-sectional
regression) - differencing, and
- transformations.
- When filtering is done to reduce the series to
stationarity, Box and Jenkins called the
filtering the pre-processing stage.
26Transformations
- The power family is a useful class of nonlinear
transforms. - In particular, we will use logarithms to shrink
"spread out" data. - Differences of logs are particularly pleasing
because they can be interpreted as percentages
changes. To see this, define pchanget yt /
yt-1 - 1. Then, - ln yt - ln yt-1 ln (yt / yt-1 ) ln
(1pchanget) ? pchanget. - Consider the Standard and Poor's Composite
Quarterly Index. Here, the graphs illustrate
going from a nonstationary series to a stationary
by using differences of logs.
27Standard and Poor's Composite Quarterly Index
- The graphs illustrate going from a nonstationary
series to a stationary by using differences of
logs. - Control charts also help us see patterns of
nonstationarity. - In particular, R and s-charts display the
increasing variability through time. - Forecasting. The average proportional change was
0.01493. The most recent value of the index was
951. - The first forecast value of the proportional
change is 0.01493. This translates into a
forecast value of the index equal to
951(10.01493) 965.2. - The second forecast value of the proportional
change is 0.01493. This translates into a
forecast value of the index equal to
965.2(10.01493) 979.61.
28Forecast Evaluation
- Hold out a portion of the data, fit models to
one portion and validate on the other portion of
the data. - Step 1. Begin with a sample size of T' and
divide this into two subsamples. i1, ..., T1 -
obs from 1st subsample, - iT11,...,T1T2 T obs from 2nd subsample.
- Step 2. For the first sample, fit a candidate
model to the data set i1, ..., T1. - Step 3. Use the model created in Step 2 to
"predict" the dependent variables, yi , where
iT11, ..., T1T2. - Step 4. Compute the one or more forecast
evaluation statistics. - Repeat Steps 2-4 for various candidate models.
29Forecast Evaluation Statistics
- Mean Error (ME), Mean Percent Error (MPE), Mean
Square Error (MSE), Mean Absolute Error (MAE),
Mean Absolute Percent Error (MAPE)