F. Jay Breidt*,** - PowerPoint PPT Presentation

1 / 36

About This Presentation

Title:

F. Jay Breidt*,**

Description:

Title: No Slide Title Author: nsu Last modified by: nsu Created Date: 8/5/2002 7:17:04 PM Document presentation format: On-screen Show Other titles – PowerPoint PPT presentation

Number of Views:69

Avg rating:3.0/5.0

Slides: 37

Provided by: NSU88

Learn more at: https://www.stat.colostate.edu

Category:

more less

Transcript and Presenter's Notes

Title: F. Jay Breidt*,**

1
Nonparametric Survey Regression Estimation Using
Penalized Splines

F. Jay Breidt,
Colorado State University
Jean D. Opsomer
Iowa State University
( more folks acknowledged soon)
Research supported by EPA STAR Grants
R-82909501 (CSU) and R-82909601 (OSU)

2
The Usual Disclaimer

The work reported here was developed under STAR
Research Assistance Agreements CR-829095 and
CR-829096 awarded by the U.S. Environmental
Protection Agency (EPA) to Colorado State
University and Oregon State University. This
presentation has not been formally reviewed by
EPA. The views expressed here are solely those of
the authors. EPA does not endorse any products or
commercial services mentioned in this report.

3
Outline

Background
Scales of inference
Specific versus generic
Model-assisted and model-based inference
Penalized splines
Comparison to other smoothers two-stage small
area
Variations network data, increment data
Other
Non-Gaussian time series
Summary
Status of STARMAP.2 and DAMARS.5

4
Scales of Inference in Surveys

Large area
sample itself suffices for inference
no model needed
Medium area
use auxiliary information through a model
model helps inference but is not critical
Small area
sample size is small or zero
inference must be based on a model

5
Specific and Generic Inference

Specific one study variable, few population
parameters
lots of modeling resources to specify, estimate,
and diagnose a model
willingness to defend the model
Generic many study variables, many population
parameters
no resources to model every variable
no single model is adequate/defensible

6
Generic Inferences in Aquatic Resources

Generic inference is a common problem for
federal, state, and tribal agencies
Example conduct a survey and prepare a report
analyze large numbers of chemical, biological,
and physical variables
estimate means, quantiles, and distribution
functions
break down both by political classifications and
by various ecological classifications

7
Model-Assisted Survey Inference

Scarce modeling resources for generic inference,
so we dont trust models
Can we use a model without depending on the
model?
Model-assisted inference
efficiency gains if model is right
sensible inference even if model is wrong

8
Model-Assisted Estimators

Form of model-assisted estimator
(model-based prediction)(design bias adjustment)
model incorporates auxiliary information
bias adjustment corrects for bad models
Classical parametric model-assisted
prediction from linear regression model
Our idea nonparametric model-assisted
prediction from kernel regression or other
smoother (JB JO (2000), Annals of Stat)

9
Why Nonparametric?

More flexible model specification
smooth mean function, positive variance function
Approximately correct more often
more opportunities for efficiency gains from
auxiliary information
often, not a large efficiency loss if parametric
specification is correct

10
Goals of Our Research

Focus on generic inference
Use flexible nonparametric models to reduce
misspecification bias
model-assisted medium area problem
model-based small area problem
Make the methods operationally feasible for state
and tribal agencies
linear smoothers generate generic weights

11
Penalized Splines

Very useful class of linear smoothers
Readily fits into standard linear mixed model
framework
Modular, extensible, computationally convenient
Automated smoothing parameter selection and
fitting with standard software
Several ongoing projects
Model-assisted p-spline estimation (Gerda
Claeskens, JO, JB) two-stage extensions (Mark
Delorey)
Small area p-spline estimation (Gerda, Giovanna
Ranalli, Goran Kauermann, JO, JB)
Smoothing on networks (Giovanna, JB)
Semiparametric mixed models for
increment-averaged core data (Nan-Jung Hsu, Steve
Ogle, JB)

12
Penalized Splines

Truncated linear basis allows slope changes at
each of many knots

Penalize for unnecessary slope changes

13
P-Splines Influence of Penalty

Fits with increasing penalty parameter

14
Penalized Splines Computation

Computation using S-Plus
Set up design matrix truncated linear splines
Z lt- outer(x, knots, "-")
Z lt- Z (Z gt 0)
C lt- cbind(one,x,Z)
Solve for spline with fixed degrees of freedom
D lt- diag(rep(0,2),rep(1,K))
mhat lt- X solve(t(C) diag(1/pi) C
lambda2 D) t(C) diag(1/pi)y
For data-determined df/roughness penalty, can use
lme()to select via REML

15
Model-Assisted P-Spline Estimator

Model-based prediction design bias adjustment

Asymptotically design-unbiased and design
consistent
Asymptotic variance given by

16
Design of Simulation Study

Model-assisted estimators
Polynomial regression
Poststratification (piecewise constant)
Local polynomial regression (kernel)
Penalized spline
Model-based estimator
Penalized spline
All use common degrees of freedom 3 or 6
Eight response variables on one population
Two noise levels
N1000
Designs SI or STSI
1000 replicate samples of size n50

17
Estimator Comparisons Common Degrees of Freedom
18
MSE Ratio Relative to Model-Assisted Penalized
Splines
19
Further Results from Simulation

Variance estimation
For all estimators, variance estimator has
negative bias
Weighted residual variance estimator performs
better
Confidence interval coverage
Somewhat less than nominal for all estimators
(90-92)
Undercoverage not as severe as bias would suggest
Negative weights (2 df)x(2 designs)x(1000
reps)x(50 weights) 200,000 weights
902 negative REG weights
145 negative LLR weights
2 negative MA weights

20
Two-Stage P-Spline Estimation

Available auxiliary information in two-stage
sampling
All clusters
All elements
All elements in sampled clusters
Mark Delorey (poster) focus on first case
Simulation study comparing Horvitz-Thompson,
regression, model-based p-spline, model-assisted
p-spline with and without cluster random effects
Operational issues with df, cluster variance
component
Some results p-spline is good!

21
Semiparametric Small Area Estimation

Gerda, Giovanna, Goran Kauermann, JO, JB
Example ANC level for Northeastern lakes
557 observations over 113 HUCs
Average sample size/HUC 4.9
64 HUCs contain less than 5 observations
Site-specific covariates lake location and
elevation
Simple way to capture spatial effects?

22
Semiparametric Small Area Model

Replace linear function of covariates by more
general model
direct estimator truth sampling error
truth semiparametric regression area-specific
deviation
Semiparametric regression expressed as linear
mixed model
Thin plate splines
Low-rank radial basis functions

23
Small Area Estimation Results

EBLUP for this model easily handled with
standard software (SAS proc mixed, SPlus lme())

24
P-Splines for Increment Data

Common for soil, sediment core data
Datum represents not a single depth point but a
depth increment (e.g., cylinder of soil 2.5cm in
diameter x 15cm high, collected at 20-35 cm)
Ignoring increment structure leads to biased,
inconsistent estimators
Integrate linear mixed model representation
Definite integral of truncated linear basis
(x-?) becomes differenced quadratic basis
(top-?) 2 - (bottom-?) 2
Immediate extension to small area estimation
E.g., soil mapping by map unit symbol

25
Carbon Sequestration

(Nan-Jung Hsu, Steve Ogle, JB) Broad class of
semiparametric mixed models for
increment-averaged data

26
Smoothing on Networks

Current research with post-doc, Giovanna Ranalli
have noisy data on stream network
have within-network distance measure (rather
than as the crow flies)
want interpolations at unsampled locations in
network
Semiparametric methodology readily extends to
this setting
low-rank radial basis functions
Possible real data from EPA (John Faustini)

27
Smoothing on Stream Networks

Toy stream network

Two first-order, one second-order stream segment
Regression function is exponential along
straight reach (two segments), constant along
remaining segment, continuous at intersection
n150 noisy observations obtained along network

28
Toy Network Results

Noisy observations smoothed via
Low-rank thin plate spline (2D, ignoring network
structure)
Within-network radial basis functions (1D,
accounts for network structure)
Network smooth offers 25-30 reduction in MISE
over spatial smooth

29
Non-Gaussian Time Series

Potential models for one-dimensional spatial
processes

30
Identification and Estimation

In Gaussian case, models of differing
causality/invertibility cannot be identified
Identification in non-Gaussian case
Fit causal/invertible ARMA via Gaussian quasi-MLE
Examine residuals for IID-ness
If not IID, fit All-Pass model (LAD Breidt,
Davis, Trindade, Ann. Stat. (2001), MLE, rank
estimation) to determine order of non-causality
or non-invertibility
Prediction and Estimation in non-Gaussian case
Best MS prediction requires trickery
Exact MLE, Bayes for non-Gaussian MA
Exact and conditional MLE for MA with roots near
unit circle Rosenblatt, Davis, Breidt, Hsu

31
Asymptotic Results for All-Pass
32
Where Are We Now?

DAMARS.5 Nonparametric model-assisted
1. Extensions
1.1 continuous spatial domains (Siobhan poster
Giovanna, work in progress)
1.2 multiple phases (Kim (PhD 2004, ISU), working
paper)
1.3 multiple auxiliary variables (gam Gretchen,
Goran, JO, JB, JASA 2nd submission)
1.3-1.4 alternative smoothing (Gerda, JO, JB,
p-splines Biometrika 2nd submission Ranalli and
Montanari, neural nets, JASA 2nd submission)
Other two-stage kernels (Kim, JO, JB JRSS
submission) two-stage splines (Mark, JB, poster)
2. Applications
2.1 CDF estimation (Alicia, JO, JB poster, CJS
submission)
2.2 Medium area (Siobhan, JO, JB poster)
2.3 Surveys over time (Jehad Al-Jararha, JO, JB,
spam with partial overlap)
2.4 Nonresponse (da Silva and Opsomer, Survey
Methodology 2004)

33
Where Are We Now?

STARMAP.2 Local Inferences
1. Small area
1.1-1.4 Nonparametric model-assisted for spatial
(Siobhan, poster Giovanna, work in progress)
Semiparametric (Gerda, Giovanna, Goran, JO, JB,
working paper) Increments (Nan-Jung, Steve, JB,
working paper)
1.1 MLE for all-pass (Beth, RD, JB, JMVA
submission) rank for all-pass (Beth, RD, JB,
working paper) Prediction for MA (Breidt and
Hsu, Stat Sinica 2004) Exact MLE for MA
(Nan-Jung, RD, JB)
Spatial trend detection (Hsin-Cheng Huang)
Design aspects (Bill, JB, poster)
2. Deconvolution
Formulated as another small area estimation
problem using constrained Bayes methods (Mark,
JB, poster)
Methodology seems OK example (88 HUCs in MAHA)
still being tweaked work in progress
3. Causal inference
3.1-3.3 (Alix G)

34
Some Summaries (these projects only)

Some Invited Talks and Seminars
Winemiller Symposium (Columbia, MO)
Computational Environmetrics (Chicago, IL)
Monitoring Symposium (Denver, CO)
ICSA (Singapore)
EMAP 2004 (Newport, RI)
ENAR (Pittsburgh PA)
IWAP (Piraeus, Greece)
IMS-ASA (Calcutta, India)
Western Ecology Division, EPA (Corvallis, OR)
University of Maryland (Baltimore County, MD)
Jeans talks

35
More Summaries (these projects only)

People
Students Ji-Yeon Kim, ISU PhD completed Spring
2004 (JO and JB) Bill Coar, Mark Delorey, Jehad
Al-Jararha, CSU PhD work in progress ISU
student?
Post-Doctoral Research Associate Giovanna
Ranalli
Visiting Research Scientists Nan-Jung Hsu and
Hsin-Cheng Huang
Unsuspecting Collaborators Gerda Claeskens and
Goran Kauermann
Papers
2 appeared, 2 tentatively accepted, 1 invited
revision, 4 submitted, n working papers

36
Optimal Sampling Design under Frame Imperfections

Motivated by problems with RF3 perennial
classification
About 20 errors of omission and of commission!
Previous work logistic regression for
probability of perennial as function of
covariates (Bill Coar)
Compare optimal biased and unbiased designs using
anticipated MSE criterion
Account for differential costs (in frame, not in
frame perennial, non-perennial)
Minimize AMSE for fixed cost
Further work
Asymptotic results for cases of negligible,
non-negligible bias
Empirical results