Title: Introduction to ObjectOriented Programming for Statistical Applications
1Introduction to Object-Oriented Programming for
Statistical Applications
- E. Andres Houseman
- April 19, 2004
2Typical Statistical Development
Software Development/ Data Analysis
Testing and New Development
Theoretical Investigation
Pilot/Proof-of-Concept
Theoretical Enhancements
Total Effort 70
3Ideal Statistical Development
Software Development/ Data Analysis
Testing and New Development
Theoretical Investigation
Pilot/Proof-of-Concept
Theoretical Enhancements
Total Effort 58
4Goals for Today
- Understand the motivation for using OOP in a
statistical context - Basic understanding of OOP concepts
- Understanding of the architecture of complex R
functions such as glm() and lme()
5Object-Oriented Programming
The object-oriented approach relies on .. a
systems-oriented approach that treats a system as
an organized entity made of components that can
be defined with respect to one another. It
proposes a method of decomposition that is based
not only on what the system does, but more
importantly, on the integration of what the
system is and does. -- Muller (1997), Instant
UML, Wrox Press Ltd.
6Objects
7Object-Oriented Programming (2)
Door
Open
Elevator Cabin
Light
Go to first floor
Button
Illuminate
8Why OOP?
- Stability of models with respect to real-world
entities - Iterative construction, which is made easier by
the weak coupling between components - The ability to reuse elements across development
projects
-- Muller (1997)
9Why OOP for Statisticians?
- Code reusability
- Enforced organization of development process
- Architecture of software more closely resembles
the statistical relationships - Protection against errors
10Why OOP for Statisticians?
Data
Model
Fit
Estimate
Assess
Inference
Goodnessof Fit
Predict
Predictions
11Simple Example to Fix Concepts
gt Sample data for toy examples gt gt
binomial.data lt- rbinom(1000, size1, p.5) gt gt
Stats lt- function(x) x should be
binomial return( list(p mean(x), n
length(x)) ) gt gt Stats( binomial.data
) p 1 0.494 n 1 1000
12Problem
gt What is wrong with this? gt gt normal.data lt-
rnorm(1000, mean5, sd2) gt gt Stats( normal.data
) p 1 4.998116 n 1 1000
13Simple Solution
gt Stats.modBinomial lt- function(x) x
should be binomial return( list(p
mean(x), n length(x)) ) gt gt
Stats.modNormal lt- function(x) x should
be normal return( list(mu mean(x),
sigmasqrt(var(x)), n length(x))
) gt
14Simple Solution (2)
gt statBinomial lt- Stats.modBinomial (
binomial.data ) gt statNormal lt- Stats.modNormal (
normal.data ) gt
gt statNormal mu 1 4.998116 sigma 1
2.013784 n 1 1000
gt statBinomial p 1 0.494 n 1 1000
15Mistakes Happen
gt A careless person could do this gt
Stats.modNormal ( binomial.data ) mu 1
0.494 sigma 1 0.5002142 n 1 1000
16Additional Functionality
gt Expectation lt- function(stat)
if(!is.null(statp)) statp binomial
else if(!is.null(statmu)) statmu normal
else stop("Unknown model type.") something
else gt gt Variance lt- function(stat)
if(!is.null(statp)) statp (1 - statp)
binomial else if(!is.null(statmu))
statsigma2 normal else stop("Unknown
model type.") something else gt gt
ConfidenceInterval lt- function(stat)
Wald-type confidence interval est lt-
Expectation(stat) se lt- sqrt(Variance(stat)
/ statn) return (c(est - 1.96se, est
1.96se))
17Additional Functionality (2)
gt Expectation(statBinomial) 1 0.494 gt
Expectation(statNormal) 1 4.998116 gt gt
Variance(statBinomial) 1 0.249964 gt
Variance(statNormal) 1 4.055326 gt gt
ConfidenceInterval(statBinomial) 1 0.4630119
0.5249881 gt ConfidenceInterval(statNormal) 1
4.873300 5.122931
18Additional Problems
gt What's wrong with this? gt gt stats lt-
Stats.modBinomial ( normal.data ) gt
ConfidenceInterval( stats ) 1 NaN NaN Warning
message NaNs produced in sqrt(Variance(stat)/st
atn)
19Solution Objects
gt as.modBinomial lt- function(x)Class
Constructor class(x) lt- "modBinomial"
return( x ) gt gt as.modNormal lt-
function(x)Class Constructor class(x) lt-
"modNormal" return( x ) gt gt
binomial.data lt- as.modBinomial( binomial.data
) gt normal.data lt- as.modNormal( normal.data
) gt gt binomial.data 1 0 1 1 1 0 0 1 ...
... 1000 0 attr(,"class") 1 "modBinomial"
20Objects and Methods
gt Stats lt- function(x) Method
if(class(x)"modBinomial") binomial
object lt- list(p mean(x), n
length(x)) class(object) lt-
"statBinomial" return(object)
else if(class(x)"modNormal") normal
object lt- list(mu mean(x),
sigmasqrt(var(x)), n
length(x)) class(object) lt-
"statNormal" return(object)
else stop("Unknown model type.") something
else gt
21Objects and Methods (2)
gt statBinomial lt- Stats ( binomial.data ) gt
statNormal lt- Stats ( normal.data ) gt gt
statBinomial p 1 0.494 n 1
1000 attr(,"class") 1 "statBinomial"
gt statNormal mu 1 4.998116 sigma 1
2.013784 n 1 1000 attr(,"class") 1
"statNormal"
22Classes and Methods in R
gt Stats lt- function(x) Method
if(class(x)"modBinomial") binomial
... else if(class(x)"modNormal")
normal ... else if(class(x
)"modPoisson") poisson ... ...
many many many more classes else
stop("Unknown model type.") something else
gt
23Classes and Methods in R (2)
gt Stats lt- function(x) UseMethod("Stats") gt gt
Stats.modBinomial lt- function(x) object lt-
list(p mean(x), n length(x))
class(object) lt- "statBinomial"
return(object) gt gt Stats.modNormal lt-
function(x) object lt- list(mu mean(x),
sigmasqrt(var(x)), n
length(x)) class(object) lt- "statNormal"
return(object) gt gt Stats.default lt-
function(x) stop("Unknown model type.")
24Classes and Objects in R (2)
gt Stats ( binomial.data ) p 1 0.494 n 1
1000 attr(,"class") 1 "statBinomial"
gt Stats ( normal.data ) mu 1
4.998116 sigma 1 2.013784 n 1
1000 attr(,"class") 1 "statNormal"
25Statistical Objects
Estimate
26Inheritance
gt as.modUnitNormal lt- function(x) x lt-
as.modNormal(x) class(x) lt-
c("modUnitNormal", class(x)) return( x )
gt gt Stats.modUnitNormal lt- function(x)
object lt- Stats.modNormal(x) objectsigma lt-
1 class(object) lt- c("statUnitNormal",
class(object)) return(object) gt
27Inheritance (2)
gt unitNormal.data lt- as.modStdNormal (
rnorm(1000, mean5, sd1) ) gt Stats(
unitNormal.data ) mu 1 5.0274 sigma 1
1 n 1 1000 attr(,"class") 1
"statUnitNormal" "statNormal" gt gt
ConfidenceInterval( Stats( unitNormal.data )
) 1 4.965420 5.089381
28Inheritance (3)
statistical model
normal
binomial
unit-variance normal
29Example lm() and glm()
gt ex.ols lt- lm(normal.data 1) gt ex.glm lt-
glm(binomial.data 1, familybinomial()) gt gt
class(ex.ols) 1 "lm" gt class(ex.glm) 1 "glm"
"lm" gt print function (x, ...)
UseMethod("print") gt summary function (object,
...) UseMethod("summary") gt predict function
(object, ...) UseMethod("predict")
30Methods for glm() and lm()
gt summary(ex.glm) Call glm(formula
binomial.data 1, family binomial()) Devianc
e Residuals Min 1Q Median 3Q
Max -1.167 -1.167 -1.167 1.188 1.188
Coefficients Estimate Std. Error
Pr(gtz) (Intercept) -0.02400 0.06325
0.704 (Dispersion parameter for binomial family
taken to be 1) Null deviance 1386.2 on
999 degrees of freedom Residual deviance
1386.2 on 999 degrees of freedom AIC 1388.2
gt summary.lm(ex.glm) Call glm(formula
binomial.data 1, family binomial()) Residua
ls Min 1Q Median 3Q Max
-0.9881 -0.9881 -0.9881 1.0121 1.0121
Coefficients Estimate Std. Error
Pr(gtt) (Intercept) -0.02400 0.06329
0.705 Residual standard error 1.001 on 999
degrees of freedom
31Using the Default Print Method
gt print.default(ex.ols) coefficients (Intercept)
4.998116 residuals 1
2 ... 0.617956019 1.297557292 ...
... 999 5.88151405 1000 7.81920766 attr(,"clas
s") 1 "lm"
gt print(ex.ols) Call lm(formula normal.data
1) Coefficients (Intercept) 4.998 gt
32Objects and Encapsulation
An object is an atomic entity, formed from the
union of state and behavior. It provides an
encapsulation relationship that ensures a strong
internal cohesion, and a weak coupling with the
outside.
33Objects and Encapsulation
Q
Land
Take-off
Control Tower
34Encapsulation in Programming Environments
- Java and C enforce encapsulation by default
- R has to be tricked into encapsulation
- Use the environment to encapsulate in R
35R Environments
- An environment is a name-space whose access is
tightly controlled - Functions that live within a specific environment
can only see data within that environment (or a
parent environment) - Use get and assign for communication with
other environments
36R Environments (2)
assign
37Objects and Encapsulation
statistical model
Sufficient statistics Order statistics
normal
binomial
unit-variance normal
38Encapsulation Example
Default modelled data class as.model lt-
function( data )Constructor object lt-
new.env() assign("data", data, envobject)
assign("n", length(data), envobject)
class(object) lt- "model" object N lt-
function(x) UseMethod("Size") Stats lt-
function(x) UseMethod("Stats") Mean lt-
function(x) UseMethod("Mean") Variance lt-
function(x) UseMethod("Variance")
39Encapsulation Example (2)
print.model lt- function(object) cat("Model
for data with", N(object),"observations.\n") Si
ze.model lt- function(object) get("n",
envobject) Stats.model lt-
function(object) sort( get("data",
envobject) ) Mean.model lt- function(object)
mean(get("data", envobject))
Variance.model lt- function(object)
var(get("data", envobject))
40Encapsulation Example (3)
gt myData lt- as.model( rnorm(100, 10, 2) ) gt
myData Model for data with 100 observations. gt gt
print.default(myData) ltenvironment
017DA720gt attr(,"class") 1 "model" gt gt Mean(
myData ) 1 9.970916 gt Variance( myData ) 1
4.203072 gt gt Stats( myData ) 1 4.463604
5.246475 5.276268 5.929042 6.031831 ... 8
7.032259 7.175986 7.224143 7.269212 7.828484
... 99 13.969482 14.986617
41Encapsulation Example (4)
as.modelNormal lt- function( data )Constructor
object lt- new.env() if( inherits(data,"model")
) "data" is a model object data lt-
get("data", data) COPY data over Note
this is wrong object lt- data This would
mean that "data" and this object
reference the same environment!
assign("data", data, envobject) class(object)
lt- c("modelNormal", "model") assign("mu",
mean(data), envobject) assign("sigma",
var(data).5, envobject) assign("n",
length(data), envobject) object
42Encapsulation Example (5)
gt myNormalData lt- as.modelNormal ( myData ) gt
myNormalData Model for data with 100
observations. gt Stats.modelNormal lt-
function(object) stats lt- c(Mean(object),
sqrt( Variance(object) )) names( stats) lt-
c("mu", "sigma") stats gt Stats(
myNormalData ) mu sigma 9.970916
2.050140 gt
43Encapsulation Example (6)
more efficient Mean.modelNormal lt-
function(object) get("mu", envobject)
Variance.modelNormal lt- function(object)
get("sigma", envobject)2 Stats.modelNormal
lt- function(object) get("mu", envobject)
get("sigma", envobject) stats lt-
c(mu, sigma) names( stats) lt- c("mu",
"sigma") stats
44Encapsulation Example (7)
updating objects update.model lt-
function(object, newdata) data lt-
c(get("data", envobject), newdata)
assign("data", data, envobject) assign("n",
length(data), envobject) Does the update
method need to be modified for modelNormal?
45Encapsulation Example (8)
gt myData Model for data with 100 observations. gt
gt someNewData lt- rnorm(50, 10, 2) gt update(
myData, someNewData ) gt gt myData Model for data
with 150 observations. gt myNormalData Model for
data with 100 observations. gt Stats(myNormalData)
mu sigma 9.970916 2.050140 gt update(
myNormalData, someNewData ) gt Stats( myNormalData
) mu sigma 9.970916 2.050140
46Encapsulation Example (9)
- update.modelNormal lt- function(object, newdata)
- update.model( object, newdata )
- data lt- get("data", envobject)
- assign("mu", mean(data), envobject)
- assign("sigma", var(data).5, envobject)
-
- gt Reset the data for this example
- gt myNormalData lt- as.modelNormal ( get("data",
myNormalData)1100 ) - gt update( myNormalData, someNewData )
- gt Stats(myNormalData)
- mu sigma
- 9.851004 2.014527
47Why Use Environments?
Function
Input Data
Output Data
Environment Data
48Example Architecture of lme()
- The mixed model function lme() returns objects of
class lme - Correlation models are passed into lme() as
objects belonging to the class corStruct - Random effects structures are stored efficiently
as objects belonging to the class pdMat
49pdMat
gt myPDMat lt- pdMat( matrix(c(2,1,1,2), 2, 2) ,
pdClass "pdCompSymm") gt myPDMat Positive
definite matrix structure of class pdCompSymm
representing ,1 ,2 1, 2 1 2,
1 2 gt print.default(myPDMat) 1 0.3465736
1.0986123 attr(,"Dimnames") attr(,"Dimnames")1
NULL attr(,"Dimnames")2 NULL attr(,"ncol") 1
2 attr(,"class") 1 "pdCompSymm" "pdMat"
50pdMat (2)
gt myPDMat lt- pdMat( matrix(c(2,1,1,3), 2, 2) ,
pdClass "pdNatural") gt myPDMat Positive
definite matrix structure of class pdNatural
representing ,1 ,2 1, 2 1 2,
1 3 gt print.default(myPDMat) 1 0.3465736
0.5493061 0.8670147 attr(,"Dimnames") attr(,"Dimna
mes")1 NULL attr(,"Dimnames")2 NULL attr(,
"class") 1 "pdNatural" "pdMat"
51pdMat (3)
gt myPDMat lt- pdMat( matrix(c(2,1,1,3), 2, 2) ,
pdClass "pdCompSymm") gt gt myPDMat Positive
definite matrix structure of class pdCompSymm
representing ,1 ,2 1, 2.500000
1.020621 2, 1.020621 2.500000
52corStruct
gt myCorr lt- corAR1(.5) gt myCorr Correlation
structure of class corAR1 representing Phi 0.5
gt print.default(myCorr) 1 1.098612 attr(,"formu
la") 1 attr(,"formula")attr(,"class") 1
"formula" attr(,"formula")attr(,".Environment") lte
nvironment 04332600gt attr(,"fixed") 1
FALSE attr(,"class") 1 "corAR1" "corStruct"
53Summary
- Why OOP?
- Code reusability and faster development
- Enforce intelligent design considerations
- Protection against errors
- OOP Basic Concepts
- Classes and Inheritance
- Objects and Encapsulation