Title: Statistics and Stamp Collecting
1Statistics and Stamp Collecting
-
- Haim Shore
- Ben-Gurion University, Israel
- Auburn Univ. Symposium, March, 2006
2- All science is either physics or
- stamp collecting
- Ernest Rutherford
- (discoverer of the proton, 1919)
3What is Stamp Collecting?
- Definition
- Scientific research restricted to the Discovery
and Classification of the separate objects of
inquiry - Alternative notation Taxonomy, Bug Collecting
4Examples of Stamp Collecting in Science
- Chemistry (until recently) Discovery of
substances and their classification into separate
groups of materials (as in the Periodic Table) - Biology (until recently) Discovery of new
species and their classification into groups
5How has physics evolved differently?
- At first, they had five Stamps
- The electric force
- The magnetic force
- The weak nuclear force
- The strong nuclear force
- Gravitation
6Then physicists started to DERIVE the
different Stamps from unifying theory.
Highlights
- Unifying the theory of the electric field
(Faraday) and the theory of the magnetic field
(Maxwell) via Maxwell theory of electromagnetism
(in the late 1860s) - Unifying the theory of the weak force with the
theory of the electromagnetic force into the
Electroweak Theory by Weinberg and Salem
(1967-1968) - Most recently Attempts to unify gravity with the
other forces of nature (Super-string theories). - Yet, physicists are still concerned.
7- How many Stamps do we have in Statistics?
8 9Stamps in Statistics
- The normal distribution, the gamma distribution,
the beta distribution, the Weibull distribution,
the Poisson distribution, the binomial
distribution, the negative binomial distribution,
the geometric distribution, the hypergeometric
distribution, the Pareto distribution, the
multinomial distribution, the Burr distribution,
the logistic distribution, Johnson distributions,
the Wishart distribution, the extreme-value
distribution, the t-distribution, the F
distribution, Gumbel distribution, the Beta-Stacy
distribution, the Laplace distribution, the
noncentral t, F and ?2 distributions, various
distributions of the correlation coefficient,
Gompertz distribution, the Lognormal
distribution, Kolmogorov-Smirnov distributions,
Planck distributions, Polya distributions, Ords
distributions, Pearson distributions, Lagrangian
distributions, Gouls and Abel distributions,
Factorial Series distributions, Kemp
distributions, Digamma and Trigamma
distributions, Gram-Charlier Type B
distributions, discrete Ades distribution, Naor
distribution, Runs distributions, generalized
Polya-Aeppli distribution, Neyman Type A
distribution, Thomas distribution, Cauchy
distribution, Fisher-Stevens distribution, Furry
distribution, Borel distribution, Bravais
distribution, Tukey lambda distribution,
Dirichlet distribution, uniform distribution..... - We have run out of albums. Yet everybody is
happy...
10- How can one expect a practitioner to identify
the correct distribution from such a multitude of
Stamps????
11- We are holding this symposium because
Statistics, from its very origin, engaged in
Stamp Collecting -
12- Response Modeling Methodology (RMM)-
- Perhaps an initial attempt to escape Stamp
Collecting in Statistics via a new approach to
modeling monotone convexity
13References
- BOOK
- 1 Shore, H. (2005). Response Modeling
Methodology (RMM)- Empirical Modeling for
Engineering and the Sciences. World Scientific
Publishing Company. Singapore. - Published Papers
- 1 Shore, H. (2000a). General control charts for
variables. International Journal of Production
Research, 38(8), 1875-1897. - 2 Shore, H. (2001a). Modeling a non-normal
response for quality improvement. International
Journal of Production Research, 39 (17),
4049-4063. - 3 Shore, H. (2001b). Inverse normalizing
transformations and a normalizing transformation.
In Advances in Methodological and Applied
Aspects of Probability and Statistics, Book, N.
Balakrishnan (Ed.), Gordon and Breach Science
Publishers. Canada. V. 2, 131-146, - 4 Shore, H. (2002a). Modeling a response with
self-generated and externally-generated sources
of variation. Quality Engineering, 14(4),
563-578. - 5 Shore, H. (2002b). Response Modeling
Methodology (RMM)- Exploring the implied error
distribution. Communications in Statistics
(Theory and Methods), 31(12), 2225-2249. - 6 Shore, H., Brauner, N., and Shacham, M.
(2002). Modeling physical and thermodynamic
properties via inverse normalizing
transformations. Industrial and Engineering
Chemistry Research, 41, 651-656. - 7 Shore, H. (2003). Response Modeling
Methodology (RMM)- A new approach to model a
chemo-response for a monotone convex/concave
relationship. Computers and Chemical Engineering,
27(5), 715-726. - 8 Shore, H. (2004a). Response Modeling
Methodology (RMM)- Validating evidence from
engineering and the sciences. Quality and
Reliability Engineering International, 20, 61-79. - 9 Shore, H. (2004c) Response Modeling
Methodology (RMM)- Current distributions,
transformations, and approximations as special
cases of the RMM error distribution.
Communications in Statistics (Theory and
Methods), 33(7), 1491-1510. - 10 Shore, H. (2004d). Response Modeling
Methodology (RMM)- Maximum likelihood estimation
procedures. Computational Statistics and Data
Analysis. In press. - 11 Shore, H. (2005). Accurate RMM-based
approximations for the CDF of the normal
distribution. Communications in Statistics
(Theory and Methods), 34(3). In press.
14Two-Way Distinction( for modeling variation)
- Modeling Systematic Variation vs. Modeling Random
Variation - Theory-based Modeling vs. Empirical Modeling
15Two-way Partition of Modeling Variation
16Scientific Relational Models
- Kinetic Energy Ek(V) M (V2/2)
- Radioactive Decay R(t) R0 exp(-kt)
- Antoine Equation log(P) A B / (TC),
- Arrhenius Formula Re(T) A exp-Ea/(kBT)
- Gompertz Growth-Model Y b1exp-b2exp(-b3x)
- Einsteins E MC2 / 1-(v/C)21/2
17Models of Random Variation
- Normal Distribution
- Exponential Distribution
- Cauchy Distribution
- Pearson Family of Distributions
- Johnson Families (SB, SU, Log Normal)
- Tukeys g- and h- Systems of Distributions
- Box-Cox transformations (and their inverse)
18Empirical Modeling of Random Variation (Current)
19Empirical Modeling of Systematic Variation
(Current)
20Empirical Modeling of Systematic Variation
(Current)
21Empirical Modeling of Variation (RMM)
22The RMM Model
- Two questions
- - What is the error-structure of the relational
models? - - Are there certain patterns that repeatedly
appear in current relational models? (assuming
monotone convexity)
23The RMM Model (1st question)
- Antoine Equation (again)
- log(P) A B / (TC), Blt0
- P-pressure, T- temperature (K0)
- What are the random errors associated with this
model? - Suppose that T was frozen (namely, constant).
Then (one option) - log(P) A B / (TC) e2, Blt0
- e2- Random error, representing
- Self-generated Variation
24The RMM Model (Contd)
- Since T is not constant (though assumed stable),
we can write - T mT e1
- e1- a random additive error, representing
- Externally-generated random variation
- Antoine Equation, with the errors
- log(P) A B / (mT e1 C) e2, Blt0
25The RMM Model (2nd question)
- What are the repeated patterns?
- Kinetic Energy Ek(V) M (V2/2)
- Radioactive Decay R(t) R0 exp(-kt)
- Antoine Equation log(P) A B / (TC),
- Arrhenius Formula Re(T) A exp-Ea/(kBT)
- Gompertz Growth-Model Y b1exp-b2exp(-b3x)
26The RMM Model (2nd question)
- The Ladder (Vhe1)
- .
- .
- Exponential-exponential expa exp(V)
- Exponential-power exp(aVb)
- Exponential exp(V)
- Power Va
- Linear V
- h- The linear predictor
27The RMM Model (axiomatic derivation)
- Assume that Self-generated and Externally
generated components of variation interact - Y f1(h,e1 q1) f2(e2 q2)
- f1- Modeling Exernally-generated variation
- f2- Modeling Self-generated variation
- We now wish to model f1 and f2
28The RMM Model (axiomatic derivation)
- Modeling Self-generated random variation (f2)
- One option (a proportional model)
- f2(e2 q2) M(1 e2) M(1se2Z2)
- M- A central value, for example the median
- Z2- A standard normal r.v.
- Alternatively (e2 ltlt1) f2(e2 q2) exp(m2
se2Z2)
29The RMM Model (axiomatic derivation)
- Modeling Externally-generated random and
systematic variation (f1) - The Ladder of fundamental monotone
convex/concave functions
30The RMM Model (axiomatic derivation)
- The Ladder (Vhe1)
- .
- .
- Exponential-exponential expa exp(V)
- Exponential-power exp(aVb)
- Exponential exp(V)
- Power Va
- Linear V
31The RMM Model (axiomatic derivation)
- Modeling Externally-generated random and
systematic variation - f1(h,e1 q1) exp(a/l)(he1)l - 1
- h- The Linear Predictor
- Does this represent well the Ladder?
32The RMM Model (axiomatic derivation)
- f1(h,e1 q1) exp(a/l)(he1)l - 1
- Special Cases
- Linear l0, a1
- Power l0, a?1
- Exponential l1
- Exponential-power l?0, l?1
- Exponential-exponential ??
33The RMM Model (axiomatic derivation)
- f1(h,e1 q1) exp(a/l)(he1)l - 1
- Insert exp(b/k)(he1)k - 1
- For k0, b1
- exp(b/k)(he1)k - 1 he1
- Adding two parameters allows a repeat of the
cycle Linear-power-exponential
34The RMM Model (axiomatic derivation)
- Significance of the model for f1
- Via RMM, the Ladder becomes a conveyor that
takes the modeler, in a continuous manner, from
one point to the next on the Spectrum of
convexity intensity - Most important
- The structure of the model is determined from the
data - (not a-priori, as in GLM)
35The RMM Model (axiomatic derivation)
- The complete model
- f1(h,e1 q1) f2(e2 q2) exp(a/l)(he1)l - 1
m2 e2 - e1 se1Z1, e2 se2Z2.
- Assumption Z1, Z2 from a bi-variate standard
normal distribution with correlation r - Three sets of parameters
- - The linear predictor h b0 b1X1 b2X2..
- - Structural parameters, a, l, m2
- - Error parameters, r, se1, se1
- Not all parameters were created equal
36The RMM Model (axiomatic derivation)
- Write Z2 (Z1z1) rz1 (1-r2)1/2Z
- The model, expressed in terms of independent
standard normal variables - W log(Y)
- (a/l)(hse1Z1)l - 1m2se2 rZ1 (1-r2)1/2Z2
37The RMM Model (axiomatic derivation)
- W log(Y)
- (a/l)(hse1Z1)l - 1m2se2 rZ1 (1-r2)1/2Z2
- For r ?1, h 1 (m2log(Med.) and 4 par.)
- W (a/l)(1se1Z)l - 1m2se2rZ,
- Inverse Normalizing Transformation (INT)
38The RMM Model (axiomatic derivation)
- Eight variations of the RMM model
- Two ways to write each error (e2ltlt1, e1ltlth)
- 1e2 ? exp(e2),
- (he1)l hl(1e1/h)l ? expllog(h)le1/h
- Two equivalent forms to express the standard
normal variables as independent (uncorrelated) - Z2 (Z1z1) rz1 (1-r2)1/2Z
- Z1 (Z2z2) rz2 (1-r2)1/2Z
-
39RMM Estimation
- Phase 1Estimating the linear predictor (Set I)
- Canonical Correlation Analysis plus Linear Reg.
- Phase 2 Iteratively alternating between
Estimating Set II (the Structural Parameters)
via W-NL-LS and Set II (the Error Parameters)
via max. of log-likelihood
40RMM Modeling Random Variation
- Modeling via the INT (r?1, one variation)
- Y exp(A/l)(1BZ)l - 1log(M)DZ
- Z- Standard normal
- Examples
- - Approximations for the Poisson quantile and CDF
of the Normal distribution - - Approximations for gamma
41RMM Modeling (INT)- The Poisson
Source Communications in Statistics (TM),
33(7), 2004
42RMM Modeling (INT)- CDF of Standard Normal
Distribution
For example y -log(1-P) -log(1/2)
expBexp(Cz)-1 Dz (Source Communications in
Statistics (TM), 34(3), 2005)
43RMM Modeling (INT)- Gamma
44RMM Modeling (for sample data)
- Fitting and estimating procedures for modeling
random variation have been developed - Maximum likelihood estimation
- Estimation by Percentile-Matching
- Estimation by Moment-Matching (two- moment,
partial and complete)
45Comparative Evaluation
- Five families of distributions were fitted to 20
distributions. The parameters of the FITTED
distributions, ?, were determined so as to
minimize the L2 norm - L2
46The Fitted Families
- Pearson
- Burr F(x)
- Generalized Lambda
47The Fitted Families (Contd)
- Shore family of distributions
- RMM
- Log(X) log(M) (A/B)exp(BZ) 1 CZ,
48The Approximated Distributions
- Distribution Skewness Kurtosis
- 1 Chi-Square(5) 1.265 2.4
- 2 Log-Normal(0, 1/3) 1.0687 5.097
- 3 Exponential(0.5) 2 6
- 4 F-Ratio(5,10) 3.867 50.86153
- 5 Inverse-Gamma(3,5) 11.1287 471.557
- 6 Maxwell-Boltzman(4) 0.485693 0.108164
- 7 Inverse-Weibull(4,2) 5.53489 199.792
- 8 Meilkes Beta-Kappa(1,2,2) 18.6826 730.993
- 9 Pareto-II(3, 1) 14.6373 976936
- 10 Half-Normal(4) 0.995272 0.869177
-
- 11 Gomperts(2,3) 0.970789 0.631851
- 12 Standard Half-Logistic 1.54033 3.58374
- 13 Exponential-Power(2, 4) -0.648501 0.110401
- 14 Chi(4) 0.405696 0.0592951
- 15 Pareto-III(1, 1) 2.53252 10.1756
- 16 Pareto(10, 5) 4.64758 67.8
- 17 Weibull(3,4) 0.168103 -0.270536
49Results (based on optimal L2 values)
50Results (based on optimal L2 values)
Box and Whisker diagram for the five families of
distributions compared
51 52Intra-Galactic Velocities (modeling random var.)
Average relative error Beta (KD)- 0.28 RMM-
0.39 Source Computational Statistics and Data
Analysis, In press
53The Wave-Soldering Process (modeling systematic
variation)
- GLM Myers, Montgomery, Vining (2002, Book, John
WileySons) - log(m)3.08 0.44C - 0.40G 0.28AC - 0.31BD
- m1/2 5.02 0.43A - 0.40B 1.19C - 0.39E -
1.10G 0.58AC - 0.96BD (Poisson Model) - RMM Shore (2005, Book, World Scientific
Publishing) - h -0.1786 0.16A - 0.14B 0.53C - 0.16E -
0.49G 0.22AC - 0.43BD (R2-adj.0.848) - Comparison with the square-root link
- For A/C RMM 0.164/0.526 0.31 GLM 0.43/1.19
0.36 - For (AC)/(BD) RMM 0.224/(-0.431) -0.52 GLM
0.58/(-.96) -0.60.
54RMM Modeling in Chemical Engineering
- DIPPR (Design Institute for Physical Property
Data)- A leading data-base for chemical
properties and their modeling - Notation
- - The acceptable model Model with best fit
relative to the data set in the data-base - - Goodness-of-fit By various measures for fit
- (for example R2, AIC, MSE)
55RMM Modeling in Chemical Engineering (Contd)
- Substances examined (5) Water, Acetone, Benzoic
acid, Ethane, Iso-propanol - Properties examined (13) Solid density, Liquid
density, Solid vapor pressure, Vapor pressure,
Heat of vaporization, Solid heat capacity, Liquid
heat capacity, Ideal gas heat capacity, Liquid
viscosity, Vapor viscosity, Liquid thermal
conductivity, Vapor thermal conductivity, Surface
tension
56RMM Modeling in Chemical Engineering (Contd)
Preferred Model (by substance, across properties
and goodness-of-fit measures)
57RMM Modeling in Chemical Engineering (Contd)
Preferred Model (by property, across substances
and goodness-of-fit measures)
58Software Reliability-Growth Models
Example 1- Number of cumulative detected failures
vs. number of software testing hours (Musa's M1
data set)
59Software Reliability-Growth Models
- IV. Duane (1964), who has proposed the Power-Law
Process (PLP) - m(t) atb , a gt 0, b gt 0. (2.29)
- V. The log-power model introduced by Xie and Zhao
(1993) - m(t) (a)log(1t)b , a gt 0, b gt 0. (2.30)
- VI. The generalized power family models (GPFM) by
Knafl and Morgan (1996) - m(t) ak(t)b, (2.31)
60Software Reliability-Growth Models
-
-
- Figure 16.4. Example 2 - Scatter plots of
residuals (vs. exact value of Y), from fitting
(clockwise) Model IV, Model V and the RMM model
(16.3) and (16.10)
61RMM Modeling
- Thank you
- (shor_at_bgu.ac.il)