Statistics and Stamp Collecting - PowerPoint PPT Presentation

1 / 61
About This Presentation
Title:

Statistics and Stamp Collecting

Description:

Scientific research restricted to the Discovery and Classification of the ... Factorial Series distributions, Kemp distributions, Digamma and Trigamma ... – PowerPoint PPT presentation

Number of Views:69
Avg rating:3.0/5.0
Slides: 62
Provided by: SHor
Category:

less

Transcript and Presenter's Notes

Title: Statistics and Stamp Collecting


1
Statistics and Stamp Collecting
  • Haim Shore
  • Ben-Gurion University, Israel
  • Auburn Univ. Symposium, March, 2006

2
  • All science is either physics or
  • stamp collecting
  • Ernest Rutherford
  • (discoverer of the proton, 1919)

3
What is Stamp Collecting?
  • Definition
  • Scientific research restricted to the Discovery
    and Classification of the separate objects of
    inquiry
  • Alternative notation Taxonomy, Bug Collecting

4
Examples of Stamp Collecting in Science
  • Chemistry (until recently) Discovery of
    substances and their classification into separate
    groups of materials (as in the Periodic Table)
  • Biology (until recently) Discovery of new
    species and their classification into groups

5
How has physics evolved differently?
  • At first, they had five Stamps
  • The electric force
  • The magnetic force
  • The weak nuclear force
  • The strong nuclear force
  • Gravitation

6
Then physicists started to DERIVE the
different Stamps from unifying theory.
Highlights
  • Unifying the theory of the electric field
    (Faraday) and the theory of the magnetic field
    (Maxwell) via Maxwell theory of electromagnetism
    (in the late 1860s)
  • Unifying the theory of the weak force with the
    theory of the electromagnetic force into the
    Electroweak Theory by Weinberg and Salem
    (1967-1968)
  • Most recently Attempts to unify gravity with the
    other forces of nature (Super-string theories).
  • Yet, physicists are still concerned.

7
  • How many Stamps do we have in Statistics?

8
  • A Partial List

9
Stamps in Statistics
  • The normal distribution, the gamma distribution,
    the beta distribution, the Weibull distribution,
    the Poisson distribution, the binomial
    distribution, the negative binomial distribution,
    the geometric distribution, the hypergeometric
    distribution, the Pareto distribution, the
    multinomial distribution, the Burr distribution,
    the logistic distribution, Johnson distributions,
    the Wishart distribution, the extreme-value
    distribution, the t-distribution, the F
    distribution, Gumbel distribution, the Beta-Stacy
    distribution, the Laplace distribution, the
    noncentral t, F and ?2 distributions, various
    distributions of the correlation coefficient,
    Gompertz distribution, the Lognormal
    distribution, Kolmogorov-Smirnov distributions,
    Planck distributions, Polya distributions, Ords
    distributions, Pearson distributions, Lagrangian
    distributions, Gouls and Abel distributions,
    Factorial Series distributions, Kemp
    distributions, Digamma and Trigamma
    distributions, Gram-Charlier Type B
    distributions, discrete Ades distribution, Naor
    distribution, Runs distributions, generalized
    Polya-Aeppli distribution, Neyman Type A
    distribution, Thomas distribution, Cauchy
    distribution, Fisher-Stevens distribution, Furry
    distribution, Borel distribution, Bravais
    distribution, Tukey lambda distribution,
    Dirichlet distribution, uniform distribution.....
  • We have run out of albums. Yet everybody is
    happy...

10
  • How can one expect a practitioner to identify
    the correct distribution from such a multitude of
    Stamps????

11
  • We are holding this symposium because
    Statistics, from its very origin, engaged in
    Stamp Collecting

12
  • Response Modeling Methodology (RMM)-
  • Perhaps an initial attempt to escape Stamp
    Collecting in Statistics via a new approach to
    modeling monotone convexity

13
References
  • BOOK
  • 1 Shore, H. (2005). Response Modeling
    Methodology (RMM)- Empirical Modeling for
    Engineering and the Sciences. World Scientific
    Publishing Company. Singapore.
  • Published Papers
  • 1 Shore, H. (2000a). General control charts for
    variables. International Journal of Production
    Research, 38(8), 1875-1897.
  • 2 Shore, H. (2001a). Modeling a non-normal
    response for quality improvement. International
    Journal of Production Research, 39 (17),
    4049-4063.
  • 3 Shore, H. (2001b). Inverse normalizing
    transformations and a normalizing transformation.
    In Advances in Methodological and Applied
    Aspects of Probability and Statistics, Book, N.
    Balakrishnan (Ed.), Gordon and Breach Science
    Publishers. Canada. V. 2, 131-146,
  • 4 Shore, H. (2002a). Modeling a response with
    self-generated and externally-generated sources
    of variation. Quality Engineering, 14(4),
    563-578.
  • 5 Shore, H. (2002b). Response Modeling
    Methodology (RMM)- Exploring the implied error
    distribution. Communications in Statistics
    (Theory and Methods), 31(12), 2225-2249.
  • 6 Shore, H., Brauner, N., and Shacham, M.
    (2002). Modeling physical and thermodynamic
    properties via inverse normalizing
    transformations. Industrial and Engineering
    Chemistry Research, 41, 651-656.
  • 7 Shore, H. (2003). Response Modeling
    Methodology (RMM)- A new approach to model a
    chemo-response for a monotone convex/concave
    relationship. Computers and Chemical Engineering,
    27(5), 715-726.
  • 8 Shore, H. (2004a). Response Modeling
    Methodology (RMM)- Validating evidence from
    engineering and the sciences. Quality and
    Reliability Engineering International, 20, 61-79.
  • 9 Shore, H. (2004c) Response Modeling
    Methodology (RMM)- Current distributions,
    transformations, and approximations as special
    cases of the RMM error distribution.
    Communications in Statistics (Theory and
    Methods), 33(7), 1491-1510.
  • 10 Shore, H. (2004d). Response Modeling
    Methodology (RMM)- Maximum likelihood estimation
    procedures. Computational Statistics and Data
    Analysis. In press.
  • 11 Shore, H. (2005). Accurate RMM-based
    approximations for the CDF of the normal
    distribution. Communications in Statistics
    (Theory and Methods), 34(3). In press.

14
Two-Way Distinction( for modeling variation)
  • Modeling Systematic Variation vs. Modeling Random
    Variation
  • Theory-based Modeling vs. Empirical Modeling

15
Two-way Partition of Modeling Variation
16
Scientific Relational Models
  • Kinetic Energy Ek(V) M (V2/2)
  • Radioactive Decay R(t) R0 exp(-kt)
  • Antoine Equation log(P) A B / (TC),
  • Arrhenius Formula Re(T) A exp-Ea/(kBT)
  • Gompertz Growth-Model Y b1exp-b2exp(-b3x)
  • Einsteins E MC2 / 1-(v/C)21/2

17
Models of Random Variation
  • Normal Distribution
  • Exponential Distribution
  • Cauchy Distribution
  • Pearson Family of Distributions
  • Johnson Families (SB, SU, Log Normal)
  • Tukeys g- and h- Systems of Distributions
  • Box-Cox transformations (and their inverse)

18
Empirical Modeling of Random Variation (Current)
19
Empirical Modeling of Systematic Variation
(Current)
20
Empirical Modeling of Systematic Variation
(Current)
21
Empirical Modeling of Variation (RMM)
22
The RMM Model
  • Two questions
  • - What is the error-structure of the relational
    models?
  • - Are there certain patterns that repeatedly
    appear in current relational models? (assuming
    monotone convexity)

23
The RMM Model (1st question)
  • Antoine Equation (again)
  • log(P) A B / (TC), Blt0
  • P-pressure, T- temperature (K0)
  • What are the random errors associated with this
    model?
  • Suppose that T was frozen (namely, constant).
    Then (one option)
  • log(P) A B / (TC) e2, Blt0
  • e2- Random error, representing
  • Self-generated Variation

24
The RMM Model (Contd)
  • Since T is not constant (though assumed stable),
    we can write
  • T mT e1
  • e1- a random additive error, representing
  • Externally-generated random variation
  • Antoine Equation, with the errors
  • log(P) A B / (mT e1 C) e2, Blt0

25
The RMM Model (2nd question)
  • What are the repeated patterns?
  • Kinetic Energy Ek(V) M (V2/2)
  • Radioactive Decay R(t) R0 exp(-kt)
  • Antoine Equation log(P) A B / (TC),
  • Arrhenius Formula Re(T) A exp-Ea/(kBT)
  • Gompertz Growth-Model Y b1exp-b2exp(-b3x)

26
The RMM Model (2nd question)
  • The Ladder (Vhe1)
  • .
  • .
  • Exponential-exponential expa exp(V)
  • Exponential-power exp(aVb)
  • Exponential exp(V)
  • Power Va
  • Linear V
  • h- The linear predictor

27
The RMM Model (axiomatic derivation)
  • Assume that Self-generated and Externally
    generated components of variation interact
  • Y f1(h,e1 q1) f2(e2 q2)
  • f1- Modeling Exernally-generated variation
  • f2- Modeling Self-generated variation
  • We now wish to model f1 and f2

28
The RMM Model (axiomatic derivation)
  • Modeling Self-generated random variation (f2)
  • One option (a proportional model)
  • f2(e2 q2) M(1 e2) M(1se2Z2)
  • M- A central value, for example the median
  • Z2- A standard normal r.v.
  • Alternatively (e2 ltlt1) f2(e2 q2) exp(m2
    se2Z2)

29
The RMM Model (axiomatic derivation)
  • Modeling Externally-generated random and
    systematic variation (f1)
  • The Ladder of fundamental monotone
    convex/concave functions

30
The RMM Model (axiomatic derivation)
  • The Ladder (Vhe1)
  • .
  • .
  • Exponential-exponential expa exp(V)
  • Exponential-power exp(aVb)
  • Exponential exp(V)
  • Power Va
  • Linear V

31
The RMM Model (axiomatic derivation)
  • Modeling Externally-generated random and
    systematic variation
  • f1(h,e1 q1) exp(a/l)(he1)l - 1
  • h- The Linear Predictor
  • Does this represent well the Ladder?

32
The RMM Model (axiomatic derivation)
  • f1(h,e1 q1) exp(a/l)(he1)l - 1
  • Special Cases
  • Linear l0, a1
  • Power l0, a?1
  • Exponential l1
  • Exponential-power l?0, l?1
  • Exponential-exponential ??

33
The RMM Model (axiomatic derivation)
  • f1(h,e1 q1) exp(a/l)(he1)l - 1
  • Insert exp(b/k)(he1)k - 1
  • For k0, b1
  • exp(b/k)(he1)k - 1 he1
  • Adding two parameters allows a repeat of the
    cycle Linear-power-exponential

34
The RMM Model (axiomatic derivation)
  • Significance of the model for f1
  • Via RMM, the Ladder becomes a conveyor that
    takes the modeler, in a continuous manner, from
    one point to the next on the Spectrum of
    convexity intensity
  • Most important
  • The structure of the model is determined from the
    data
  • (not a-priori, as in GLM)

35
The RMM Model (axiomatic derivation)
  • The complete model
  • f1(h,e1 q1) f2(e2 q2) exp(a/l)(he1)l - 1
    m2 e2
  • e1 se1Z1, e2 se2Z2.
  • Assumption Z1, Z2 from a bi-variate standard
    normal distribution with correlation r
  • Three sets of parameters
  • - The linear predictor h b0 b1X1 b2X2..
  • - Structural parameters, a, l, m2
  • - Error parameters, r, se1, se1
  • Not all parameters were created equal

36
The RMM Model (axiomatic derivation)
  • Write Z2 (Z1z1) rz1 (1-r2)1/2Z
  • The model, expressed in terms of independent
    standard normal variables
  • W log(Y)
  • (a/l)(hse1Z1)l - 1m2se2 rZ1 (1-r2)1/2Z2

37
The RMM Model (axiomatic derivation)
  • W log(Y)
  • (a/l)(hse1Z1)l - 1m2se2 rZ1 (1-r2)1/2Z2
  • For r ?1, h 1 (m2log(Med.) and 4 par.)
  • W (a/l)(1se1Z)l - 1m2se2rZ,
  • Inverse Normalizing Transformation (INT)

38
The RMM Model (axiomatic derivation)
  • Eight variations of the RMM model
  • Two ways to write each error (e2ltlt1, e1ltlth)
  • 1e2 ? exp(e2),
  • (he1)l hl(1e1/h)l ? expllog(h)le1/h
  • Two equivalent forms to express the standard
    normal variables as independent (uncorrelated)
  • Z2 (Z1z1) rz1 (1-r2)1/2Z
  • Z1 (Z2z2) rz2 (1-r2)1/2Z

39
RMM Estimation
  • Phase 1Estimating the linear predictor (Set I)
  • Canonical Correlation Analysis plus Linear Reg.
  • Phase 2 Iteratively alternating between
    Estimating Set II (the Structural Parameters)
    via W-NL-LS and Set II (the Error Parameters)
    via max. of log-likelihood

40
RMM Modeling Random Variation
  • Modeling via the INT (r?1, one variation)
  • Y exp(A/l)(1BZ)l - 1log(M)DZ
  • Z- Standard normal
  • Examples
  • - Approximations for the Poisson quantile and CDF
    of the Normal distribution
  • - Approximations for gamma

41
RMM Modeling (INT)- The Poisson
Source Communications in Statistics (TM),
33(7), 2004
42
RMM Modeling (INT)- CDF of Standard Normal
Distribution
For example y -log(1-P) -log(1/2)
expBexp(Cz)-1 Dz (Source Communications in
Statistics (TM), 34(3), 2005)
43
RMM Modeling (INT)- Gamma
44
RMM Modeling (for sample data)
  • Fitting and estimating procedures for modeling
    random variation have been developed
  • Maximum likelihood estimation
  • Estimation by Percentile-Matching
  • Estimation by Moment-Matching (two- moment,
    partial and complete)

45
Comparative Evaluation
  • Five families of distributions were fitted to 20
    distributions. The parameters of the FITTED
    distributions, ?, were determined so as to
    minimize the L2 norm
  • L2

46
The Fitted Families
  • Pearson
  • Burr F(x)
  • Generalized Lambda

47
The Fitted Families (Contd)
  • Shore family of distributions
  • RMM
  • Log(X) log(M) (A/B)exp(BZ) 1 CZ,

48
The Approximated Distributions
  • Distribution Skewness Kurtosis
  • 1 Chi-Square(5) 1.265 2.4
  • 2 Log-Normal(0, 1/3) 1.0687 5.097
  • 3 Exponential(0.5) 2 6
  • 4 F-Ratio(5,10) 3.867 50.86153
  • 5 Inverse-Gamma(3,5) 11.1287 471.557
  • 6 Maxwell-Boltzman(4) 0.485693 0.108164
  • 7 Inverse-Weibull(4,2) 5.53489 199.792
  • 8 Meilkes Beta-Kappa(1,2,2) 18.6826 730.993
  • 9 Pareto-II(3, 1) 14.6373 976936
  • 10 Half-Normal(4) 0.995272 0.869177
  • 11 Gomperts(2,3) 0.970789 0.631851
  • 12 Standard Half-Logistic 1.54033 3.58374
  • 13 Exponential-Power(2, 4) -0.648501 0.110401
  • 14 Chi(4) 0.405696 0.0592951
  • 15 Pareto-III(1, 1) 2.53252 10.1756
  • 16 Pareto(10, 5) 4.64758 67.8
  • 17 Weibull(3,4) 0.168103 -0.270536

49
Results (based on optimal L2 values)
50
Results (based on optimal L2 values)
Box and Whisker diagram for the five families of
distributions compared
51
  • Thank you

52
Intra-Galactic Velocities (modeling random var.)
Average relative error Beta (KD)- 0.28 RMM-
0.39 Source Computational Statistics and Data
Analysis, In press
53
The Wave-Soldering Process (modeling systematic
variation)
  • GLM Myers, Montgomery, Vining (2002, Book, John
    WileySons)
  • log(m)3.08 0.44C - 0.40G 0.28AC - 0.31BD
  • m1/2 5.02 0.43A - 0.40B 1.19C - 0.39E -
    1.10G 0.58AC - 0.96BD (Poisson Model)
  • RMM Shore (2005, Book, World Scientific
    Publishing)
  • h -0.1786 0.16A - 0.14B 0.53C - 0.16E -
    0.49G 0.22AC - 0.43BD (R2-adj.0.848)
  • Comparison with the square-root link
  • For A/C RMM 0.164/0.526 0.31 GLM 0.43/1.19
    0.36
  • For (AC)/(BD) RMM 0.224/(-0.431) -0.52 GLM
    0.58/(-.96) -0.60.

54
RMM Modeling in Chemical Engineering
  • DIPPR (Design Institute for Physical Property
    Data)- A leading data-base for chemical
    properties and their modeling
  • Notation
  • - The acceptable model Model with best fit
    relative to the data set in the data-base
  • - Goodness-of-fit By various measures for fit
  • (for example R2, AIC, MSE)

55
RMM Modeling in Chemical Engineering (Contd)
  • Substances examined (5) Water, Acetone, Benzoic
    acid, Ethane, Iso-propanol
  • Properties examined (13) Solid density, Liquid
    density, Solid vapor pressure, Vapor pressure,
    Heat of vaporization, Solid heat capacity, Liquid
    heat capacity, Ideal gas heat capacity, Liquid
    viscosity, Vapor viscosity, Liquid thermal
    conductivity, Vapor thermal conductivity, Surface
    tension

56
RMM Modeling in Chemical Engineering (Contd)
Preferred Model (by substance, across properties
and goodness-of-fit measures)
57
RMM Modeling in Chemical Engineering (Contd)
Preferred Model (by property, across substances
and goodness-of-fit measures)
58
Software Reliability-Growth Models
Example 1- Number of cumulative detected failures
vs. number of software testing hours (Musa's M1
data set)
59
Software Reliability-Growth Models
  • IV. Duane (1964), who has proposed the Power-Law
    Process (PLP)
  • m(t) atb , a gt 0, b gt 0. (2.29)
  • V. The log-power model introduced by Xie and Zhao
    (1993)
  • m(t) (a)log(1t)b , a gt 0, b gt 0. (2.30)
  • VI. The generalized power family models (GPFM) by
    Knafl and Morgan (1996)
  • m(t) ak(t)b, (2.31)

60
Software Reliability-Growth Models
  • Figure 16.4. Example 2 - Scatter plots of
    residuals (vs. exact value of Y), from fitting
    (clockwise) Model IV, Model V and the RMM model
    (16.3) and (16.10)

61
RMM Modeling
  • Thank you
  • (shor_at_bgu.ac.il)
Write a Comment
User Comments (0)
About PowerShow.com