ROBUST STATISTICS - PowerPoint PPT Presentation

About This Presentation
Title:

ROBUST STATISTICS

Description:

ROBUST STATISTICS INTRODUCTION Robust statistics provides an alternative approach to classical statistical methods. The motivation is to produce estimators that are ... – PowerPoint PPT presentation

Number of Views:170
Avg rating:3.0/5.0
Slides: 46
Provided by: ceylan
Category:

less

Transcript and Presenter's Notes

Title: ROBUST STATISTICS


1
ROBUST STATISTICS

2
INTRODUCTION
  • Robust statistics provides an alternative
    approach to classical statistical methods. The
    motivation is to produce estimators that are not
    excessively affected by small departures from
    model assumptions. These departures may include
    departures from an assumed sample distribution or
    data departuring from the rest of the data (i.e.
    outliers).
  • The goal is to produce statistical procedures
    which still behave fairly well under deviations
    from the assumed model.

3
INTRODUCTION
  • Mosteller and Tukey define two types of
    robustness
  • resistance means that changing a small part, even
    by a large amount, of the data does not cause a
    large change in the estimate
  • robustness of efficiency means that the statistic
    has high efficiency in a variety of situations
    rather than in any one situation. Efficiency
    means that the estimate is close to optimal
    estimate given that we know what distribution
    that the data comes from. A useful measure of
    efficiency is
  • Efficiency (lowest variance feasible)/ (actual
    variance)
  • Many statistics have one of these properties.
    However, it can be difficult to find statistics
    that are both resistant and have robustness of
    efficiency.

4

5
MEAN VS MEDIAN
  •  

6

7
ROBUST MEASURE OF VARIABILITY
  •  

8
ROBUST MEASURE OF CORRELATION
  • The Pearson correlation coefficient is an optimal
    estimator for Gaussian data. However, it is not
    resistant and it does not have robustness of
    efficiency.
  • The percentage bend correlation estimator,
    discussed in Shoemaker and Hettmansperger and
    also by Wilcox, is both resistant and robust of
    efficiency.
  • When the underlying data are bivariate normal,
    ?pb gives essentially the same values as
    Pearsons ?. In general, ?pb is more robust to
    slight changes in the data than ?.

9
Example
  • We use a simple dataset from Wright and London
    (2009) where we are interested whether the length
    and heat of a chile are related. The length was
    measured in centimeters, the heat on a scale from
    0 (for sissys) to 11 (nuclear).
  • head(chile)
  • name length heat
  • 1 Afric 5.00 5.0
  • 2 Aji 7.50 7.0
  • 3 Aji_A 11.43 7.5
  • 4 Aji_C 3.81 9.0
  • 5 Aji_F 6.00 2.0
  • 6 Aji_L 6.35 7.0
  • library(vioplot)
  • attach(chile)
  • plot(length, heat, xlimc(0,32), ylimc(0,11))
  • vioplot(length, col"red", at0, horizontalTRUE,
    addTRUE)
  • vioplot(heat, col"blue", at0, horizontalFALSE,
    addTRUE
  • with(chile, pbcor(length, heat))
  • Call
  • pbcor(x length, y heat)
  • Robust correlation coefficient -0.3785

10
ROBUST MEASURE OF CORRELATION
  • A second robust correlation measure is the
    Winsorized correlation ?w, which requires the
    specification of the amount of Winsorization.
  • The computation is simple it uses Persons
    correlation formula applied on the Winsorized
    data.

11
Example
  • In a study on the effect of consuming alcohol,
    the number hangover symptoms were measured for
    two independent groups, with each subject
    consuming alcohol and being measured on three
    different occasions. One group consisted of sons
    of alcoholics and the other one was a control
    group. Here we are interested in the Winsorized
    correlations across the three time points for the
    participants in the alcoholic group
  • library(WRS2)
  • library(reshape)
  • hangctr lt- subset(hangover, subset group
    "alcoholic")
  • hangctr
  • symptoms group time id
  • 61 0 alcoholic 1 21
  • 62 0 alcoholic 1 22
  • 63 0 alcoholic 1 23
  • 64 0 alcoholic 1 24
  • 65 0 alcoholic 1 25
  • 66 0 alcoholic 1 26
  • 67 0 alcoholic 1 27
  • 0 alcoholic 1 28
  • hangwide lt- cast(hangctr, id time, value
    "symptoms"),-1
  • hangwide
  • 1 2 3
  • 1 0 2 1
  • 2 0 0 0

Cast a molten data frame into the reshaped or
aggregated form you want
12
  • winall(hangwide)
  • Call
  • winall(x hangwide)
  • Robust correlation matrix
  • 1 2 3
  • 1 1.0000 0.2651 0.4875
  • 2 0.2651 1.0000 0.6791
  • 3 0.4875 0.6791 1.0000
  • p-values
  • 1 2 3
  • 1 NA 0.27046 0.03935
  • 2 0.27046 NA 0.00284
  • 3 0.03935 0.00284 NA

13
ORDER STATISTICS AND ROBUSTNESS
  • Ordered statistics and their functions are
    usually somewhat robust (e.g. median, MAD, IQR),
    but not all ordered statistics are robust (e.g.
    X(1), X(n), RX(n)? X(1).

14

15
  • REMARK mean, a-trimmed mean and a-Winsorized
    mean, median are particular cases of L-statistics
  • L-statistics linear combination of order
    statistics.
  • Classical estimators are highly influenced by the
    outlier
  • Robust estimate computed from all observations is
    comparable with the classical estimate applied to
    non-outlying data
  • How to compare robust estimators?

16
(Standardized) sensitivity curve (SC)
  •  

17
Sensitivity curve (example)
  • Data ??10??1,,??10 are the natural logs of
    the annual incomes of 10 people.
  • 9.52 9.68 10.16 9.96 10.08 9.99 10.47 9.91 9.92
    15.21
  • .y9 consist of the 9 regular observations

18
(Finite sample) breakdown point
  • Given data set with nobs.
  • If replace m of obs. by any outliers and
    estimator stays in a bounded set, but doesn't
    when we replace (m1), the breakdown point of the
    estimator at that data set is m/n.
  • breakdown point of the mean 0

19
(Finite sample) breakdown pointof the median
  • ?? is even
  • ?? is odd

20
INFLUENCE FUNCTION (Hampel, 1974)
  •  

21
INFLUENCE FUNCTION (IF)

22
 

23
PLOT OF THE INFLUENCE FUNCTION AT N(0,1)

24
SC and IF
  • IF small fraction e of identical outliers
  • SC fraction of contamination is 1/??

25
M-ESTIMATORS
  •  

26
M-ESTIMATORS
  •  

27
M-ESTIMATORS
  •  

28
M-ESTIMATORS
  •  

29
M-ESTIMATOR
  • When an estimator is robust, it may be inferred
    that the influence of any single observation is
    insufficient to yield any significant offset.
    There are several constraints that a robust
    M-estimator should meet
  • 1. The first is of course to have a bounded
    influence function.
  • 2. The second is naturally the requirement of the
    robust estimator to be unique.

30

31

32
  • Briefly we give a few indications of these
    functions
  • L2 (least-squares) estimators are not robust
    because their influence function is not bounded.
  • L1 (absolute value) estimators are not stable
    because the ? -function x is not strictly
    convex in x. Indeed, the second derivative at x0
    is unbounded, and an indeterminant solution may
    result.
  • L1?L2 estimators reduce the influence of large
    errors, but they still have an influence because
    the influence function has no cut off point.

33
EXAMPLES OF M-ESTIMATORS
  • The mean corresponds to ?(x) x2, and the median
    to ?(x) x. (For even n any median will solve
    the problem.) The function
  • corresponds to metric trimming and large outliers
    have no influence at all. The function
  • is known as metric Winsorizing2 and brings in
    extreme observations to µc.

34
EXAMPLES OF M-ESTIMATORS
  • The corresponding -log f is
  • and corresponds to a density with a Gaussian
    center and double-exponential tails. This
    estimator is due to Huber.

35
EXAMPLES OF M-ESTIMATORS
  • Tukeys biweight has
  • where denotes the positive part of. This
    implements soft trimming. The value R 4.685
    gives 95 efficiency at the normal.
  • Hampels ? has several linear pieces,

for example, with a 2.2s, b 3.7s, c 5.9s.
36

37
ROBUST REGRESSION
  • Robust estimation procedures dampen the influence
    of outlying cases, as compared to ordinary LSE,
    in an effort to provide a better fit for the
    majority of cases.
  • LEAST ABSOLUTE RESIDUALS (LAR) REGRESSION
    Estimates the regression coefficients by
    minimizing the sum of absolute deviations of Y
    observations from their means
  • Since absolute deviations rather than squared
    ones are involved, LAR places less emphasis on
    outlying observations than does the method of LS.
    Residuals ordinarily will not sum to 0. Solution
    for estimated coefficients may not be unique.

38
ROBUST REGRESSION
  • ITERATIVELY REWEIGHTED LEAST SQUARES (IRLS)
    ROBUST REGRESSION It uses weighted least squares
    procedure.
  • This regression uses weights based on how far
    outlying a case is, as measured by the residual
    for that case. The weights are revised with each
    iteration until a robust fit has been obtained.

39
ROBUST REGRESSION
  • LEAST MEDIAN OF SQUARES (LMS) REGRESSION
  • Other robust regression methods Some involve
    trimming extreme squared deviations before
    applying LSE, others are based on ranks. Many of
    the robust regression procedures require
    intensive computing.

40
EXAMPLE
  • This data set gives n 24 observations about the
    annual numbers of telephone calls made (calls, in
    millions of calls) in Belgium in the last two
    digits of the year (year) see Rousseeuw and
    Leroy (1987), and Venables and Ripley (2002). As
    it can be noted in Figure there are several
    outliers in the y-direction in the late 1960s.

41
  • Let us start the analysis with the classical OLS
    fit.
  • library(MASS)
  • attach(phones)
  • plot(year,calls)
  • fit.ols lt- lm(callsyear)
  • summary(fit.ols,corF)
  • Residuals
  • Min 1Q Median 3Q Max
  • -78.97 -33.52 -12.04 23.38 124.20
  • Coefficients
  • Estimate Std. Error t value Pr(gtt)
  • (Intercept) -260.059 102.607 -2.535 0.0189
  • year 5.041 1.658 3.041 0.0060
  • ---
  • Signif. codes 0 0.001 0.01 0.05
    . 0.1 1
  • Residual standard error 56.22 on 22 degrees of
    freedom
  • Multiple R-squared 0.2959, Adjusted
    R-squared 0.2639

42
  • abline(fit.olscoef)
  • par(mfrowc(1,4))
  • plot(fit.ols,12)
  • plot(fit.ols,4)
  • hmat.p lt- hat(model.matrix(fit.ols))
  • h.phone lt- hat(hmat.p)
  • cook.d lt- cooks.distance(fit.ols)
  • plot(h.phone/(1-h.phone),cook.d,xlab"h/(1-h)",yla
    b"Cook distance")

43
  • In order to take into account of observations
    related to high values of the residuals, i.e. the
    outliers in the late 1960s, consider a robust
    regression based on Huber-type estimates
  • fit.hub lt- rlm(callsyear,maxit50)
  • fit.hub2 lt- rlm(callsyear,scale.est"proposal
    2")
  • summary(fit.hub,corF)
  • Residuals
  • Min 1Q Median 3Q Max
  • -18.314 -5.953 -1.681 26.460 173.769
  • Coefficients
  • Value Std. Error t value
  • (Intercept) -102.6222 26.6082 -3.8568
  • year 2.0414 0.4299 4.7480
  • Residual standard error 9.032 on 22 degrees of
    freedom
  • summary(fit.hub2,corF)
  • Residuals
  • Min 1Q Median 3Q Max
  • -68.15 -29.46 -11.52 22.74 132.67
  • Coefficients

44
  • gt summary(rlm(callsyear, psipsi.bisquare),
    corF)
  • Residuals
  • Min 1Q Median 3Q Max
  • -1.6585 -0.4143 0.2837 39.0866 188.5376
  • Coefficients
  • Value Std. Error t value
  • (Intercept) -52.3025 2.7530 -18.9985
  • year 1.0980 0.0445 24.6846
  • Residual standard error 1.654 on 22 degrees of
    freedom

45
  • From the results and also from THE PREVIOUS PLOT,
    we note that there are some differences with the
    OLS estimates, in particular this is true for the
    Huber-type estimator with MAD.
  • Consider again some classic diagnostic plots
    about the robust fit the plot of the observed
    values versus the fitted values, the plot of the
    residuals versus the fitted values, the normal
    QQ-plot of the residuals and the fit weights of
    the robust estimator. Note that there are some
    observations with low Huber-type weights which
    were not identified by the classical Cooks
    statistics.
Write a Comment
User Comments (0)
About PowerShow.com