STT 430/530, Nonparametric Statistics - PowerPoint PPT Presentation

About This Presentation
Title:

STT 430/530, Nonparametric Statistics

Description:

Note that the graph of this ecdf is a step function that takes a step at each ... also notice that if all n data values are distinct then the step size is 1/n and ... – PowerPoint PPT presentation

Number of Views:19
Avg rating:3.0/5.0
Slides: 3
Provided by: darganfr
Category:

less

Transcript and Presenter's Notes

Title: STT 430/530, Nonparametric Statistics


1
STT 430/530, Nonparametric Statistics
  • The empirical cumulative distribution function
    (ecdf), F-hat(x), counts the fraction of
    observations less than or equal to x.
  • Note that the graph of this ecdf is a step
    function that takes a step at each observed data
    value also notice that if all n data values are
    distinct then the step size is 1/n and whenever
    there are k tied values, the next step is k/n
  • F-hat(x) is an estimate of the true c.d.f. - in
    fact,
  • E(F-hat(x))F(x) and
  • SD(F-hat(x)) sqrt((F-hat(x))(1-F-hat(x))/n) as
    you would expect for a binomial r.v. like
    F-hat(x) n of obs, pP(an obs. lt x)F(x)
  • We can use SAS to sketch a plot of the ecdf and
    compare it with several theoretical
    distributions. Of course, we are most interested
    in whether the data is following the Normal
    distribution, so I show you how to check for that
    one
  • proc capability
  • cdfplot sodium/normal(colorred)
  • This statement will do an ecdf and overlay a
    theoretical normal cdf with mean and sd estimated
    from the data.

2
  • Another important graph for checking normality of
    data is called a normal quantile plot . This
    plots the sorted data values against the
    corresponding normal quantile. That is,
  • first, sort the data from smallest to largest
  • second, for each data point find the ecdf (i.e.,
    the fraction of the data lt that point)
  • third, get the corresponding standard normal
    z-score for that fraction.
  • Try this SAS code to check it out (recall that
    the sodium values are already sorted from
    smallest to largest if they weren't, then you'd
    have to use PROC SORT and OUTPUT the sorted data
    to a SAS data set
  • fract_n_/40 zprobit(fract)
  • probit is the SAS function that returns the
    z-score corresponding to the cumulative
    probability under the standard normal curve
    between 0 and 1.
  • PROC UNIVARIATE PLOT will give you a normal
    quantile plot but not a very nice oneTry this
    code to make it better
  • proc capability
  • qqplot sodium/normal(mu76 sigma2.25)
  • This last option tells SAS to put in a reference
    line with mean76 and slope2.25 (I remembered
    these values from PROC UNIVARIATE output)
Write a Comment
User Comments (0)
About PowerShow.com