ETSUMath Department Colloquium Got - PowerPoint PPT Presentation

1 / 28
About This Presentation
Title:

ETSUMath Department Colloquium Got

Description:

... to calculate the Confidence interval for COD we were defining in the paper ... cod=tau/median(x);a1=round(((n 1)/2)-sqrt(n)); a2=n-a1 1; ... – PowerPoint PPT presentation

Number of Views:59
Avg rating:3.0/5.0
Slides: 29
Provided by: edith5
Category:

less

Transcript and Presenter's Notes

Title: ETSUMath Department Colloquium Got


1
ETSU-Math Department ColloquiumGot ?
  • Edith Seier
  • 10/31/05
  • If you wish go to
  • http//www.etsu.edu/math/seier/Rtalk.htm

2
  • What is R?
  • What can we do with R?
  • How to work with R
  • Who can use R?
  • Who really needs R?
  • Learning R.
  • Should we teach R at ETSU?

3
What is R ?
  • Its developers (Robert Gentleman and Ross,
    Statistics Department of the University of
    Auckland) define it as an environment in which
    statistical techniques are implemented. The
    programming language R started as a free teaching
    version of the S language (developed by Chambers,
    Becker and Wilks at Bell Laboratories), it is
    considered a dialect of S and in 1995 was put in
    the public domain. (The commercial version of S
    of use in the academia is S-Plus)
  • New packages of functions are being created and
    its development is an international effort. All
    the information about the R project is available
    from

  • http//www.r-project.org
  • Its use in advanced statistical topics,
    bioinformatics and biostatistics is constantly
    growing and
  • It is free!!!
  • Notes.-
  • There are versions for Mac, Windows Linux
  • It can be linked to C or Fortran for
    computationally intensive tasks
  • The language has some affinities to Lisp
    (Lisp-Stat?) and APL

4
Where can we get R?
  • The web page shown at the right is
  • http//www.r-project.org
  • There we can find
  • The basic program
  • Additional packages of functions
  • Manuals in several languages

5
What can we do with R?
  • Basic statistical calculations and graphs
  • Write functions that are not included in
    commercial statistical software.
  • Use it to teach upper division/graduate courses
    when we want the students to be aware of the
    steps needed to solve a problem instead of using
    a totally menu (point and click) menu driven
    program. (at least some times)
  • To use packages written in R for specialized
    areas such as spatial statistics, survival
    analysis, microarray data analysis, generalized
    linear models, etcetera.

6
How to work with R
  • Using R
  • To start data entry
  • Operations, transformations.
  • Descriptive statistics graphs
  • More graphs and calculations
  • Writing our own functions in R
  • Interactive Statistics with R (Motoya Machida
    TNTechU)
  • Using packages in R

7
Starting and entering data
  • 1)Once you have downloaded R from
  • http//www.r-project.org
  • click on the icon
  • You can
  • Use data sets that come with R, data()
  • Type your own data
  • Read ascii data files
  • Import data from other software
  • Generate random numbers and sequences

8
Typing data in R
  • Decide the name of the object where you will
    store the data
  • name lt- c( , , , , )
  • Example Cigarette consumption per capita in 1930
    (Freeman et al (1970) Statistics )
  • Categorical data
  • countrylt-c("Australia" , "Canada", Denmark"
    ,"Finland", England" , "Island" , Netherlands"
    , Norway",
  • Sweden", "Switzerland", USA" )
  • Numerical data
  • cigarettelt-c(480,500,380,1100,1100,230,490,250,300
    ,510,1300)
  • Labels for the data
  • If you want to put labels to the data use the
    command names.
  • names(cigarette)lt-country
  • Now type cigarettes to see the data, in the
    screen you will see
  • Australia Canada Denmark Finland England
    Island Netherland
  • 480 500 380 1100
    1100 230 490
  • Norway Sweden Switzerland USA
  • 250 300 510 1300

9
Generating data
  • a) Sequences
  • ilt-seq(0,10,by2) will create the sequence
  • 0,2,4,6,8,10
  • b) Evaluating functions. First we need to create
    the argument of the function
  • tlt-seq(0,60,by0.01)
  • Y lt-cos(t)
  • c) Generating random numbers
  • xlt-rnorm(500)

10
Operations, transformations
  • The arithmetic operations are done with the usual
    symbols - / . For example, if we want to
    convert weights in pounds into kilos
    weightklweightlb/2.2
  • Some useful transformations are
  • exp( ) , for power
  • log( ) for natural logarithms
  • log10( ) for base 10 logarithms
  • Trigonometric functions
  • sin(), cos(), tan(), asin(), atan(), acos()

11
Descriptive statisticstasamortlt-c(180,150,170,350
,460,60,240,90,110,250,200)
  • gt length(tasamort)
  • 1 11
  • gt sum(tasamort)
  • 1 2260
  • gt mean(tasamort)
  • 1 205.4545
  • gt median(tasamort)
  • 1 180
  • gt sd(tasamort)
  • 1 117.2488
  • gt var(tasamort)
  • 1 13747.27

gt quantile(tasamort,0.5) 50 180 gt
min(tasamort) 1 60 gt max(tasamort) 1
460 summary(tasamort) Min. 1st Qu. Median
Mean 3rd Qu. Max. 60.0 130.0 180.0
205.5 245.0 460.0 cor(x,y)
12
Statistical graphs and plots of functions
hist(variable)
plot(cigarette,deathrate) abline(67.56 , 0.22844)
boxplot(variable)
13
  • Among the graphs for which R has commands are
  • Practically all the graphs from Multivariate
    Analysis
  • Mosaic graphs (for categorical variables)
  • Smoothing in regression
  • Time series plots
  • Plots from Microarray data analysis
  • If in the menu of R we click on Help, and then
    Search help and write plot, a list of graphs in
    R will appear.
  • In the next slide we have copied and pasted the
    plot displays from R .screenshots in
    http//www.r-project.org

14
  • R can be use to do statistical calculations
    related to
  • Test of hypotheses
  • Analysis of Variance and Covariance
  • Probability Distributions
  • Tests for two-way tables (Chi-cuadrado, McNemar
    etc.)
  • Multiple regression
  • Calculation of sample size
  • Logistic regression
  • Survival analysis
  • All these topics are explained in
  • Dalgaard, P.(2002)Introductory Statistics with
    R Springer Verlag.
  • There are also several manuals and other books on
    specific topics such as Linear Models,
    Bioinformatics, etcetera. Also some general
    methods books have instructions in R. For
    example
  • Heiberger Holland (2004) Statistical Analysis
    and Data Display, An Intermediate Course with
    Examples in S-Plus, R, and SAS. Springer Verlag
  • In http//www.r-project.org you can find manuals
    and tutorials that can be downloaded for free
    (not only in English but in some other languages)

15
Writing our own functions in R Example
calculating a confidence interval for the mean
absolute deviation
  • In a sample, MAD is calculated as the average of
    the distances of the values to the median.
  • In Bonett Seier (2003), Confidence Intervals
    for Mean Absolute Deviations The American
    Statistical Association, Vol 57 4 the following
    formula for the confidence interval for the
    population mean absolute deviation was derived

16

Once we have copied and pasted the function citau
into R, we simply write citau(x,1.96) To
calculate the CI for the data in x
17
  • In Bonett, D.G. and Seier, E. (2005) 
    Confidence Interval for a Coefficient of
    Dispersion in Non-normal Distributions.
    Biometrical Journal 47  (5) pp 1-5, we included
    the following program to calculate the Confidence
    interval for COD we were defining in the paper
  • CODCIlt-function(x,z)mdmedian(x) mmean(x)
    vvar(x) taumean(abs(x-md))
  • del(m-md)/tau nlength(x) cn/(n-1)
    gamv/(tau2)
  • codtau/median(x)a1round(((n1)/2)-sqrt(n))
    a2n-a11
  • sxsort(x) l1log(sxa1) u1log(sxa2)
    se1(u1-l1)/4 se2sqrt((gam(del2)-1)/n)
  • fmsqrt(1/(4n(se12)))covtm(m-md)/(2nfmtau)
    ksqrt(se12se22-2covtm)/(se1se2)
  • b1round((n1)/2-kzsqrt(n/4))
    b1round((n1)/2-kzsqrt(n/4)) b2n-b11
  • l1log(sxb1) u1log(sxb2)
    l2log(ctau)-kzse2 u2log(ctau)kzse2
  • ll1exp(l2-u1) ul1exp(u2-l1) cilt-
    c(ll1,cod,ul1) ci

18
The output of a function could be a plot.
Example periodogram
  • this function calculates the periodogram and
    displays its graph
  • perioplotlt-function(x)
  • adjxx-mean(x) substracts the mean of
    the series
  • tffft(adjx) calculates
    FFT nflength(tf) n2nf/21 decides
    the number of frequencies
  • pritflt-tfc(1n2) takes the elements of
    the FFT
  • intensitylt-(abs(pritf2))/nf calculates the
    ordinates of periodogram
  • nyquist1/2 pfreqlt-seq(0,nf/2,by1)
    preparation for frequencies
  • freqlt-pfreq/(length(pfreq)-1)nyquist
    calculates frequencies
  • plot(freq,intensity,type"l")
  • After reading a data set, for example sunspots,
    we type
  • perioplot(sunspots)

19
Things I am still avoiding in R
  • Loops (it can be done..) but there are ways to
    get around that sometimes by using
    matrices.Example

estspeclt-function(au) Mlt-length(au)
counts how many autocorrelations (M) we
read jlt-seq(1,M,by1) creates sub-indexes
j1M lam0.5(1cos(jpi/M)) calculates
Tukeys weights wlt-seq(0,pi,bypi/50)
calculates angular frequencies lact(lam)au
multiplies each weight by the corresponding
autocorrelation flt-function(j,w) cos(jw)
zlt-outer(j,w,f)
calculates cos j w for all values of w and
j szlt-lacz obtains the
sum of weights correlations cos
jw hlt-(1/(2pi))(12sz) calculates
h(w) plot(w,h ,type"l",mainEstimated spectral
density )
20
A user friendly version for teaching STATS
without writing commands
  • Interactive Statistics with R developed by
  • Motoya Machida, Math Department, Tennessee
    Technological University
  • http//www.math.tntech.edu/ISR/index.html

21
Packages in RExample Microarray data analysis
  • The package DNAMR and DNAMRWeb developed by
    J.Cabrera at Rutgers University can be found in.
    At the left there is a program
  • http//www.rci.rutgers.edu/cabrera/DNAMR/
  • The graph at the right was obtained with
    DNAMRWeb for clustering the most important genes
    for the Kahn data (each row is a gene and each
    column is a microarray)

22
Other Packages of functions written in R
  • nlme Linear and nonlinear
    mixed effects models
  • nnet Feed-forward Neural
    Networks and Multinomial
  • Log-Linear Models
  • rpart Recursive Partitioning
  • spatial Functions for Kriging and
    Point Pattern
  • Analysis
  • splines Regression Spline
    Functions and Classes
  • stats The R Stats Package
  • stats4 Statistical functions
    using S4 classes
  • survival Survival analysis,
    including penalised
  • likelihood.
  • tcltk Tcl/Tk Interface
  • tools Tools for Package
    Development
  • utils The R Utils Package
  • Packages in library 'C/PROGRA1/R/rw2001/library'
  • base The R Base Package
  • boot Bootstrap R (S-Plus)
    Functions (Canty)
  • class Functions for
    Classification
  • cluster Functions for clustering
    (by Rousseeuw et al.)
  • datasets The R Datasets Package
  • foreign Read Data Stored by
    Minitab, S, SAS, SPSS,
  • Stata, Systat, ...
  • graphics The R Graphics Package
  • grDevices The R Graphics Devices
    and Support for Colours
  • and Fonts
  • grid The Grid Graphics Package
  • KernSmooth Functions for kernel
    smoothing for Wand
  • Jones (1995)
  • lattice Lattice Graphics
  • MASS Main Package of Venables
    and Ripley's MASS
  • methods Formal Methods and
    Classes
  • mgcv GAMs with GCV smoothness
    estimation and GAMMs
  • by REML/PQL

23
Who can use R?
  • Anybody who wants to have a free statistical
    software at home or the office
  • Schools that can not afford to buy licenses of
    statistical software
  • When we teach courses outside campus in places
    without the software we have in campus for
    instance to teachers (I prefer R to Excel)
  • Statisticians to write programs to do
    calculations that are not included in commercial
    software

24
Who really needs R?
  • If we are in an area of Statistics that has been
    developed more in R than in commercial software
    (spatial statistics etc.)
  • People working in Bioinformatics, Microarray
    data analysis etc.

25
  • Will R push out commercial
  • statistical software?
  • Probably not in general but its use will increase
    in the academic environment.
  • Minitab is user friendly for classroom use in
    Intro STATS courses but R could be an option for
    schools that can not afford to have MTB.
  • People in social sciences are too used to SPSS.
  • SAS has very good data management options for
    large data sets so probably in the big
    corporations SAS will prevail but in the academic
    environment R could be an option for advanced
    courses (SAS is quite expensive) .
  • At least for a few years R probably will be used
    in environments with statistical sophistication
    and with some programming knowledge statistics
    departments of universities and research
    institutes until more user friendly versions are
    developed. Maybe the commercial software will
    take a turn more into the business world and R
    will take more of the academic/research world.

26
Should we teach R at ETSU?
  • Yes!! Maybe not in Math 1530 but we could
    introduce it in upper division courses. Why?
  • Students planning for a grad school in
    Statistics, Bioinformatics, Biostatistics,
    Biomath would benefit from it.
  • They would benefit of being able to work at home
    without buying or renting a statistical program.
  • They would get familiar with an object oriented
    programming language
  • Programming is good for you ? ! Why? Because you
    really have to understand a problem before
    writing a program to solve it

27
Learning R
  • There are several free manuals uploaded in the
    page of the R project
  • http//www.r-project.org
  • Books about R
  • Introductory Statistics with R Peter Dalgaard
  • Statistical Analysis and Data Display- An
    Intermediate Course with Examples in S-Plus, R
    and SAS
  • A Handbook of Statistical Analysis using
    R-Everitt Hothorn (to appear in 2006)
  • Bioinformatics for R Wiley 2006
  • There is a tutorial in
  • http//www.etsu.edu/math/seier/R.htm
  • In Module 1 we give the basic commands to do
    calculations and graphs, the reader can search
    for more information using the HELP of R.
  • In Module 2 we will learn how to write functions,
    i.e. we will learn how to program in R.
  • You are welcome to use it.

28
  • Now we will browse the manual. Then we can Open R
    ( it is already installed in Room 205)
  • see some demos
  • and try some of the commands in
  • http//www.etsu.edu/math/seier/commandsRtalk.doc
Write a Comment
User Comments (0)
About PowerShow.com