Introduction to R - PowerPoint PPT Presentation

1 / 62
About This Presentation
Title:

Introduction to R

Description:

Get the right tool for the job. There might be better options than Excel. Where to get R ... R Seek (specific R search engine) http://www.rseek.org/ R Wiki ... – PowerPoint PPT presentation

Number of Views:31
Avg rating:3.0/5.0
Slides: 63
Provided by: tommyga
Category:

less

Transcript and Presenter's Notes

Title: Introduction to R


1
Introduction to R
Fish 507 F Lecture 1
2
Introduction
  • Course website
  • http//students.washington.edu/gtommy/FISH507F/
  • Syllabus
  • Introductions

3
Recommended reading
  • An Introduction to R (R Development Core Team)
  • http//cran.r-project.org/doc/manuals/R-intro.pdf
  • Chapter 1
  • Chapter 2 2.1-2.3 / 2.8
  • Chapter 5 5.4
  • Chapter 6 6.2

4
What is R?
  • R is a language and environment for statistical
    computing, graphics and much more
  • It is a (open source) GNU project which is
    similar to the S language and environment
    developed at Bell Laboratories (formerly ATT,
    now Lucent Technologies) by John Chambers.)
  • R can be considered as a different implementation
    of S with more flexibility and power gained from
    contributions by other users

5
What is R?
  • an effective data handling and storage facility
  • a large, coherent, integrated collection of tools
    for data analysis
  • graphical facilities for data analysis and
    display either on-screen or on hardcopy
  • a well-developed, simple and effective
    programming language including traditional
    statements such as conditionals, loops,
    user-defined functions, and input and output
    facilities (Covered in FISH 507 G).

6
Get the right tool for the job
  • There might be better options than Excel

7
Where to get R
  • The R-project web site
  • http//www.r-project.org
  • The program can be downloaded from any one of the
    official mirrors of CRAN
  • http//cran.r-project.org
  • Download the compiled binary code for your
    operating system
  • See supplemental material on website describing
    how to download and install R

8
What can R do?
  • R provides a comprehensive set of statistical
    analysis techniques
  • Classical statistical tests
  • Linear and nonlinear modeling
  • Time-series analysis
  • Classification cluster analysis
  • Spatial statistics
  • Bayesian statistics
  • . . . . any statistical technique you use is
    likely built into R or a user contributed package

9
What can R do?
  • Contributed Packages are a salient feature of R
  • The community of users contributes these
    packages.
  • 1851 Contributed Packages
  • New packages are continuously being added

10
What can R do?
  • Publication-quality plots can be produced
  • Many default graphing choices
  • The user retains full control
  • of the graphics
  • Even rudimentary plots like
  • histograms can be made
  • vibrant and exciting

11
Learning R
  • Just like with other computing languages, the
    initial learning curve can be steep, but there
    are myriads of help files, online sources, books
    and teachers.
  • The benefits of becoming fluent can be very
    rewarding
  • Be patient and creative!

12
R Reference Material
  • No required/recommended textbook in this class
  • Intro to R (PDF available from help menu)
  • Many books to reference
  • Data Analysis and Graphics Using R, 2nd ed.
    (Maindonald Braun)
  • The R Book (Crawley)
  • A Primer of Ecology with R (Stevens)
  • R Ref card
  • http//cran.r-project.org/doc/contrib/Short-refcar
    d.pdf

13
(No Transcript)
14
Online Reference Material
  • R Website
  • http//www.r-project.org/
  • R Seek (specific R search engine)
  • http//www.rseek.org/
  • R Wiki
  • http//wiki.r-project.org/rwiki/doku.php
  • The Ecological Detective (Hilborn and Mangel)
  • http//wiki.r-project.org/rwiki/doku.php?idguides
    tutorialsecological_detective

15
Help within R?
  • Searching help
  • help.search(logarithm)
  • Finding functions
  • gt apropos("log")
  • 1 ".__C__logical" ".__C__logLik"
    ".__M__Logicbase"
  • 4 ".__M__logLikstats" ".__T__Logicbase"
    ".__T__logLikstats" 7 "as.data.frame.logical"
    "as.logical" "dlogis"
  • . . . . .
  • Getting help for a function
  • help(log)
  • ?log

This font means this is an R command
16
?log
  • The help file is broken down into several
    components. This information can be dense, but
    is very useful.

Discover other functions
17
Using R Hands-on Introduction
Prompt
18
R workspaces
  • All analyses can be saved in an R workspace at
    the end of a session
  • I usually do not do this
  • The location of the workspace should be specified
    at the beginning of an R session

19
(No Transcript)
20
R Session Workspaces
  • setwd() specifies the working directory of the
    current R session
  • This is useful for saving the workspace in a
    specific location and this command will also be
    useful in several other contexts
  • setwd(C/gnu/home)
  • Alternatively
  • setwd(C\\gnu\\home)
  • To find the current working directory
  • gt getwd()
  • 1 "C/gnu/home"

Note that the slashes are opposite as those
in windows explorer (C\gnu\home)
21
Scripts
  • Reproducible work (Scientific method)
  • Handy for rerunning similar analysis later
  • Save your scripts !
  • e.g. Lab1.R

All saved scripts should have .R extension
22
(No Transcript)
23
Scripts
  • Normal keyboard shortcuts apply
  • Ctrl C copy
  • Ctrl H find and replace . . . .
  • Right click a line of code and click Run line or
    selection
  • Ctrl R
  • This also works when highlighting a chunk of code

24
Tinn-R Text Editor Program
  • Free basic code editor for R
  • http//www.sciviews.org/Tinn-R/

25
New Script
26
(No Transcript)
27
Start R in Tinn-R
28
Press R Send line to send commands to R
Type commands up here
29
Some simple R Commands
  • Good programmers always insert comments into
    their code
  • The sign denotes a comment. All subsequent
    entries are not interpreted by R
  • This is a comment
  • Rule of thumb
  • Always use more comments than you think are
    necessary !
  • We tend to forget a lot

30
Some Simple R Commands
  • gt 22
  • 1 4
  • gt 22
  • 1 4
  • gt 2(11)
  • 1 4

Result
Grouping and ordering
gt 2 1 1 1 3
31
Some simple R commands
gt exp(0) 1 1 gt log(2.718282) 1 1 gt
log(2.718282, base 10) 1 0.4342945 gt
log(2.718282 , base 10) 1
0.4342945
Optional argument Whats the default ?
Incomplete command
32
In-class Exercise 1
  • Use R to compute the following
  • 1 2(3 4)
  • log(43 321)

33
Object Oriented Language
  • R is an object oriented programming (OOP)
    language
  • OOP is very powerful and we will rarely utilize
    common object oriented features in this class
  • For simplicity think of every programming outcome
    having the possibility to be stored as an
    accessible object
  • Numbers
  • Tables
  • Matrices
  • Plots
  • Statistical output
  • Objects are stored in the directory you have told
    R to work in

34
Assigning Values
  • myObject lt- log(2.718282)
  • myObject log(2.718282)
  • myObject log(2.718282, base 10)

Assign the value of log(2.718282) to a new object
named myObject
can be used instead of lt- However it is harder
to read with other arguments
Optional argument
35
Assigning values
  • Characters may also be assigned to objects
  • myName lt- "Tommy"
  • myName lt- 'Tommy
  • gt ( myName lt- "Tommy Garrison" )
  • 1 "Tommy Garrison"

Single or double quotes may be used
Spaces can enter in character variables
Note that putting a command in ( ) will display
the assigned object in the R Console
36
Viewing objects
  • There are several ways to display an object that
    has just been assigned. I prefer ( lt- )
  • gt print(myObject)
  • 1 0.4342945
  • gt myObject
  • 1 0.4342945
  • gt myObject 10
  • 1 4.342945

print is a useful function within functions (FISH
507 G)
manipulate an object
37
Removing objects
  • To list the objects in the current workspace
  • gt ls()
  • 1 "myName" "myObject
  • To remove an object
  • gt rm(myObject)
  • gt ls()
  • 1 "myName
  • To remove all objects
  • rm(list ls())

Be very careful when doing this useful when
starting a new analysis in the same R session
38
Data types
  • Data types describe how objects are stored in a
    computers memory
  • When storing an object in R, you do not need to
    specify the data type
  • Common data types include
  • Integers
  • Booleans (True / False)
  • Floating point numbers (double)
  • Characters
  • An objects type is not always obvious
    (particularly when reading in data from external
    sources) and knowing exactly what it is can be
    very useful
  • myObject lt- log(2.718282)

39
Data types
  • gt myObject
  • 1 1
  • gt mode(myObject)
  • 1 "numeric"
  • gt is.numeric(myObject)
  • 1 TRUE
  • gt typeof(myObject)
  • 1 "double"
  • myObject lt- as.integer(myObject)

is. functions useful for determining modes of
objects returned by R functions
Memory is cheap! R will always store a numeric
object as a double unless specified. as.
functions coerce the objects data type or mode
40
Data types
  • Similar functions can be applied to character
    variables
  • gt is.character(myObject)
  • 1 FALSE
  • gt is.character(myName)
  • 1 TRUE
  • gt typeof(myName)
  • 1 "character
  • Character and numeric (double) storage modes will
    be the most common encountered in this class

41
The following are equivalent
  • gt c(1,2,3,4,5,6,7,8,9,10)
  • 1 1 2 3 4 5 6 7 8 9 10
  • c(110)
  • (110)
  • 110
  • seq(from1, to10, by1)
  • gt seq(1, 10, 1)

concatenate combine elements to form a vector
from to (by 1)
use seq function for more flexibility
by default R expects arguments in this order
42
The following are equivalent
  • seq(1,10)
  • seq(to10)
  • seq(length.out10)
  • seq(l10)

the default argument for by is 1
the default argument for from is 1
generate a vector with ten elements
R is smart ! you dont need to specify the full
argument
43
The following are NOT equivalent
  • gt x lt- 13 x
  • 1 1 2 3
  • gt rep(x, length10)
  • 1 1 2 3 1 2 3 1 2 3 1
  • gt rep(x, times2)
  • 1 1 2 3 1 2 3
  • gt rep(x, each2)
  • 1 1 1 2 2 3 3

put two commands on the same line
rep replicates the values in x by a certain
argument default is times
44
Operations on vectors
  • Operations on vectors work element wise
  • gt log(x)
  • 1 0.0000000 0.6931472 1.0986123
  • gt x 1
  • 1 2 3 4
  • gt x 2
  • 1 2 4 6

45
Operations on vectors
  • gt y lt- 46
  • gt x y
  • 1 5 7 9
  • gt x - y
  • 1 -3 -3 -3
  • gt x / y
  • 1 0.25 0.40 0.50
  • gt x y
  • 1 4 10 18

46
In-class Exercise 2
  • Create vectors using seq() and rep() and only c()
    if necessary
  • Positive integers from 1 to 99
  • Odd integers between 1 and 99
  • The numbers 1,1,1,2,2,2,3,3,3
  • The numbers 1,2,3,4,5,4,3,2,1,0
  • The fractions 1, 1/2,1/3, .., 1/10
  • The numbers 1,8,27,64,125,216

47
Entering Data
  • Data is usually read in from and .txt or .csv
    file, but can be entered manually
  • co2 lt- scan()
  • Famous Mauna Loa CO2 data
  • co2 lt- c(316,316.91, 317.63, 318.46, . . .

Hit enter twice to stop
48
Useful arithmetic functions
  • gt min(co2)
  • 1 316
  • gt max(co2)
  • 1 377.43
  • gt mean(co2)
  • 1 342.4361
  • gt median(co2)
  • 1 340.52
  • gt var(co2)
  • 1 348.3905
  • gt sd(co2)
  • 1 18.66522
  • gt range(co2)
  • 1 316.00 377.43

gt quantile(co2) 0 25 50
75 100 316.0000 325.8175 340.5200 356.9150
377.4300
49
The length function
  • length returns the number of elements in a vector
  • Very useful for flexible programming
  • gt length(co2)
  • 1 46
  • gt nyears lt- length(co2)
  • gt years lt- seq(from1959, lengthnyears)
  • 1 1959 1960 1961 . . . . 2002 2003 2004

50
Data frames
  • Its convenient to store data as a collection of
    variables
  • gt co2.df lt- data.frame(years,co2)
  • head() is quick way to view the first few
    observations of a data frame
  • gt head(co2.df)
  • years co2
  • 1 1959 316.00
  • 2 1960 316.91
  • 3 1961 317.63
  • 4 1962 318.46
  • 5 1963 319.02
  • 6 1964 319.52

values must be of same length
column names
row names
51
Data frames
  • Extract the names of data frame
  • gt names(co2.df)
  • 1 year" co2"
  • Modify the names of the data frame
  • gt names(co2.df) lt- c("Year",co.2")
  • gt head(co2.df)
  • Year co.2
  • 1 1959 316.00
  • 2 1960 316.91
  • This can also be done within the data.frame()
    statement by assigning the variable names to the
    single variable object
  • co2.df lt- data.frame(Yearyears, co.2co2)

52
Data frames
  • Whats stored in your working directory?
  • gt Year
  • Error object 'Year' not found
  • gt ls()
  • 1 "co2" "co2.df" "myName" . . . .
  • attach() attaches variables in the R search path.
    Be careful confusing names with other previous
    named variables
  • gt attach(co2.df)
  • gt Year
  • 1 1959 1960 1961 .
  • gt co2.dfYear
  • 1 1959 1960 1961 .

extracts columns from a data frame
53
Data frames
  • If youre done working with a particular data
    frame and attached it, you should always detach
    it
  • gt detach(co2.df)
  • gt Year
  • Error object 'Year' not found
  • The safest approach is use the operator to
    extract variables from data frames
  • gt co2.dfYear
  • 1 1959 1960 1961 . . . .

54
Matrices
  • Matrices represent another way to store
    collections of variables. Whereas data frames
    can store objects of multiple types (numeric,
    character, . . . . ), a matrix must be of a
    single type or R will coerce variables
    accordingly
  • The matrix() statement
  • matrix(data NA,
  • nrow 1, ncol 1,
  • byrow FALSE)

A vector needs to be given or R will generate NA
(Not Available)
Only one of these arguments needs to be
specified. If both are specified then
length(data) nrow ncol
R will fill the matrix by column unless
byrowTRUE is specified
gt matrix(14, ncol2) ,1 ,2 1, 1
3 2, 2 4
55
Matrices
  • Create a matrix with the co2 data
  • gt co2.mat lt- matrix(c(years, co2), ncol2,
    nrowlength(years))
  • gt head(co2.mat, n3)
  • ,1 ,2
  • 1, 1959 316.00
  • 2, 1960 316.91
  • 3, 1961 317.63
  • A fast way to create matrices is by binding the
    columns of two (or more) vectors (or matrices)
  • gt co2.mat lt- cbind(years,co2)
  • gt head(co2.mat, n3)
  • years co2
  • 1, 1959 316.00
  • 2, 1960 316.91
  • 3, 1961 317.63

56
Matrices
  • rbind() forms a matrix by binding two (or more)
    rows of vectors (or matrices)
  • gt co2.row.mat lt- rbind(years,co2)
  • gt t(co2.row.mat)
  • years co2
  • 1, 1959 316.00
  • 2, 1960 316.91
  • 3, 1961 317.63
  • ...
  • Common functions to extract dimensions
  • gt dim(co2.mat)
  • 1 46 2
  • gt dim(co2.row.mat)
  • 1 2 46
  • gt nrow(co2.mat)
  • 1 46
  • gt ncol(co2.mat)
  • 1 2

t - transpose a matrix
57
apply function
  • Very flexible function
  • apply(X, MARGIN, FUN, ...)
  • X matrix
  • MARGIN 1 rows / 2 columns
  • FUN an R function
  • . . . . additional arguments to that function
  • gt apply(co2.mat,2,mean)
  • years co2
  • 1981.5000 342.4361
  • What does this return ?
  • apply(co2.row.mat,1,mean)

apply to the columns of co2.mat the mean
58
Arrays
  • N-dimensional matrices
  • gt co2.array lt- array(datac(years,co2),
    dimc(nyears,2))
  • gt head(co2.array, n3)
  • ,1 ,2
  • 1, 1959 316.00
  • 2, 1960 316.91
  • 3, 1961 317.63
  • A matrix is a special case of an array, but a
    data frame is not
  • gt is.array(co2.array)
  • 1 TRUE
  • gt is.array(co2.mat)
  • 1 TRUE
  • gt is.array(co2.df)
  • 1 FALSE

create an array of dimension nyears x 2
59
Arrays
  • 3-dimensional array
  • gt array(124, dimc(3,4,2))
  • , , 1
  • ,1 ,2 ,3 ,4
  • 1, 1 4 7 10
  • 2, 2 5 8 11
  • 3, 3 6 9 12
  • , , 2
  • ,1 ,2 ,3 ,4
  • 1, 13 16 19 22
  • 2, 14 17 20 23
  • 3, 15 18 21 24

R will fill in an array by column, row and then
higher dimensions
60
Lists
  • Most flexible structure
  • Each element can be of a different mode and
    varying length
  • gt description lt- "Atmospheric CO2 concentrations
    (ppmv)
  • derived from in situ air
    samples
  • collected at Mauna Loa
    Observatory,
  • Hawaii. Source C.D. Keeling
  • co2.list lt- list(Metadescription, Yearsnyears,
    dataco2.mat)

61
Lists
  • gt co2.list
  • Meta
  • 1 "Atmospheric CO2 concentrations (ppmv)
    derived from in situ air samples collected at
    Mauna Loa Observatory, Hawaii. Source C.D.
    Keeling"
  • Years
  • 1 46
  • data
  • years co2
  • 1, 1959 316.00
  • 2, 1960 316.91
  • 3, 1961 317.63
  • . . .
  • gt co2.listYears
  • 1 46

operation works for lists
62
In-class Exercise 3
  • Look up help on the summary command and try it
    out on the co2 object. Now replicate this output
    without using the summary command and include it
    in co2.list as the 4th element in the list.
  • You may or may not need some extra
    functions/operations that we have not discussed.
    Use the help files if necessary.
Write a Comment
User Comments (0)
About PowerShow.com