Title: Introduction to R
1Introduction to R
Fish 507 F Lecture 1
2Introduction
- Course website
- http//students.washington.edu/gtommy/FISH507F/
- Syllabus
- Introductions
3Recommended reading
- An Introduction to R (R Development Core Team)
- http//cran.r-project.org/doc/manuals/R-intro.pdf
- Chapter 1
- Chapter 2 2.1-2.3 / 2.8
- Chapter 5 5.4
- Chapter 6 6.2
4What is R?
- R is a language and environment for statistical
computing, graphics and much more - It is a (open source) GNU project which is
similar to the S language and environment
developed at Bell Laboratories (formerly ATT,
now Lucent Technologies) by John Chambers.) - R can be considered as a different implementation
of S with more flexibility and power gained from
contributions by other users
5What is R?
- an effective data handling and storage facility
- a large, coherent, integrated collection of tools
for data analysis - graphical facilities for data analysis and
display either on-screen or on hardcopy - a well-developed, simple and effective
programming language including traditional
statements such as conditionals, loops,
user-defined functions, and input and output
facilities (Covered in FISH 507 G).
6Get the right tool for the job
- There might be better options than Excel
7Where to get R
- The R-project web site
- http//www.r-project.org
- The program can be downloaded from any one of the
official mirrors of CRAN - http//cran.r-project.org
- Download the compiled binary code for your
operating system - See supplemental material on website describing
how to download and install R
8What can R do?
- R provides a comprehensive set of statistical
analysis techniques - Classical statistical tests
- Linear and nonlinear modeling
- Time-series analysis
- Classification cluster analysis
- Spatial statistics
- Bayesian statistics
- . . . . any statistical technique you use is
likely built into R or a user contributed package
9What can R do?
- Contributed Packages are a salient feature of R
- The community of users contributes these
packages. - 1851 Contributed Packages
- New packages are continuously being added
10What can R do?
- Publication-quality plots can be produced
- Many default graphing choices
- The user retains full control
- of the graphics
- Even rudimentary plots like
- histograms can be made
- vibrant and exciting
11Learning R
- Just like with other computing languages, the
initial learning curve can be steep, but there
are myriads of help files, online sources, books
and teachers. - The benefits of becoming fluent can be very
rewarding - Be patient and creative!
12R Reference Material
- No required/recommended textbook in this class
- Intro to R (PDF available from help menu)
- Many books to reference
- Data Analysis and Graphics Using R, 2nd ed.
(Maindonald Braun) - The R Book (Crawley)
- A Primer of Ecology with R (Stevens)
- R Ref card
- http//cran.r-project.org/doc/contrib/Short-refcar
d.pdf
13(No Transcript)
14Online Reference Material
- R Website
- http//www.r-project.org/
- R Seek (specific R search engine)
- http//www.rseek.org/
- R Wiki
- http//wiki.r-project.org/rwiki/doku.php
- The Ecological Detective (Hilborn and Mangel)
- http//wiki.r-project.org/rwiki/doku.php?idguides
tutorialsecological_detective
15Help within R?
- Searching help
- help.search(logarithm)
- Finding functions
- gt apropos("log")
- 1 ".__C__logical" ".__C__logLik"
".__M__Logicbase" - 4 ".__M__logLikstats" ".__T__Logicbase"
".__T__logLikstats" 7 "as.data.frame.logical"
"as.logical" "dlogis" - . . . . .
- Getting help for a function
- help(log)
- ?log
This font means this is an R command
16?log
- The help file is broken down into several
components. This information can be dense, but
is very useful.
Discover other functions
17Using R Hands-on Introduction
Prompt
18R workspaces
- All analyses can be saved in an R workspace at
the end of a session - I usually do not do this
- The location of the workspace should be specified
at the beginning of an R session
19(No Transcript)
20R Session Workspaces
- setwd() specifies the working directory of the
current R session - This is useful for saving the workspace in a
specific location and this command will also be
useful in several other contexts - setwd(C/gnu/home)
- Alternatively
- setwd(C\\gnu\\home)
- To find the current working directory
- gt getwd()
- 1 "C/gnu/home"
Note that the slashes are opposite as those
in windows explorer (C\gnu\home)
21Scripts
- Reproducible work (Scientific method)
- Handy for rerunning similar analysis later
- Save your scripts !
- e.g. Lab1.R
All saved scripts should have .R extension
22(No Transcript)
23Scripts
- Normal keyboard shortcuts apply
- Ctrl C copy
- Ctrl H find and replace . . . .
- Right click a line of code and click Run line or
selection - Ctrl R
- This also works when highlighting a chunk of code
24Tinn-R Text Editor Program
- Free basic code editor for R
- http//www.sciviews.org/Tinn-R/
25New Script
26(No Transcript)
27Start R in Tinn-R
28Press R Send line to send commands to R
Type commands up here
29Some simple R Commands
- Good programmers always insert comments into
their code - The sign denotes a comment. All subsequent
entries are not interpreted by R - This is a comment
- Rule of thumb
- Always use more comments than you think are
necessary ! - We tend to forget a lot
30Some Simple R Commands
- gt 22
- 1 4
- gt 22
- 1 4
- gt 2(11)
- 1 4
-
Result
Grouping and ordering
gt 2 1 1 1 3
31Some simple R commands
gt exp(0) 1 1 gt log(2.718282) 1 1 gt
log(2.718282, base 10) 1 0.4342945 gt
log(2.718282 , base 10) 1
0.4342945
Optional argument Whats the default ?
Incomplete command
32In-class Exercise 1
- Use R to compute the following
- 1 2(3 4)
- log(43 321)
-
-
33Object Oriented Language
- R is an object oriented programming (OOP)
language - OOP is very powerful and we will rarely utilize
common object oriented features in this class - For simplicity think of every programming outcome
having the possibility to be stored as an
accessible object - Numbers
- Tables
- Matrices
- Plots
- Statistical output
- Objects are stored in the directory you have told
R to work in
34Assigning Values
- myObject lt- log(2.718282)
- myObject log(2.718282)
- myObject log(2.718282, base 10)
Assign the value of log(2.718282) to a new object
named myObject
can be used instead of lt- However it is harder
to read with other arguments
Optional argument
35Assigning values
- Characters may also be assigned to objects
- myName lt- "Tommy"
- myName lt- 'Tommy
- gt ( myName lt- "Tommy Garrison" )
- 1 "Tommy Garrison"
Single or double quotes may be used
Spaces can enter in character variables
Note that putting a command in ( ) will display
the assigned object in the R Console
36Viewing objects
- There are several ways to display an object that
has just been assigned. I prefer ( lt- ) - gt print(myObject)
- 1 0.4342945
- gt myObject
- 1 0.4342945
- gt myObject 10
- 1 4.342945
print is a useful function within functions (FISH
507 G)
manipulate an object
37Removing objects
- To list the objects in the current workspace
- gt ls()
- 1 "myName" "myObject
- To remove an object
- gt rm(myObject)
- gt ls()
- 1 "myName
- To remove all objects
- rm(list ls())
Be very careful when doing this useful when
starting a new analysis in the same R session
38Data types
- Data types describe how objects are stored in a
computers memory - When storing an object in R, you do not need to
specify the data type - Common data types include
- Integers
- Booleans (True / False)
- Floating point numbers (double)
- Characters
- An objects type is not always obvious
(particularly when reading in data from external
sources) and knowing exactly what it is can be
very useful - myObject lt- log(2.718282)
39Data types
- gt myObject
- 1 1
- gt mode(myObject)
- 1 "numeric"
- gt is.numeric(myObject)
- 1 TRUE
- gt typeof(myObject)
- 1 "double"
- myObject lt- as.integer(myObject)
is. functions useful for determining modes of
objects returned by R functions
Memory is cheap! R will always store a numeric
object as a double unless specified. as.
functions coerce the objects data type or mode
40Data types
- Similar functions can be applied to character
variables - gt is.character(myObject)
- 1 FALSE
- gt is.character(myName)
- 1 TRUE
- gt typeof(myName)
- 1 "character
- Character and numeric (double) storage modes will
be the most common encountered in this class
41The following are equivalent
- gt c(1,2,3,4,5,6,7,8,9,10)
- 1 1 2 3 4 5 6 7 8 9 10
- c(110)
-
- (110)
- 110
-
- seq(from1, to10, by1)
- gt seq(1, 10, 1)
concatenate combine elements to form a vector
from to (by 1)
use seq function for more flexibility
by default R expects arguments in this order
42The following are equivalent
- seq(1,10)
-
- seq(to10)
- seq(length.out10)
- seq(l10)
the default argument for by is 1
the default argument for from is 1
generate a vector with ten elements
R is smart ! you dont need to specify the full
argument
43The following are NOT equivalent
- gt x lt- 13 x
- 1 1 2 3
- gt rep(x, length10)
- 1 1 2 3 1 2 3 1 2 3 1
- gt rep(x, times2)
- 1 1 2 3 1 2 3
- gt rep(x, each2)
- 1 1 1 2 2 3 3
put two commands on the same line
rep replicates the values in x by a certain
argument default is times
44Operations on vectors
- Operations on vectors work element wise
- gt log(x)
- 1 0.0000000 0.6931472 1.0986123
- gt x 1
- 1 2 3 4
- gt x 2
- 1 2 4 6
45Operations on vectors
- gt y lt- 46
- gt x y
- 1 5 7 9
- gt x - y
- 1 -3 -3 -3
- gt x / y
- 1 0.25 0.40 0.50
- gt x y
- 1 4 10 18
46In-class Exercise 2
- Create vectors using seq() and rep() and only c()
if necessary - Positive integers from 1 to 99
- Odd integers between 1 and 99
- The numbers 1,1,1,2,2,2,3,3,3
- The numbers 1,2,3,4,5,4,3,2,1,0
- The fractions 1, 1/2,1/3, .., 1/10
- The numbers 1,8,27,64,125,216
47Entering Data
- Data is usually read in from and .txt or .csv
file, but can be entered manually - co2 lt- scan()
- Famous Mauna Loa CO2 data
- co2 lt- c(316,316.91, 317.63, 318.46, . . .
Hit enter twice to stop
48Useful arithmetic functions
- gt min(co2)
- 1 316
- gt max(co2)
- 1 377.43
- gt mean(co2)
- 1 342.4361
- gt median(co2)
- 1 340.52
- gt var(co2)
- 1 348.3905
- gt sd(co2)
- 1 18.66522
- gt range(co2)
- 1 316.00 377.43
gt quantile(co2) 0 25 50
75 100 316.0000 325.8175 340.5200 356.9150
377.4300
49The length function
- length returns the number of elements in a vector
- Very useful for flexible programming
- gt length(co2)
- 1 46
- gt nyears lt- length(co2)
- gt years lt- seq(from1959, lengthnyears)
- 1 1959 1960 1961 . . . . 2002 2003 2004
50Data frames
- Its convenient to store data as a collection of
variables - gt co2.df lt- data.frame(years,co2)
- head() is quick way to view the first few
observations of a data frame - gt head(co2.df)
- years co2
- 1 1959 316.00
- 2 1960 316.91
- 3 1961 317.63
- 4 1962 318.46
- 5 1963 319.02
- 6 1964 319.52
values must be of same length
column names
row names
51Data frames
- Extract the names of data frame
- gt names(co2.df)
- 1 year" co2"
- Modify the names of the data frame
- gt names(co2.df) lt- c("Year",co.2")
- gt head(co2.df)
- Year co.2
- 1 1959 316.00
- 2 1960 316.91
-
- This can also be done within the data.frame()
statement by assigning the variable names to the
single variable object - co2.df lt- data.frame(Yearyears, co.2co2)
52Data frames
- Whats stored in your working directory?
- gt Year
- Error object 'Year' not found
- gt ls()
- 1 "co2" "co2.df" "myName" . . . .
- attach() attaches variables in the R search path.
Be careful confusing names with other previous
named variables - gt attach(co2.df)
- gt Year
- 1 1959 1960 1961 .
- gt co2.dfYear
- 1 1959 1960 1961 .
extracts columns from a data frame
53Data frames
- If youre done working with a particular data
frame and attached it, you should always detach
it - gt detach(co2.df)
- gt Year
- Error object 'Year' not found
- The safest approach is use the operator to
extract variables from data frames - gt co2.dfYear
- 1 1959 1960 1961 . . . .
54Matrices
- Matrices represent another way to store
collections of variables. Whereas data frames
can store objects of multiple types (numeric,
character, . . . . ), a matrix must be of a
single type or R will coerce variables
accordingly - The matrix() statement
- matrix(data NA,
- nrow 1, ncol 1,
- byrow FALSE)
A vector needs to be given or R will generate NA
(Not Available)
Only one of these arguments needs to be
specified. If both are specified then
length(data) nrow ncol
R will fill the matrix by column unless
byrowTRUE is specified
gt matrix(14, ncol2) ,1 ,2 1, 1
3 2, 2 4
55Matrices
- Create a matrix with the co2 data
- gt co2.mat lt- matrix(c(years, co2), ncol2,
nrowlength(years)) - gt head(co2.mat, n3)
- ,1 ,2
- 1, 1959 316.00
- 2, 1960 316.91
- 3, 1961 317.63
- A fast way to create matrices is by binding the
columns of two (or more) vectors (or matrices) - gt co2.mat lt- cbind(years,co2)
- gt head(co2.mat, n3)
- years co2
- 1, 1959 316.00
- 2, 1960 316.91
- 3, 1961 317.63
56Matrices
- rbind() forms a matrix by binding two (or more)
rows of vectors (or matrices) - gt co2.row.mat lt- rbind(years,co2)
- gt t(co2.row.mat)
- years co2
- 1, 1959 316.00
- 2, 1960 316.91
- 3, 1961 317.63
- ...
- Common functions to extract dimensions
- gt dim(co2.mat)
- 1 46 2
- gt dim(co2.row.mat)
- 1 2 46
- gt nrow(co2.mat)
- 1 46
- gt ncol(co2.mat)
- 1 2
t - transpose a matrix
57apply function
- Very flexible function
- apply(X, MARGIN, FUN, ...)
- X matrix
- MARGIN 1 rows / 2 columns
- FUN an R function
- . . . . additional arguments to that function
- gt apply(co2.mat,2,mean)
- years co2
- 1981.5000 342.4361
- What does this return ?
- apply(co2.row.mat,1,mean)
apply to the columns of co2.mat the mean
58Arrays
- N-dimensional matrices
- gt co2.array lt- array(datac(years,co2),
dimc(nyears,2)) - gt head(co2.array, n3)
- ,1 ,2
- 1, 1959 316.00
- 2, 1960 316.91
- 3, 1961 317.63
- A matrix is a special case of an array, but a
data frame is not - gt is.array(co2.array)
- 1 TRUE
- gt is.array(co2.mat)
- 1 TRUE
- gt is.array(co2.df)
- 1 FALSE
create an array of dimension nyears x 2
59Arrays
- 3-dimensional array
- gt array(124, dimc(3,4,2))
- , , 1
- ,1 ,2 ,3 ,4
- 1, 1 4 7 10
- 2, 2 5 8 11
- 3, 3 6 9 12
- , , 2
- ,1 ,2 ,3 ,4
- 1, 13 16 19 22
- 2, 14 17 20 23
- 3, 15 18 21 24
R will fill in an array by column, row and then
higher dimensions
60Lists
- Most flexible structure
- Each element can be of a different mode and
varying length - gt description lt- "Atmospheric CO2 concentrations
(ppmv) - derived from in situ air
samples - collected at Mauna Loa
Observatory, - Hawaii. Source C.D. Keeling
- co2.list lt- list(Metadescription, Yearsnyears,
dataco2.mat)
61Lists
- gt co2.list
- Meta
- 1 "Atmospheric CO2 concentrations (ppmv)
derived from in situ air samples collected at
Mauna Loa Observatory, Hawaii. Source C.D.
Keeling" - Years
- 1 46
- data
- years co2
- 1, 1959 316.00
- 2, 1960 316.91
- 3, 1961 317.63
- . . .
- gt co2.listYears
- 1 46
operation works for lists
62In-class Exercise 3
- Look up help on the summary command and try it
out on the co2 object. Now replicate this output
without using the summary command and include it
in co2.list as the 4th element in the list. - You may or may not need some extra
functions/operations that we have not discussed.
Use the help files if necessary.