Title: A Tutorial on R Programming
1A Tutorial on R Programming
- Courtesy of Ping Mas class notes
2Introduction
- GNU S-Plus
- A flexible programming language for statistical
computing. - Multitude of packages exist for computational
biology analyses. - BioConductor Project.
- Some Programming Gems
- Graphics
- Extensibility ports to perl, Python, Java,
HTML, etc. - Support active user community, especially in
computation biology. - Open source in design and nature.
- http//www.r-project.org
- http//cran.r-project.org
3R Projects
- The BioConductor Project
- www.bioconductor.org
- A suite of statistical and graphical methods for
analyzing genomic data. - For example, software available for DNA
microarray analysis normalization, CGH data, GO
analysis, tiling arrays, plus some proteomics
analyses. - CRAN Comprehensive R Archive Network
- All areas of mathematical and statistical
software applications. - Finance modeling, time series, spatial modeling,
high performance parallel computing,
4Outline
- Data Structures
- Functionality
- Input/Output
- Workspace Management
5Getting Started
Installation (usually) a snap download file,
unzip and run wizard Start up via icon or
inside a shell gt R
6R Basics
gt x lt- 1 5 gt x 1 6 gt y lt- c(1,2,3,4) gt y
1 1 2 3 4 gt z lt- 14 gt z 1 1 2 3 4 gt
z1 1 1
- Note everything in R is case sensitive.
- Assignments can also made using .
- Variable names may be delimited by a .
- gt a.meaningful.name lt- 6
- Indices always begin with 1.
- Comments
7Mathematical Operators
R as a calculator gt 2 3 1 5 gt 34/6 2(1
9) 1 22 gt AB matrix multiplication
8Built-In R Functions
R comes with a suite a built-in mathematical and
statistical functions. gt sqrt(54) 1
7.348469 gt mean(15) 1 3 gt lm(yx) simple
linear regression For more specialized
functions, look at CRAN or BioConductor.
9Matrices
Matrices are 2 dimensional vectors. gt A lt-
matrix(19, nrow3, ncol3, byrowT) gt A
,1 ,2 ,3 1, 1 2 3 2, 4 5
6 3, 7 8 9 gt row.names(A) lt- c(a,
b, c) gt colnames(A) lt- c(f, g, h) gt A
f g h a 1 2 3 b 4 5 6 c 7 8 9
10Extracting and Extending Matrices
Extract information from the matrix using
indices. gt A,1 a b c 1 4 7 Extend the matrix
by adding rows or columns. gt B lt- cbind(A,
c(-10,-20,-30)) gt B f g h a 1 2 3 -10 b 4 5
6 -20 c 7 8 9 -30
gt A1, f g h 1 2 3 gt C lt- rbind(A,
c(-10,-20,-30)) gt C f g h a 1 2 3 b
4 5 6 c 7 8 9 -10 -20 -30
A matrix can only consist of the one data type
e.g. numeric, character.
11Interrogating a Matrix Object
Useful functions are gt dim(A) 1 3 3 gt
ncol(A) 1 3 gt nrow(A) 1 3 gt length(A) 1 9
Similarly for a vector object gt length(x)
12Operating on Matrices
A really useful function for matrices is the
apply function. This allows us to apply a
specific function to row-wise or column-wise. gt
apply(A, 1, mean) 1 2 5 8 the 1 means
row-wise, use 2 for column-wise.
13Data Frame
A data frame is a collection of column vectors.
Gpdh Sod Xdh AvRate Myr Drosophila 1.50 25.
7 30.4 22.4 55 Fungi 40.0 24.9 13.7 21.4 300 A
nimal Phyla 13.2 19.2 19.2 17.5 600
A useful way to store table-like information. gt
molclock lt- data.frame(Gpdhc(1.50, 40, 13.2),
Sodc(25.7, 24.9, 19.2), Xdhc(30.4, 13.7, 19.2),
AvRatec(22.4, 21.4, 17.5), Myrc(55, 300,
600), row.namesc(Drosophila, Fungi,
Animal Phyla))
14Working with Data Frame
Extracting data from a data frame object by
column, we can use indices or names gt
molclock,1 1 1.5 40.0 13.2 gt
molclock,Gpdh 1 1.5 40.0 13.2 For rows we
must use row indices. gt molclock2, Gpdh Sod
Xdh AvRate Myr Fungi 40 24.9 13.7 21.4 300 gt
class(molclock,1) 1 numeric gt
class(molclock2,) 1 data.frame
Recall a data.frame object is a collection of
column vectors.
15List Structures
- Up until now, all our data structure objects have
needed a uniform data type. - List structures are powerful because we can store
multiple data types in the same object. - gt miscObjs lt- list("actin"c(1.3, 99.6, 2.45),
- "gapdh"matrix(rnorm(100), nrow10),
"atp"molclock) - We extract data from a list using names or
indices. - gt names(miscObjs)
- 1 "actin" "gapdh" "atp"
- gt miscObjsactin
- 1 1.30 99.60 2.45
- gt miscObjs1
- 1 1.30 99.60 2.45
16Visualizing Data Plot Function
A simple scatter plot gt x.dat lt- rnorm(100)
100 N(0,1) rvs gt plot(x.dat, xlab"Index",
ylab"Normal RVS", main"Figure 1 Scatter
Plot")
17Exporting Graphics
- In Windows
- right mouse click to copy to clipboard.
- For most operating systems
- gt bitmap("file.bmp")
- gt plot(x.dat) lt- insert code for making plot
here - gt dev.off()
- You can create export graphics to many file
formats bitmap, jpeg, gif, postscript, etc.
18Classes
- A class describes the way an object in R is
stored. - Strings Homo sapiens
- Numeric 3.141593
- Boolean TRUE, FALSE
- We can interrogate an object to find out its
class - gt a lt- FALSE
- gt class(a)
- 1 "logical"
- gt is.numeric(a)
- 1 FALSE
- Classes also reflect their data structure, eg.
matrix, data.frame, function.
19Working with Strings
- While Perl or Python are more competent languages
for text parsing, R does have capabilities for
manipulating and creating strings. - Pasting Strings Together
- gt paste(c("Cat", "Dog"), sep"")
- 1 "CatDog"
- Splitting Strings
- gt strsplit("Seuss", "")
- 1
- 1 "S" "e" "u" "s" "s"
- Searching for Patterns
- gt grep("and", "Brown eggs and ham")
- 1 1 grep also lets you search with regexp
patterns
20Booleans Algebra
- In R, to test for equality use ""
- gt 1 3
- 1 FALSE
- gt 1 3
- 1 TRUE inequality
- Another powerful tip we can test for inclusion
in a vector by asking with "in" - gt x lt- 110 even.numbers lt- seq(from2, to10,
by2) - gt x
- 1 1 2 3 4 5 6 7 8 9 10
- gt even.numbers
- 1 2 4 6 8 10
- gt x in even.numbers
- 1 FALSE TRUE FALSE TRUE FALSE TRUE FALSE
TRUE FALSE - TRUE
- We can subset vectors with TRUE/FALSE flags
- gt xx in even.numbers
- 1 2 4 6 8 10
21Missing Values
- NA is the all-inclusive symbol for a missing
value in R. - gt mean(c(1, 4, NA))
- 1 NA
- gt mean(c(1, 4, NA), na.rmT)
- 1 2.5
- We can test whether an object is a missing value.
- gt NA NA
- 1 NA this doesn't work!
- gt is.na(NA)
- 1 TRUE
- gt na.omit(c(1, 4, NA))
- 1 1 4
- Other objects NaN, Inf.
22For Loops
- For loops are very simple in R.
- gt for( m in 13 )
- print(m)
-
- 1 1
-
- gt for( m in c("actin", "myosin", "gapdh") )
- print(m)
-
- 1 "actin"
-
- Note R does not process for loops very quickly,
try to avoid them for large data if you can (eg.
Use apply)
23Conditional Statements
- We can use conditional statements to automate
tasks and functions. - If..Else Block
- If( condition 1 holds ) then do task 1. Else, do
task 2. - gt if( x gt 0 ) print("positive")
- else print("negative")
- While Block
- While( condition 1 holds) then do task 1. If
condition 1 no longer holds, stop. - gt while( x gt 0 ) x lt- x rnorm(1)
- You can put the break command inside an if( )
to break out of the conditional loop.
24Writing Your Own Functions
- Imagine you need to write a simple function that
returns both the mean and the standard deviation
of a vector in a list structure. - gt mean.and.sd lt- function(x)
- res.mean lt- mean(x) res.sd lt- sd(x)
- res list(meanres.mean, sdres.sd)
- return(res)
-
- gt mean.and.sd(rpois(10,5))
- mean
- 1 4.4
- sd
- 1 0.9660918
- You can use the args function to find out what
arguments a function needs. - gt args(mean.and.sd)
- 1 function (x)
- NULL
25Inputting Data into R
- R has capabilities for reading in data files of
many different formats. - For simple ASCII text files we can use the
read.table function. - The arguments specified by read.table are
- gt args(read.table)
- 1 function (file, header FALSE, sep "",
- quote "\"'", dec ".", row.names,
col.names, - as.is FALSE, na.strings "NA", colClasses
NA, - nrows -1, skip 0, check.names TRUE,
- fill !blank.lines.skip, strip.white FALSE,
- blank.lines.skip TRUE, comment.char "",
- allowEscapes FALSE)
- Other read-in functions read.csv, scan, readLines
26Outputting Data from R
- To output data to a simple table text file, we
can use write.table. - gt args(write.table)
- function (x, file "", append FALSE, quote
TRUE, - sep " ", eol "\n", na "NA", dec ".",
- row.names TRUE, col.names TRUE,
- qmethod c("escape", "double"))
- Other write functions write, cat.
27Porting to Other Languages
- A port is a piece of software that provides a
means to get one programming language to
communicate with another. - The Omega Project for Statistical Computing
- An umbrella project to link different programming
languages seamlessly. - Some packages available RSPython, RSPerl,
RMatlab. - (Plus a variety of others).
- Example RSPython
- To call Python from R load RSPython, call py
commands using .Python(func, args1, args2, ) - To call R from Python load RS module,
RS.call("plot", x, y).
28Workspace Management
- Where am I?
- gt getwd() returns the working directory
- gt setwd("C//Jess") sets the working directory
- gt dir() lists files in working directory
- gt list.files()
- How can I tell what objects I have?
- gt ls()
- To remove individual objects use rm()
- gt rm("name.of.object")
- To save specific objects use save()
- gt save(x, file"fileName.Rdata")
- At a later date, you can load this into your
workspace - gt load("fileName.RData")
29Libraries
- Libraries are a collection of R functions that
together perform a specialized analysis or task. - For example genetics package.
- CRAN Description
- Classes and methods for handling genetic data.
Includes classes to represent genotypes and
haplotypes at single markers up to multiple
markers on multiple chromosomes. Function include
allele frequencies, flagging homo/heterozygotes,
flagging carriers of certain alleles, estimating
and testing for Hardy-Weinberg disequilibrium,
estimating and testing for linkage
disequilibrium, ... - Consult CRAN for more http//cran.us.r-project.or
g/
30Helpful Functions
- To boot up HTML help files
- gt help.start()
- To pop up a help file on an individual function.
- gt help(function)
- To seach for help on something around a topic or
function - gt help.search("plot")
- To search on a string for something
- gt apropos("string")
31More Info Resources
- For R tutorials and simple documents to learn
more about R, consult the R website for lots of
resources www.r-project.org/ - (go to Documentation gt Other gt Contributed
Documentation - Really Great HTML Tutorial Kickstarting R by Jim
Lemon - http//cran.r-project.org/doc/contrib/Lemon-kicks
tart/index.html - "R for Beginners" by Emmanuel Paradis short pdf
- There are also reference cards that contain the
most important R functions (and their
descriptions) you need to know (like a cheat
sheet). - "R Reference Card" by Jonathon Baron 1 page
list