A Tutorial on R Programming - PowerPoint PPT Presentation

1 / 31
About This Presentation
Title:

A Tutorial on R Programming

Description:

row.names=c('Drosophila', 'Fungi', 'Animal Phyla')) Working with Data Frame ... Other write functions: write, cat. Porting to Other Languages ... – PowerPoint PPT presentation

Number of Views:4233
Avg rating:3.0/5.0
Slides: 32
Provided by: Jes347
Category:

less

Transcript and Presenter's Notes

Title: A Tutorial on R Programming


1
A Tutorial on R Programming
  • Courtesy of Ping Mas class notes 

2
Introduction
  • GNU S-Plus
  • A flexible programming language for statistical
    computing.
  • Multitude of packages exist for computational
    biology analyses.
  • BioConductor Project.
  • Some Programming Gems
  • Graphics
  • Extensibility ports to perl, Python, Java,
    HTML, etc.
  • Support active user community, especially in
    computation biology.
  • Open source in design and nature.
  • http//www.r-project.org
  • http//cran.r-project.org

3
R Projects
  • The BioConductor Project
  • www.bioconductor.org
  • A suite of statistical and graphical methods for
    analyzing genomic data.
  • For example, software available for DNA
    microarray analysis normalization, CGH data, GO
    analysis, tiling arrays, plus some proteomics
    analyses.
  • CRAN Comprehensive R Archive Network
  • All areas of mathematical and statistical
    software applications.
  • Finance modeling, time series, spatial modeling,
    high performance parallel computing,

4
Outline
  • Data Structures
  • Functionality
  • Input/Output
  • Workspace Management

5
Getting Started
Installation (usually) a snap download file,
unzip and run wizard Start up via icon or
inside a shell gt R
6
R Basics
gt x lt- 1 5 gt x 1 6 gt y lt- c(1,2,3,4) gt y
1 1 2 3 4 gt z lt- 14 gt z 1 1 2 3 4 gt
z1 1 1
  • Note everything in R is case sensitive.
  • Assignments can also made using .
  • Variable names may be delimited by a .
  • gt a.meaningful.name lt- 6
  • Indices always begin with 1.
  • Comments

7
Mathematical Operators
R as a calculator gt 2 3 1 5 gt 34/6 2(1
9) 1 22 gt AB matrix multiplication
8
Built-In R Functions
R comes with a suite a built-in mathematical and
statistical functions. gt sqrt(54) 1
7.348469 gt mean(15) 1 3 gt lm(yx) simple
linear regression For more specialized
functions, look at CRAN or BioConductor.
9
Matrices
Matrices are 2 dimensional vectors. gt A lt-
matrix(19, nrow3, ncol3, byrowT) gt A
,1 ,2 ,3 1, 1 2 3 2, 4 5
6 3, 7 8 9 gt row.names(A) lt- c(a,
b, c) gt colnames(A) lt- c(f, g, h) gt A
f g h a 1 2 3 b 4 5 6 c 7 8 9
10
Extracting and Extending Matrices
Extract information from the matrix using
indices. gt A,1 a b c 1 4 7 Extend the matrix
by adding rows or columns. gt B lt- cbind(A,
c(-10,-20,-30)) gt B f g h a 1 2 3 -10 b 4 5
6 -20 c 7 8 9 -30
gt A1, f g h 1 2 3 gt C lt- rbind(A,
c(-10,-20,-30)) gt C f g h a 1 2 3 b
4 5 6 c 7 8 9 -10 -20 -30
A matrix can only consist of the one data type
e.g. numeric, character.
11
Interrogating a Matrix Object
Useful functions are gt dim(A) 1 3 3 gt
ncol(A) 1 3 gt nrow(A) 1 3 gt length(A) 1 9
Similarly for a vector object gt length(x)
12
Operating on Matrices
A really useful function for matrices is the
apply function. This allows us to apply a
specific function to row-wise or column-wise. gt
apply(A, 1, mean) 1 2 5 8 the 1 means
row-wise, use 2 for column-wise.
13
Data Frame
A data frame is a collection of column vectors.
Gpdh Sod Xdh AvRate Myr Drosophila 1.50 25.
7 30.4 22.4 55 Fungi 40.0 24.9 13.7 21.4 300 A
nimal Phyla 13.2 19.2 19.2 17.5 600
A useful way to store table-like information. gt
molclock lt- data.frame(Gpdhc(1.50, 40, 13.2),
Sodc(25.7, 24.9, 19.2), Xdhc(30.4, 13.7, 19.2),
AvRatec(22.4, 21.4, 17.5), Myrc(55, 300,
600), row.namesc(Drosophila, Fungi,
Animal Phyla))
14
Working with Data Frame
Extracting data from a data frame object by
column, we can use indices or names gt
molclock,1 1 1.5 40.0 13.2 gt
molclock,Gpdh 1 1.5 40.0 13.2 For rows we
must use row indices. gt molclock2, Gpdh Sod
Xdh AvRate Myr Fungi 40 24.9 13.7 21.4 300 gt
class(molclock,1) 1 numeric gt
class(molclock2,) 1 data.frame
Recall a data.frame object is a collection of
column vectors.
15
List Structures
  • Up until now, all our data structure objects have
    needed a uniform data type.
  • List structures are powerful because we can store
    multiple data types in the same object.
  • gt miscObjs lt- list("actin"c(1.3, 99.6, 2.45),
  • "gapdh"matrix(rnorm(100), nrow10),
    "atp"molclock)
  • We extract data from a list using names or
    indices.
  • gt names(miscObjs)
  • 1 "actin" "gapdh" "atp"
  • gt miscObjsactin
  • 1 1.30 99.60 2.45
  • gt miscObjs1
  • 1 1.30 99.60 2.45

16
Visualizing Data Plot Function
A simple scatter plot gt x.dat lt- rnorm(100)
100 N(0,1) rvs gt plot(x.dat, xlab"Index",
ylab"Normal RVS", main"Figure 1 Scatter
Plot")
17
Exporting Graphics
  • In Windows
  • right mouse click to copy to clipboard.
  • For most operating systems
  • gt bitmap("file.bmp")
  • gt plot(x.dat) lt- insert code for making plot
    here
  • gt dev.off()
  • You can create export graphics to many file
    formats bitmap, jpeg, gif, postscript, etc.

18
Classes
  • A class describes the way an object in R is
    stored.
  • Strings Homo sapiens
  • Numeric 3.141593
  • Boolean TRUE, FALSE
  • We can interrogate an object to find out its
    class
  • gt a lt- FALSE
  • gt class(a)
  • 1 "logical"
  • gt is.numeric(a)
  • 1 FALSE
  • Classes also reflect their data structure, eg.
    matrix, data.frame, function.

19
Working with Strings
  • While Perl or Python are more competent languages
    for text parsing, R does have capabilities for
    manipulating and creating strings.
  • Pasting Strings Together
  • gt paste(c("Cat", "Dog"), sep"")
  • 1 "CatDog"
  • Splitting Strings
  • gt strsplit("Seuss", "")
  • 1
  • 1 "S" "e" "u" "s" "s"
  • Searching for Patterns
  • gt grep("and", "Brown eggs and ham")
  • 1 1 grep also lets you search with regexp
    patterns

20
Booleans Algebra
  • In R, to test for equality use ""
  • gt 1 3
  • 1 FALSE
  • gt 1 3
  • 1 TRUE inequality
  • Another powerful tip we can test for inclusion
    in a vector by asking with "in"
  • gt x lt- 110 even.numbers lt- seq(from2, to10,
    by2)
  • gt x
  • 1 1 2 3 4 5 6 7 8 9 10
  • gt even.numbers
  • 1 2 4 6 8 10
  • gt x in even.numbers
  • 1 FALSE TRUE FALSE TRUE FALSE TRUE FALSE
    TRUE FALSE
  • TRUE
  • We can subset vectors with TRUE/FALSE flags
  • gt xx in even.numbers
  • 1 2 4 6 8 10

21
Missing Values
  • NA is the all-inclusive symbol for a missing
    value in R.
  • gt mean(c(1, 4, NA))
  • 1 NA
  • gt mean(c(1, 4, NA), na.rmT)
  • 1 2.5
  • We can test whether an object is a missing value.
  • gt NA NA
  • 1 NA this doesn't work!
  • gt is.na(NA)
  • 1 TRUE
  • gt na.omit(c(1, 4, NA))
  • 1 1 4
  • Other objects NaN, Inf.

22
For Loops
  • For loops are very simple in R.
  • gt for( m in 13 )
  • print(m)
  • 1 1
  • gt for( m in c("actin", "myosin", "gapdh") )
  • print(m)
  • 1 "actin"
  • Note R does not process for loops very quickly,
    try to avoid them for large data if you can (eg.
    Use apply)

23
Conditional Statements
  • We can use conditional statements to automate
    tasks and functions.
  • If..Else Block
  • If( condition 1 holds ) then do task 1. Else, do
    task 2.
  • gt if( x gt 0 ) print("positive")
  • else print("negative")
  • While Block
  • While( condition 1 holds) then do task 1. If
    condition 1 no longer holds, stop.
  • gt while( x gt 0 ) x lt- x rnorm(1)
  • You can put the break command inside an if( )
    to break out of the conditional loop.

24
Writing Your Own Functions
  • Imagine you need to write a simple function that
    returns both the mean and the standard deviation
    of a vector in a list structure.
  • gt mean.and.sd lt- function(x)
  • res.mean lt- mean(x) res.sd lt- sd(x)
  • res list(meanres.mean, sdres.sd)
  • return(res)
  • gt mean.and.sd(rpois(10,5))
  • mean
  • 1 4.4
  • sd
  • 1 0.9660918
  • You can use the args function to find out what
    arguments a function needs.
  • gt args(mean.and.sd)
  • 1 function (x)
  • NULL

25
Inputting Data into R
  • R has capabilities for reading in data files of
    many different formats.
  • For simple ASCII text files we can use the
    read.table function.
  • The arguments specified by read.table are
  • gt args(read.table)
  • 1 function (file, header FALSE, sep "",
  • quote "\"'", dec ".", row.names,
    col.names,
  • as.is FALSE, na.strings "NA", colClasses
    NA,
  • nrows -1, skip 0, check.names TRUE,
  • fill !blank.lines.skip, strip.white FALSE,
  • blank.lines.skip TRUE, comment.char "",
  • allowEscapes FALSE)
  • Other read-in functions read.csv, scan, readLines

26
Outputting Data from R
  • To output data to a simple table text file, we
    can use write.table.
  • gt args(write.table)
  • function (x, file "", append FALSE, quote
    TRUE,
  • sep " ", eol "\n", na "NA", dec ".",
  • row.names TRUE, col.names TRUE,
  • qmethod c("escape", "double"))
  • Other write functions write, cat.

27
Porting to Other Languages
  • A port is a piece of software that provides a
    means to get one programming language to
    communicate with another.
  • The Omega Project for Statistical Computing
  • An umbrella project to link different programming
    languages seamlessly.
  • Some packages available RSPython, RSPerl,
    RMatlab.
  • (Plus a variety of others).
  • Example RSPython
  • To call Python from R load RSPython, call py
    commands using .Python(func, args1, args2, )
  • To call R from Python load RS module,
    RS.call("plot", x, y).

28
Workspace Management
  • Where am I?
  • gt getwd() returns the working directory
  • gt setwd("C//Jess") sets the working directory
  • gt dir() lists files in working directory
  • gt list.files()
  • How can I tell what objects I have?
  • gt ls()
  • To remove individual objects use rm()
  • gt rm("name.of.object")
  • To save specific objects use save()
  • gt save(x, file"fileName.Rdata")
  • At a later date, you can load this into your
    workspace
  • gt load("fileName.RData")

29
Libraries
  • Libraries are a collection of R functions that
    together perform a specialized analysis or task.
  • For example genetics package.
  • CRAN Description
  • Classes and methods for handling genetic data.
    Includes classes to represent genotypes and
    haplotypes at single markers up to multiple
    markers on multiple chromosomes. Function include
    allele frequencies, flagging homo/heterozygotes,
    flagging carriers of certain alleles, estimating
    and testing for Hardy-Weinberg disequilibrium,
    estimating and testing for linkage
    disequilibrium, ...
  • Consult CRAN for more http//cran.us.r-project.or
    g/

30
Helpful Functions
  • To boot up HTML help files
  • gt help.start()
  • To pop up a help file on an individual function.
  • gt help(function)
  • To seach for help on something around a topic or
    function
  • gt help.search("plot")
  • To search on a string for something
  • gt apropos("string")

31
More Info Resources
  • For R tutorials and simple documents to learn
    more about R, consult the R website for lots of
    resources www.r-project.org/
  • (go to Documentation gt Other gt Contributed
    Documentation
  • Really Great HTML Tutorial Kickstarting R by Jim
    Lemon
  • http//cran.r-project.org/doc/contrib/Lemon-kicks
    tart/index.html
  • "R for Beginners" by Emmanuel Paradis short pdf
  • There are also reference cards that contain the
    most important R functions (and their
    descriptions) you need to know (like a cheat
    sheet).
  • "R Reference Card" by Jonathon Baron 1 page
    list
Write a Comment
User Comments (0)
About PowerShow.com