Introduction to - PowerPoint PPT Presentation

1 / 51
About This Presentation
Title:

Introduction to

Description:

Introduction to 3.11.13 Dror Hollander Gil Ast Lab Sackler Medical School Using the plot() Function Plot the gene expression profile of Hippocampus.brain against that ... – PowerPoint PPT presentation

Number of Views:94
Avg rating:3.0/5.0
Slides: 52
Provided by: acil150
Category:

less

Transcript and Presenter's Notes

Title: Introduction to


1
Introduction to
3.11.13
  • Dror Hollander
  • Gil Ast Lab
  • Sackler Medical School

2
Lecture Overview
  • What is R and why use it?
  • Setting up R RStudio for use
  • Calculations, functions and variable classes
  • File handling, plotting and graphic features
  • Statistics
  • Packages and writing functions

3
What is ?
  • R is a freely available language and environment
    for statistical computing and graphics
  • Much like , but bette !

4
Why use ?
  • SPSS and Excel users are limited in their ability
    to change their environment. The way they
    approach a problem is constrained by how Excel
    SPSS were programmed to approach it
  • The users have to pay money to use the software

5
s Strengths
  • Data management manipulation
  • Statistics
  • Graphics
  • Programming language
  • Active user community
  • Free

6
s Weaknesses
  • Not very user friendly at start
  • No commercial support
  • Substantially slower than programming languages
    (e.g. Perl, Java, C)

7
Lecture Overview
  • What is R and why use it?
  • Setting up R RStudio for use
  • Calculations, functions and variable classes
  • File handling, plotting and graphic features
  • Statistics
  • Packages and writing functions

8
Installing
  • Go to R homepage http//www.r-project.org/

And just follow the installation instructions
9
Installing RStudio
  • RStudio is a integrated development environment
    (IDE) for R
  • Install the desktop edition from this link
    http//www.rstudio.org/download/

10
Using RStudio
Script editor
View variables in workspace and history file
View help, plots files manage packages
R console
11
Set Up Your Workspace
  • Create your working directory
  • Open a new R script file

12
Lecture Overview
  • What is R and why use it?
  • Setting up R RStudio for use
  • Calculations, functions and variable classes
  • File handling plotting and graphic features
  • Statistics
  • Packages and writing functions

13
- Basic Calculations
  • Operators take values (operands), operate on
    them, and produce a new value
  • Basic calculations (numeric operators)
  • , - , / , ,
  • Lets try an example. Run this
  • (170.35)(1/3)Before you do

Script editor
Use to write comments (script lines that are
ignored when run)
Click here / Ctrlenter to run code in RStudio
R console
14
- Basic Functions
  • All R operations are performed by functions
  • Calling a functiongt function_name(x)
  • For examplegt sqrt(9) 1 3
  • Reading a functions help file gt ?sqrt Also,
    when in doubt Google it!

15
Variables
  • A variable is a symbolic name given to stored
    information
  • Variables are assigned using either or lt-
    gt xlt-12.6 gt x1 12.6

16
Variables - Numeric Vectors
  • A vector is the simplest R data structure. A
    numeric vector is a single entity consisting of a
    collection of numbers.
  • It may be created
  • Using the c() function (concatenate)
    xc(3,7.6,9,11.1)gt x1 3 7.6 9 11.1
  • Using the rep(what,how_many_times) function
    (replicate)xrep(10.2,3)
  • Using the operator, signifiying a series of
    integersx415

17
Variables - Character Vectors
  • Character strings are always double quoted
  • Vectors made of character stringsgt
    yc("I","want","to","go","home") gt y 1 "I"
    "want" "to" "go" "home"
  • Using rep()gt rep("bye",2) 1 "bye" "bye"
  • Notice the difference using paste() (1
    element)gt paste("I","want","to","go","home")1
    "I want to go home"

18
Variables - Boolean Vectors
  • Logical either FALSE or TRUE
  • gt 5gt3 1 TRUE
  • gt x15gt x1 1 2 3 4 5 gt xlt3 1 TRUE TRUE
    FALSE FALSE FALSEzxlt3

19
RStudio Workspace History
  • Lets review the workspace and history tabs
    in RStudio

20
Manipulation of Vectors
  • Our vector xc(101,102,103,104)
  • are used to access elements in x
  • Extract 2nd element in xgt x21 102
  • Extract 3rd and 4th elements in x gt x34 or
    xc(3,4)1 103 104

21
Manipulation of Vectors Cont.
  • gt x 1 101 102 103 104
  • Add 1 to all elements in xgt x1 1 102 103
    104 105
  • Multiply all elements in x by 2gt x2 1 202
    204 206 208

22
More Operators
  • Comparison operators
  • Equal
  • Not equal !
  • Less / greater than lt / gt
  • Less / greater than or equal lt / gt
  • Boolean (either FALSE or TRUE)
  • And
  • Or
  • Not !

23
Manipulation of Vectors Cont.
  • Our vector x100150
  • Elements of x higher than 145gt xxgt145 1 146
    147 148 149 150
  • Elements of x higher than 135 and lower than
    140gt x xgt135 xlt140 1 136 137 138 139

24
Manipulation of Vectors Cont.
  • Our vector gt xc("I","want","to","go","home")
  • Elements of x that do not equal wantgt xx !
    "want" 1 "I" "to" "go" "home"
  • Elements of x that equal want and homegt xx
    in c("want","home") 1 "want" "home"

Note use for 1 element and in for
several elements
25
Variables Data Frames
  • A data frame is simply a table
  • Each column may be of a different class (e.g.
    numeric, character, etc.)
  • The number of elements in each
  • row must be identical
  • Accessing elements in data frame
  • xrow,column
  • The age columngt xage orgt x,age or
    gt x,1
  • All male rowsgt xxgenderM,

26
Variables Matrices
  • A matrix is a table of a different class
  • Each column must be of the same class (e.g.
    numeric, character, etc.)
  • The number of elements in each
  • row must be identical
  • Accessing elements in matrices
  • xrow,column
  • The Height columngt x,Height or gt
    x,2
  • Note you cannot use gt xWeight

27
Exe cise
  • Construct the character vector pplNames
    containing 5 names Srulik, Esti, Shimshon,
    Shifra, Ezra
  • Construct the numeric vector ages that includes
    the following numbers 21, 12 (twice), 35
    (twice)
  • Use the data.frame() function to construct the
    pplAges table out of pplNames ages
  • Access the pplAges rows with ages values
    greater than 19

28
Lecture Overview
  • What is R and why use it?
  • Setting up R RStudio for use
  • Calculations, functions and variable classes
  • File handling, plotting and graphic features
  • Statistics
  • Packages and writing functions

29
Wo king With a File
  • For example analysis of a gene expression file
  • Workflow
  • Save file in workspace directory
  • Read / load file to R
  • Analyze the gene expression table
  • 305 gene expression reads in 48 tissues (log10
    values compared to a mixed tissue pool)
  • Values gt0 ? over-expressed genes
  • Values lt0 ? under-expressed genes
  • File includes 306 rows X 49 columns

30
File Handling - ead File
  • Read file to R
  • Use the read.table() function
  • Note each function receives input (arguments)
    and produces output (return value)
  • The function returns a data frame
  • Rungt geneExprss read.table(file
    "geneExprss.txt", sep "\t",header T)
  • Check tablegt dim(geneExprss) table
    dimentionsgt geneExprss1, 1st linegt
    class(geneExprss) check variable class
  • Or double click on variable name in workspace tab

31
Plotting - Pie Chart
  • What fraction of lung genes are over-expressed?
  • What about the under-expressed genes?
  • A pie chart can illustrate our findings

32
Using the pie() Function
  • Lets regard values gt 0.2 as over-expressed
  • Lets regard values lt (-0.2) as under-expressed
  • Lets use Length() ? retrieves the number of
    elements in a vector

gt up length (geneExprssLung geneExprssLunggt0.
2) gt down length (geneExprssLung
geneExprssLunglt(-0.2)) gt mid length
(geneExprssLung geneExprssLunglt0.2
geneExprssLunggt(-0.2)) gt pie (c(up,down,mid)
,labels c("up","down","mid"))
  • More on saving plots to files in a few slides

33
Plotting - Scatter Plot
  • How similar is the gene expression profile of the
    Hippocampus (brain) to that of that of the
    Thalamus (brain)?
  • A scatter plot is ideal for the visualization of
    the correlation between two variables

34
Using the plot() Function
  • Plot the gene expression profile of
    Hippocampus.brain against that of Thalamus.brain
  • gt plot ( geneExprssHippocampus.brain,
    geneExprssThalamus.brain, xlab"Hippocampus",
    ylab"Thalamus")

35
File Handling Load File to
  • .RData files contain saved R environment data
  • Load .RData file to R
  • Use the load() function
  • Note each function receives input (arguments)
    and produces output (return value)
  • Rungt load (file "geneExprss.RData")
  • Check tablegt dim(geneExprss) table
    dimentionsgt geneExprss1, 1st linegt
    class(geneExprss) check variable class
  • Or double click on variable name in workspace tab

36
Plotting Bar Plot
  • How does the expression profile of NOVA1 differ
    across several tissues?
  • A bar plot can be used to compare two or more
    categories

37
Using the barplot() Function
  • Compare NOVA1 expression in Spinalcord, Kidney,
    Heart and Skeletal.muscle by plotting a bar plot
  • Sort the data before plotting using the sort()
    function
  • barplot() works on a variable of a matrix class
  • gt tissues c ( "Spinalcord", "Kidney",
    "Skeletal.muscle", "Heart")gt barplot ( sort (
    geneExprss "NOVA1",tissues ) )

38
More Graphic Functions to Keep in Mind
  • hist()
  • boxplot()
  • plotmeans()
  • scatterplot()

39
Exe cise
  • Use barplot() to compare PTBP1 PTBP2 gene
    expression in Hypothalamus.brain
  • Use barplot() to compare PTBP1 PTBP2 gene
    expression in Lung
  • What are the differences between the two plots
    indicative of?

40
Save Plot to File - RStudio
  • Create a .PNG file
  • Create a .PDF file

41
Save Plot to File in
  • Before running the visualizing function, redirect
    all plots to a file of a certain type
  • jpeg(filename)
  • png(filename)
  • pdf(filename)
  • postscript(filename)
  • After running the visualization function, close
    graphic device using dev.off() or graphcis.off()
  • For example
  • gt load(file"geneExprss.RData")
  • gt Tissues c ("Spinalcord", "Kidney",
    "Skeletal.muscle", "Heart")
  • gt pdf("Nova1BarPlot.PDF")
  • gt barplot ( sort (geneExprss "NOVA1", tissues )
    )
  • gt graphics.off()

42
Lecture Overview
  • What is R and why use it?
  • Setting up R RStudio for use
  • Calculations, functions and variable classes
  • File handling, plotting and graphic features
  • Statistics
  • Packages and writing functions

43
Statistics cor.test()
gt geneExprss read.table (file
"geneExprss.txt", sep "\t", header T) gt
cor.test ( geneExprssHippocampus.brain,
geneExprssThalamus.brain, method "pearson") gt
cor.test ( geneExprssHippocampus.brain,
geneExprssThalamus.brain, method "spearman")
  • A few slides back we compared the expression
    profiles of the Hippocampus.brain and the
    Thalamus.brain
  • But is that correlation statistically
    significant?
  • R can help with this sort of question as well
  • To answer that specific question well use the
    cor.test() function

44
Statistics More Testing, FYI
  • t.test() Student t test
  • wilcox.test() Mann-Whitney test
  • kruskal.test() Kruskal-Wallis rank sum test
  • chisq.test() chi squared test
  • cor.test() pearson / spearman correlations
  • lm(), glm() linear and generalized linear
    models
  • p.adjust() adjustment of P-values for multiple
    testing (multiple testing correction) using FDR,
    bonferroni, etc.

45
Statistics Examine the Distribution of
Your Data
  • Use the summary() function
  • gt geneExprss read.table (file
    "geneExprss.txt", sep "\t", header T)
  • gt summary(geneExprssLiver)
  • Min. -1.844001st Qu. -0.17290 Median -0.05145
    Mean -0.08091 3rd Qu. 0.05299 Max. 0.63950

46
Statistics More Distribution Functions
  • mean()
  • median()
  • var()
  • min()
  • max()
  • When using most of these functions remember to
    use argument na.rmT

47
Lecture Overview
  • What is R and why use it?
  • Setting up R RStudio for use
  • Calculations, functions and variable classes
  • File handling, plotting and graphic features
  • Statistics
  • Packages and writing functions

48
Functions Packages
  • All operations are performed by functions
  • All R functions are stored in packages
  • Base packages are installed along with R
  • Packages including additional functions can by
    downloaded by user
  • Functions can also be written by user

49
Install Load Packages - RStudio
50
Install Load Packages -
  • Use the functions
  • Install.packages(package_name)
  • update.packages(package_name)
  • library(package_name) Load a package

51
Final Tips
  • Reading the functions help file (gt
    ?function_name)
  • Run the help file examples
  • Use http//www.rseek.org/
  • Google what youre looking for
  • Post on the R forum webpage
  • And most importantly play with it, get the hang
    of it, and do NOT despair ?
Write a Comment
User Comments (0)
About PowerShow.com