Bioconductor

About This Presentation

Title:

Bioconductor

Description:

Byron Ellis, Department of Statistics, Harvard University, USA. ... Kurt Hornik, Technische Universitat Wien, Austria. ... Technische Universitat Wien, Austria. ... – PowerPoint PPT presentation

Number of Views:284

Avg rating:3.0/5.0

Slides: 82

Provided by: statBe

Category:

Tags: bioconductor

more less

Transcript and Presenter's Notes

Title: Bioconductor

1
Bioconductor

Sandrine Dudoit
Division of Biostatistics, UC Berkeley
www.stat.berkeley.edu/sandrine
MGED7
September 8, 2004
Toronto, Canada

2
Core Development Team

Douglas Bates, University of Wisconsin,
Madison,USA.
Benjamin Bolstad, Division of
Biostatistics, UC Berkeley, USA.
Vincent Carey, Harvard Medical School,
USA.
Marcel Dettling, Federal Inst. Technology,
Switzerland.
Sandrine Dudoit, Division of
Biostatistics, UC Berkeley, USA.
Byron Ellis, Department of Statistics,
Harvard University, USA.
Laurent Gautier, Technical University of
Denmark, Denmark.
Robert Gentleman, Harvard Medical School,
USA.
Jeff Gentry, Dana-Farber Cancer Institute,
USA.
Kurt Hornik, Technische Universitat Wien,
Austria.
Torsten Hothorn, Institut fuer
Medizininformatik, Biometrie und Epidemiologie,
Germany.
Wolfgang Huber, DKFZ Heidelberg, Germany.
Stefano Iacus, University of Milan, Italy
Rafael Irizarry, Department of
Biostatistics, Johns Hopkins University, USA.
Friedrich Leisch, Technische Universitat
Wien, Austria.
James MacDonald, University of Michigan, USA.
Martin Maechler, Federal Inst. Technology,
Switzerland.
Crispin Miller, The Paterson Institute
Bioinformatics Group, UK.
Colin Smith, NASA Center for Astrobioinformatics,
USA.

3
References

Bioconductor www.bioconductor.org
software, data, and documentation (vignettes)
training materials from short courses
mailing list.
R www.r-project.org, cran.r-project.org
software base and contributed (CRAN)
documentation
newsletter R News
mailing list.
Bioconductor Project Working Papers
www.bepress.com/bioconductor.
Personal
www.stat.berkeley.edu/sandrine.

4
Outline

Overview of the Bioconductor Project.
Getting Started R and Bioconductor.
Hands On!

5
Overview of the Bioconductor Project
6
Bioconductor

Bioconductor is an open-source and
open-development software project for the
analysis of biomedical and genomic data.
The project was started in the Fall of 2001 and
includes 25 core developers in the US, Europe,
and Australia.
R and the R package system are used to design and
distribute software.
Semi-annual releases
v 1.0 May 2nd, 2002, 15 packages.
v 1.4 May 17th, 2004, 81 packages.
ArrayAnalyzer Commercial port of Bioconductor
packages in S-Plus.

7
Goals

Provide access to powerful statistical and
graphical methods for the analysis of biomedical
and genomic data.
Facilitate the integration of biological metadata
from WWW in the analysis of experimental data.
E.g. GenBank, GO, LocusLink, PubMed.
Allow the rapid development of extensible,
interoperable, and scalable software.
Promote high-quality documentation and
reproducible research.
Provide training in computational and statistical
methods.

8
Bioconductor Packages

Bioconductor software consists of R add-on
packages.
An R package is a structured collection of code
(R, C, or other), documentation, and/or data for
performing specific types of analyses.
E.g. affy, cluster, graph, hexbin packages
provide implementations of specialized
statistical and graphical methods.

9
Bioconductor Packages

Statistical methods cluster analysis, estimation
and testing for linear and non-linear models
(with possibly censored continuous and
polychotomous outcomes), multiple hypothesis
testing, resampling, visualization, etc.
Biological assays cell-based assays, DNA
microarrays (transcript levels, DNA copy number
from CGH), proteomics, SAGE, SELDI-TOF, SNP, etc.
Biological metadata from WWW GenBank, GO, KEGG,
PubMed, etc.
Interfaces with other languages C, Java, Perl,
Python, XML, etc. -- Omega Project
(www.omegahat.org).
Interactions with other projects BGL,
GeneSpring, Graphviz, MAGE-ML, Resourcerer, etc.
R as a broker.

10
Bioconductor Packages

Data packages
Biological metadata mappings between different
gene identifiers (e.g., AffyID, GO ID, LocusID,
PMID), CDF and probe sequence information for
Affy arrays.
E.g. hgu95av2, GO, KEGG.
Experimental data code, data, and documentation
for specific experiments or projects.
ALL Chiaretti et al. (2004) ALL data.
golubEsets Golub et al. (2000) ALL/AML data.
yeastCC Spellman et al. (1998) yeast cell
cycle.
Course packages code, data, documentation, and
labs for the instruction of a particular course.
E.g. EMBO03 course package.

11
Bioconductor Packages

Bioconductor provides two main classes of
software packages.
End-user packages
aimed at users unfamiliar with R or computer
programming
polished and easy-to-use interfaces to a wide
variety of computational and statistical methods
for the analysis of biomedical and genomic data.
Developer packages aimed at software developers,
in the sense that they provide software to write
software.

12
Bioconductor PackagesRelease 1.4, May 17th,
2004Over 80 packages!

General infrastructure
Biobase, Biostrings, DynDoc, reposTools, rhdf5,
ruuid, tkWidgets, widgetTools.
Annotation
annotate, AnnBuilder metadata packages.
Graphics
geneplotter, hexbin.
Pre-processing Affymetrix oligonucleotide chip
data
affy, affycomp, affydata, affylmGUI, affyPLM,
annaffy, gcrma, makecdfenv, vsn.
Pre-processing two-color spotted DNA microarray
data
arrayMagic, arrayQuality, limma, limmaGUI,
marray, vsn.
Other assays aCGH, DNAcopy, prada, PROcess,
RSNPer, SAGElyzer.
Differential gene expression
EBarrays, edd, factDesign, genefilter, limma,
limmaGUI, multtest, ROC.
Graphs and networks
graph, RBGL, Rgraphviz.
Gene Ontology GOstats, goTools.
MAGE RMAGEML.

N.B. Many new packages in Bioconductor
development version.
13
Ongoing Efforts
Many methods already implemented in CRAN packages.

Variable/model selection
Prediction
Cluster analysis
Resampling bootstrap, cross-validation
Multiple testing procedures
Quality measures for microarray data
Other biological data types e.g., proteomics,
sequence analysis
Interactions with other projects
Web services.

14
Microarray Data Analysis
.gpr, .Spot
CEL, CDF
marray limma vsn
affy vsn
Pre-processing
exprSet
Annotation
annotate annaffy metadata packages
Differential expression
Graphs networks
Cluster analysis
Prediction
CRAN class e1071 ipred LogitBoost MASS nnet random
Forest rpart
graph RBGL Rgraphviz
edd genefilter limma multtest ROC CRAN
CRAN class cluster MASS mva
Graphics
geneplotter hexbin CRAN
15
Microarray Data Analysis

Pre-processing of
spotted array data with marray packages
Affymetrix array data with affy packages.
List of differentially expressed genes from
genefilter, limma, or multtest packages.
Prediction of tumor class using randomForest
package.
Clustering of genes using cluster package.
Use of annotate package
to retrieve and search PubMed abstracts
to generate an HTML report with links to
LocusLink for each gene.

16
marray

Pre-processing two-color spotted array data
diagnostic plots,
robust adaptive normalization (lowess, loess).

maImage
maBoxplot
maPlot hexbin
17
arrayMagic
R Rb R-Rbcolor scale by rank
another array print-tip
color scale log(G)
color scale rank(G)
Spatial effects
18
affy

Pre-processing oligonucleotide chip data
diagnostic plots,
background correction,
probe-level normalization,
computation of expression measures.

plotAffyRNADeg
barplot.ProbeSet
image
plotDensity
19
vsn

Variance stabilization (shrinkage) more stable
expression estimates in cases where there are few
replicates.
Model-based normalization parameter estimation
for affine calibration and additive-multiplicative
error model.

20
limma

LInear Models for MicroArrays pre-processing
and differential expression
Pre-processing background correction,
normalization.
Complex experimental designs, e.g.,
multifactorial.
Empirical Bayes methods for identifying
differentially expressed genes t-statistics,
F-statistics, posterior odds.
Inference methods for duplicate spots and
technical replicates.
Analysis based on log-ratios or absolute
log-intensities.
Spot quality weights.
Graphics heat diagrams, Venn diagrams.

21
limmaGUI
22
aCGH

Pre-processing imputation of missing values
(lowess), filtering.
Visualization measured and derived information
as a function of genomic position.
HMM-based algorithm for finding genomic events,
e.g., copy number transitions and high-level
amplifications.
Perform and interpret tests for associations
between clinical variables and copy number of
individual loci as well as collective features of
genomic profiles

23
Statistics and significance cut-off
Copy number transitions
Frequency plot
Genomic profile
24
annotate, annafy, and AnnBuilder
Metadata package hgu95av2 mappings between
different gene identifiers for hgu95av2 chip.

Assemble and process genomic annotation data from
public repositories.
Build annotation data packages or XML data
documents.
Associate experimental data in real time to
biological metadata from web databases such as
GenBank, GO, KEGG, LocusLink, and PubMed.
Process and store query results e.g., search
PubMed abstracts.
Generate HTML reports of analyses.

GENENAME zinc finger protein 261
LOCUSID 9203
ACCNUM X95808
MAP Xq13.1
AffyID 41046_s_at
SYMBOL ZNF261
PMID 10486218 9205841 8817323
GO GO0003677 GO0007275 GO0016021
many other mappings
25
stats

heatmap
26
R Cluster Analysis Packages

cclust convex clustering methods.
class self-organizing maps (SOM).
cluster
AGglomerative NESting (agnes),
Clustering LARe Applications (clara),
DIvisive ANAlysis (diana),
Fuzzy Analysis (fanny),
MONothetic Analysis (mona),
Partitioning Around Medoids (pam).
e1071
fuzzy C-means clustering (cmeans),
bagged clustering (bclust).
flexmix flexible mixture modeling.
fpc fixed point clusters, clusterwise regression
and discriminant plots.
GeneSOM self-organizing maps.
mclust, mclust98 model-based cluster analysis.
mva
hierarchical clustering (hclust),
k-means (kmeans).

Download these and other packages from CRAN.
27
R Class Prediction Packages
Download these and other packages from CRAN.

class
k-nearest neighbor (knn),
learning vector quantization (lvq).
classPP projection pursuit.
e1071 support vector machines (svm).
ipred bagging, resampling based estimation of
prediction error.
knnTree k-nn classification with variable
selection inside leaves of a tree.
LogitBoost boosting for tree stumps.
MASS linear and quadratic discriminant analysis
(lda, qda).
mlbench machine learning benchmark problems.
nnet feed-forward neural networks and
multinomial log-linear models.
pamR prediction analysis for microarrays.
randomForest random forests.
rpart classification and regression trees.
sma diagonal linear and quadratic discriminant
analysis, naïve Bayes (stat.diag.da).

28
Getting StartedR and Bioconductor
29
About R

R Project (r-project.org) language and
environment for statistical computing and
graphics.
R is an open-source implementation of the S
language S-Plus is a commercial implementation.
Comprehensive R Archive Network, CRAN
(cran.r-project.org) source code and
pre-compiled binaries for Linux, Windows, MacOS
contributed packages documentation FAQs
mailing lists.
Omega Project (www.omegahat.org) by-directional
intersystem interfaces, e.g., R/Java, R/Perl,
R/Python, R/XML.

30
Installation

Main R software download from CRAN
(cran.r-project.org), use latest release, now
1.9.1.
Bioconductor packages download from Bioconductor
(www.bioconductor.org), use latest release, now
1.4.
Available for Linux/Unix, Windows, and MacOS.

31
Installating R

Latest released is version R 1.9.1.
From CRAN
Sources.
Linux Debian (apt-get), Mandrake, RedHat RPMs,
Suse, Vine.
Windows installer rw1091.exe, double-click on
icon and follow instructions.
MacOS X RAqua.
To customize installation, see R FAQs.
May need to set some environment variables,
e.g., R_HOME, R_LIBS, R_PROFILE.

32
Installating Bioconductor

After installing R, install Bioconductor packages
using getBioC install script.
From R
gt source("http//www.bioconductor.org/getBioC.R")
gt getBioC()
Can customize installation via arguments of
getBioC.
Other packages (biological metadata, experimental
data, courses) can be installed as described
below, using Windows pull-down menus or R
functions install.packages or installDataPackage.

33
R Packages

An R package is a structure collection of code
(R, C, or other), documentation, and/or data for
performing specific types of analyses,
Packages
Base packages (CRAN) e.g., base, methods, nls,
stats.
Contributed packages (CRAN) e.g., ellipse, XML.
Bioconductor packages e.g., annotate, affy,
marray, multtest, hu95av2, ALL.
In Linux, have a look at directory
/usr/lib/R (or wherever youve installed R).
In Windows, have a look at folders in
c\ProgramFiles\R\rw1091.

34
Installing vs. Loading

Packages only need to be installed once, but
they must be loaded with each new R session.
Installing functions install.packages,
installDataPackage
Unix command R INSTALL
Windows Packages pull-down menu.
Loading function library
Windows Packages pull-down menu.
gt library(Biobase)
Updating function update.packages
Windows Packages pull-down menu.

35
Starting and Quitting R

Start R command.
Quit q(). Prompted to save workspace image.
Save
current environment with save.image (default is
in .RData file)
specific R objects with save.
Can be read back using load.
Working directory getwd, setwd.
List objects ls, objects.
Search path for R objects search, searchpaths,
attach, detach.
Function arguments e.g., ? lm or args(lm).
R for Windows provides pull-down menus for the
above actions.

36
Documentation and Help

Manuals, FAQs, and tutorials available from R
and Bioconductor websites and on-line in an R
session.
R on-line help system detailed on-line
documentation, available in text, HTML, PDF, and
LaTeX formats.
gt help.start()
gt help(lm)
gt ? hclust
gt help.search(aproposprint)
gt apropos(mean)
gt example(hclust)
gt demo()
gt demo(image)
gt data()
R and Bioconductor mailing lists search
archives, post.
Short courses lectures notes, computer labs, and
course packages available on WWW for
self-instruction.
Vignettes openVignette(), vExplorer().
Google.
All on WWW.

37
Vignettes

Bioconductor has adopted a new documentation
paradigm, the vignette.
A vignette is an executable document consisting
of a collection of code chunks and documentation
text chunks.
Vignettes provide dynamic, integrated, and
reproducible statistical documents that can be
automatically updated if either data or analyses
are changed.
Vignettes can be generated using the Sweave
function from the R tools package.

38
Vignettes

Each Bioconductor package contains at least one
vignette, providing task-oriented descriptions of
the package's functionality.
Vignettes are located in the doc subdirectory of
an installed package and are accessible from the
help browser.
Vignettes can be used interactively.
Vignettes are also available separately from the
Bioconductor website.

39
Vignettes

Tools are being developed for managing and using
this repository of step-by-step tutorials
Biobase openVignette Menu of available
vignettes and interface for viewing vignettes
(PDF).
tkWidgets vExplorer Interactive use of
vignettes.
reposTools.

40
Vignettes

HowTos Task-oriented descriptions of package
functionality.
Executable documents consisting of documentation
text and code chunks.
Dynamic, integrated, and reproducible
statistical documents.
Can be used interactively vExplorer.
Generated using Sweave (tools package).

vExplorer
41
Hands On!
42
Extra Slides
43
Annotation

One of the greatest challenges in analyzing
genomic data is associating the experimental data
with the available biological metadata, e.g.,
sequence, gene annotation, chromosomal maps,
literature.
It is essential to make these data available for
computation.
Bioconductor provides three main packages for
this purpose
annotate (end-user)
AnnBuilder (developer)
annaffy (end-user).

44
WWW Resources

Nucleotide databases e.g., GenBank.
Gene databases e.g., LocusLink, UniGene.
Protein sequence and structure databases e.g.,
Protein DataBank (PDB), SwissProt.
Literature databases e.g., PubMed, OMIM.
Chromosome maps e.g., NCBI Map Viewer.
Pathways e.g., KEGG.
Entrez is a search and retrieval system that
integrates information from databases at NCBI
(National Center for Biotechnology Information).

45
annotate Matching IDs

Important tasks
Associate manufacturers or in-house probe
identifiers to other available identifiers.
E.g.
Affymetrix IDs ? LocusLink LocusID
Affymetrix IDs ? GenBank accession number.
Associate probes with biological data such as
chromosomal position, pathway membership.
Associate probes with published literature data
via PubMed (need PMID).

46
annotate Matching IDs
47
annotate Versioning

It is important to keep version information for
the mappings.
It is important to allow for new mappings to be
used when they become available.
There are some interesting challenges and
concerns that arise when comparing the strategies
of on-line mappings versus compiled mappings.

48
Annotation Data Packages

The Bioconductor project provides annotation data
packages, that contain many different mappings.
Mappings between Affy IDs and other probe IDs
hgu95av2 for HGU95Av2 GeneChip series, also,
hgu133a, hu6800, mgu74a, rgu34a, YG.
Affy CDF data packages.
Probe sequence data packages.
These packages are updated and expanded regularly
as new data become available.
They can be downloaded from the Bioconductor
website and also using installDataPackage.
DPExplorer a widget for interacting with data
packages.
AnnBuilder tools for building annotation data
packages.

49
annotate Matching IDs

Much of what annotate does relies on matching
symbols.
This is basically the role of a hash table in
most programming languages.
In R, we rely on environments.
The annotation data packages provide R
environment objects containing key and value
pairs for the mappings between two sets of probe
identifiers.
Keys can be accessed using the R ls function.
Matching values in different environments can be
accessed using the get or multiget functions.

50
annotate Matching IDs

gt library(hgu95av2)
gt get("41046_s_at", env hgu95av2ACCNUM)
1 "X95808
gt get("41046_s_at", env hgu95av2LOCUSID)
1 "9203
gt get("41046_s_at", env hgu95av2SYMBOL)
1 "ZNF261"
gt get("41046_s_at", env hgu95av2GENENAME)
1 "zinc finger protein 261"
gt get("41046_s_at", env hgu95av2SUMFUNC)
1 "Contains a putative zinc-binding motif
(MYM)Proteome"
gt get("41046_s_at", env hgu95av2UNIGENE)
1 "Hs.9568"

51
annotate Matching IDs

gt get("41046_s_at", env hgu95av2CHR)
1 "X"
gt get("41046_s_at", env hgu95av2CHRLOC)
X
-68692698
gt get("41046_s_at", env hgu95av2MAP)
1 "Xq13.1
gt get("41046_s_at", env hgu95av2PMID)
1 "10486218" "9205841" "8817323"
gt get("41046_s_at", env hgu95av2GO) TAS
TAS IEA
"GO0003677" "GO0007275" "GO0016021"

52
annotate Matching IDs

Instead of relying on the general R functions for
environments, new user-friendly functions have
been written for accessing and working with
specific identifiers.
E.g. getGO, getGOdesc, getLL, getPMID, getSYMBOL.

53
annotate Matching IDs

gt getSYMBOL("41046_s_at",data"hgu95av2")
41046_s_at
"ZNF261"
gt gglt- getGO("41046_s_at",data"hgu95av2")
gt getGOdesc(gg1, "MF")
"GO0003677"
"DNA binding activity"
gt getLL("41046_s_at",data"hgu95av2")
41046_s_at
9203
gt getPMID("41046_s_at",data"hgu95av2")
"41046_s_at"
1 10486218 9205841 8817323

54
annotate WWW Queries

The annotate package provides tools for
Querying and processing information from various
WWW biological databases
GenBank,
LocusLink,
PubMed.
Regular expression searching of PubMed abstracts.
Generating nice HTML reports of analyses, with
links to biological databases.

55
annotate WWW Queries

Functions for querying WWW databases from R rely
on the browseURL function
browseURL("www.r-project.org")
Other tools HTMLPage class, getTDRows,
getQueryLink, getQuery4UG, getQuery4LL,
makeAnchor .
The XML package is used to parse query results.

56
annotate Querying GenBank www.ncbi.nlm.nih.gov/Ge
nbank/index.html

Given a vector of GenBank accession numbers or
NCBI UIDs, the genbank function
opens a browser at the URLs for the corresponding
GenBank queries
returns an XMLdoc object with the same data.
gtgenbank(X95808,dispbrowser)
http//www.ncbi.nih.gov/entrez/query.fcgi?toolbi
oconductorcmdSearchdbNucleotidetermX95808
gtgenbank(1430782,dispdata,typeuid)

57
annotate Querying LocusLinkwww.ncbi.nlm.nih.gov
/LocusLink/

locuslinkByID given one or more LocusIDs, the
browser is opened at the URL corresponding to the
first gene
gt locuslinkByID(9203)
http//www.ncbi.nih.gov/LocusLink/LocRpt.cgi?l92
03
locuslinkQuery given a search string, the
results of the LocusLink query are displayed in
the browser
gt locuslinkQuery(zinc finger)
http//www.ncbi.nih.gov/LocusLink/list.cgi?Qzinc
fingerORGHsV0
getQuery4LL.

58
annotate Querying PubMed www.ncbi.nlm.nih.gov

For any gene there is often a large amount of
data available from PubMed.
The annotate package provides the following tools
for interacting with PubMed
pubMedAbst a class structure for PubMed
abstracts in R.
pubmed the basic engine for talking to PubMed
(pmidQuery).

59
annotate pubMedAbst Class

Class structure for storing and processing
PubMed abstracts in R
pmid
authors
abstText
articleTitle
journal
pubDate

60
annotate High-Level Tools for PubMed

pm.getabst download the specified PubMed
abstracts (stored in XML) and create a list of
pubMedAbst objects.
pm.titles extract the titles from a list of
PubMed abstracts.
pm.abstGrep regular expression matching on the
abstracts.

61
annotate PubMed Example

gt pmid lt- getPMID("41046_s_at",data"hgu95av2")
gt pubmed(pmid, dispbrowser)
http//www.ncbi.nih.gov/entrez/query.fcgi?toolbi
oconductorcmdRetrievedbPubMedlist_uids104862
182c92058412c8817323
gt absts lt- pm.getabst("41046_s_at",base"hgu95av2"
)
gt pm.titles(absts)
gt pm.abstGrep("mouse",absts1)

62
annotate PubMed Example
63
annotate PubMed HTML Report

The function pmAbst2HTML takes a list of
pubMedAbst objects and generates an HTML report
with the titles of the abstracts and links to
their full page on PubMed.
gt pmAbst2HTML(absts1,filename"pm.html")

64
pmAbst2html function from annotate package
pm.html
65
annotate Analysis Reports

A simple interface, ll.htmlpage, can be used to
generate an HTML report of analysis results.
The page consists of a table with one row per
gene, with links to LocusLink.
Entries can include various gene identifiers and
statistics.

66
ll.htmlpage function from annotate package
genelist.html
67
Data Complexity

Dimensionality.
Dynamic/evolving data e.g., gene annotation,
sequence, literature.
Multiple data sources and locations in-house,
WWW.
Multiple data types numeric, textual, graphical.
No longer Xnxp!
We distinguish between biological metadata and
experimental metadata.

68
Experimental Metadata

Gene expression measures
scanned images, i.e., raw data
image quantitation data, i.e., output from image
analysis
normalized expression measures, i.e., log ratios
or Affy expression measures.
Reliability/quality information for the
expression measures.
Information on the probe sequences printed on the
arrays (array layout).
Information on the target samples hybridized to
the arrays.
See Minimum Information About a Microarray
Experiment (MIAME) standards and new MAGEML
package.

69
Biological Metadata

Biological attributes that can be applied to the
experimental data.
E.g. for genes
chromosomal location
gene annotation (LocusLink, GO)
relevant literature (PubMed).
Biological metadata sets are large, evolving
rapidly, and typically distributed via the WWW.
Tools annotate, annaffy, and AnnBuilder
packages, and annotation data packages.

70
OOP

The Bioconductor project has adopted the
object-oriented programming (OOP) paradigm
proposed in J. M. Chambers (1998). Programming
with Data.
This object-oriented class/method design allows
efficient representation and manipulation of
large and complex biological datasets of multiple
types.
Tools for programming using the class/method
mechanism are provided in the R methods package.
Tutorialwww.omegahat.org/RSMethods/index.html.

71
OOP Classes

A class provides a software abstraction of a real
world object. It reflects how we think of
certain objects and what information these
objects should contain.
Classes are defined in terms of slots which
contain the relevant data.
An object is an instance of a class.
A class defines the structure, inheritance, and
initialization of objects.

72
OOP Methods

A method is a function that performs an action on
data (objects).
Methods define how a particular function should
behave depending on the class of its arguments.
Methods allow computations to be adapted to
particular data types, i.e., classes.
A generic function is a dispatcher, it examines
its arguments and determines the appropriate
method to invoke.
Examples of generic functions in R include plot,
summary, print.

73
exprSet Class
Processed Affymetrix or spotted array data
exprs
Matrix of expression measures, genes x samples
Matrix of SEs for expression measures, genes x
samples
se.exprs
phenoData
Sample level covariates, instance of class
phenoData
annotation
Name of annotation data
description
MIAME information

Use of object-oriented programming
to deal with data complexity.
S4 class/method mechanism
(methods package).

notes
Any notes
74
marrayRaw Class
Pre-normalization intensity data for a batch of
arrays
maRf
maGf
Matrix of red and green foreground intensities
maRb
maGb
Matrix of red and green background intensities
maW
Matrix of spot quality weights
maLayout
Array layout parameters - marrayLayout
Description of spotted probe sequences -
marrayInfo
maGnames
maTargets
Description of target samples - marrayInfo
Any notes
maNotes
75
AffyBatch Class
Probe-level intensity data for a batch of arrays
(same CDF)
cdfName
Name of CDF file for arrays in the batch
nrow
ncol
Dimensions of the array
exprs
Matrices of probe-level intensities and SEs rows
? probe cells, columns ? arrays.
se.exprs
phenoData
Sample level covariates, instance of class
phenoData
annotation
Name of annotation data
description
MIAME information
Any notes
notes
76
Sweave

The Sweave system allows the generation of
dynamic, integrated, and reproducible statistical
documents intermixing text, code, and code output
(textual and graphical).
Functions are available in the R tools package.
See ? Sweave and manual www.ci.tuwien.ac.at/leisc
h/Sweave/.

77
Sweave Input

Input a text file which consists of a sequence
of code chunks and documentation text chunks
(noweb file).
Documentation chunks
start with _at_
text in a markup language like LaTeX.
Code chunks
start with ltltnamegtgt
R or S-Plus code.
File extension .rnw, .Rnw, .snw, .Snw.

78
Sweave Output

Output a single document, e.g., .tex file or
.pdf file containing
the documentation text,
the R code,
the code output text and graphs.
The document can be automatically regenerated
whenever the data, code, or documentation text
change.
Stangle or tangleToR extract only the code
chunks.

79
Sweave
main.Rnw
main.R
Stangle
Sweave
main.tex
fig.pdf
fig.eps
latex
pdflatex
main.dvi
main.pdf
dvips
main.ps
80
Widgets

Widgets. Small-scale graphical user interfaces
(GUI), providing point click access for
specific tasks.
E.g. File browsing and selection for data input,
basic analyses.
Packages
tkWidgets dataViewer, fileBrowser, fileWizard,
importWizard, objectBrowser.
widgetTools.

81
Widgets
Reading in phenoData
tkSampleNames
tkphenoData
tkMIAME

Write a Comment

User Comments (0)