Hierarchical Clustering in R - PowerPoint PPT Presentation

1 / 11
About This Presentation
Title:

Hierarchical Clustering in R

Description:

Hierarchical Clustering in R Quick R Tips How to find out what packages are available library() How to find out what packages are actually installed locally ... – PowerPoint PPT presentation

Number of Views:258
Avg rating:3.0/5.0
Slides: 12
Provided by: CarolB170
Category:

less

Transcript and Presenter's Notes

Title: Hierarchical Clustering in R


1
Hierarchical Clustering in R
2
Quick R Tips
  • How to find out what packages are available
  • library()
  • How to find out what packages are actually
    installed locally
  • (.packages())

3
Hierarchical Clustering
  • A type of cluster analysis
  • There is both divisive and agglomerative
    HCagglomerative is most commonly used
  • Group objects that are close to one another
    based on some distance/similarity metric
  • Clusters are created and linked based on a metric
    that evaluates the cluster-to-cluster distance
  • Results are displayed as a dendrogram

4
(No Transcript)
5
Step 1 Data matrix
  • First you need a numeric matrix
  • Typical array data set will have samples as
    columns and genes as rows
  • We want to be sure our data are in the form of an
    expression matrix
  • Use Biobase library/package
  • See http//www.bioconductor.org/packages/2.2/bioc/
    vignettes/Biobase/inst/doc/ExpressionSetIntroducti
    on.pdf
  • gt exprslt-as.matrix(data, headerTRUE, sep"\t",
    row.names1, as.isTRUE)

6
Step 2 Calculate Distance Matrix
  • Default dist() method in R uses rows as the
    vectors..but we want the distance between
    samples.i.e., the columns of our matrix.
  • There is a handy package to help us at MD
    Anderson called oompaBase
  • source("http//bioinformatics.mdanderson.org/OOMPA
    /oompaLite.R")
  • oompaLite()
  • oompainstall(groupName"all")
  • Once installed, be sure to locally activate the
    libraries
  • library(oompaBase)
  • library(ClassDiscovery)
  • library(ClassComparison)
  • oompaBase also requires the mclust and cobs
    packagesdownload these from CRAN

7
  • Use the function distanceMatrix() to create a
    distance matrix of your samples.
  • Uses the expression set created in Step 1 as
    input
  • Remember that there are many different types of
    distance metrics to choose from!
  • See help(distanceMatrix)
  • xlt- distanceMatrix(exprs,'pearson')

8
Step 3 Cluster
  • Use the hclust() function to create a
    hierarchical cluster based on your distance
    matrix, x, created in Step 2.
  • gt ylt-hclust(x,method"complete")
  • gt plot(y)

9
Testing for Differential Gene Expression with the
T-test
10
  • Get the multtest package from CRAN
  • Package contains data from the Golub leukemia
    microarray data set (ALL v AML)
  • 38 arrays
  • 27 from lymphoblastic
  • 11 from myeloid

http//people.cryst.bbk.ac.uk/wernisch/macourse/
11
  • library(multtest)
  • data(golub)
  • golub.cl
  • Generate the T statistic
  • teststat lt-mt.teststat(golub, golub.cl)
  • Convert into P-values
  • rawp0 lt-2pt(abs(teststat),lower.tailF, df38-2)
  • Correct for multiple testing and show the ten
    most significant genes
  • procs lt-c(Bonferroni, BH)
  • reslt-mt.rawp2adjp((rawp0), procs)
  • resadjp110,

http//people.cryst.bbk.ac.uk/wernisch/macourse/
Write a Comment
User Comments (0)
About PowerShow.com