BIOS816/VBMS818 Lecture 8 - PowerPoint PPT Presentation

About This Presentation
Title:

BIOS816/VBMS818 Lecture 8

Description:

BIOS816/VBMS818. Lecture 8 Microarray Analysis. Guoqing Lu ... related objects) or divisive (building the tree by finding the most dissimilar objects first) ... – PowerPoint PPT presentation

Number of Views:21
Avg rating:3.0/5.0
Slides: 31
Provided by: bioinfo5
Category:

less

Transcript and Presenter's Notes

Title: BIOS816/VBMS818 Lecture 8


1
BIOS816/VBMS818 Lecture 8 Microarray Analysis
  • Guoqing Lu
  • Office E115 Beadle CenterTel (402)
    472-4982Email glu3_at_unl.eduWebsite
    http//biocore.unl.edu

2
Introduction to DNA Microarray
  • Microarray revolutionized biology and medicine
    research
  • One gene at a time before, now tens of thousands
    simultaneously
  • Gene expression
  • Measure the expression levels of many thousands
    of genes in only a few biological samples
  • E.g., sample from specific organ to show which
    genes are expressed
  • E.g., compare samples from healthy and sick host
    to find gene-disease connection
  • E.g., probes are sets of human pathogens for
    disease detection

3
Introduction to DNA Microarray (contd)
  • Replicates are needed
  • Technical replicates, i.e. measuring gene
    expression with the same starting material on
    independent arrays
  • Biological replicates, e.g. measuring gene
    expression from multiple cell lines
  • The challenge to the biologist is to apply
    appropriate statistical techniques to determine
    which changes are relevant

Affymetrix U133 Plus 2.0 47,000 x 11 x 2
4
Microarray Experiment
  • http//www.bio.davidson.edu/courses/genomics/chip/
    chip.html

Flash Animation
5
Microarray Technology
  • Basic principle is the same
  • DNA complementary to genes of interest is
    generated and laid out in microscopic quantities
    on solid surfaces at defined positions
  • DNA from samples is eluted over the surface,
    complementary DNA binds
  • Presence of bound DNA is detected by florescence
    following laser excitation

6
Two Different Techniques
  • Spotted cDNA
  • DNA sequences are laid down through spotting
  • Complete sequences are laid down
  • Cheaper
  • Usually measures relative expression in two
    samples
  • E.g., Systeni/Stanford
  • Oligonucleotide arrays
  • DNA sequences are laid down through
    photolithography
  • a series of fragments are laid down
  • Probably give higher quality results
  • Usually measures expression in a single sample
  • Mainly supplied by Affymetrix Inc.

7
Spotted cDNA
  • Uses available cDNA libraries to create the array
  • Quality depends on choice of cDNAs
  • Cross hybridization and non-specific binding can
    be a problem
  • Reduce cross hybridization by choosing highly
    gene specific DNA for the spots

mRNA
8
Oligonucleotide arrays
  • Each gene represented by 11 20 paired oligos
  • Each represents a different part of the gene
  • Oligos are produced in situ on the chip
  • Each pair comprises two 25-mers
  • Perfect match (PM)
  • Mismatch (MM)

Affymetrix uses a unique combination of
photolithography and combinatorial chemistry to
manufacture GeneChip Arrays.
http//www.affymetrix.com/technology/manufacturing
/index.affx
9
PM and MM oligos
PM ATCTGCGTGTCGTAGTGTGACCCCA MM
ATCTGCGTGTCGAAGTGTGACCCCA
  • By measuring the difference in hybridization
    to the PM and MM oligos the effect of
    non-specific and cross hybridization is minimised
  • Using 11-20 pairs from different parts of the
    gene these effects are further reduced

https//www.affymetrix.com/support/downloads/manua
ls/data_analysis_fundamentals_manual.pdf
10
Microarray Data Analysis
  • Data preprocessing
  • allow data sets from two (or more) samples to be
    compared to each other
  • Inferential statistics
  • hypothesis testing
  • the likelihood that particular genes are
    significantly regulated
  • Descriptive (exploratory) statistics
  • clustering and principal components analysis
  • inspect the complex data set for biologically
    meaningful patterns

11
Microarray data analysis
  • Begin with a data matrix (gene expression values
    versus samples)
  • Typically, there are many genes (gtgt 10,000) and
    few samples ( 10)

12
Preprocessing normalization
  • Normalization is needed
  • To compare signal intensities on two arrays
  • To compare two mRNA samples on the same array
  • Make sure the samples are equivalent in some sense

13
Normalization methods
  • Adjust spot values to show
  • Same total mRNA in all samples
  • Or, Same expression level for certain
    housekeeping genes
  • Use spiked controls
  • Add equal amounts of a different mRNA to each
    sample and normalize to equalize intensity for
    these spots

cDNA Array
Oligo Array
Both Arrays
14
Data analysis global normalization
  • Global normalization procedure
  • Step 1 subtract background intensity values (use
    a blank region of the array)
  • Step 2 globally normalize so that the average
    ratio 1 (apply this to 1-channel or 2-channel
    data sets)

Do Exercise!
Affymetrix scaling!!!
15
Scatter plots
  • Useful to represent gene expression values from
    two microarray experiments (e.g. control,
    experimental)
  • Each dot corresponds to a gene expression value
  • Most dots fall along a line
  • Outliers represent up-regulated or down-regulated
    genes

16
Inferential statistics
  • Inferential statistics are used to make
    inferences about a population from a sample.
  • Hypothesis testing is a common form of
    inferential statistics
  • A null hypothesis is stated, such as There is
    no difference in signal intensity for the gene
    expression measurements in normal and diseased
    samples.
  • The alternative hypothesis is that there is a
    difference.
  • We use a test statistic to decide whether to
    accept or reject the null hypothesis. For many
    applications, we set the significance level a to
    p lt 0.05.

17
Inferential statistics
Paradigm Parametric test Nonparametric
Compare two unpaired groups Unpaired t-test Mann-Whitney test
Compare two paired groups Paired t-test Wilcoxon test
Compare 3 or more groups ANOVA
18
Significance analysis of microarrays (SAM)
  • SAM
  • an Excel plug-in
  • modified t-test
  • adjustable false discovery rate

http//www-stat.stanford.edu/tibs/SAM/
19
SAM
up- regulated
observed
expected
down-regulated
20
Descriptive statistics
  • Microarray data are highly dimensional there are
    many thousands of measurements made from a small
    number of samples.
  • Descriptive (exploratory) statistics help you to
    find meaningful patterns in the data.
  • A first step is to arrange the data in a matrix.
  • Next, use a distance metric to define the
    relatedness of the different data points.
  • Two commonly used distance metrics are
  • Euclidean distance
  • Pearson coefficient of correlation

21
Descriptive statistics clustering
  • Clustering algorithms offer useful visual
    descriptions of microarray data.
  • Genes may be clustered, or samples, or both.
  • This may be agglomerative (building up the
    branches of a tree, beginning with the two most
    closely related objects) or divisive (building
    the tree by finding the most dissimilar objects
    first).
  • In each case, we end up with a tree having
    branches and nodes.

22
agglomerative
4
3
2
1
0
a
a,b
b
a,b,c,d,e
c
c,d,e
d
d,e
e
4
3
2
1
0
divisive
23
Cluster and TreeView
  • Perform a variety of types of cluster analysis
    and other types of processing on large microarray
    datasets
  • Clustering
  • K means
  • SOM
  • PCA

http//rana.lbl.gov/EisenSoftware.htm
24
Cluster and TreeView
http//rana.lbl.gov/manuals/ClusterTreeView.pdf
25
Cluster and TreeView
26
Two-way clustering of genes (y-axis) and cell
lines (x-axis) (Alizadeh et al., 2000)
27
K-means clustering
  • Clusters the expression profiles into K clusters
  • You have to specify K
  • Produces clusters that are as tight as possible
  • Each cluster has a centroid or mean expression
    profile
  • Tightness of clusters measured by the sum of
    squared distances between each gene and the
    centroid of its cluster
  • Algorithm tries to minimise this sum

28
Self-organizing maps (SOM)
  • Unlike k-means clustering, which is unstructured,
    SOMs allow one to impose partial structure on the
    clusters.
  • The principle of SOMs
  • One chooses an initial geometry of nodes such
    as a 3 x 2 rectangular grid (indicated by solid
    lines in the figure connecting the nodes).
  • Hypothetical trajectories of nodes as they
    migrate to fit data during successive iterations
    of SOM algorithm are shown.
  • Data points are represented by black dots, six
    nodes of SOM by large circles, and trajectories
    by arrows.

29
Microarray Software
30
Exercise
  • http//pevsnerlab.kennedykrieger.org/hinxton.html
  • Thanks to Dr. Jonathan Pevsner
Write a Comment
User Comments (0)
About PowerShow.com