Title: Introduction%20into%20Micro%20Array%20Analysis
1Introduction into Micro Array Analysis
- Winfried Krueger, Ph. D.
- UCHC M.A.C.
2Analysis of Single Nucleotide Polymorphisms
3Genotyping through Oligonucleotide Micro Arrays
Photolitho- graphic mask
0.8 cm
Array Production
gene specific arrayed oligonucleotides
Target synthesis
4cDNA vs Oligonucleotide Arrays
cDNA arrays Oligonucleotide arrays
Probe Size 400 - gt 2000bp 25 -70 nts
GC Contents among and within probes variable Predetermined
Probe dependence on labeling protocol no yes
Access to tissue specific arrays fast slow
Spotting amounts with simple chemistry 100-200 ng/ml 1 mg/ml
Binding chemistry (low probe amounts) simple complex
Expression profiles for sequenced genes yes yes
Expression profiles for part. seq. genes yes no
Direct detection of SNPs no yes
Enzyme mediated detection of SNPs yes yes
5eSensor Nanotechnology based Switch
6- Basic Concepts in Molecular Biology
- Introduction into Micro Array Methodology
- Image and Data Analysis
7Micro Array Imaging
- Micro array imaging methods
- Phosphoimaging (33P)
- Fluorescence Imaging
-
- Confocal
- epifluorescence microscopy
- CCD technology)
8Array Analysis of P2 brain mRNA
Cy3 channel P2 RNA
Cy5 channel P2 RNA
9Micro Array Analysis
- Micro array data analysis consists of two main
procedures - Image analysis
- Pattern recognition
- - Data normalization
- - Data curation and
- - Cluster Analysis
-
10Generation of a simplified Expression Profile
Profile
Series of transcriptogram
11Comparative Analysis of cDNA Microarrays
- Microarrays contain large amounts of system
specific expression data - Linkage of related experimental systems increases
the information value of each individual
experiment and reduces redundancy - Replacement of one experimental condition with a
unified reference (RNA sample or general
oligonucleotide) provides the standardization
required for comparative analyses through use of - Microarray image databases (AMAD/NOMAD etc.)
12Image Analysis
13(No Transcript)
14(No Transcript)
15Selected Methods for Background Computation
Standard background determination Radial
Background determination Autocorrecting
Background determination
16Image analysis - General Flowdiagram
Compute Pixel intensities across ROI (defined
manually or by grid file) gt compute signal area
Compute Pixel intensities across preset square
background area gt compute corrected signal
intensities per pixel
Compute Pixel intensities across preset radial
area 1 gt compute corrected signal intensities
per pixel
Compute Background intensitie from area2 area1
intensites gt compute corrected signal
intensities per pixel
Compute Pixel intensities across preset radial
area 2 gt compute corrected signal intensities
per pixel
17Autocorrecting Image analysis - Flowdiagram
Compare Pixel intensities across ROI gt
compute signal area Determine spot morphology
parameters of signal area gt center, radius,
circumference Determine intensity distributions
across background and signal areas gt correlate
with optimized distribution Calculate
aberration of experimental vs optimized signal
distribution gt substitute outlying pixels with
anticipated pixel intensities Calculate
corrected background and signal intensities gt
compute corrected signal intensities per pixel
18(No Transcript)
19Comparative Analysis of an Array Image
P2VsP2 ImaGene
P2VsP2 Gleams
20Data Variability
21Data Curation
- Array inherent error sources interfere with
accurate signal - quantitation even if sophisticated image analysis
algorithms are - employed
- Target synthesis effects (cDNA length, integrity
and ribosomal cDNA contamination) - Labeling effects (labeling efficiencies)
- Dye effects (higher incorporation rates for one
dye) - Substrate effects (DNA binding capacity, coating
homogeneity etc.) - Dye gene effects (preferential labeling of one
gene with one dye) - Data curation requires
- Experimental approaches
- Statistical Analysis
22Considerations for Micro Array Experiments
- High throughput format/Transcript coverage in
micro array experiments is inversely correlated
to experimental parameters that depend on the
amount of substrate immobilized probe - Sensitivity (function of scanner, probe and
target amounts) - Reproducibility (function of substrate, target
and arrayer) - Signal/noise ratios (function of substrate,
target, hybridization specificity and sequence) - Breadth (function of RNA abundancy)
-
- spot density reduction increases reliability and
- forces the development of customized clonesets
-
23Experimental Approaches to Data Curation
- Replicate and flip dye experiments
- Competitive hybridization or simultaneous
hybridization both experimental and control
samples - A second experimental condition
- Universal control RNA/Oligonucleotide from a
large mixture of different cell types - Simultaneous hybridization to chip immobilized
sense probes and scrambled sense probes - Internal controls
- Arraying of replicate probes within each array
- Incorporation of genes with constant expression
levels into the micro array - Arraying of heterologous cDNAs and spiking of
target RNA with respective in vitro transcripts
for positive/negative controls and for
normalization of the fluorescence intensities - Arraying of sense oligonucleotide and
randomized sense oligonucleotides
24Statistical Methods for Data Curation
- Currently implemented statistical methods for
data significance calculation - Holms p-value
- T-test Bayesian T-test (2 categories)
- Mann-Whitney test (2 categories)
- ANOVA (gt2 categories KerrChurchill)
- Kruskal-Wallis test (2 categories)
25Pattern Recognition
- Micro Array image data specify genes for further
functional - investigation based on their regulation.
-
- Filtering of gene groups through pattern
recognition - - Expression Profiling
- Expression profiles represent Fingerprints
characteristic for the transcriptional state of
cells - The significance of expression profiles is
measured as the - probability for their non random occurrence
specified through a - P (probability) - value.
This value is generally computed through massive
permutation of data points within and across
replicate experiments.
26Pattern recognition - Cluster Analysis
- Cluster analysis is a method of organizing genes
(objects) into groups whose members demonstrate
similar expression profiles (relatedness)
Genes within a group demonstrate expression
profiles more similar among the members of that
group than among members of different groups - clustering procedures are predominantly based on
hierachial clustering or partitioning
27Interpretation of Primary Data by Cluster Analysis
Primary microarray data
- yield near quantitative transcriptional and
translational (indirectly) activation data - specify genes for further functional
characterization on the basis of compelling
expression profiles
but most genes with are functionally
uncharacterized and a targeted research approach
requires a further data analysis
- Cluster analysis
- genes with similar expression profiles often
participate in related biochemical pathways
(guilt by association principle)
28Cluster Analysis - Definition
- Cluster analysis is a method of organizing genes
(objects) into groups whose members demonstrate
similar expression profiles (relatedness)
Genes within a group demonstrate expression
profiles more similar among the members of that
group than among members of different groups - clustering procedures are predominantly based on
hierachial clustering or partitioning
29Types of Cluster Algorithms
A) Hierachial clustering
- B) Partitional clustering (Traditional Methods)
- Nearest Neighbor Clustering
- K-means
- Self organizing maps(SOM)/unsupervised neural
network
- C) Partitional clustering (Hypergraph based
Methods) - Support Vector Machines
- C45 Tree Decision Algorithms
30Hierachial and K- means/SOM Cluster Analysis
Hierachial
K-means/SOM
1
5
16
types
- K-means and SOMs calculate a mean of group
members (prototype) for each cluster and and
define a pattern according to the similarity
between prototypes - K-means uses a predefined number of clusters for
pattern definition - SOMs maximize the similarity of group members
within each cluster and then minimize the number
of clusters that create the single partition - Hierachial clustering finds a sequence of groups
by merging two groups at every step according to
some criterion and each node represents a cluster -
31Hypergraph based Clustering Methods
Similarity is determined by association rules in
a n-dimensional hyperspace, not by euclidian
distance between datapoints
- implementation complex
- number of clusters independent of data set
(multidimensional scaling) - avoid data that are far from mean (K-means) or
binary groups with highly dissimilar data
(hierachial) - maximize information
32Cluster Analysis generates Molecular Signatures
time, cell types etc.
genes
subgroups
33Requirement for Improved Data Analysis
What is the biological Significance of Gene
Expression Profiling?
- Conventional cluster analysis identifies groups
of genes with similar transcriptional activation
profiles - Significance of cluster association for
individual members only by the guilt by
association principle - Association does not reflect true linkage to
biochemical pathways - Development of a resource that links expression
profiling with Genetic Networks and biochemical
Pathways
34Databases with Information pertinent to
Microarray Experiments
Clone Repositories
PubMed
Sequence DBs
I.M.A.G.E./dbest
N.H.G.R.I./HGR
Entrez
M.G.S.C./MGR
Gene DBs
Genome Mapping DBs
UniGene
RefSeq
Locuslink
InterPro
HomoloGene
TransCompel/Transfac(BioBase)
TCluster_at_States-UMich
ProTSite_at_CBIL-UPenn
Ms Genome DB (MGD) _at_ Jax.
Ensembl DBs _at_ EBI
Genome DBs _at_ T.I.G.R.
Genome DBs _at_ UCSC
HuGenome DB _at_ UToronto(hGDB)
Pathway DBs
TransPath(BioBase)
K.E.G.G.
BioCarta
35Association of Array and Annotation Data
upload array data to AIDB
Query Editor