Title: Analysis of High-throughput Gene Expression Profiling
1Analysis of High-throughput Gene Expression
Profiling
2Why to Measure Gene Expression
- 1. Determines which genes are induced/repressed
in - response to a developmental phase or to an
- environmental change.
- 2. Sets of genes whose expression rises and falls
- under the same condition are likely to have a
- related function.
- 3. Features such as a common regulatory motif can
be - detected within co-expressed genes.
- 4. A pattern of gene expression may be used as an
- indicator of abnormal cellular regulation.
- A useful tool for cancer diagnosis
3Transitional vs. High-throughput Approaches
Why to Measure Gene Expression in Large Scale?
4Techniques Used to Detect Gene Expression Level
- Microarray (single or dual channel)
- SAGE
- EST/cDNA library
- Northern Blots
- Subtractive hybridisation
- Differential hybridisation
- Representational difference analysis (RDA)
- DNA/RNA Fingerprinting (RAP-PCR)
- Differential Display (DD-PCR)
- aCGH array CGH (DNA level)
High-throughput
5Basic Information of Microarray, SAGE and cDNA
Library
6(DNA) Microarray
- 1. Developed around 1987.
- 2. Employ methods previously exploited in
immunoassay context specific binding and
marking techniques. - 3. Two types of probes
- Format I probe cDNA (5005,000 bases long) is
immobilized to a solid surface such as glass
widely considered as developed at Stanford
University Traditionally called DNA microarrays.
- Format II an array of oligonucleotide
(2080-mer oligos) probes is synthesized either
in situ(on-chip) or by conventional synthesis
followed by on-chip immobilization developed at
Affymetrix, Inc. Many companies are anufacturing
oligonucleotide based chips using alternative
in-situ synthesis or depositioning technologies.
Historically called DNA chips.
7Microarray
- Single Channel sub-type classification
- Dual Channel differential expression gene
screening - Tissue microarray
- Protein microarray
8Array CGH
- Detecting DNA copy variation via microarray
approach - A hotspot in recent research works, especially in
Cancer research
9Microarray Analysis
Which genes are up-regulated, down-regulated,
co-regulated, not-regulated?
-
- gene discovery
- pattern discovery
- inferences about biological processes
- classification of biological processes
10SAGE
- Experimental technique assigned to gain a
quantitive measure of gene expression. - 10-20 base tags are produced (immediately
adjacent to the 3 end of the 3 most NlaIII
restriction site). - The SAGE technique measures not the expression
level of a gene, but quantifies a "tag" which
represents the transcription product of a gene.
11SAGE
Tags are isolated and concatermized. Relative
expression levels can be compared between cells
in different states.
12SAGEmap (http//cgap.nci.nih.gov)
13SAGE comparing two relational libraries
14EST library (UniGene)
15Gene expression info from Unigene Library
16An Example of In-house EST Library Analysis
17The Algorithms and Challenges of High-throughput
Gene Expression Analysis
18Seeing is believing?
No, need to correct errors.
19SAGE
- A typical experiment requires 30,000 gene
expression comparisons where normal and a
diseased cell is compared. - The results were subject to the size and
reliabilities of the SAGE libraries. - Statistical measures are used to filter out
candidate genes to reduce the dimensionality of
the data but it is tedious and time consuming to
play with these measures until a good set is
found.
20SAGE
- TPM a simple normalization method
- TPMCount1000,000/TotalCount
- Bayesian approach http//cancerres.aacrjournals.or
g/cgi/content/full/59/21/5403
21Microarray Sources of errors
log signal intensity
log RNA abundance
22Sources of Errors (Cont.)
- Printing and/or tip problems
- Labeling and dye effects (differing amounts of
RNA labeled between the 2 channels) - Differences in the power of the two lasers (or
other scanner problems) - Difference in DNA concentration on arrays (plate
effects) - Spatial biases in ratios across the surface of
the microarray due to uneven hybridization - cDNA array cannot distinguish alternatively
spliced forms
23Errors that cannot be corrected by statistics
- Competitive hybridization of different targets on
the chip - Failure to distinguish different splicing forms
- Misinterpretation of time course data when there
are not sufficient points - Misinterpretation of relative intensity
24Does clustered time course really mean
co-expression?
Picture taken from http//genomics.stanford.edu/ye
ast/additional_figures_link.html
Yes, you can study known system (such as cell
cycle) this way but, how about the unknown
systems?
25Normalization by iterative linear regression
- fit a line (ymxb) to the data set
- set aside outliers (residuals gt 2 x s.e.)
- D Finkelstein et al.
- http//www.camda.duke.edu/CAMDA00/abstracts.asp
26Normalization (Curvilinear)
G Tseng et al., NAR 2001
27After Normalization
- Differentially Expressed (DE) Gene screeing
- T-test
- T-statistics
- SVM
- Clustering
- Hierarchical
- SOM
- K-means
- Network (Pathway) analysis
- BioCarta, KEGG, GO databases
- Bayesian network learning
- Topology
-
28Bioinformatics challenges
- 1. data management
- 2. utilizing data from multiple experiments
- 3. utilizing data from multiple groups
- with different technologies
- with only processed data available
29Bioinformatics Analysis of Integrated Analysis of
Gene Expression Profiling
30- Large-scale meta-analysis of cancer microarray
data identifies common transcriptional profiles
of neoplastic transformation and progression - Daniel R. et al. PNAS, 2004(101), 9309-9314
- T-test
- Q values (estimated false discovery rates) were
calculated as - where P is P value, n is the total number of
genes, and i is the sorted rank of P value.
31Cont. Meta-Profiling.
- The purpose of meta-profiling is to address the
hypothesis that a selected set of differential
expression signatures shares a significant
intersection of genes (a meta-signature), thus
inferring a biological relatedness.
3267 genes were screened by mata-analysis
33Integrated Cancer Gene Expression Map
347 genes were discovered by the system
35THANX!!