Analysis of Exon Arrays - PowerPoint PPT Presentation

About This Presentation
Title:

Analysis of Exon Arrays

Description:

Analysis of Exon Arrays. Slides provided by Dr. Yi Xing ... Probes from 600 bps near 3' end. Probes from each putative exon. Probeset has 11 PM, 11 MM probes ... – PowerPoint PPT presentation

Number of Views:134
Avg rating:3.0/5.0
Slides: 45
Provided by: whw5
Category:

less

Transcript and Presenter's Notes

Title: Analysis of Exon Arrays


1
Analysis of Exon Arrays
  • Slides provided by Dr. Yi Xing

2
Outline
  • Design of exon arrays
  • Background correction
  • Probe selection, expression index computation
  • Evaluation of gene level index
  • Exon level analysis
  • Conclusion

3
1. Basic design of Exon Array
4
Exon Array Probesets Classified by Annotational
Confidence
  • Core probesets target exons supported by RefSeq
    mRNAs.
  • Extended probesets target exons supported by ESTs
    or partial mRNAs.
  • Full probesets target exons supported purely by
    computational predictions.

5
2. Background modeling predict non-specific
hybridization from probe sequence
  • Wu and Irizarry (2005) use probe effect modeling
    to obtain more accurate expression index on 3
    arrays
  • Johnson et al (2006) use probe effect modeling
    to detect ChIP peaks for Tiling arrays
  • Kapur et al (2007) use probe effect modeling to
    correct background for Exon array

6
Background modeling in Exon Arrays
  • logBi aniT ? ßjk Iijk ? ?k nik2 ei
  • Estimate parameters from either
  • Background probes (n 37,687)
  • Full probes (n 400,000)
  • test on a different array (with single scaling
    constant)
  • Full probes useful for modeling background

7
Promoter array may be used to train exon array
background
8
Preliminary conclusions
  • Background correction based on background probe
    effect modeling can greatly reduce background
    noise
  • Model parameters are similar for different
    ChIP-DNA samples, or for different RNA samples,
    but not across DNA and RNA.
  • The data may be rich enough to support learning
    of more complex models with even better
    predictive power.

9
3. Probe selection and expression index
computation
10
Gene-level visualization Heatmap of Intensities
major histocompatibility complex, class II, DM
beta
Probes
Core probes
Samples
11
Heatmap of Pairwise Correlations
HLA_DMB
Probes
Probes
12
First observations
  • Heapmap of correlations is a useful complement to
    heatmap of intensities
  • Core probes have higher intensity than extended
    and full probes

13
Probe selection for gene-level expression
  • Most full and extended probes are not suitable
    for estimating gene-level expression
  • Probes may target false exon predictions
  • Even some core probes may not be suitable
  • Bad probes with low affinity, or cross-hybridize
  • Probes targeting differentially spliced exons
  • Probe selection
  • Selecting a suitably large subset of good probes
    targeting constitutively spliced regions of the
    gene
  • Use only to selected probes to estimate gene
    expression

14
Heatmap of CD44 core probes (Ordered By Genomic
Locations)
_____________ ________________________
_____________ constitutive
alternatively spliced constitutive
15
ataxin 2-binding protein 1 
16
These examples motivated our Probe Selection
Strategy
  • Probe selection procedure (on core probes)
  • Hierarchical clustering of the probe intensities
    across 11 tissues (33 samples), and cut the tree
    at various heights (0.1,0.2,1.0).
  • Choose a height cutoff to strike a balance
    between the size of the largest sub-group and the
    correlation within the sub-group.
  • Iteratively remove probes if they do not
    correlate well with current expression index
  • At least 11 core probes need to be chosen.
  • If the total number of core probes is less than
    11 for the entire transcript cluster, we skip
    probe selection.

(Xing Y, Kapur K, Wong WH. PLoS ONE. 2006
201e88)
17
Hierarchical Clustering of CD44 Core Probes
(distance1-corr, average linkage)
h0.1 44 (42) probes
18
Computation of gene level expression index
Background correction
Normalization
(linear scaling or none)
Probe selection
Computation of Overall Gene Expression Indexes
(dChip type model)
optional
Gene level quantile normalization
GeneBASE Gene-level Background Adjusted Selected
probe Expression Download http//biogibbs.stanfor
d.edu/kkapur/GeneBASE/ Xing, Kapur, Wong, PLoS
ONE, 1e88, 2006 Kapur, Xing, Wong, Genome
Biology, 8R82, 2007
19
In most cases selection does not affect fold
changes
20
Sometimes, selections change fold-change
significantly
spectrin, beta, non-erythrocytic 4 (SPTBN4)
BetaIV spectrins are essential for membrane
stability and the molecular organization of nodes
of Ranvier along neuronal axons
21
4. Evaluations of gene level index
22
1st evaluation tissue fold change
Fold-change of liver over muscle, in 438 genes
with high fold-change in 3 expression array data
After selection
Before selection
23
Probe selection allows more sensitive detection
of fold-changes
Zoom-in
After selection
Before selection
24
FC of muscle over liver, in 500 genes detected to
be overexpressed in muscle over liver by 3 array
After selection
Before selection
25
FC of muscle over liver
Zoom-in
After selection
Before selection
26
2nd evaluation Presence/Absence calls
  • Use SAGE data to construct gold-standard
  • Presence in tissue if 100 tags per million
  • Absence if no tags in given tissue but gt100 tpm
    in at least another tissue
  • Exon array A/P calls use sum of z-scores for
    core probes (z-score is computed based on
    background model)

27
(a)
(c)
Cerebellum
Kidney
(b)
Heart
ROC curves shows that background correction
improves A/P calls. Red Exon, Z-score
call Blue Exon Affy call Brown 3 Affy call,
max probeset Purple 3 Affy call, min probe set
28
3rd evaluation Cross-species conservation
  • 3 and Exon array data for six adult tissues in
    both human and mouse
  • Expression computed for about 10,000 pairs of
    human-mouse ortholog pairs

29
Similarity of gene expression profiles in six
human tissues and six corresponding mouse
tissues. For each ortholog pair we calculated
the Pearson correlation coefficient (PCC) of
expression indexes across six tissues (solid
line). We also permutated ortholog relationships
and calculated the PCC for random human-mouse
gene pairs (dashed line).
3 arrays
Exon arrays
(Xing Y, Ouyang Z, Kapur K, Scott MP, Wong WH.
Mol Biol Evol. April 2007)
30
3 arrays scatter plot
Exon arrays scatter plot
Exon arrays also reveal conservation of absolute
abundance of transcripts in individual tissues!
3 arrays correlations
Exon arrays correlations
31
4th evaluation q-PCR
On log scale, exon array fold change estimate is
correlated with qPCR fold change (corr 0.9)
32
5. Issues in exon level analysis
33
Challenges
  • The experimental validation rate in several
    published exon array studies are highly variable.
  • Gardina et al. BMC Genomics 7325, 21
  • Kwan et al. Genome Res 171210, 45
  • Hung et al. RNA 14284, 22-56
  • Clark et al. Genome Biol 8R64, 84.
  • Most exons are targeted by no more than four
    probes. No probes for splice junctions.
  • Noise in observed probe intensities (due to
    background, cross-hybridization) can make the
    inferred splicing pattern unreliable.

34
MADS Microarray Analysis of Differential Splicing
1. Correction for background (non-specific
hybridization)
1. Kapur, Xing, Wong, Genome Biology, 8R82,
2007 2. Xing, Kapur, Wong WH. PLoS ONE. 2006
201e88 3. Xing et.al., 2008, RNA, 2008, 14(8)
1470-1479
35
Splicing Index Corrected Probe
IntensityEstimated Gene Expression Level
36
Analysis of gold-standard alternative splicing
data via PTB knockdown experiments
  • Our gold-standard - a list of exons with
    pre-determined inclusion/exclusion profiles in
    response to PTB depletion (Boutz P, et.al. Genes
    Dev. 2007, 21(13)1636-52.)
  • We used shRNA to knock-down PTB, generated Exon
    array data, and analyzed data on gold-standard
    exons.
  • MADS detected all exons with large changes
    (gt25) in transcript inclusion levels, and
    offered improvement over Affymetrixs analysis
    procedure.

Collaboration with Douglas Black (UCLA)
Boutz P, et.al. Genes Dev. 2007, 21(13)1636-52.
37
MADS sensitivity correlates with the magnitude of
change in exon inclusion levels of gold-standard
exons
Xing et.al., 2008, RNA, 2008, 14(8) 1470-1479
38
Exon array detection of novel PTB-dependent
splicing events
control
shRNA knockdown of splicing repressor PTB
39
Detection of alternative 3-UTR and Poly-A sites
of Ncam1
30 differentially spliced exons were tested 27
were validated. Validation rate 27/3090
40
Cross-Hybridization
  • Probes are designed to hybridize to their target
    transcripts
  • Often probes have 0,1,2,3 base pair mismatches to
    non-target transcripts
  • Cross-hyb seriously complicates exon-level
    analysis.

41
Mapping mismatches to probes
  • 6,000,000 probes
  • Each 25bp long
  • 3,000,000,000bp genome sequence
  • For 1-bp mismatch, a naïve search needs O(6M x 3G
    x 25) years of CPU time
  • Fast matching algorithm (by Hui Jiang) makes this
    feasible in hours

42
Distribution of Number of Cross-hyb Transcripts
Full Probes
Core Probes
43
Correction of sequence-specific
cross-hybridization to off-target transcripts
44
Conclusion
  • Gene level index is accurate and reflects
    absolute abundance
  • We show that sequence-specific modeling of
    microarray noise (background and
    cross-hybridization) improves the precision of
    exon-level analysis of exon array data.
  • Overall, our data demonstrate that exon array
    design is an effective approach to study gene
    expression and differential splicing.
  • Development of future probe rich exon arrays,
    with increased probe density on exons and
    inclusion of splice junction probes, will offer
    more powerful tools for global or targeted
    analysis of alternative splicing.
Write a Comment
User Comments (0)
About PowerShow.com