Alternative Splicing from ESTs - PowerPoint PPT Presentation

1 / 54
About This Presentation
Title:

Alternative Splicing from ESTs

Description:

ESTs provide expression data. eVOC Ontologies http://www.sanbi.ac.za/evoc/ Anatomical System ... of special technology: efficient use of computer farms (~2000 ... – PowerPoint PPT presentation

Number of Views:111
Avg rating:3.0/5.0
Slides: 55
Provided by: eduard84
Category:

less

Transcript and Presenter's Notes

Title: Alternative Splicing from ESTs


1
Alternative Splicing from ESTs
  • Eduardo Eyras
  • Bioinformatics UPF February 2004

2
  • Intro
  • ESTs
  • Prediction of
  • Alternative Splicing from ESTs

3
5
3
3
5
AAAAAAA
5 CAP
4
5
3
3
5
Transcription
exons
introns
pre-mRNA
AAAAAAA
5 CAP
5
Alt splicing as a mechanism of gene regulation
Functional domains can be added/subtracted ?
protein diversity Can introduce early stop
codons, resulting in truncated proteins or
unstable mRNAs It can modify the activity of the
transcription factors, affecting the expression
of genes It is observed nearly in all
metazoans Estimated to occur in 30-60 of human
6
Forms of alternative splicing
Exon skipping / inclusion
Alternative 3 splice site
Alternative 5 splice site
Mutually exclusive exons
Intron retention
Constitutive exon
Alternatively spliced exons
7
  • How to study alternative splicing?

8
ESTs (Expressed Sequence Tags)
Single-pass sequencing of a small (end) piece of
cDNA Typically 200-500 nucleotides long It may
contain coding and/or non-coding region
9
ESTs
Cells from a specific organ, tissue or
developmental stage
mRNA extraction
Add oligo-dT primer
TTTTTT
5
3
Reverse transcriptase
RNA
TTTTTT
5
3
DNA
Ribonuclease H
TTTTTT
5
3
DNA polimerase Ribonuclease H
3
5
AAAAAA
TTTTTT
Double stranded cDNA
5
3
10
ESTs
3
5
AAAAAA
Clone cDNA into a vector
TTTTTT
5
3
5 EST
Single-pass sequence reads
Multiple cDNA clones
3 EST
11
Genomic
Primary transcript
Splicing
Splice variants
oligo-dT primer Reverse transcriptase
cDNA clones (double stranded)
EST sequences (Single-pass sequence reads)
5 3
5 3
12
(No Transcript)
13
EST sequencing
  • Is fast and cheap
  • Gives direct information about the gene sequence
  • Partial information

Resulting ESTs Known gene (DB searches) Similar
to known gene Contaminant Novel gene
14
dbEST release 20 February 2004
  • Number of public entries 20,039,613
  • Summary by organism
  • Homo sapiens (human)
    5,472,005
  • Mus musculus domesticus (mouse) 4,056,481
  • Rattus sp. (rat)
    583,841
  • Triticum aestivum (wheat)
    549,926
  • Ciona intestinalis
    492,511
  • Gallus gallus (chicken)
    460,385
  • Danio rerio (zebrafish)
    450,652
  • Zea mays (maize)
    391,417
  • Xenopus laevis (African clawed frog)
    359,901

15
EST lengths
450 bp
Human EST length distribution (dbEST Sep. 2003 )
16
ESTs provide expression data
eVOC Ontologies http//www.sanbi.ac.za/ev
oc/
J Kelso et al. Genome Research 2002
17
ESTs provide expression data
eVOC Ontologies http//www.sanbi.ac.za/ev
oc/
Developmental Stage
Anatomical System
Pathology
Cell Type
Pooling

nervous
brain
cerebellum

Library 1
Library 2

ESTs
ESTs
18
Linking the expression vocabulary to gene
annotations
ESTs
Genes
V Curwen et al. Genome Research (2004)
19
Gene expression vocabulary
20
(No Transcript)
21
The down side of the ESTs
  • Cannot detect lowly/rarely expressed genes or
    non-expressed sequences (regulatory)

Random sampling the more ESTs we sequence the
less new useful sequences we will get
22
  • Using ESTs to study Alternative Splicing

23
ESTs aligned to the genome
EST
Stop

GT
AG
PolyA
Processed pseudogene
True match best in genome
Paralog
Must Clip poly A tails before aligning
24
Alternative Exons/ 3 PolyA sites from ESTs
ESTs can also provide information about potential
alternative splicing when aligned to the genome
(and when aligned to mRNA data)
25
Aligning ESTs to the Genome
  • Many ESTs ? Fast programs, Fast computers
  • Nearly exact matches Coverage gt 97
  • Percent_id gt 97
  • Splice sites GTAG, ATAC, GCAG

26
Development of special software fast versus
accurate alignment Development of special
technology efficient use of computer farms
(2000 CPUs)
27
  • Recovering full transcripts from ESTs

28
Recover the mRNA from the ESTs
29
The Problem
ESTs
Genome
What are the transcripts represented in this set
of mapped ESTs?
30
Predict Transcripts from ESTs
ESTs
Transcript predictions
Merge ESTs according to splicing structure
compatibility
31
Redundant ESTs
Consider 2 ESTs in a Genomic Cluster with more
ESTS
x
z
x z
z gives redundant splicing information, we could
keep only x
x
z
w
x z
z w
However, the relation with other ESTs in the
cluster is important a third EST, w, is
compatible with z but not with x. --gt keep all
relations
32
Extension of the exon structure
Consider 2 ESTs in a Genomic Cluster with more
ESTS
x
y
x y
y extends x, we can assume that they are from the
same mRNA
x
z
w
Our success will depend on the coverage of the
exons. However, ESTs are 3and 5 biased (ESTs
like z not so frequent), hence we will have
fragmentation.
33
Representation
For every 2 ESTs in a Genomic Cluster, we decide
if they represent equivalent splicing
structures The compatibility relation is a graph
x
x
Extension
y
y
x
Inclusion
z
x
z
E Eyras et al. Genome Research (2004)
34
Criteria of merging
Allow edge-exon mismatches
mismatches
Allow internal mismatches
Allow intron mismatches
Is this intron real?
35
Transitivity
x
x
y
y
Extension
z
w
x
Inclusion
w
z
x
z
w
This reduces the number of comparisons needed
36
ClusterMerge graph
Each node defines an inclusion sub-tree
y
z
y
x
z
x
Extensions form acyclic graphs
x
x
y
z
y
z
w
w
E Eyras et al. Genome Research (2004)
37
Mergeable sets
Example
1
2
3
4
5
6
7
38
Mergeable sets
Example
1
3
1
2
3
2
5
7
4
5
4
6
6
7
39
Mergeable sets
Example
1
3
Root
1
2
3
2
5
7
4
5
4
6
6
7
Leaves
40
Mergeable sets
Example
1
3
Root
1
2
3
2
5
7
4
5
4
6
6
7
Leaves
Lists produced (1,2,3,5,6,7) ( 1,2,3,4,5,7)
41
Deriving the transcripts from the lists
Internal Splice Sites external coordinates of
the 5 and 3 exons are not allowed to
contribute
42
Deriving the transcripts from the lists
Splice Sites are set to the most common
coordinate 5 and 3 coordinates are set to
the exon coordinate that extends the
potential UTR the most
43
Single exon transcripts
Reject resulting single exon transcripts when
using ESTs
44
  • Alternative splicing
  • and comparative genomics

45
  • Conservation of Alternative Splicing

Degree of conservation 30-60 Methods 1.-
compare single events 2.- Cross-alignment of
full transcripts
46
  • Exon Skipping Events

Introns flanking alternatively spliced (skipped)
exons have high sequence conservation. Higher on
average than constitutive inrons. R Sorek G
Ast. Genome Research 131631-1637, 2003
47
  • Sequences regulating the (Alternative) splicing

Conserved Alternative Exon
Flanking Introns
Overrepresented hexamer (downstream)
Overrepresented sequences in conserved introns
(between human and mouse) may be Involved in the
regulation of alternative splicing. Overrepresent
ed found in these introns more often than
expected at random AND not found in intronic
sequences flanking constitutive exons (and
upstream of skipped ones) R Sorek G Ast.
Genome Research (2003) 131631-1637
48
  • Sequences regulating the (Alternative) splicing

Conserved Alternative Exon
Flanking Introns
Overrepresented hexamer
Not all types of events are equally
conserved. Introns flanking alternative 5and
3exons, and retained introns, have higher
sequence conservation. Sugnet CW, Kent WJ,
Ares M Jr, Haussler D. Pac Symp Biocomput.
200466-77
49
  • Frame preservation

Frame preserving Constitutive exons Alternative exons
All exons 39.7 (Human) 39.5 (Mouse) 41.6 (Human) 44.7 (Mouse)
Conserved Exon 40.9 (Human) 38 (Mouse) 51.8 (Human) 51.9 (Mouse)
A Resch et al. Nucleic Acids Research 2004, 32
(4) 1261-1269
50
  • Predicting alternative exons

51
  • Features Differentiating Between Alternatively
    splice and Constitutively spliced exons

Alternative exons Constitutive exons
Average size 87 128
length mutliple of 3 73 37
Average human-mouse exon conservation 94 89
(A) Exons with upstream intron conserved in mouse 92 45
(B) Exons with downstream intron conserved in mouse 82 35
(A) (B) 77 17
(A), (B) conservation is considered if at least
there 12 consecutive matches over 100bp of the
intron
R Sorek et al. Genome Research (2004) 141617-1623
52
  • Build a classifier to make predictions
  • Rule Set of conditions over the parameters
  • e.g. at least 99 conservation with mouse AND
    divisible by 3, etc
  • Try all the possible combinations of parameters
  • Select the rule that would correctly identify a
    maximum number of true
  • alternative exons minimizing the number of
    false positives

This rule achieved 31 sensitivity and no false
positives in a set of known exons
At least 95 identity with mouse orthologous
exon Exon size is a multiple of 3 An upstream
intronic alignment of at least 15bp with at least
85 identity A downstream intronic exact
alignment of at least 12bp
R Sorek et al. Genome Research (2004) 141617-1623
53
  • Summary

Alternative splicing is a mechanism to generate
function diversity We can study alternative
splicing using ESTs (Expressed Sequence
Tags) EST data is fragmented and full of noise
need to be processed Some alternative splicing
is conserved across species (Human-Mouse) Predict
ion of alternative (conserved) exons is possible
(a classifier) but no ab initio Evolution of
alternative splicing?
54
  • THE END
Write a Comment
User Comments (0)
About PowerShow.com