Title: GenePC and Aspic
1GenePC and Aspic
Integrating gene predictions with EST alignments
to predict alternative transcripts
- Tyler Alioto
- Center for Genomic Regulation
- Barcelona, Spain
2The Splicing Code
3Problem with Combiners
- Combiners generally look for consensus, not
alternatives - SOLUTION
- Provide mutually incompatible constraints which
are known to be of high quality and run
prediction iteratively or in parallel with each
set of constraints.
4Problem with EST-based transcript predictors
- Quality of predicted transcripts dependent on
often poor EST sequence quality - Transcripts can be incomplete
- SOLUTION
- Fill in with ab initio exon predictions
5Advantages of Aspic-GenePC combo
- Aspic performs high-quality multiple alignment of
ESTs to a genomic locus, reducing number of
artifacts - GenePC is well suited for incorporating introns
as evidence - Aspic introns are incorporated if and only if
they are compatible with the gene prediction
evidence
6(No Transcript)
7The Aspic algorithm
- The MEFC problem
- Minimum EST Factorization Compatible with a
genomic sequence - Optimal solution
- Minimize the number of distinct pseudo-exons in
the gene-factorization of the genomic sequence
8Aspic optimization
9Intron Boundary Refinement
Example of intron detection in the human ATP1B1
(UGHs.291196) gene without (A) or with (B) the
refinement of exon-intron boundaries. The first
row shows the genomic sequence aligned to the EST
sequences (below). In (A) four different introns
are detected (A, B, C, D) that can be merged to
only two (A, D) in B. Absolute coordinate (NCBI
35 assembly) are shown for each intron and
acceptor/donor splice sites are in
black-background. Bonizzoni et al. BMC
Bioinformatics 2005 6244 doi10.1186/1471-2105-
6-244
10GenePC uses GeneID architecture
- GeneID follows a hierarchical structure
- signal ? exon ? gene
- GenePC exon scoring replaces GeneID exon scoring
- Dynamic programming algorithm
- max score of assembled exons ? assembled gene
predicted exons
Aspic introns
11GenePC weighted linear combination mode
12Combining gene predictions
- Method 1
- Ad hoc linear combination of 3 factors
Normalized self- reported scores
Sum of distances between programs
Performance of programs
13GenePC weighted linear combination mode
14Combining gene predictions
- Method 2 combining exon probabilities
- c self-reported confidence
- SPe exon-level specificity
- rij (SNe SPe)/2 bewtween prediction sets i
and j - Correlation matrix R treated as a distance matrix
for UPGMA tree. - Terminal branches x and y with max r are
collapsed - R is updated with the r of resulting branch with
each remaining branch z - recalculated as average of rxz and ryz
- Calculate p for node
- Repeat until only one branch left
15Final Score
- Expressed as log-likelihood ratio
16Results ad hoc method
- 5 point gain in average sensitivity and
specificity at the transcript level
17GenePC Example VPS25
18Results ad hoc method
- Increased sensitivity offset by loss in
specificity - Probably due in part to introns in non-coding
regions - Intron sensitivity is higher
- Driven by prediction of more transcripts per
locus, not refinement of transcript predictions
19Acknowledgments
- GENEID TEAM
- Roderic Guigó
- Enrique Blanco
- Genís Parra
- Francisco Camara
- ASPIC TEAM
- Graziano Pesole
- Ernesto Picardi
- Paola Bonizzoni
- Raffaella Rizzi