Title: ASTD
1ASTD
- Alternative Splicing and Transcript Diversity
database
2What/who are we?
- Firstly AltExtron
- Secondly ASD - Alternative splicing database, and
the AltSplice pipeline - database of alternative splice events and the
resultant isoform splice patterns of genes from
human, and other model species. - Thirdly, for grant purposes, ATD - Alternative
transcript diversity database, and the AltTrans
pipeline - formation of transcript isoforms on a genome-wide
scale by creating a value-added database of
full-length alternate transcripts from human and
other model species. - We also host the AEdb database manual
annotations - the two, ASD and ATD, blended into 1 pipeline, so
now we are - ASTD
- Alternative splicing and transcript diversity
database - www.ebi.ac.uk/astd
3Pipeline in a nutshell
Poly(A) Pipeline
1. Ensembl gene slices EMBL
EST/mRNA/HTC/HInv download
TSS Pipeline
Peptide Pipeline
2. Immunoglobulin filtering (Blast)
9. Data generation
SNP Pipeline
3. Redundant gene filtering (Blat)
8. Events prediction
Conservation Pipeline
4. Genes vs EST/mRNA Alignment (Blast)
7. Splice patterns delineation
6. Intron/exon delineation
5. HSP Collection
4Limitations of the pipeline
- Pipeline defines consensus splice sites
- True biology is removed
- dicistronic transcripts
- Nested genes
- Single exon genes
- Small exons
- Large introns
- Manual annotation would resolve these issues ..
5Improvements
- New web interfaces user friendly
- New database schema that is normalised,
extendable and maintainable - Pipeline improvements some steps now automated,
bugs corrected, some improvements and blat
replaces blast for filtering redundant genes - Database allows external features to be included
(Ensembl and VEGA annotations) to compare to our
transcripts - Schema allows export of data in standard format
GTF2 and GFF3, EMBL flat file format, fasta
format, and excel spreadsheet - Transcripts for complete genome, not restricted
to those with alternative splice events - Introduction of unique identifiers
- Addition of datasets as input to pipeline HTC
and HInv - Extension of 5 and 3 UTR to capture more TSS
and poly(A) - Annotation of TSS (Align 5 capped mRNAs from
human and mouse to transcript ) and poly(A) to
generate full length transcripts
6www.ebi.ac.uk/astd - Query tools
- Three query tools are available to retrieve
entries - Simple text search on the main page
- Genome browsing
- Advanced search
7Gene information
8Genomic region information 1
9Genomic region information 2
10Transcript information
11evidence for transcript 1
12evidence for transcript 2
13Expression information
14Splice event 1
15Splice event 2
16Peptide information
17Statistics
- Human
- Number of genes with an ASTD transcript
16715 - Number of genes with an ASTD transcription_start
_site 4936 - Number of genes with an ASTD polyA_site
15376 - Number of genes with an ASTD splicing event
11316 - Number of genes with multiple ASTD transcripts
14101 - Proportion of genes undergoing alternative
splicing 68 - Proportion of genes undergoing alternative
polyadenylation 92 - Proportion of genes undergoing alternative
transcription_start_sites 30 - Mouse
- Number of genes with an ASTD transcript
16491 - Number of genes with an ASTD transcription_start
_site 948 - Number of genes with an ASTD polyA_site
13556 - Number of genes with an ASTD splicing event
9474 - Number of genes with multiple ASTD transcripts
13028 - Proportion of genes undergoing alternative
splicing 57 - Proportion of genes undergoing alternative
polyadenylation 82 - Proportion of genes undergoing alternative
transcription_start_sites 6
18Graph of human growth
19Controlled vocabularies/ontologies
- GO
- SOFA
- eVOC
- Splice event ontology
- MeSH terms
20Future 1
- Addition of new species
- Experimental validation of transcript structure
and alternative poly(A)s - Use EMBL CDS as another source of alignments to
the genome - More frequent releases every 3 months
- Addition of regulatory motifs ESS, ESE, ISS and
ISE - microRNA target sites from the EURASNET NoE
(University Basel)
21Future 2
- Introduction of unique identifiers means
- Addition as xref in EMBL so transcripts in the
INSDC can be grouped into one gene - Addition into UniParc so translations can be
linked to UniProt IsoIds and again grouped as
being variants of one gene - UniParc translations also undergo full InterPro
scan, TM and SignalP predictions so data is
precomputed and not done on the fly
22Future 3
- The EBI sequence database group and Ensembl have
merged making the Hinxton Sequencing Forum (HSF) - Outcome is that ASTD will be vehicle to augment
the Ensembl transcript views - Full length transcripts with TSS, splice events
and polyA - Definition of the major transcript set using
annotation of features to transcripts, eg
expression state, exon array, splice junction
array evidence etc - VEGA/Havana annotations also included
- Time scale - within 2 years
23Acknowledgements
- The ASTD Team
- Gautier Koscielny
- Vincent Le Texier
- Eleanor Whitfield
- Chellapa Gopalakrishnan
- Vasudev Kumanduri
- Sequence Database Group and External Services
- ASD consortium (Stefan Stamm for AEdb)
- ATD consortium (Daniel Gautheret for AltPAS)
- EURASNET consortium
- The ASTD project at EBI is supported by a grant
from the EC Eurasnet Network of Excellence
(LSHG-CT-2005-518238).