Title: Alternative Splicing and Disease: an overview
1Alternative Splicing and Disease an overview
- Shoba Ranganathan
- Professor and Chair Bioinformatics
- Dept. of Chemistry and Biomolecular Sciences
Adjunct Professor - ARC CoE in Bioinformatics Dept. of Biochemistry
- Macquarie University Yong Loo Lin School of
Medicine - Sydney, Australia National University of
Singapore, Singapore - (shoba.ranganathan_at_mq.edu.au) (shoba_at_bic.nus.edu.s
g) - Visiting scientist _at_
- Institute for Infocomm Research (I2R), Singapore
-
2Outline of the talk
- Background
- Determining gene architecture
- Graph theory in AS
- Whole genome analysis results
- AS and disease
3Unexpectedly low number of genes in the human
genome
Drosophila 14,000 genes
C.elegans 19,000 genes
Human 22,000-25000 genes
- How can the genome of Drosophila contain fewer
genes than the undoubtedly simpler organism C.
elegans? - This raises the possibility of expanded diversity
leading to biological complexity
www.utexas.edu, www.sih.m.u-tokyo.ac.jp
http//pub.tv2.no/multimedia/na/archive/
4Sources of Biological complexity
- With a limited number of genes
- Enhanced regulation of genes and pathways
- Post-translational modifications
- Alternative splicing
5A Genomic View
6Spliceosomal splicing
7Maniatis Tasic, 2002
8Protein Diversity
9(No Transcript)
10Alternative splicing
- Splicing is a regulated process that removes the
non-coding sequence from transcripts to produce
mRNA (Bernot, 2004). - Contradicts the central dogma of molecular
biology - One gene one protein
11Why AS?
- Protein diversity (Neverov et al., 2005).
- Form of spatial and temporal regulation (Lopez,
1995) - Errors in splicing lead to diseases (Orengo
Cooper, 2007) - Drug discovery (Levanon Sorek, 2003)
12Usual way of studying AS
- One gene at a time tedious for
genomes - Collect intron-exon structures for all isoforms
- Try to analyze them again one isoform at a time
and then gene by gene. - Unsuitable for genes with large numbers of
transcripts.
13Usual way of studying AS
14Why use bioinformatics?
- Most research into alternative splicing is
limited to a few genes (reductionist approach) - Bioinformatics overcomes this by facilitating a
systems biology approach - Information can be obtained for all genes in a
genome - This can be done for many genomes allowing for
comparative genomics
15Where is the splicing?
- Information on the intron-exon (coding/non-coding)
arrangement of a gene is essential. - Aligning mRNA/EST sequence to their co-ordinate
genomic sequences will give the arrangement of
exons in a gene. (MGAlign, Ranganathan et al
2003 MGAlignIt, Lee et al 2003)
16Outline of the talk
- Background
- Determining gene architecture
- Graph theory in AS
- Whole genome analysis results
- AS and disease
17MGAlignIt (Lee et al., 2003)
- Fast heuristic approach and highly accurate
- Capitalizes on the fact that the mRNA sequence
constitutes a very small percentage of the
genomic sequence
15
18MGAligns biological alignment strategy
19MGAlignIt web service
http//origin.bic.nus.edu.sg/mgalign
20Benchmarking
- Dataset human Chr 22 from the Sanger Centre
(Collins et al., 2003) - 936 annotated mRNA (5176 exons)
- 48Mbp long human Chr 22 genomic sequence
21Some successes
- Short internal exons (exon 2 9 bp exon 9
21bp) - Short terminal exons (exon 1 15 bp)
22MGAlign performance
- More savings in computer time with longer gDNA
sequences - Based on 41 randomly chosen genomic fragments
sim4
spidey
mgalign
23Outline of the talk
- Background
- Determining gene architecture
- Graph theory in AS
- Whole genome analysis results
- AS and disease
24Problem Königsberg bridges (1700s)
- The residents of Königsberg, Germany, wondered if
it was possible to take a walking tour of the
town that crossed each of the seven bridges over
the Presel river exactly once. - Leonhard Euler, 1736 (father of graph theory)
25Graph theory for AS
- First used for AS by Heber et al. (2002).
- Each independent segment represented as a node,
connected by arrows. - Node here is not necessarily based on introns
and exons simply a common contiguous segment of
the gene. - Human ADSL (adenylosuccinate lyase) gene
26Our splicing graph approach
- A biologists viewpoint each exon should be a
node and each intron, an edge (connection). - Automatic generation of AS clusters from gene
structure. - Identifying Reference distinct Exon and its
associated variants. - Simple rules for classifying alternative splicing
events and visualization system for studying all
variants from a single gene. - Single-line diagram Experimentalist way of
Alternative splicing analysis
27Making the splicing graph
28Usual classification of AS events(Leipzig et
al., 2004)
29Representing splice variants of the same gene as
a splicing graph
30Normal representation of transcripts human
hyalouronidase HYAL1 gene ENSG00000114378 (an
early version)
www.ebi.ac.uk/asd
31Splicing Graph representation of the same gene
Intron retention
Alternative Termination site
Exon skipping
Transcripts are shown as exon numbers 5239
639 1734 1834 124 134.
32Single-line Splice Diagram
Patterns using the above exon numbers are shown
as 5239 639 1734 1834 124
134.
- A Digraph or DAG (Directed Acyclic Graph)
- Graphs for which every unilateral orientation is
traceable - Experimentalists way of Alternative Splicing
analysis (for a gene of interest with all
transcripts) for validating splive junctions - Intron retention is clearly visible
33Our extended classification
Automatic rule-based classification
34Our extended classification
35Where to make your splicing graphs
36Outline of the talk
- Background
- Determining gene architecture
- Graph theory in AS
- Whole genome analysis results
- AS and disease
37AS Databases (Of men and mice)
ASAP II (Kim et al., 2007) Comparative and evolutionary studies 17 genomes
EC Gene (Lee et al., 2007) Provides functional annotation for AS genes 9 genomes
ASTD (previously ASD) (Thanaraj et al., 2004) Genome wide analysis Human, mouse and rat
ASTALAVISTA (Foissac et al., 2007) Visual summary of the AS landscape Mainly for human genome
- Does not provide sufficient information for
multi-gene comparison to understand the
phenomenon of AS.
6
38Genome-wide AS analysisI said the fly
39Homology
- Similarity between biological sequences due to
shared ancestry - Orthology
- Homologous sequences are orthologous if separated
by a speciation event - The divergent copies of a singe gene in the
resulting species are orthologous genes. - At least 25 - 30 similarity at the protein level
13
40Gene Ontology
- Provides a controlled vocabulary to describe gene
and gene product attributes in organisms. - Three organizing principles
- Cellular component
- A component of a cell, e.g. nucleus
- Biological process
- Series of events accomplished by one or more
ordered assemblies, e.g. signal transduction - Molecular function
- Describes activities, e.g. catalytic activity
14
41AS genes in Bovine genome
- Part of bovine annotation project
- 16560 human genes, 15986 mouse genes, 4567 bovine
genes - Data extracted from ASTD and Ensembl (Hubbard et
al., 2002) - Orthologous genes found using Biomart from
Ensembl - Gene Ontology using Blast2GO (Conesa et al.,
2005) - 2458 (out of 4567) Ensembl AS genes have GO
annotations - 1716 AS genes can be further annotated
16
42Percentage of AS genes and orthologous spliced
genes in bovine, human and mouse
- Orthologous genes were analysed in order to
reduce bias in the data.
17
43Gene Level AS Analysis of orthologous subset
- Percentage of bovine genes showing AS events are
fewer compared to human.
18
44AS Event Analysis of the orthologous subset
- of AS events in bovine similar to human
- implies that more splice variants are obtained
from fewer bovine genes.
19
45Gene Ontology analysis
- Gene Ontology using Blast2GO (Conesa et al.,
2005) - 2458 (out of 4567) AS genes has GO annotations in
Ensembl - 1716 AS genes can be further annotated
46Outline of the talk
- Background
- Determining gene architecture
- Graph theory in AS
- Whole genome analysis results
- AS and disease
47Implications for disease
- Diagnostics from early recognition of splice
variants associated with disease, based on
nucleotide detection - Treatment options using siRNA
- Aberrant splicing in survival of motor neuron 1
gene (SMN1) in spinal muscular atrophy (Cartegni
and Krainer 2002) - Suppressing anti-apoptotic AS variant of Bcl-x
pre-mRNA in prostate and breast cancer cells
(Mercatante et al. 2001) - Correcting CFTR mis-splicing (Friedman et al.
1999)
48Many diseases are caused by AS
Myotonic dystropy
49Why study farm animals?
- Provide valuable insights into gene function and
genetic and environmental influences on animal
production and human diseases. (Roberts et al.,
2009 ) - The size and relatively long intervals between
generations, domestic species are widely used to
unravel the mechanisms involved in programming
the development of an embryo and fetus, resulting
in adult onset of diseases (King et. al., 2007 ,
Padmanabhan et al., 2007) - Mapping human disease genes to bovine orthologous
genes is an excellent mode for carrying out
analytical work and verifying the suitability of
cow as a model organism.
50Mapping human disease genes to bovine genome
- 94 human disease genes were extracted from NCBI
Genes and Disease database to analyse which of
these genes were alternatively spliced in human
and bovine genomes. - AS analysis was conducted on 66 spliced genes.
- 17 orthologous spliced genes were observed in
bovine.
51Human disease genes Conservation of cassette
exons in bovine orthologous genes
- Cassette exons occur in 38 of human disease genes
and 14 orthologous bovine genes.
Number of cassette exons in 38 AS human disease genes 120
Exons present and constitutive in bovine orthologous gene 90
Exons present and regulated in bovine orthologous gene 7
Exons absent in bovine orthologous gene 23
52Human disease genes Cassette exons present and
regulated in bovine orthologous genes
- 3 genes with cassette exons in human were present
and regulated in bovine.
Disease Gene name Cassette exon
Colon Cancer MLH1 Exon9
Exon10
Spinal muscular atrophy SMN1 Exon6
Exon5
Exon32
Tangier disease ABC1 Exon2
Exon10
53Human disease genes Intron retention present and
constitutive in bovine orthologues
- Intron retention in nine human genes out of
which, IR in five genes was present and
constitutive in bovine
Disease Gene name Intron retention
Glaucoma GLC1A Exon1
Spinocerebellar ataxia SCA1 Exon9
Polycystic kidney disease PKD1 Exon23
Exon15
Autoimmune polyglandular syndrome APS1 Exon10
Wilsons disease ATP7B Exon2
54Protein domain analysis of the orthologous
disease gene set
- Carried out Pfam domain search on 8 human disease
genes to identify the effects of alternative
splicing on the functional protein domains. - Genes responsible for spinal muscular atrophy and
colon cancer are spliced in bovine and resulted
in probable structure and function disruption. - 4 disease genes (glaucoma, Tangier, spinal
muscular atrophy and colon cancer ) had all the
domains from their human counterparts conserved
in bovine.
55Conclusion
Our results provide a window of opportunity for
more in-depth analysis over a larger dataset,
where the cow can serve as a model organism for
many more human diseases.
56Acknowledgements
- PhD students at the
- National University of Singapore (Bernett T.K.
Lee) - Macquarie University (Durgaprasad Bollina and
Elsa Chacko) - Colleagues and A/Prof. Tin Wee Tan, NUS
- All of you
57Invitation to attend InCoB2009International
Conference in Bioinformatics(incob.apbionet.org)
Singapore, 7-11 Sept. at Matrix,
BiopolisKeynote Nobel Laureate Robert Huber,
587 Sept Tutorials and Bioinformatics Education
workshop (WEBCB)8 Sept Clinical Bioinformatics
(CBAS) and SYMBIO (Students) Symposia9-11
Sept Scientific Meeting