Title: Alternative Splicing
1Alternative Splicing
- As an introduction to microarrays
2(No Transcript)
3(No Transcript)
4(No Transcript)
5(No Transcript)
6Human Genome
- 90,000 Human proteins, initially assumed near
that number of genes (initial estimates 153,000) - The 1000 cell roundworm Caenorhabditis elegans
has 19,500 genes, corn has 40,000 genes - Current estimates are 25,000 or fewer genes
- Alternative splicing allows different tissue
types to perform different function with same
gene assortment
7Implications
- 75 of human genes are subject to alternative
editing - faulty gene splicing leads to cancer and
congenital diseases. - gene therapy can use splicing
8Application
- We talked before about apoptotis when the cell
determines it cant be repaired - Bcl-x is a regulator of apoptotis, is
alternatively spliced to produce either Bcl-x(L)
that suppresses apoptosis, or Bcl-x(S) that
promotes it.
9(No Transcript)
10Spliceosome
- Five snRNA molecules U1, U2, U3, U4, U5, U6
combine with as many as 150 proteins to form the
spliceosome - It recognizes sites where introns begin and end
- Cuts introns out of pre-mRNA
- joins exons
11(No Transcript)
12Spliceosome
- The 5 splice site is at the beginning of the
intron, the 3 site is at the end - The average human protein coding gene is 28000
nucleotides long with 8.8 exons separated by 7.8
introns - exons are 120 nucleotides long while introns are
100-100,000 nucleotides long
13Splicing errors
- familial dysautonomia results from a
single-nucleotide mutation that causes a gene to
be alternatively spliced in nervous system tissue - The decrease in the IKBKAP protein leads to
abnormal nervous system development (half die
before 30) - gt 15 of gene mutations that cause genetic
diseases and cancers are caused by splicing
errors.
14Why splicing
- Each gene generates 3 alternatively spliced mRNAs
- Why so much intron (1-2 of genome is exons)?
- Mouse and human differences are almost all
splicing - Half of the human genome is made up of
transposable elements, Alus being the most
abundant (1.4 million copies) - They continue to multiply and insert themselves
into the genome at the rate of one insertion per
100 human births - mutations in the Alu can create a 5 or 3 site
in an intron causing it to be an exon - This mutation doesnt impact existing exons
- It only has effect when it is alternatively
spliced in
15(No Transcript)
16Microarrays For Alt. Splicing
- Use short oligonucleotides
- Get a guess at the rate of expression of the oligo
Exon 1
Exon 2
Exon 4
Exon 5
Exon 3
17AffymetrixMicroarrays For Alt. Splicing
Exon 1
Exon 2
Exon 4
Exon 5
Exon 3
Isoform 1
Exon 1
Exon 2
Exon 4
Exon 5
Isoform 2
Exon 1
Exon 3
Exon 5
18Ideal Microarray Readings
Expression
a
b
c
d
e
Probe
Isoform 1
a
c
Exon 1
Exon 2
Exon 4
Exon 5
b
Isoform 2
a
d
Exon 1
Exon 3
Exon 5
e
19Motivation
- Why alternatively splice?
- How does it affect the resulting proteins?
- Look at domains
- High level summary of protein
- 80 of eukaryotic proteins are multi-domain
- Domains are big relative to an exon
20Some Previous Work
- Signatures of domain shuffling in the human
genome. Kaessmann, 2002. - Intron phase symmetry around domain boundaries
- The Effects of Alternative Splicing On
Transmembrane Proteins in the Mouse Genome.
Cline, 2004. - Half of TM proteins studied affected by
alt-splicing.
21Method
- Predict Alternative Splicing
- Predict Protein Domains
- Look for effects of Alt-Splicing on predicted
domains - Swapping
- Knockout
- Clipping
22Microarray Design
- Genes based on mRNA and EST data in mouse
- Mapped to Feb. 2002 mouse genome freeze
- 500,000 probes (66,000 sets)
- 100,000 transcripts
- 13,000 gene models
23Technical work
Genome Space
Overlap
gene models
Generated Data
transcripts
Overlap
Provided data
Overlap
Probe to transcript mapping
E_at_NM_021320 cc-chr10-000017.82.0 G6836022_at_J9
11445 cc-chr10-000017.91.1 G6807921_at_J911524_
RC cc-chr10-000018.4.0
probes
24Predicting Alternative Splicing
- Using mouse alt-splicing microarrays
- Data from Manny Ares
- 8 tissues
- 3 replicates of each tissue
25Predicting Alternative Splicing
- General Approach Clustering, then Anti-Clustering
107 Clusters
Detail View
26Gene Expression Measurement
- mRNA expression represents dynamic aspects of
cell - mRNA expression can be measured with latest
technology - mRNA is isolated and labeled with fluorescent
protein - mRNA is hybridized to the target level of
hybridization corresponds to light emission which
is measured with a laser
27Gene Expression Microarrays
- The main types of gene expression microarrays
- Short oligonucleotide arrays (Affymetrix)
- cDNA or spotted arrays (Brown/Botstein).
- Long oligonucleotide arrays (Agilent Inkjet)
- Fiber-optic arrays
- ...
28Affymetrix Microarrays
Raw image
1.28cm
107 oligonucleotides, half Perfectly Match mRNA
(PM), half have one Mismatch (MM) Raw gene
expression is intensity difference PM - MM
29Microarray Potential Applications
- Biological discovery
- new and better molecular diagnostics
- new molecular targets for therapy
- finding and refining biological pathways
- Recent examples
- molecular diagnosis of leukemia, breast cancer,
... - appropriate treatment for genetic signature
- potential new drug targets
30Microarray Data Analysis Types
- Gene Selection
- find genes for therapeutic targets
- avoid false positives (FDA approval ?)
- Classification (Supervised)
- identify disease
- predict outcome / select best treatment
- Clustering (Unsupervised)
- find new biological classes / refine existing
ones - exploration
-
31Microarray Data Mining Challenges
- too few records (samples), usually lt 100
- too many columns (genes), usually gt 1,000
- Too many columns likely to lead to False
positives - for exploration, a large set of all relevant
genes is desired - for diagnostics or identification of therapeutic
targets, the smallest set of genes is needed - model needs to be explainable to biologists
32Microarray Data Classification
Microarray chips
Images scanned by laser
Gene Value D26528_at
193 D26561_cds1_at -70 D26561_cds2_at
144 D26561_cds3_at 33 D26579_at
318 D26598_at 1764 D26599_at
1537 D26600_at 1204 D28114_at
707
Datasets
New sample
Data Mining model
Prediction ALL or AML