Title: Gene Finding with HMMs
1SLAMming the Human and mouse genomes Marina
Alexandersson Simon Cawley Lior Pachter
2Example a human/mouse ortholog
Proliferating cell nuclear antigen (PCNA)
Human Locus
Alignment
CDS
Mouse Locus
3- Observation
- - Finding the genes will help to find
- biologically meaningful alignments.
- Finding a good alignment will help in
- finding the genes.
4(No Transcript)
5- Using GPHMMs for cross-species gene finding
given a pair of syntenic sequences
predict genes by estimating hidden state sequence
Predict exon-pairs using single most likely
sequence of hidden states (Viterbi).
6A GPHMM implementationSLAM
- SLAM components
- Splice sites (Variable length Markov models).
- Introns and Intergenic regions (2nd order Markov
models, independent geometric lengths, CNS
states). - Coding sequences (3-periodic Markov models,
generalized length distributions, protein-based
pairHMM.) - Input
- Pair of syntenic genomic sequences.
- Approximate alignment.
- Output
- CNS and CDS predictions in both sequences.
7CNS
D
Y
Z
M
I
8Approximate alignment
Currently generated by running AVID and then
relaxing
9(No Transcript)
10Example Rosetta Set.
Nucl.
Sn
Sp
AC
Genscan
.908
.929
.975
.951
SLAM
.981
.960
Rosetta
.935
.978
.949
11Human genome and Mouse assembly
http//pipeline.lbl.gov/
- Build an approximate alignment
- for every sequence pair
- Realign and annotate with SLAM
12http//bio.math.berkeley.edu/slam/mouse/