Welcome to BNFO 601 - PowerPoint PPT Presentation

1 / 105
About This Presentation
Title:

Welcome to BNFO 601

Description:

Welcome to BNFO 601 – PowerPoint PPT presentation

Number of Views:47
Avg rating:3.0/5.0
Slides: 106
Provided by: paulfa2
Category:
Tags: bnfo | ago1 | welcome

less

Transcript and Presenter's Notes

Title: Welcome to BNFO 601


1
Welcome to BNFO 601 Integrated Bioinformatics
The villains gallery
Paul Fawcett pfawcett_at_vcu.edu
Jeff Elhai elhaij_at_vcu.edu
2
Welcome to BNFO 601 Course Organization
www.vcu.edu/csbc/bnfo601
  • Scientific problems -
  • bioinformatic solutions!
  • For each topic
  • Lecture and web supplements
  • Discussion and computer time
  • Problem sets

Focus on principals, not soon-to-be obsolete
software!!
3
Optional Textbook BNFO 601
Beginning Perl by Simon Cozens Peter
Wainwright. 700 pages. Wrox Press Inc.
Also available for free on the web!!! http//learn
.perl.org/library/beginning_perl
4
What is bioinformatics, anyway?
One reasonable definition The study and
application of computational and statistical
methods for the management and analysis of
biological information.
This is a very broad definition - therefore there
are many flavours of bioinformatics
5
Why do we need bioinformatics?
Year BasePairs
Sequences 1982 680338 606 1983 2274029 242
7 1984 3368765 4175 1985 5204420 5700 1986 96
15371 9978 1987 15514776 14584 1988 23800000
20579 1989 34762585 28791 1990 49179285 39533
1991 71947426 55627 1992 101008486 78608 1993
157152442 143492 1994 217102462 215273 1995 3
84939485 555694 1996 651972984 1021211 1997 11
60300687 1765847 1998 2008761784 2837897 1999
3841163011 4864570 2000 11101066288 10106023 200
1 15849921438 14976310 2002 28507990166 22318883
Available biological data is growing
exponentially!
6
Why do we need bioinformatics?
Humans have a hard time finding patterns in large
or complex data sets!
7
MDVEEFLSRVDAGELVISLGDLSGAILSEVDLSGINLSGANLSGLWKNLS
TILSNTLWDIKEADALATIREIQDESNRAHALIALADKISLPPDLLSEAL
TVARVDEADCADALIALARKLPPDLLSEALATAAEREIQDEYFRTSTLIE
LKLPSVLSEALAAAREIQDEYFRASTLIADEYLAEKLPSVLSEALAASRE
IQFRADALRELAQKLPPDLLSEALAAVREIQPEYLRADALIALVEKLPSV
LSEALAAIREIQDEYLHADALRELVQKLPPDLLGEVLAAATEIRGGYPHT
NPLRELAEKLPPDLLSEALAAAREIQDESNRAHALRELAEKLPPDLLSEA
LTATREIQSEYHRASTLRALAQKLPPDLLSEALAAAREIQDESNRASTLR
ELAEKLPSVLPEALAAVRKIRHKSNRAYGLIALAEKLPSVLPEALAAATE
IEPEYHRASTLRELAEKLPPDLLSELTAISEIQPKSNRADALIALAEKLP
PDLLSEALAAIREIQDESNRAHALIALAEKLPPDLLSEALAAIREIQDES
NRAHALIALAQKLPPDLLSEALAATREIQSKSNRVHALIALAQKLPSVLP
EALAAATEIQDESNRASTLRELAEKLPPDLLSEALAAIREIQPKSNRVHA
LIALAQKLPSVLPEALAAIREIHHEYHRDNALRELAEKLPPNLLSEALAV
IREIHYESNRTNALIALAKKLPSVLPEALAAVRKIRDKSNRIYALRELAD
KLPSVLPEALATAREIHDESYRADALKELAEKLPPDLLSEALTAIREIHD
ESYRADALIALAEKLPSVLPEALAAATVIRPESYRADALRDLAQKLPPDL
LSEALAAIREIQSESNRAHALIALAEKMSLHNPSLSNVSANCVNLNHSTL
TEAKLNQSDLRYGNLKGANLNKANLSRAFLNHADLSNTMLAQSNLSGTNL
RNANLRNANLIMREEIRKVNQSLGESPKFGPFTGRQFVIFAGIFCIVFGL
LCLIIGLDIFWGLGFAFWSSFSVALLSGDQPYIYWSKVYPIVPRWTRGYA
TYTSPHLKKKVGTRKVKLTRSSKPKTLNPFEDWLDLTTIVRLKKDAYTVG
AYLLSKKNLTDSNNTLQLIFGFSCTGIHPLFNSEQEIEAVAKIFESGCKE
IPPGEKITFRWSSFCDDSDAEQYLMQRINNSSSLECEFLDWGRLARTQKL
TNQRARKDIKLNIYWSFTVSSEALETSDPVDKFLAKLANFVQRRFTDSGV
NQLTKKRFTQILTKALEASLRYQQILTEMGLNPQPKTDKDLWQELCKNIG
AKTVIAPHTLVFDEQGVREEIDEKAVFDKPIEIINQPHLSSIILNNGVPF
ADKRWICLPTGENKKFVGVMVLTRKPEIFASTKHQIRFLWDLFSRNNIFD
VEIITEFSPADRGITRAAQQMITKRSRALDLNVQQKKSIDVSAQINVERS
VEAQRQLYTGDVPLNLSLVVLVYRDTPEEIDDACRLISGYISQPTELTRE
VEYAWLIWLQTLLIRLEPILLRPYNRRLTFFASEILGLTNIVQNSPADEQ
GFELIADESDSPLHLDLSKTKNILILGTTGSGKSVLVSSIIGECQAQDMS
VLMIDLPNDDGTGTFGDYTPYHNGFYFDISKESNNLVQPLDLSKIPPDEW
EDRLQAHRNDVNLIVLQLVLGSQTFDGFLSQTIESLIPLGTKAFYDHADI
QRRFAKAKKDGLGSAAWDDTPTLADMERFFSKEHISLGYEDENVDRALNY
IRLRFQYWRNSSIGNAICRPSTFDTDAKLITFALTNLQSSKDAEVFGMSA
YIAASRQSLSAPNSVFFMDEASVLLRFAALSRLVGRKCATARKGGCRVML
AAQDILSIANSEAGEQILQNMPCRLIGRIVPGAAKSFTEHLGIPKDIIDK
NESFRPNIKQLYTLWLLDYNN
Biological data can be confusing!
8
MDVEEFLSRVDAGELVISLGDLSGAILSEVDLSGINLSGANLSGLWKNLS
TILSNTLWDIKEADALATIREIQDESNRAHALIALADKISLPPDLLSEAL
TVARVDEADCADALIALARKLPPDLLSEALATAAEREIQDEYFRTSTLIE
LKLPSVLSEALAAAREIQDEYFRASTLIADEYLAEKLPSVLSEALAASRE
IQFRADALRELAQKLPPDLLSEALAAVREIQPEYLRADALIALVEKLPSV
LSEALAAIREIQDEYLHADALRELVQKLPPDLLGEVLAAATEIRGGYPHT
NPLRELAEKLPPDLLSEALAAAREIQDESNRAHALRELAEKLPPDLLSEA
LTATREIQSEYHRASTLRALAQKLPPDLLSEALAAAREIQDESNRASTLR
ELAEKLPSVLPEALAAVRKIRHKSNRAYGLIALAEKLPSVLPEALAAATE
IEPEYHRASTLRELAEKLPPDLLSELTAISEIQPKSNRADALIALAEKLP
PDLLSEALAAIREIQDESNRAHALIALAEKLPPDLLSEALAAIREIQDES
NRAHALIALAQKLPPDLLSEALAATREIQSKSNRVHALIALAQKLPSVLP
EALAAATEIQDESNRASTLRELAEKLPPDLLSEALAAIREIQPKSNRVHA
LIALAQKLPSVLPEALAAIREIHHEYHRDNALRELAEKLPPNLLSEALAV
IREIHYESNRTNALIALAKKLPSVLPEALAAVRKIRDKSNRIYALRELAD
KLPSVLPEALATAREIHDESYRADALKELAEKLPPDLLSEALTAIREIHD
ESYRADALIALAEKLPSVLPEALAAATVIRPESYRADALRDLAQKLPPDL
LSEALAAIREIQSESNRAHALIALAEKMSLHNPSLSNVSANCVNLNHSTL
TEAKLNQSDLRYGNLKGANLNKANLSRAFLNHADLSNTMLAQSNLSGTNL
RNANLRNANLIMREEIRKVNQSLGESPKFGPFTGRQFVIFAGIFCIVFGL
LCLIIGLDIFWGLGFAFWSSFSVALLSGDQPYIYWSKVYPIVPRWTRGYA
TYTSPHLKKKVGTRKVKLTRSSKPKTLNPFEDWLDLTTIVRLKKDAYTVG
AYLLSKKNLTDSNNTLQLIFGFSCTGIHPLFNSEQEIEAVAKIFESGCKE
IPPGEKITFRWSSFCDDSDAEQYLMQRINNSSSLECEFLDWGRLARTQKL
TNQRARKDIKLNIYWSFTVSSEALETSDPVDKFLAKLANFVQRRFTDSGV
NQLTKKRFTQILTKALEASLRYQQILTEMGLNPQPKTDKDLWQELCKNIG
AKTVIAPHTLVFDEQGVREEIDEKAVFDKPIEIINQPHLSSIILNNGVPF
ADKRWICLPTGENKKFVGVMVLTRKPEIFASTKHQIRFLWDLFSRNNIFD
VEIITEFSPADRGITRAAQQMITKRSRALDLNVQQKKSIDVSAQINVERS
VEAQRQLYTGDVPLNLSLVVLVYRDTPEEIDDACRLISGYISQPTELTRE
VEYAWLIWLQTLLIRLEPILLRPYNRRLTFFASEILGLTNIVQNSPADEQ
GFELIADESDSPLHLDLSKTKNILILGTTGSGKSVLVSSIIGECQAQDMS
VLMIDLPNDDGTGTFGDYTPYHNGFYFDISKESNNLVQPLDLSKIPPDEW
EDRLQAHRNDVNLIVLQLVLGSQTFDGFLSQTIESLIPLGTKAFYDHADI
QRRFAKAKKDGLGSAAWDDTPTLADMERFFSKEHISLGYEDENVDRALNY
IRLRFQYWRNSSIGNAICRPSTFDTDAKLITFALTNLQSSKDAEVFGMSA
YIAASRQSLSAPNSVFFMDEASVLLRFAALSRLVGRKCATARKGGCRVML
AAQDILSIANSEAGEQILQNMPCRLIGRIVPGAAKSFTEHLGIPKDIIDK
NESFRPNIKQLYTLWLLDYNN
But is rich in information content !
9
Where did bioinformatics come from?
  • Evolved, but is distinct from,
  • the intellectual traditions of
  • Genetics
  • Biochemistry
  • Molecular Biology
  • Computer Science
  • Probability Statistics
  • Genomics

10
Pre-genomic Molecular Biology
The cell as a factory
11
Pre-genomic Molecular Biology
12
Pre-genomic Molecular Biology
13
Pre-genomic Molecular Biology
14
Pre-genomic Molecular Biology
15
Pre-genomic Molecular Biology
The cell as a Black box
16
Pre-genomic Molecular Biology
How do we figure out how cars are made?
Genetic approach
Biochemical approach
17
Pre-genomic Molecular BiologyBiochemists
Approach
18
Pre-genomic Molecular BiologyBiochemists
Approach
19
Pre-genomic Molecular BiologyBiochemists
Approach
20
Pre-genomic Molecular BiologyBiochemists
Approach
An inherently reductionist approach!
21
Pre-genomic Molecular Biology
How do we figure out how cars are made?
Genetic approach
Biochemical approach
22
Pre-genomic Molecular BiologyGeneticists
Approach
23
Pre-genomic Molecular BiologyGeneticists
Approach
24
Pre-genomic Molecular BiologyGeneticists
Approach
Isolation of a Defective Gene
25
Pre-genomic Molecular BiologyHow we viewed the
world
  • One component at a time
  • Highly filtered perception
  • Many local viewpoints
  • Subject to ascertainment bias

26
Post-genomic Molecular Biology
A major goal is to achieve a synoptic, integrated
understanding of cell function
27
Post-genomic Molecular BiologyBioinformaticists
Approach
(short term)
28
Post-genomic Molecular BiologyBioinformaticists
Approach
(long term)
29
What is Bioinformatics?
30
TGAGACACATATTTTTGATATTCCAGTTGTTGCAATC GAATGTAAAACA
TATTTAGATCTTTAAATGTATGGTAC ATTCAAGATCCAACCTTCATTCT
AGTGTTTAAAGAGAAC TGATTTGTTTGCAGGGGCAGGAGGCTTTGGTTT
AGGTTTTG AAATGGCAGGCTTCTCTGTACCTTTATCTGTTGAAATTGAT
ACCTGGGCTTGTGATACACTACGCTACAACCGCCCTGATTCAACAGTTAT
TCAAAATGATATCGGTAACTTTAGTACAGAAAATGACGTTAAGAATATCT
GCAACTTTAAACCTGATATTATTATTGGCGGGCCTCCATGCCAGGGATTT
AGTATTGCTGGGCCAGCCCAAAAAGATCCTAAAGATCCTAGAAATGG AA
TTATCAAACAAATCATATGATCAGAATAATCGCCGTTTAAATCCTCATAA
AACT TTTATTCATCAACTTTGCACAATGGATAAAATTTCTTGAACCTAA
AGCGTTTGTCATGGAAAACGTAAAAGGATTGCTATCAAGGAAAAATGCAG
AAGGTTTTAAAGTTATAGATA CTTCTCACTAAATATAAAGATTTTTTAG
ATCAGCAGCATTATGCAGAAAAATTTGATTCA AGACGACGGTACTGGTT
TAACCAGCCAAATGTTCTTTCTACTACCCACCGTTTGGGCAAAACCT TT
ATTAAGAAAACATTTGAAGAACTTGGTTATTTTGTCGAAGTATGGGTTTT
AAATGCTGCGGAATATGGCATTCCGCAAATTAGAGAACGTATTTTTATTG
TTGGCAATAAAAAAGGTAAAGTACTAGGTATGAGTATTATACCTGCACTA
ACTTTGTGGGACGCAATATCAGACTTACCAGAACTTAATGCGCGTGAAGG
AAGTGAAGAGCAACCCTATCATTTAAAACCTCAAAATACTTATCAGACTT
GGGCTAGAAATGGTAGTGCTACGCTTTACAATCATGTTGCAATGGAACAT
TCTGACCGTTTAGTAGAACGTTTCCGGCATATAAAATGGGGTGAATCCAG
TTCGGATGTATCTAAAGAACATGGAGCTAGACGACGTAGTGGTAATGGTG
AATTATCAAACAAATCATATGATCAGAATAATCGCCGTTTAAATCCTCAT
AAACCGTCTCACACTATTGCTGCGTCATTCTATGCTAATTTTGTCCATCC
TTTTCAACATCGAAATTTAACAGCCCGTGAAGGAGCTAGAATCCAATCTT
TTCCAGATAACTATAGATTTTTTGGAAAAAAAACTGTCGTATCTCATAAA
CTATTGCATCGAGAAGAAAGATTTGATGAAAAATTTCTTTGTCAATATAA
TCAAATCGGTAATGCTGTACCCCCTCTTCTCGCTAAAGTAATTGCACATC
ATCTTCTAGAGAAATTAGAGTTATGCCAACAACTGATAGAAATCCTCTAG
TGCATGGATCAAATCTTGAACAAAAAGAGAATCATCGTACAAAATACAGA
GATACTGAAAGCAGGACTTTCCTTAGAGAAATCAGAACTGAATATGACAA
ATGGCATAAAGCAAATATGAACCTGGTTGGACCAAAATCAGAAATTACTG
ACCAAGATGATTCAATTATTACTCAAAGAGTGGAACTTCTCACTAAATAT
AAAGATTTTTTAGATCAGCAGCATTATGCAGAAAAATTTGATTCAAGATC
CAACCTTCATTCTAGTGTTTTAGAGACCATTTATAAAGTAAATCTTTAGA
CGACTAGACGACGTAGCATAATACGAGTCATAACGGCATATATGGCAGCC
TCACTCATTTCTGGGAGACGCTCATAATCCTTACTGAGACGACGGTACTG
GTTTAACCAGCCAAATGTTCTTTCTACTACCCACCGTTTGGGCAAAACCT
GAAATTCTTGATTAGTACGCCGGATTACCTCAACATGAGCTTGAATCATC
AGCCAAACAGAGAGCGCAAATTTATCACCGTCATAGCCGGAATCAACCCA
GATGACTTCAACTTTTTCCAGTAATTCTGGACGCTCTTCTAACAGTTCCA
TCAAAGTATAGGCGGCAAGTAATCTTTCTCCAGCATTTGCTTCACTTACA
ACCACTTTTAACAAAAGTCCCAGACTATCAACCAAAGTTTGCCGCTTTCG
TCCTTTTACCTTCTTGCCACCATCAAAACCGTACACATCCCCCTTTTTTC
AGTCGTTTTTACCGACTGGCTGTCTGCCGCGATCGCCGTGGGTTGAGTTG
ACTTCCCCATTTTTTGACGAACTTGATCGCGCAAAGTATGATTCATTTCA
GTTGAACTAGGAGGAAAATCCCCTGGAAGCATATCCCACTGACAACCTGT
TTTCAGATGGTAGTAGATAGCGTTGCATACTTCTCGCATATCAGTTGTTC
GGGGATGCCCACCGCATTTAGCGGGTGGAATCAAAGGAGCTAAAATTGCC
CATTCTGAGTCATTAAGGTCTGTAGAATAAGACTTTCGTCTCATTGTTTC
CTATGTAAATACACTCTACAAACAGTATCTTATCGCTGCCTTTTTATCTT
AGCTCTCCTTTAGATTTACTTTATAAATAGCCTCTTAGAAGAATTTCTTT
ATTATTTATTTAAAGATTTAGTACAAGATTTCGGGCAGAACGCTCTTATT
GGTAAGTCACACACGTTCAAAGATATTTTCTTCGTACCACCAAAATATTC
TGAAATGCTCAAGCGACCTTATGCGCGAATTGAGAGAAAAGATCATGATT
TCGTAATTGGTGCAACTGTTCAAGCATCGCTTGAAGCAGCACCTCCTCCA
GAACAAAACCATGCTTGAGGGATCTTCACGCGCAGCAGAGGATTTAAAAG
CGAGAAATCCTAACAGTTTATACCTTGTGGTTATGGAATGGATAAAACTG
ACCAATGATGTAAATTTACGAAAATATAAAGTTGATCAAATTTATGTACT
ACGTCAGCAAAAAAATACTGATAGAGAGTTTAGGTATGAGTCAACTTACA
TAAAAAAT
What is genomic data?
31
Partial Hierarchy of Genomic Data
  • DNA Sequences
  • Contigs of assembled sequences
  • Predicted introns, exons, promoters, etc.
  • Genes
  • RNA sequences
  • Predicted gene products, proteins
  • Chromosomes
  • Genome

Sequence Analysis is therefore a fundamental
component of bioinformatics!
32
E. coli What makes it kill?
33
E. coli What makes it kill?
Escherichia coli . . .
haemorrhagic colitis
34
E. coli What makes it kill?
E. coli K12
E. coli O157H7
35
E. coli What makes it kill?
E. coli K12
E. coli O157H7
36
E. coli What makes it kill?
37
Metabolomics
What is Bioinformatics?
38
Towards a Treatment for Sleeping Sickness
Prevalance
66 million sufferers
Standard treatment
Derivative of arsenic
39
Towards a Treatment for Sleeping Sickness
TrypanosomesDependent on glycolysis
HumansDependent on glycolysis OR oxidative
metabolism
IDEA Identify drug that selectively blocks
glycolysis
40
Towards a Treatment for Sleeping Sickness
How to block glycolysis?
  • dozen enzyme targets
  • 1 billion per target

Need a method to predict effectiveness!!
41
Towards a Treatment for Sleeping Sickness
Glucose ATP
Glucose-6-phosphate ADP
Hexokinase
d(G6P)/dt k3glucoseATP
42
Towards a Treatment for Sleeping Sickness
Glucose ATP
Glucose-6-phosphate ADP
Hexokinase
Model of glycolysis
d(G6P)/dt k3glucoseATP
d(F6P)/dt k4G6P
d(FDP)/dt k6F6PATP
...
d(pyruvate)/dt k20PEPADP
43
Towards a Treatment for Sleeping Sickness
Glucose ATP
Glucose-6-phosphate ADP
Hexokinase
Model of glycolysis
d(G6P)/dt k3glucoseATP
d(F6P)/dt k4G6P
d(FDP)/dt k6F6PATP
...
d(pyruvate)/dt k20PEPADP
44
Towards a Treatment for Sleeping Sickness
Run model with different realities
45
Metabolomics
What isbioinformatics?
46
What is bioinformatics, revisted
How to extract biological meaning from
overwhelming information
47
A Walk in the Forest
Photo courtesy of www.webshots.com
48
Observation
Photos courtesy of www.webshots.com and Peter
Smallwood
49
Observation
Photos courtesy of www.webshots.com and Peter
Smallwood
50
Observation
Photos courtesy of www.webshots.com and Peter
Smallwood
51
Observation
Photos courtesy of www.webshots.com and Peter
Smallwood
52
Experiment
Photos courtesy of www.webshots.com and Peter
Smallwood
53
Filters Information reducers
A squirrel filter!
54
Filters Information reducers
A molecule filter
55
Filters Information reducers

How organism is made How organism works
A sequence filter
56
From Sequence to OrganismHow does Nature do it?
57
From Sequence to OrganismHow does Nature do it?
Genetic code
Rules of folding
58
From Sequence to OrganismHow does Nature do it?
Genetic code
  • Custom antibiotics

Gives us
59
From Sequence to OrganismHow does Nature do it?
ATGACTTATGATCAACGCACAGGGCTA
  • Custom antibiotics
  • Custom antibodies

Gives us
  • Custom enzymes
  • New materials

60
From Sequence to OrganismHow does Nature do it?
ATGACTTATGATCAACGCACAGGGCTA
  • Begin transcription
  • End transcription
  • Splice transcript
  • Begin translation

Rules of transcriptional and post-transcriptional
control
61
From Sequence to OrganismHow does Nature do it?
ATGACTTATGATCAACGCACAGGGCTA
  • Begin transcription
  • End transcription
  • Splice transcript
  • Begin translation

Rules of transcriptional and post-transcriptional
control
62
From Sequence to OrganismHow does Nature do it?
  • Natural filters/transformations
  • Selective transcription
  • Selective processing
  • Translation
  • Folding

Functional protein
DNA
63
From Sequence to OrganismHow can we do it?
Natural filters/transformations
Functional protein
DNA

Simulation of Nature
Surrogate Processes
64
From Sequence to OrganismHow can we do it?
Simulation of Nature
Whether tis nobler in the mind to suffer the
slings and arrows of outrageous fortune...
We must give our military every tool and weapon
it needs to prevail...
???
65
From Sequence to OrganismHow can we do it?
Surrogate Processes
Whether tis nobler in the mind to suffer the
slings and arrows of outrageous fortune...
Utterance of W Shakespeare
Utterance of George W Bush
We must give our military every tool and weapon
it needs to prevail...
Word frequency
66
From Sequence to OrganismHow can we do it?
Surrogate Processes
Whether tis nobler in the mind to suffer the
slings and arrows of outrageous fortune...
Utterance of W Shakespeare
Utterance of George W Bush
We must give our military every tool and weapon
it needs to prevail...
Word frequency
, words/sentence
67
From Sequence to OrganismHow can we do it?
Surrogate filters
  • Natural filters/transformations
  • Selective transcription
  • Selective processing
  • Translation
  • Folding/function

TCTACTTATA TTCAATCCAC AGGGCTACAC CTAGTTCTTG
AAGAGTCTGT TGAATGAACA CATACATGGT TTATCTGTTT
TTCTGTCTGC TCTGACCTCT GGCAGCTTTC CACTAGTTTC
TGGATTTCGG AACTCTAGCC TGCCCCACTC
My sequence
68
From Sequence to OrganismHow can we do it?
  • Surrogate filters
  • Gene finders
  • Natural filters/transformations
  • Selective transcription
  • Selective processing
  • Translation
  • Folding/function

Met-Thr-Tyr-Asp-Gln-Arg-Thr-Gly-Leu...
Function?
69
From Sequence to OrganismHow can we do it?
  • Surrogate filters
  • Gene finders
  • Natural filters/transformations
  • Selective transcription
  • Selective processing
  • Translation
  • Folding/function
  • Similarity finders

globin?
globin
70
Surrogate FiltersGene finders
Start/Stop codon search
Look for stop codons (TAA,TAG,TGA)
CTCCACGCCCCTCCGTACACCTCTAACATGATGTCAGCAAATATTAAAAA
TGAATAAACTTTGTGACATGTACAAATGGAAATATGCAA
71
Surrogate FiltersGene finders
Start/Stop codon search
Look for stop codons (TAA,TAG,TGA)
CTCCACGCCCCTCCGTACACCTCTAACATGATGTCAGCAAATATTAAAAA
TGAATAAACTTTGTGACATGTACAAATGGAAATATGCAA
TTGCATATTTCCATTTGTACATGTCACAAAGTTTATTCATTTTTAATATT
TGCTGAGATCATGTTAGAGGTGTACGGAGGGGCGTGGAG
Highly inaccurate
72
Surrogate FiltersGene finders
Hidden Markov Model (HMM)-based recognition
73
Surrogate FiltersGene finders
Class 3 Hidden Markov Model (HMM)-based
recognition
Step 1 Create model through extensive training
set
AAAAACAAGAATACA . . .TTGTTT
TrainingSet
AAGCTTGACCAAAAAGTTAAAACACTGACGGCAAATAATCAATGACTATC
AGACAGAGAATCATCGTGCTGTCAGTAAAACCTCTGATTTCGATCTTTAC
CATAATTGTTATGTTGTAATGACTAACCAGACTATCTTTTACAGAGCTTC
TGGTTAACACTTGTCTAATTAGACATTGATAATGTTTGTGGGGGTTGGTC
ATCAGGAATGGTAAATAGCAATTACCCTTCAGACTTTCCTATGAGACGCT
CCGCCAACGAGCAGTGTCTCTTAAAGAACGTTATGAGCGCTCAGTTAACT
TCAGAAATTCACGGCGGAAATCCATAGTTATTATTACTTATGACTAAAAC
AAAATTACTATGGCGGCTTGTTTAATATAGATTCTGTGTTCTGAGAAATG
ACTTTTAAAGTCCCACTAACTTTTTTCTCATCTATTGCTATATTTCGACT
TTAAAACTTATAGTAGATGGCTTAATTCTCAAATAACAAACTCATTTTTA
GTAGATATTTCATGCAAACTGAGGTTTTTAGTGATATTTTCCCCTTATTG
AGTACAGCCACTCCACAAACCTTAGAATGGCTACTCAATATTGCAATTGA
TCATGAATATCCCACTGGTAGAGCAGTTTTAATGGAAGATGCCTGGGGTA
ATGCAGTTTATTTCGTTGTATCTGGATGGGTAAAAGTTCGGCGCACCTGT
GGAGATGATTCGGTAGCTTT
74
Surrogate FiltersGene finders
Class 3 Hidden Markov Model (HMM)-based
recognition
Step 1 Create model through extensive training
set
AAAAACAAGAATACA . . .TTGTTT
TrainingSet
AAGCTTGACCAAAAAGTTAAAACACTGACGGCAAATAATCAATGACTATC
AGACAGAGAATCATCGTGCTGTCAGTAAAACCTCTGATTTCGATCTTTAC
CATAATTGTTATGTTGTAATGACTAACCAGACTATCTTTTACAGAGCTTC
TGGTTAACACTTGTCTAATTAGACATTGATAATGTTTGTGGGGGTTGGTC
ATCAGGAATGGTAAATAGCAATTACCCTTCAGACTTTCCTATGAGACGCT
CCGCCAACGAGCAGTGTCTCTTAAAGAACGTTATGAGCGCTCAGTTAACT
TCAGAAATTCACGGCGGAAATCCATAGTTATTATTACTTATGACTAAAAC
AAAATTACTATGGCGGCTTGTTTAATATAGATTCTGTGTTCTGAGAAATG
ACTTTTAAAGTCCCACTAACTTTTTTCTCATCTATTGCTATATTTCGACT
TTAAAACTTATAGTAGATGGCTTAATTCTCAAATAACAAACTCATTTTTA
GTAGATATTTCATGCAAACTGAGGTTTTTAGTGATATTTTCCCCTTATTG
AGTACAGCCACTCCACAAACCTTAGAATGGCTACTCAATATTGCAATTGA
TCATGAATATCCCACTGGTAGAGCAGTTTTAATGGAAGATGCCTGGGGTA
ATGCAGTTTATTTCGTTGTATCTGGATGGGTAAAAGTTCGGCGCACCTGT
GGAGATGATTCGGTAGCTTT
75
Surrogate FiltersGene finders
Class 3 Hidden Markov Model (HMM)-based
recognition
Step 2 Assess candidate genes
76
Surrogate FiltersGene finders
Class 3 Hidden Markov Model (HMM)-based
recognition
Step 2 Assess candidate genes
3rd order Markov model
A C G TAAA 0.33
0.25 0.12 0.30AAC 0.30 0.20 0.15 0.35AAG
0.35 0.15 0.20 0.30 AAT 0.30 0.15 0.20
0.25 ACA 0.25 0.20 0.15 0.35 . . .TTG 0.25
0.30 0.15 0.30TTT 0.30 0.25 0.10 0.35
Candidategene
0.12
x 0.15
AAAGCAA
77
Surrogate FiltersGene finders
Class 3 Hidden Markov Model (HMM)-based
recognition
Step 2 Assess candidate genes
3rd order Markov model
A C G TAAA 0.33
0.25 0.12 0.30AAC 0.30 0.20 0.15 0.35AAG
0.35 0.15 0.20 0.30 AAT 0.30 0.15 0.20
0.25 ACA 0.25 0.20 0.15 0.35 . . .TTG 0.25
0.30 0.15 0.30TTT 0.30 0.25 0.10 0.35
Candidategene
0.12
x 0.15 . . .
AAAGCTA
So far, not a good candidate!
78
Surrogate FiltersGene finders
Class 3 Hidden Markov Model (HMM)-based
recognition
Step 2 Assess candidate genes
3rd order Markov model
Candidate genes
Predicted genes
79
Surrogate FiltersGene finders
Class 3 Hidden Markov Model (HMM)-based
recognition
Step 2 Assess candidate genes
3rd order Markov model
Conform to standard model
Challenge accepted beliefs
Predicted genes
Candidate genes
Predicted genes
80
Computers are an ideal tool
81
The Crisis in Bioinformatics
1. Need high-level filters
2. Need access to raw phenomena
3. Need new tools for new phenomena
4. Need ability to build new tools
Need a new generation!!
82
AATAAAGCTTTACAAACCAAACTCTGGCTTCAATTGTGTAACCCAAGCTT
TGATTCTTTCCTCTGTTAAATCGGATTGATTATCTTCATCAAGGGCAAGA
CCTACAAATTTACCATCACGAACAGCTTTAGACTCACTGAATTCATAACC
TTCTGTAGGCCAATAGCCAACTGTTTCACCACCATTTTCTGAAATTTTTT
CCTCTAGAATACCGAGGGCATCTTGAAATGTATCAGGATAACCAACCTGG
TCTCCAGGAGCAAAATAAGCAACTTTTTTGCCGATGAAGTCAATGTTATC
TAACTCATCATAAAAATTTTCCCAATCACTTTGCAATTCTCCAACATTCC
AGGTAGGACAACCAACAACGATATAATCGTAGTTATTGAAATCACTTGGT
TCAGCTTGTGAAATATCATATAAAGTTACAACACTATCACCACCAAACTC
CTTCTGAATTATTTCTGATTCAGTTTGGGTATTGCCTGTTTGAGTACCAA
AAAATAAACCAATATTAGACATTTTTACTCCTTTTATGTATTTGCAAAAT
TATTTCAATTAAAATATTTAGTAATAATTAATTGTTAGCTAGCTAATAAT
TAAATTTTTATTACAATCATTGTAAAAGGCATTGAAAAAGTAAATAAAAA
TTTTTATTCTACGTTATTTCAAAAATATTTACTTACATATACTTAACCTT
TATAGTGATGTAATATACTCTAATTCCTATTTTACTTATAAATACCATCT
CAGCTTAATGTAACGAATTTTTCTGTTTATCTTTAAATACAAAAAATTCA
ACAAAACTACAGAAAATTAATCTTAATAACACAAAACAAGTATCAATCTG
TAATACAACTAAGCTTAAATAAATTAATAGAAAGCTTCATCTATCTAATA
GGTTGAGAATAGTTTATGTCTAATGACATAAATTCATTCGTGTTGATTTC
ATTTGGGTATATTCATCTGATTTAGGATTTACTCCATTAAGTTTGTACTC
ATCAATGCCCGCCTGTTGGTATCCACAATTCTCATACAGTGCGCGAGCAA
AGTAATCAATCGTTCGTCGCCATATCTAACTTTGAGTCAAACAAACCAGT
TGGATTACCAACCCTCAACTAATCGCTTCTTTAAGGCGAGCGATCGCACA
TTTAACTGTTGGTTGTCACAAGAGAACTAATACTACAGCAGTATATTTAA
CAACTAAGGGTGGTTCAACTTTCGCTGCGACTCCTCCAACGCGCTGAAAT
ACACAGGACTGATGCGATCGCAAACTCTTTGACTAAATTCCATACATTAT
CATGACCATCTCCCAAACAAACAAGTGGGTTAACCAGATGCTGACTATTA
ACATCCCCTGAGTTCGGAGTTGTAGGTCTATTTGACTGGTTCAAAGCGAT
GATGGAACGGCTTTGTTGCATGAATTAAAAAAAGACACACCATCACCTAC
TTCTAGGATAGACACATCAAACGTCCCACCGCCTAAGTCAAATACCAAGA
TAATTTCGTTAGTTTTCTTGTCAAGTCCGTAAGCGAGGGCCGCCGCCGTG
GGCTAGTTGATAATTCGCAGAACTTTAATCCCGGCAATTCTACTGGCATC
TTTGGTAGCCTGCCGTTGAGAGTCATTGAAATAGGCAGGGGTGGTAATTA
CCGCTTGCCTCACTGGTTCCCCCAGATATGTGCTGGCATCATCTATCAGC
TTGCGGACTACCTCATACCATTTCACGAAAAACCTGATACACATGTAAAC
TCTGAAACCCTTGCTGTATCAAAGTTTTGTAATTACGAATTACGAATTAC
GAATTGATATCAGCCGAGATTTCTTCGGGTGAAAATTCCTTGTTCAGAGC
GGGACAGTGTAGCTTGACATTGCCATTACTGTCACGTACCACTTTGTAAG
TAACTTGTTTTGCCTCTTGCGTAACTTCATCATACCTGCGCCCGATGAAC
CGCTTCACAGAATAAAAAGTGTTTTCTGGGTTCATTACACCCTGGCGCTT

Future Biology
83
How to get there?
Bioinformatics
84
How to get there?
The Challenge
  • Some expert molecular biologists
  • Some master programmers
  • Some knowledgeable in the statistical arts
  • Most have little experience with bioinformatic
    tools

Overall goals of the course
85
(No Transcript)
86
(No Transcript)
87
(No Transcript)
88
(No Transcript)
89
(No Transcript)
90
(No Transcript)
91
(No Transcript)
92
(No Transcript)
93
(No Transcript)
94
(No Transcript)
95
(No Transcript)
96
(No Transcript)
97
(No Transcript)
98
How to get there?
Overall goals of the course
99
How to get there?
Overall goals of the course
Introduction to the questions and tools of
bioinformatics
  • Through specific scientific scenarios
  • Through consideration of how common tools work
  • Through manipulation of the tools to solve
    problems
  • Through computer programming

100
Can normal people program?
Sample problem Whats the probability of getting
at least one pair in five dice?
101
Can normal people program?
MAIN PROGRAM
Simulate the
roll of many dice (number_of_trials) Count
successes (how many trials conditions are
met) my successes 0 foreach my trial
(1..number_of_trials) roll_dice() if
(any_matches()) successes successes 1
print "Number of successes ", successes,
"\n" print "Number of trials ",
number_of_trials, "\n" print "Fraction
successful ", successes/number_of_trials,
"\n"
102
Can normal people program?
sub roll_dice Roll some number of dice, count
ones, twos,... sixes number_of_ones 0
number_of_twos 0 number_of_threes 0
number_of_fours 0 number_of_fives 0
number_of_sixes 0 foreach my roll
(1..number_of_dice) my die_value
random_integer(1,6) if (die_value 1)
number_of_ones number_of_ones 1
if (die_value 2) number_of_twos
number_of_twos 1 if (die_value 3)
number_of_threes number_of_threes 1
if (die_value 4) number_of_fours
number_of_fours 1 if (die_value 5)
number_of_fives number_of_fives 1
if (die_value 6) number_of_sixes
number_of_sixes 1
103
Can normal people program?
CONSTANTS
my
number_of_trials 10000 my number_of_dice
5 my matches_wanted 2
104
How to get there?
Computer programming
Goals of course
  • Be able to understand well-written programs in
    Perl
  • Be able to modify working programs
  • Gain increasing skill in writing programs from
    scratch

105
How to get there?
What do you do?
  • Read notes before coming to class
  • Respond to questionnaire by 700 AM, day of class
  • Attend to problem set questions (particularly
    those you cant do)
  • Serve as TA in area of your expertise
Write a Comment
User Comments (0)
About PowerShow.com