Tale 1: To Identify an Unknown Gene - PowerPoint PPT Presentation

About This Presentation
Title:

Tale 1: To Identify an Unknown Gene

Description:

Photos courtesy of www.webshots.com and Peter Smallwood. Observation ... Print GeneMark 2.4 predictions. 7. Click Start GeneMark.hmm. Surrogate Filters ... – PowerPoint PPT presentation

Number of Views:21
Avg rating:3.0/5.0
Slides: 114
Provided by: peopl2
Category:
Tags: gene | identify | tale | unknown

less

Transcript and Presenter's Notes

Title: Tale 1: To Identify an Unknown Gene


1
(No Transcript)
2
(No Transcript)
3
(No Transcript)
4
Observation
Photos courtesy of www.webshots.com and Peter
Smallwood
5
Observation
Photos courtesy of www.webshots.com and Peter
Smallwood
6
Observation
Photos courtesy of www.webshots.com and Peter
Smallwood
7
Observation
Photos courtesy of www.webshots.com and Peter
Smallwood
8
Experiment
Photos courtesy of www.webshots.com and Peter
Smallwood
9
Filters Information reducersSquirrel filter
10
Filters Information reducersMolecule filter
11
Filters Information reducersSequence filter
How organism is made How organism works
12
From Sequence to OrganismHow does Nature do it?
13
From Sequence to OrganismHow does Nature do it?
Genetic code
Rules of folding
14
From Sequence to OrganismHow does Nature do it?
ATGACTTATGATCAACGCACAGGGCTA
  • Transcrl initiation
  • Transcrl termination/ polyA tailing
  • Splicing
  • Transll initiation

Rules of transcriptional and post-transcriptional
control
15
From Sequence to OrganismHow does Nature do it?
  • Natural filters/transformations
  • Selective transcription
  • Selective processing
  • Translation
  • Folding

Functional protein
DNA
16
From Sequence to OrganismHow does Nature do it?
Natural filters/transformations
Functional protein
DNA
17
From Sequence to OrganismHow can WE do it?
Simulation of Nature
Whether tis nobler in the mind to suffer the
slings and arrows of outrageous fortune...
We must give our military every tool and weapon
it needs to prevail...
???
18
From Sequence to OrganismHow can WE do it?
Surrogate Processes
Whether tis nobler in the mind to suffer the
slings and arrows of outrageous fortune...
Utterence of Wm Shakespeare
Utterence of George W Bush
We must give our military every tool and weapon
it needs to prevail...
Words/sentence Choice of words Sentence
structure
19
From Sequence to OrganismHow can WE do it?
Surrogate filters
  • Natural filters/transformations
  • Selective transcription
  • Selective processing
  • Translation
  • Folding
  • Gene finders

20
From Sequence to OrganismHow can WE do it?
  • Surrogate filters
  • Gene finders
  • Natural filters/transformations
  • Selective transcription
  • Selective processing
  • Translation
  • Folding
  • Similarity finders

21
From Sequence to OrganismHow can WE do it?
  • Surrogate filters
  • Gene finders
  • Similarity finders
  • Feature finders
  • Natural filters/transformations
  • Selective transcription
  • Selective processing
  • Translation
  • Folding

22
From Sequence to OrganismHow can WE do it?
  • Surrogate filters
  • Gene finders
  • Similarity finders
  • Feature finders
  • Pattern finders
  • Natural filters/transformations
  • Selective transcription
  • Selective processing
  • Translation
  • Folding

23
From Sequence to OrganismHow can WE do it?
  • Surrogate filters
  • Gene finders
  • Similarity finders
  • Feature finders
  • Pattern finders
  • 2nd Most powerful tool
  • Natural filters/transformations
  • Selective transcription
  • Selective processing
  • Translation
  • Folding

24
Surrogate Filters
You do it
25
Surrogate FiltersGene finders
Class 1 Start/Stop codon search (Map, Frames,
OrfFinder)
Look for stop codons (TAA,TAG,TGA)
CTCCACGCCCCTCCGTACACCTCTAACATGATGTCAGCAAATATTAAAAA
TGAATAAACTTTGTGACATGTACAAATGGAAATATGCAA
26
Surrogate FiltersGene finders
Class 1 Start/Stop codon search (Map, Frames,
OrfFinder)
Look for stop codons (TAA,TAG,TGA)
CTCCACGCCCCTCCGTACACCTCTAACATGATGTCAGCAAATATTAAAAA
TGAATAAACTTTGTGACATGTACAAATGGAAATATGCAA
TTGCATATTTCCATTTGTACATGTCACAAAGTTTATTCATTTTTAATATT
TGCTGAGATCATGTTAGAGGTGTACGGAGGGGCGTGGAG
27
Surrogate FiltersGene finders
Class 1 Start/Stop codon search (Map, Frames,
OrfFinder)
Pro Quick, simple
Con Useless for eukaryotic genomic sequences
(introns)
Inaccurate (start codon problem)
Inaccurate (doubtful short open reading
frames)
28
Surrogate FiltersGene finders
Do it
Class 1 Start/Stop codon search (Map, Frames,
OrfFinder)
1. Go to http//www.vcu.edu/elhaij/BioInf
2. Open 2nd 3rd browsers (Ctrl-N in Netscape)
Go to same site (copy and paste URL)
3. In 1st browser, go to Program List Click
on Gene Finders then scroll down Open
OrfFinder
4. In 2nd browser, open sample sequence
29
Surrogate FiltersGene finders
Do it
Class 1 Start/Stop codon search (Map, Frames,
OrfFinder)
5. Paste sample sequence into window
6. Choose Bacterial Code in Genetic codes
window
7. Click on OrfFind
30
(No Transcript)
31
Surrogate FiltersGene finders
Class 2 Codon bias recognition (TestCode)
Are codons equally used?
The code is degenerate
32
Surrogate FiltersGene finders
Class 2 Codon bias recognition (TestCode)
Most frequently used codons
Codon bias universal?
Yes/No(basis for determining foreign genes)
Codon usage is biased
33
Surrogate FiltersGene finders
Class 2 Codon bias recognition (TestCode)
Pro Quick, simple, available through GCG
Better than Class 1 in excluding false open
reading frames
34
Surrogate FiltersGene finders
Class 3 Markov Model-based recognition
Principle
Step 1 Create model through extensive
training set Training set
proven or suspected genes
Organism-specific
Step 2 Assess candidate genes through filter of
model
35
Surrogate FiltersGene finders
Class 3 Markov Model-based recognition
Step 1 Create model through extensive training
set
AAAAACAAGAATACA . . .TTGTTT
TrainingSet
AAGCTTGACCAAAAAGTTAAAACACTGACGGCAAATAATCAATGACTATC
AGACAGAGAATCATCGTGCTGTCAGTAAAACCTCTGATTTCGATCTTTAC
CATAATTGTTATGTTGTAATGACTAACCAGACTATCTTTTACAGAGCTTC
TGGTTAACACTTGTCTAATTAGACATTGATAATGTTTGTGGGGGTTGGTC
ATCAGGAATGGTAAATAGCAATTACCCTTCAGACTTTCCTATGAGACGCT
CCGCCAACGAGCAGTGTCTCTTAAAGAACGTTATGAGCGCTCAGTTAACT
TCAGAAATTCACGGCGGAAATCCATAGTTATTATTACTTATGACTAAAAC
AAAATTACTATGGCGGCTTGTTTAATATAGATTCTGTGTTCTGAGAAATG
ACTTTTAAAGTCCCACTAACTTTTTTCTCATCTATTGCTATATTTCGACT
TTAAAACTTATAGTAGATGGCTTAATTCTCAAATAACAAACTCATTTTTA
GTAGATATTTCATGCAAACTGAGGTTTTTAGTGATATTTTCCCCTTATTG
AGTACAGCCACTCCACAAACCTTAGAATGGCTACTCAATATTGCAATTGA
TCATGAATATCCCACTGGTAGAGCAGTTTTAATGGAAGATGCCTGGGGTA
ATGCAGTTTATTTCGTTGTATCTGGATGGGTAAAAGTTCGGCGCACCTGT
GGAGATGATTCGGTAGCTTT
36
Surrogate FiltersGene finders
Class 3 Markov Model-based recognition
Step 1 Create model through extensive training
set
AAAAACAAGAATACA . . .TTGTTT
TrainingSet
AAGCTTGACCAAAAAGTTAAAACACTGACGGCAAATAATCAATGACTATC
AGACAGAGAATCATCGTGCTGTCAGTAAAACCTCTGATTTCGATCTTTAC
CATAATTGTTATGTTGTAATGACTAACCAGACTATCTTTTACAGAGCTTC
TGGTTAACACTTGTCTAATTAGACATTGATAATGTTTGTGGGGGTTGGTC
ATCAGGAATGGTAAATAGCAATTACCCTTCAGACTTTCCTATGAGACGCT
CCGCCAACGAGCAGTGTCTCTTAAAGAACGTTATGAGCGCTCAGTTAACT
TCAGAAATTCACGGCGGAAATCCATAGTTATTATTACTTATGACTAAAAC
AAAATTACTATGGCGGCTTGTTTAATATAGATTCTGTGTTCTGAGAAATG
ACTTTTAAAGTCCCACTAACTTTTTTCTCATCTATTGCTATATTTCGACT
TTAAAACTTATAGTAGATGGCTTAATTCTCAAATAACAAACTCATTTTTA
GTAGATATTTCATGCAAACTGAGGTTTTTAGTGATATTTTCCCCTTATTG
AGTACAGCCACTCCACAAACCTTAGAATGGCTACTCAATATTGCAATTGA
TCATGAATATCCCACTGGTAGAGCAGTTTTAATGGAAGATGCCTGGGGTA
ATGCAGTTTATTTCGTTGTATCTGGATGGGTAAAAGTTCGGCGCACCTGT
GGAGATGATTCGGTAGCTTT
37
Surrogate FiltersGene finders
Class 3 Markov Model-based recognition
Step 1 Create model through extensive training
set
AAAAACAAGAATACA . . .TTGTTT
TrainingSet
AAGCTTGACCAAAAAGTTAAAACACTGACGGCAAATAATCAATGACTATC
AGACAGAGAATCATCGTGCTGTCAGTAAAACCTCTGATTTCGATCTTTAC
CATAATTGTTATGTTGTAATGACTAACCAGACTATCTTTTACAGAGCTTC
TGGTTAACACTTGTCTAATTAGACATTGATAATGTTTGTGGGGGTTGGTC
ATCAGGAATGGTAAATAGCAATTACCCTTCAGACTTTCCTATGAGACGCT
CCGCCAACGAGCAGTGTCTCTTAAAGAACGTTATGAGCGCTCAGTTAACT
TCAGAAATTCACGGCGGAAATCCATAGTTATTATTACTTATGACTAAAAC
AAAATTACTATGGCGGCTTGTTTAATATAGATTCTGTGTTCTGAGAAATG
ACTTTTAAAGTCCCACTAACTTTTTTCTCATCTATTGCTATATTTCGACT
TTAAAACTTATAGTAGATGGCTTAATTCTCAAATAACAAACTCATTTTTA
GTAGATATTTCATGCAAACTGAGGTTTTTAGTGATATTTTCCCCTTATTG
AGTACAGCCACTCCACAAACCTTAGAATGGCTACTCAATATTGCAATTGA
TCATGAATATCCCACTGGTAGAGCAGTTTTAATGGAAGATGCCTGGGGTA
ATGCAGTTTATTTCGTTGTATCTGGATGGGTAAAAGTTCGGCGCACCTGT
GGAGATGATTCGGTAGCTTT
38
Surrogate FiltersGene finders
Class 3 Markov Model-based recognition
Step 2 Assess candidate genes
3rd order Markov model
A C G TAAA 0.33
0.25 0.12 0.30AAC 0.30 0.20 0.15 0.35AAG
0.35 0.15 0.20 0.30 AAT 0.30 0.15 0.20
0.25 ACA 0.25 0.20 0.15 0.35 . . .TTG 0.25
0.30 0.15 0.30TTT 0.30 0.25 0.10 0.35
Candidategene
AAAGCAA
39
Surrogate FiltersGene finders
Class 3 Markov Model-based recognition
Step 2 Assess candidate genes
3rd order Markov model
A C G TAAA 0.33
0.25 0.12 0.30AAC 0.30 0.20 0.15 0.35AAG
0.35 0.15 0.20 0.30 AAT 0.30 0.15 0.20
0.25 ACA 0.25 0.20 0.15 0.35 . . .TTG 0.25
0.30 0.15 0.30TTT 0.30 0.25 0.10 0.35
Candidategene
0.12
x 0.15
AAAGCAA
40
Surrogate FiltersGene finders
Class 3 Markov Model-based recognition
Step 2 Assess candidate genes
3rd order Markov model
A C G TAAA 0.33
0.25 0.12 0.30AAC 0.30 0.20 0.15 0.35AAG
0.35 0.15 0.20 0.30 AAT 0.30 0.15 0.20
0.25 ACA 0.25 0.20 0.15 0.35 . . .TTG 0.25
0.30 0.15 0.30TTT 0.30 0.25 0.10 0.35
Candidategene
0.12
x 0.15 . . .
AAAGCTA
So far, not a good candidate!
41
Surrogate FiltersGene finders
Class 3 Markov Model-based recognition
Step 2 Assess candidate genes
42
Surrogate FiltersGene finders
Class 3 Markov Model-based recognition
Step 2 Assess candidate genes
3rd order Markov model
Conform to standard model
Challenge accepted beliefs
Predicted genes
43
Surrogate FiltersGene finders
Class 3 Hidden Markov Model (HMM)-based
recognition
Pro Almost most accurate method known
Con Needs big training set
May miss genes of foreign origin
Will miss very small genes
44
Surrogate FiltersGene finders
Do it
Class 3 Hidden Markov Model (HMM)-based
recognition
1. Go to course web page (3rd browser)
2. Go to Program List Click on Gene Finders
then GeneMark
3. Click on here in Gene Prediction in
Bacteria and Archaea
4. Paste in sample sequence
45
Surrogate FiltersGene finders
Do it
Class 3 Hidden Markov Model (HMM)-based
recognition
5. Choose Nostoc PCC 7120 as species
6. Check Generate PDF graphics (screen)
Print GeneMark 2.4 predictions
7. Click Start GeneMark.hmm
46
(No Transcript)
47
Surrogate FiltersScenario I Case of the Hidden
Heterocyst
48
Case of the Hidden Heterocyst
NH3
N2
O2
Matveyev and Elhai (unpublished)
49
Case of the Hidden Heterocyst
Strategy to find heterocyst differentiation genes
1. Use transposon mutagenesis
50
Case of the Hidden Heterocyst
Strategy to find heterocyst differentiation genes
Nostoc genome
Transposon
1. Use transposon mutagenesis
to find a mutant defective in heterocyst
differentiation
51
Case of the Hidden Heterocyst
Strategy to find heterocyst differentiation genes
Nostoc genome
AAGCTTGACCAAAAAGTTAAAACACTGACGGCAAATAATCAATGACTATC
AGACAGAGAATCATCGTGCTGTCAGTAAAACCTCTGATTTCGATCTTTAC
CATAATTGTTATGTTGTAATGACTAACCAGACTATCTTTTACAGAGCTTC
TGGTTAACACTTGTCTAATTAGACATTGATAATGTTTGTGGGGGTTGGTC
ATCAGGAATGGTAAATAGCAATTACCCTTCAGACTTTCCTATGAGACGCT
CCGCCAACGAGCAGTGTCTCTTAAAGAACGTTATGAGCGCTCAGTTAACT
TCAGAAATTCACGGCGGAAATCCATAGTTATTATTACTTATGACTAAAAC
AAAATTACTATGGCGGCTTGTTTAATATAGATTCTGTGTTCTGAGAAATG
ACTTTTAAAGTCCCACTAACTTTTTTCTCATCTATTGCTATATTTCGACT
TTAAAACTTATAGTAGATGGCTTAATTCTCAAATAACAAACTCATTTTTA
GTAGATATTTCATGCAAACTGAGGTTTTTAGTGATATTTTCCCCTTATTG
AGTACAGCCACTCCACAAACCTTAGAATGGCTACTCAATATTGCAATTGA
TCATGAATATCCCACTGGTAGAGCAGTTTTAATGGAAGATGCCTGGGGTA
ATGCAGTTTATTTCGTTGTATCTGGATGGGTAAAAGTTCGGCGCACCTGT
GGA
1. Use transposon mutagenesis
to find a mutant defective in heterocyst
differentiation
2. Sequence out from transposon
52
Case of the Hidden Heterocyst
Strategy to find heterocyst differentiation genes
Nostoc genome
Do it
AAGCTTGACCAAAAAGTTAAAACACTGACGGCAAATAATCAATGACTATC
AGACAGAGAATCATCGTGCTGTCAGTAAAACCTCTGATTTCGATCTTTAC
CATAATTGTTATGTTGTAATGACTAACCAGACTATCTTTTACAGAGCTTC
TGGTTAACACTTGTCTAATTAGACATTGATAATGTTTGTGGGGGTTGGTC
ATCAGGAATGGTAAATAGCAATTACCCTTCAGACTTTCCTATGAGACGCT
CCGCCAACGAGCAGTGTCTCTTAAAGAACGTTATGAGCGCTCAGTTAACT
TCAGAAATTCACGGCGGAAATCCATAGTTATTATTACTTATGACTAAAAC
AAAATTACTATGGCGGCTTGTTTAATATAGATTCTGTGTTCTGAGAAATG
ACTTTTAAAGTCCCACTAACTTTTTTCTCATCTATTGCTATATTTCGACT
TTAAAACTTATAGTAGATGGCTTAATTCTCAAATAACAAACTCATTTTTA
GTAGATATTTCATGCAAACTGAGGTTTTTAGTGATATTTTCCCCTTATTG
AGTACAGCCACTCCACAAACCTTAGAATGGCTACTCAATATTGCAATTGA
TCATGAATATCCCACTGGTAGAGCAGTTTTAATGGAAGATGCCTGGGGTA
ATGCAGTTTATTTCGTTGTATCTGGATGGGTAAAAGTTCGGCGCACCTGT
GGA
1. Use transposon mutagenesis
to find a mutant defective in heterocyst
differentiation
2. Sequence out from transposon
3. Find gene boundaries
4. Identify gene
53
Case of the Hidden Heterocyst
Strategy to find heterocyst differentiation genes
1. Go to course web page (http//www.vcu.edu/
elhaij/BioInf)
2. Open Nostoc sequence
3. Do what you need to do to find the gene
54
(No Transcript)
55
Case of the Hidden Heterocyst
Strategy to find heterocyst differentiation genes
Mission successful gtTranslation 358..513
(direct), 51 amino acids VQLAKQAQTAEGTLQIVTNARVTQT
VKLVRLEKFLSLQKSVEEALENVK
or was it?
Check predicted protein against databases
56
Surrogate FiltersSimilarity finders
Do it
  • Blast
  • BlastP Protein sequence to search protein
    database
  • BlastN Nucleotide sequence to search nucleotide
    database
  • BlastX Nucleotide sequence (translated) to
    search protein database
  • TBlastN Protein sequence to search (translated)
    nucleotide database
  • Blast2Seq Compare two sequences you specify
  • FastA
  • (Various flavors)

Pfam (Protein motif families) Finds conserved
motifs similar to protein sequence
57

58
(No Transcript)
59
(No Transcript)
60
(No Transcript)
61
Case of the Hidden Heterocyst
Strategy to find heterocyst differentiation genes
Mission successful gtTranslation 397..639
(direct), 51 amino acids VQLAKQAQTAEGTLQIVTNARVTQT
VKLVRLEKFLSLQSTVDAAVENIKGA
62
Case of the Hidden Heterocyst
Strategy to find heterocyst differentiation genes
What happened?
  • GeneMark is correct Conservation of noncoding
    regions
  • GeneMark is wrong Fooled by weird aa sequence
    or start codon

63
Case of the Hidden Heterocyst
Strategy to find heterocyst differentiation genes
What happened?
  • GeneMark is correct Conservation of noncoding
    regions
  • GeneMark is wrong Fooled by weird aa sequence
    or start codon

Moral Automated gene finders are wonderful, but
common sense is better
Dont trust automated annotation
64
Surrogate FiltersFeature finders
  • Hidden Markov model-based methods
  • Good for contiguous features (e.g. signal
    sequences)
  • Problems with features having gaps (e.g.
    promoters)
  • Ad hoc methods
  • Feature-specific rules (e.g. tandem repeats,
    terminators)

Position-dependent frequency tables
Position-specific scoring matrix (PSSM)
Weight table
65
Surrogate FiltersFeature finders
Position-dependent frequency tables
Consensus TATAAA
66
Surrogate FiltersFeature finders
Position-dependent frequency tables
67
Surrogate FiltersFeature finders
Position-Specific Scoring Matrix in action
atpI ACCTCGAAGGGAGCAGGAGTGAAAAAC bioB ACGTTTTGG
AGAAGCCCCATGGCTCAC glnA ATCCAGGAGAGTTAAAGTATGTCCGC
T glnH TAGAAAAAAGGAAATGCTATGAAGTCT lacZ TTCACACAGG
AAACAGCTATGACCATG rpsJ AATTGGAGCTCTGGTCTCATGCAGAAC
serC GCAACGTGGTGAGGGGAAATGGCTCAA sucA GATGCTTAAGG
GATCACGATGCAGAAC trpE CAAAATTAGAGAATAACAATGCAAACA
Experimentally proven start sites
68
Surrogate FiltersFeature finders
Position-Specific Scoring Matrix in action
?
Unknownstart site
aceB ACTATGGAGCATCTGCACATGAAAACC atpI ACCTCGAAGGGA
GCAGGAGTGAAAAAC bioB ACGTTTTGGAGAAGCCCCATGGCTCAC g
lnA ATCCAGGAGAGTTAAAGTATGTCCGCT glnH TAGAAAAAAGGAA
ATGCTATGAAGTCT lacZ TTCACACAGGAAACAGCTATGACCATG rp
sJ AATTGGAGCTCTGGTCTCATGCAGAAC serC GCAACGTGGTGAGG
GGAAATGGCTCAA sucA GATGCTTAAGGGATCACGATGCAGAAC trp
E CAAAATTAGAGAATAACAATGCAAACA
Experimentally proven start sites
69
Surrogate FiltersFeature finders
Position-Specific Scoring Matrix in action
?
Unknownstart site
aceB ACTATGGAGCATCTGCACATGAAAACC atpI ACCTCGAAGGGA
GCAGGAGTGAAAAAC bioB ACGTTTTGGAGAAGCCCCATGGCTCAC g
lnA ATCCAGGAGAGTTAAAGTATGTCCGCT glnH TAGAAAAAAGGAA
ATGCTATGAAGTCT lacZ TTCACACAGGAAACAGCTATGACCATG rp
sJ AATTGGAGCTCTGGTCTCATGCAGAAC serC GCAACGTGGTGAGG
GGAAATGGCTCAA sucA GATGCTTAAGGGATCACGATGCAGAAC trp
E CAAAATTAGAGAATAACAATGCAAACA
Experimentally proven start sites
70
Surrogate FiltersFeature finders
Position-Specific Scoring Matrix in action
atpI ACCTCGAAGGGAGCAG.....GAGTGAAAAAC bioB
ACGTTTTGGAGAAGC...CCCATGGCTCAC glnA
ATCCAGGAGAGTTA.AAGTATGTCCGCT glnH
TAGAAAAAAGGAAATG.....CTATGAAGTCT lacZ
TTCACACAGGAAACAG....CTATGACCATG rpsJ
AATTGGAGCTCTGGTCTCATGCAGAAC serC
GCAACGTGGTGAGGG...GAAATGGCTCAA sucA
GATGCTTAAGGGATCA....CGATGCAGAAC trpE
CAAAATTAGAGAATA...ACAATGCAAACA
71
Surrogate FiltersFeature finders
Position-Specific Scoring Matrix in action
aceB ACCACATAACTATGGAGCATCTGCACATGAAAACC atpI
ACCTCGAAGGGAGCAG.....GAGTGAAAAAC bioB
ACGTTTTGGAGAAGC...CCCATGGCTCAC glnA
ATCCAGGAGAGTTA.AAGTATGTCCGCT glnH
TAGAAAAAAGGAAATG.....CTATGAAGTCT lacZ
TTCACACAGGAAACAG....CTATGACCATG rpsJ
AATTGGAGCTCTGGTCTCATGCAGAAC serC
GCAACGTGGTGAGGG...GAAATGGCTCAA sucA
GATGCTTAAGGGATCA....CGATGCAGAAC trpE
CAAAATTAGAGAATA...ACAATGCAAACA
72
Surrogate FiltersFeature finders
Position-Specific Scoring Matrix in action
aceB ACCACATAACTATGGAGCATCT.GCACATGAAAACC atpI
ACCTCGAAGGGAGCAG.....GAGTGAAAAAC bioB
ACGTTTTGGAGAAGC...CCCATGGCTCAC glnA
ATCCAGGAGAGTTA.AAGTATGTCCGCT glnH
TAGAAAAAAGGAAATG.....CTATGAAGTCT lacZ
TTCACACAGGAAACAG....CTATGACCATG rpsJ
AATTGGAGCTCTGGTCTCATGCAGAAC serC
GCAACGTGGTGAGGG...GAAATGGCTCAA sucA
GATGCTTAAGGGATCA....CGATGCAGAAC trpE
CAAAATTAGAGAATA...ACAATGCAAACA
73
Surrogate FiltersPattern finders
New pattern discovery (Meme, Gibbs sampler,
BioProspector)
74
Surrogate FiltersPattern finders
How do pattern finders work?
snRNA U1 (pU1-6) AGGTATATGGAGCTGTGACAGGGCAGAAGTGTG
TGAAGTC histone H1t GCCCTACCCTATATAAGGCCCCGAGGCCGC
CCGGGTGTTT HMG-14 CGGCCGGCGGGGAGGGGGAGCCCGCGGCCGGG
GACGCGGG TP1 GCCAAGGCCTTAAATACCCAGACTCCTGCCCCCGGGC
CTT protamine P1 CCCTGGCATCTATAACAGGCCGCAGAGCTGGCC
CCTGACT
Step 1. Arbitrarily choose candidate pattern from
a sequence
75
Surrogate FiltersPattern finders
How do pattern finders work?
snRNA U1 (pU1-6) AGGTATATGGAGCTGTGACAGGGCAGAAGTGTG
TGAAGTC histone H1t GCCCTACCCTATATAAGGCCCCGAGGCCGC
CCGGGTGTTT HMG-14 CGGCCGGCGGGGAGGGGGAGCCCGCGGCCGGG
GACGCGGG TP1 GCCAAGGCCTTAAATACCCAGACTCCTGCCCCCGGGC
CTT protamine P1 CCCTGGCATCTATAACAGGCCGCAGAGCTGGCC
CCTGACT
Step 1. Arbitrarily choose candidate pattern from
a sequence
Step 2. Find best matches to pattern in all
sequences
76
Surrogate FiltersPattern finders
How do pattern finders work?
snRNA U1 (pU1-6) AGGTATATGGAGCTGTGACAGGGCAGAAGTGTG
TGAAGTC histone H1t GCCCTACCCTATATAAGGCCCCGAGGCCGC
CCGGGTGTTT HMG-14 CGGCCGGCGGGGAGGGGGAGCCCGCGGCCGGG
GACGCGGG TP1 GCCAAGGCCTTAAATACCCAGACTCCTGCCCCCGGGC
CTT protamine P1 CCCTGGCATCTATAACAGGCCGCAGAGCTGGCC
CCTGACT
Step 1. Arbitrarily choose candidate pattern from
a sequence
Step 2. Find best matches to pattern in all
sequences
Step 3. Construct position-dependent frequency
table based on matches
77
Surrogate FiltersPattern finders
How do pattern finders work?
snRNA U1 (pU1-6) AGGTATATGGAGCTGTGACAGGGCAGAAGTGTG
TGAAGTC histone H1t GCCCTACCCTATATAAGGCCCCGAGGCCGC
CCGGGTGTTT HMG-14 CGGCCGGCGGGGAGGGGGAGCCCGCGGCCGGG
GACGCGGG TP1 GCCAAGGCCTTAAATACCCAGACTCCTGCCCCCGGGC
CTT protamine P1 CCCTGGCATCTATAACAGGCCGCAGAGCTGGCC
CCTGACT
Step 1. Arbitrarily choose candidate pattern from
a sequence
Step 2. Find best matches to pattern in all
sequences
Step 3. Construct position-dependent frequency
table based on matches
Step 4. Calculate relative probability of matches
from frequency table
78
Surrogate FiltersPattern finders
How do pattern finders work?
snRNA U1 (pU1-6) AGGTATATGGAGCTGTGACAGGGCAGAAGTGTG
TGAAGTC histone H1t GCCCTACCCTATATAAGGCCCCGAGGCCGC
CCGGGTGTTT HMG-14 CGGCCGGCGGGGAGGGGGAGCCCGCGGCCGGG
GACGCGGG TP1 GCCAAGGCCTTAAATACCCAGACTCCTGCCCCCGGGC
CTT protamine P1 CCCTGGCATCTATAACAGGCCGCAGAGCTGGCC
CCTGACT
Step 1. Arbitrarily choose candidate pattern from
a sequence
Step 2. Find best matches to pattern in all
sequences
Step 3. Construct position-dependent frequency
table based on matches
Step 4. Calculate relative probability of matches
from frequency table
Step 5. If probability score high, remember
pattern and score
79
Surrogate FiltersPattern finders
How do pattern finders work?
snRNA U1 (pU1-6) AGGTATATGGAGCTGTGACAGGGCAGAAGTGTG
TGAAGTC histone H1t GCCCTACCCTATATAAGGCCCCGAGGCCGC
CCGGGTGTTT HMG-14 CGGCCGGCGGGGAGGGGGAGCCCGCGGCCGGG
GACGCGGG TP1 GCCAAGGCCTTAAATACCCAGACTCCTGCCCCCGGGC
CTT protamine P1 CCCTGGCATCTATAACAGGCCGCAGAGCTGGCC
CCTGACT
Step 1. Arbitrarily choose candidate pattern from
a sequence
Step 2. Find best matches to pattern in all
sequences
Step 3. Construct position-dependent frequency
table based on matches
Step 4. Calculate relative probability of matches
from frequency table
Step 5. If probability score high, remember
pattern and score
Step 6. Repeat Steps 1 - 5
80
Surrogate FiltersScenario II Case of the
Masked Motif
  • Youve found a gene related to Purple Tongue
    Syndrome
  • BlastP Encoded protein related to cAMP-binding
    proteins
  • Are the similarities trivial? Related to cAMP
    binding?
  • Does your protein contain cAMP-binding site?
  • What IS a cAMP-binding site?
  • Task
  • Determine what is a cAMP-binding site
  • Determine if your protein has one

81
Surrogate FiltersScenario II Case of the
Masked Motif
Strategy
  1. Collect sequences of known cAMP-binding proteins
  1. Run Meme, a pattern-finding programAsk it to
    find any significant motifs

Do it
  1. Rerun Meme. Demand that every protein has
    identified motifs
  1. Run Pfam over known sequence to check

82
(No Transcript)
83
Surrogate FiltersScenario III Case of the
Mortal Mitochondrion
  • Progressive External Ophthalmoplegia (PEO)
  • Slow paralysis of voluntary eye muscles
  • Many other symptoms (e.g., frequent deafness)
  • Loss of mitochondrial DNA

84
Surrogate FiltersScenario III Case of the
Mortal Mitochondrion
  • Progressive External Ophthalmoplegia (PEO)
  • Slow paralysis of voluntary eye muscles
  • Many other symptoms (e.g., frequent deafness)
  • Loss of mitochondrial DNA
  • Inheritance
  • Mendelian
  • Autosomal dominant
  • Linked to chromosome 4q34

85
Surrogate FiltersScenario III Case of the
Mortal Mitochondrion
  • Progressive External Ophthalmoplegia (PEO)
  • Slow paralysis of voluntary eye muscles
  • Many other symptoms (e.g., frequent deafness)
  • Loss of mitochondrial DNA
  • Inheritance
  • Mendelian
  • Autosomal dominant
  • Linked to chromosome 4q34

Your task
  • Examine sequence of 4q34 region
  • Assess likelihood that a gene in the area could
    cause disease symptoms

86
Surrogate FiltersScenario III Case of the
Mortal Mitochondrion Examining Sequence of 4q34
Region
tctacttatattcaatccacagggctacacctagttcttggtacacagta
catgctcagcaagagtctgttgaatgaacacatacatggtttatctgttt
gtctcttccgagttcttgacttctgtctgctctgacctctggcagctttc
cactagtttctagctttcattctgcttacctggatttcggaactctagcc
tgccccactcttagataaacgcatgccctctgtggccctggaaccttagt
gacttctgctataccaaagtctccacgcccagggtgacacgcagctgcag
ctccgtaaacctctaacatgatgtcagcaaatattaaaaaaaaaaagttt
ataaaaacaatgaataaactttgttaaaggtacaaatgaaaattagcaaa
catgggaagataattgagtaaagagtttaaagttaaaaacgaattgcagt
cattctaggggaaggaacagttgtatttgaaaacctgtatggttacatga
actgcctaaaaaacaagctaaggaaaattaaagctcagatttatatattt
taagaaattaattgcaattaatttcctgggattaaatagcatttcctcaa
ccccagctgtcattaaaaagaggcaaatacagccaaggactggatcttct
ccggaaggctgacagcactgaccctcaagaaggcaccggctgacagacag
aacattctgccctaatatgtgctgaaattccgctgagagcagagtggtac
attgaaccctttaggggcttacaaaagaagtgtcctgtgttttagagtca
cagagttttgcagaaacaagtatgaattcacctagtggccccctgcacca
ggtctttcctgtgggcactgagtgcagacacatcaatatgtaatagcaga
atgaatgactgaacgaacgattgaatgaaaagaaatgagaggcagcaggt
tgtcagattctatgaggcaatcacagcatcaggtgaccttagtatctatt
tgagaggactgccatttattctcgggagcgcacggctctaaagaggccca
tatccaggcagtgagctctggtggggggcgcctttagatgcaagaaggag
gaaacagctcgaaatccctgggcctgagcgcggcccgtgcaggccggagg
gtcaagaactctccaccggcggcagcggcccggtgtctgccccggcttcg
ccccggcctaaggctgcctgtgctataaatacgcggcccacatgccgcgg
tgacacggtgttccctgggctcggcgggacagataacatgaatgtgccct
ttaaacgtcccaagttgcagggacagcccccggcccagcctcgctcccgg
aagcgccttcgcccccgatgccctctgcagctgggaggagggggcgcccc
gcacctgcccagccaatgcgcggcgcgagcgccggccgcgacccgcctcc
tctcgcgagagcccggcggggatataagggggagctgcgggccaggcggc
ggccccctagcgtcgcgcagggtcggggactgcgcgcggtgccaggccgg
gcgtgggcgagagcacgaacgggctgcctgcgggctgagagcgtcgagct
gtcaccatgggtgatcacgcttggagcttcctaaaggacttcctggccgg
gggcgtcgccgctgccgtctccaagaccgcggtcgcccccatcgagaggg
tcaaactgctgctgcaggtgaggaccgcgcggtgcaagaggcgggcgcgg
gcgcggcgggccgggcggggcgcgcgatgcggcgcgagctgcagggcgcg
gggcgccgcggaaaatctgcgccaggccacaggcccgggcgcccgcccgc
ccgcgggggaagaaggtgccctctgcgtagagacaggtccagcgtcagtc
gcagattcctggtgtcgggtggcgcccggcgttcgggtgtctatatatgg
aaacccacccggagccggtttacgtgtgccagatcctgcgcccgtgacag
cacgggcgtgcactcaggcccggaggcacctagtgattgccagtattttt
ggcaccgtcttatgcgcacgcacctttacaataaaaacatcaaaataatc
atcacccaagaattcccttatcgtatctcatgcacaatgctgtatgtagg
ctgacgccttcatctttatgtaacctctgtgagagagttattcttctcca
ttttacagatgaagctgaggttttgaaatattaagaaacaattttcggaa
taaactcagatcatcctgtctccaaatcttttcctcccctacctggtcgc
tgaatggtttatcatcctctcgtgttttcctccacctgcccaaaaggtca
gggcccctcaatgaggaagagcccaatttgggagtcagaattactaacaa
caaaacccccacaaattgctcacaacggcagcaaacccttaataattgat
tacttggattatctgcttgaaaactttggaggcctaatgtttagtggatt
tattctccttcctctattagagcatctagtagagatcctcatctccaggg
tgatcagagtgacactgagaaattgtcattttttggccatcatgtctatt
aaatccaaagccctttgaagcagggagtgttactcatttctgtcccccag
taagcccctcatacagttctcaaacctagggaaagtgaaataaataaatg
gctatagctttatataattcaatcaccttttcagtttatttggggcaata
cctttccctcaaataccctaataattgaagcaacattggattattttggc
ttgttatccagtaactaacatggataacagtatccatttacacgtcctcg
tatccatttgatttcctcatcctttttttcttcaaaaaaaaaatctagga
agtgcaaaccttttttttttctcctgtcctcttcccttctctctaccctg
cctgtcctctgtcacccaccctcccctccaccaggtccagcatgccagca
aacagatcagtgctgagaagcagtacaaagggatcattgattgtgtggtg
agaatccctaaggagcagggcttcctctccttctggaggggtaacctggc
caacgtgatccgttacttccccacccaagctctcaacttcgccttcaagg
acaagtacaagcagctcttcttagggggtgtggatcggcataagcagttc
tggcgctactttgctggtaacctggcgtccggtggggccgctggggccac
ctccctttgctttgtctacccgctggactttgctaggaccaggttggctg
ctgatgtgggcaagggcgccgcccagcgtgagttccatggtctgggcgac
tgtatcatcaagatcttcaagtctgatggcctgagggggctctaccaggg
tttcaacgtctctgtccaaggcatcattatctatagagctgcctacttcg
gagtctatgatactgccaagggtgagagaggggcatcggggagaaggagg
gtggtgtggaaagaggatcctatgggatctataactcacaaaggacctga
tatatattgatcttgttttttctagtctctgggataattgaggcttctga
atgaggaggtgatgtgcataagttaatagctgaagcgttccttgtgtcct
ctactgaaataaactctggcctttagttattcagagaggaggagggggga
gcctgtctccctctagacacagccatagcagttactgagtttaacttgaa
gccacttccaatgccctgtatacaagctgagcactgcccctccggggtcc
ggagagggcagcagccacctttgctgtctgcctggtcatatgtgaagcac
ctgcacaggggcaggttccccgcaaggtcagagcatggagctggaggtgc
agtggcctctctccctccacctgctttctgctgagaacaggcacttcata
gccgttcggcttctgggctctgtccacagggatgctgcctgaccccaaga
acgtgcacatttttgtgagctggatgattgcccagagtgtgacggcagtc
gcagggctggtgtcctacccctttgacactgttcgtcgtagaatgatgat
gcagtccggccggaaagggggtaagcttgtgctctactcatctaaacttg
tttggttttgcccgaggagaacattttacagggctcctttcagtcttcct
tactggaaattaattttcaaaattatttgataaggacttagggaagaaag
atggtattaattccccctaacgttctcaactatcctattagggaaaagta
ttttccattttattagagatgataagaacatgaatagtaagacatttaga
tgtgaatttaactaggtatccagcattatagagaccctaggccctcttcc
cttagagcctgggtgcaaaagctagggaaaagaagtagttagctacttct
tacaaagaactcttgcttccctcctagttacaggtgttagtgggatgggg
tgtttagctgggtagagatggcctgaagcaatctgttgtgccagagaaag
ttttggcttctataggttgaaccatatgaaattgccactttaaaagtcaa
aaacagtccaatgttagcagtttcgtatgtttcaacgaatagttacagcc
ttttatttagactgcataacctcgtgcaggatcatctgaggctcagcctc
agttcggtcctccataaaaaaaggtaaccgcgtagcataatactcctgct
ccactgcgcccttcttgtttcgcagttgggcagtccatgaattacttggt
taattgccccagttcttcactgaccttgaactaatggagtaggaatgaca
ggagacccagcctgccagtgaagcaaggaaggagatgtccagtgggatgt
tgcatggagctgggactccatgcccagatgaccctgattttataaaactg
gtaacagtgtgtacagatatgtttcaggggaaaagtctctttcctccagc
gttacggagccctcaccagcatttgtttccacagccgatattatgtacac
ggggacagttgactgctggaggaagattgcaaaagacgaaggagccaagg
ccttcttcaaaggtgcctggtccaatgtgctgagaggcatgggcggtgct
tttgtattggtgttgtatgatgagatcaaaaaatatgtctaatgtaatta
aaacacaagttcacagatttacatgaacttgatctacaagttcacagatc
cattgtgtggtttaatagactattcctaggggaagtaaaaagatctggga
taaaaccagactgaaggaatacctcagaagagatgcttcattgagtgttc
attaaaccacacatgtattttgtatttattttacatttaaattcccacag
caaatagaaaataatttatcatacttgtacaattaactgaagaattgata
ataactgaatgtgaaacatcaataaagaccacttaatgcacgctttctat
tttattgaactcttattaactgtaaaatgcatttttaaaagatcaaaaat
gcatattttctagcatgattcatgtatcagtcagcagccaagcttctaaa
tgccagatattatattgagaatgtattatatgagaacgtacaatgcttaa
agttccggttttcaaacttaggcaggtcatattctatctatcttatccag
cgttactgtaggctagaaagtgataatggctttcataatcctgccttgtc
ttaggcactttcctgcag
87
Surrogate FiltersScenario III Case of the
Mortal Mitochondrion
Strategy
  • Assume that encoded protein is in mitochondria
  • Protein has function associated with
    mitochondrial location?
  • Use Gene finder to identify protein sequence(s)
  • Use Similarity finder to identify possible
    function

Do it
  • Protein has structure associated with
    mitochondrial location?
  • Use Feature finders to identify pertinent
    regions
  • (What ARE pertinent regions?)

88
(No Transcript)
89
Surrogate FiltersScenario III Case of the
Mortal Mitochondrion
Run 4q34 region through FGeneSH
Name PEO-related_gene? First three lines of
sequence tctacttatattcaatccacagggctacacctagttcttg
gtacacagtacatgctcagcaagagtctgttgaat gaacacatacatgg
tttatctgtttgtctcttccgagttcttgacttctgtctgctctgacctc
tggcagctttc cactagtttctagctttcattctgcttacctggatttc
ggaactctagcctgccccactcttagataaacgcatg Fgenesh
Wed Feb 27 165914 GMT 2002 FGENESH 1.0
Prediction of potential genes in Human
genomic DNA Time Wed Feb 27 165914 2002
Seq name PEO-related_gene? Length of sequence
5768 GC content 48 Zone 2 Positions of
predicted genes and exons G Str Feature
Start End Score ORF Len
1 TSS 1216 -2.70 1
1 CDSf 1607 - 1717 18.01 1607 -
1717 111 1 2 CDSi 2985 - 3471
52.41 2985 - 3470 486 1 3 CDSi
3980 - 4120 20.99 3982 - 4119 138
1 4 CDSl 5035 - 5192 2.32 5037
- 5192 156 1 PolA 5471
0.92 Predicted protein(s) gtFGENESH 1 4
exon (s) 1607 - 5192 298 aa, chain
MGDHAWSFLKDFLAGGVAAAVSKTAVAPIERVKLLLQVQHASKQISAE
KQYKGIIDCVVR IPKEQGFLSFWRGNLANVIRYFPTQALNFAFKDKYKQ
LFLGGVDRHKQFWRYFAGNLASG GAAGATSLCFVYPLDFARTRLAADVG
KGAAQREFHGLGDCIIKIFKSDGLRGLYQGFNVS VQGIIIYRAAYFGVY
DTAKGMLPDPKNVHIFVSWMIAQSVTAVAGLVSYPFDTVRRRMMM QSGR
KGADIMYTGTVDCWRKIAKDEGAKAFFKGAWSNVLRGMGGAFVLVLYDEI
KKYV
/ 3
Translated message
?
90
How to decide where exons are?
Strategy
Do it
  • Compare sequence of 4q34 region to sequence of
    mRNA
  • Sequence of mRNA may be in cDNA library
  • Expressed Sequence Tag (EST) library

Problems
  • Library may not exist
  • Expression of gene may be low

91

92
(No Transcript)
93
Surrogate FiltersScenario III Case of the
Mortal Mitochondrion
Run 4q34 region through BlastN (x human ests)
MORAL Trust, but verify.
94
Surrogate FiltersScenario III Case of the
Mortal Mitochondrion
Strategy
  • Assume that encoded protein is in mitochondria
  • Protein has function associated with
    mitochondrial location?

?
  • Use Gene finder to identify protein sequence(s)
  • Use Similarity finder to identify possible
    function
  • Protein has structure associated with
    mitochondrial location?
  • Use Feature finders to identify pertinent
    structures
  • (What ARE pertinent structures?)

95
Surrogate FiltersScenario III Case of the
Mortal Mitochondrion
Run 4q34 region through BlastP
Name PEO-related_gene? First three lines of
sequence tctacttatattcaatccacagggctacacctagttcttg
gtacacagtacatgctcagcaagagtctgttgaat gaacacatacatgg
tttatctgtttgtctcttccgagttcttgacttctgtctgctctgacctc
tggcagctttc cactagtttctagctttcattctgcttacctggatttc
ggaactctagcctgccccactcttagataaacgcatg Fgenesh
Wed Feb 27 165914 GMT 2002 FGENESH 1.0
Prediction of potential genes in Human
genomic DNA Time Wed Feb 27 165914 2002
Seq name PEO-related_gene? Length of sequence
5768 GC content 48 Zone 2 Positions of
predicted genes and exons G Str Feature
Start End Score ORF Len
1 TSS 1216 -2.70 1
1 CDSf 1607 - 1717 18.01 1607 -
1717 111 1 2 CDSi 2985 - 3471
52.41 2985 - 3470 486 1 3 CDSi
3980 - 4120 20.99 3982 - 4119
138 1 4 CDSl 5035 - 5192 2.32
5037 - 5192 156 1 PolA 5471
0.92 Predicted protein(s) gtFGENESH
1 4 exon (s) 1607 - 5192 298 aa, chain
MGDHAWSFLKDFLAGGVAAAVSKTAVAPIERVKLLLQVQHASKQISAE
KQYKGIIDCVVR IPKEQGFLSFWRGNLANVIRYFPTQALNFAFKDKYKQ
LFLGGVDRHKQFWRYFAGNLASG GAAGATSLCFVYPLDFARTRLAADVG
KGAAQREFHGLGDCIIKIFKSDGLRGLYQGFNVS VQGIIIYRAAYFGVY
DTAKGMLPDPKNVHIFVSWMIAQSVTAVAGLVSYPFDTVRRRMMM QSGR
KGADIMYTGTVDCWRKIAKDEGAKAFFKGAWSNVLRGMGGAFVLVLYDEI
KKYV
Translated message
96
Surrogate FiltersScenario III Case of the
Mortal Mitochondrion
Strategy
  • Assume that encoded protein is in mitochondria
  • Protein has function associated with
    mitochondrial location?

?
  • Use Gene finder to identify protein sequence(s)
  • Use Similarity finder to identify possible
    function

?
  • Protein has structure associated with
    mitochondrial location?
  • Use Feature finders to identify pertinent
    structures
  • (What ARE pertinent structures?)

97
Surrogate FiltersScenario III Case of the
Mortal Mitochondrion
  • Progressive External Ophthalmoplegia (PEO)
  • Slow paralysis of voluntary eye muscles
  • Many other symptoms (e.g., frequent deafness)
  • Loss of mitochondrial DNA
  • Inheritance
  • Mendelian
  • Autosomal dominant
  • Linked to chromosome 4q34

98
2nd Most Powerful ToolScenario IV Case of the
Lethal Look-alike
99
2nd Most Powerful ToolScenario IV Case of the
Lethal Look-alike
100
2nd Most Powerful ToolScenario IV Case of the
Lethal Look-alike
Escherichia coli . . .
haemorrhagic colitis
101
2nd Most Powerful ToolScenario IV Case of the
Lethal Look-alike
E. coli K12
E. coli O157H7
102
2nd Most Powerful ToolScenario IV Case of the
Lethal Look-alike
E. coli K12
E. coli O157H7
103
2nd Most Powerful ToolScenario IV Case of the
Lethal Look-alike
104
2nd Most Powerful ToolScenario IV Case of the
Lethal Look-alike
E. coli K12
E. coli O157H7
What tool to use?
Go to http//www.vcu.edu/elhaij/BioInf
105
2nd Most Powerful ToolScenario IV Case of the
Lethal Look-alike
E. coli K12
E. coli O157H7
ASSIGN K12-set FROM Gene-finder (K12-DNA)
ASSIGN O157-set FROM Gene-finder (O157-DNA)
CONSIDER EACH protein IN O157-set
WHEN Constituent-of (K12-set, protein)
FALSE
COLLECT protein
106
2nd Most Powerful ToolScenario IV Case of the
Lethal Look-alike
E. coli K12
E. coli O157H7
FUNCTION Constituent-of (set, item)
CONSIDER EACH
protein
IN
set
WHEN
protein item
RETURN TRUE
OTHERWISE RETURN FALSE
107
2nd Most Powerful ToolScenario IV Case of the
Lethal Look-alike
E. coli K12
E. coli O157H7
FUNCTION Constituent-of (set, item)
CONSIDER EACH
protein
IN
set
WHEN
protein item
RETURN TRUE
FINALLY RETURN FALSE
108
2nd Most Powerful ToolScenario IV Case of the
Lethal Look-alike
E. coli K12
E. coli O157H7
ASSIGN K12-set FROM Gene-finder (K12-DNA)
ASSIGN O157-set FROM Gene-finder (O157-DNA)
CONSIDER EACH protein IN O157-set
WHEN Constituent-of (K12-set, protein)
FALSE
COLLECT protein
FUNCTION Constituent-of (set, item)
CONSIDER EACH
FINALLY RETURN FALSE
109
2nd Most Powerful Tool
Computer programming
  • Make your own tools
  • Change them at will
  • Use them to teach you

110
FIRST Most Powerful Tool
Your brain
  • Keep your nonsense detector on high alert

111
FIRST Most Powerful Tool
Your brain
  • Keep your nonsense detector on high alert
  • Appreciate the limitations of bioinformatic tools

112
FIRST Most Powerful Tool
Your brain
  • Keep your nonsense detector on high alert
  • Appreciate the limitations of bioinformatic tools
  • Look out for surprises in the underlying data

113
(No Transcript)
Write a Comment
User Comments (0)
About PowerShow.com