Title: BioInformatics - What and Why?
1BioInformatics - What and Why?
- The following power point presentation is
designed to give some background information on
Bioinformatics. - This presentation is modified from information
supplied by Dr. Bruno Gaeta, and with permission
from eBioInformatics Pty Ltd (c) Copywright
2The need for bioinformaticists. The number of
entries in data bases of gene sequences is
increasing exponentially. Bioinformaticians are
needed to understand and use this information.
GenBank growth
82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97
98 99
3Genome sequencing projects, including the human
genome project are producing vast amounts of
information. The challenge is to use this
information in a useful way
Publically available genomes (April 1998)
COMPLETE/PENDING PUBLICATION Rickettsia
prowazekii Pseudomonas
aeruginosa Pyrococcus abyssii Bacillus sp.
C-125 Ureaplasma urealyticum Pyrobaculum
aerophilum ALMOST/PUBLIC Pyrococcus
furiosus Mycobacterium tuberculosis
H37Rv Mycobacterium tuberculosis CSU93 Neisseria
gonorrhea Neisseria meningiditis Streptococcus
pyogenes
- COMPLETE/PUBLIC
- Aquifex aeolicus
- Pyrococcus horikoshii
- Bacillus subtilis
- Treponema pallidum
- Borrelia burgdorferi
- Helicobacter pylori
- Archaeoglobus fulgidus
- Methanobacterium thermo.
- Escherichia coli
- Mycoplasma pneumoniae
- Synechocystis sp. PCC6803
- Methanococcus jannaschii
- Saccharomyces cerevisiae
- Mycoplasma genitalium
- Haemophilus influenzae
Terry Gaasterland, Siv Andersson, Christoph
Sensen http//www.mcs.anl.gov/home/gaasterl/genome
s.html
4 Towards a paradigm shift in biology Nature
News and Views 34999
Bioinformatics impacts on all aspects of
biological research.
..We must hook our individual computers into the
worldwide network that gives us access to daily
changes in the databases and also makes immediate
our communications with each other. The programs
that display and analyze the material for us must
be improved - and we must learn to use them more
effectively. Like the purchased kits, they will
make our life easier, but also like the kits, we
must understand enough of how they work to use
them effectively Walter Gilbert (1991)
Towards a paradigm shift in biology Nature
News and Views 34999
5Promises of genomics and bioinformatics
- Medicine
- Knowledge of protein structure facilitates drug
design - Understanding of genomic variation allows the
tailoring of medical treatment to the
individuals genetic make-up - Genome analysis allows the targeting of genetic
diseases - The effect of a disease or of a therapeutic on
RNA and protein levels can be elucidated - The same techniques can be applied to
biotechnology, crop and livestock improvement,
etc...
6What is bioinformatics?
- Application of information technology to the
storage, management and analysis of biological
information - Facilitated by the use of computers
7What is bioinformatics?
- Sequence analysis
- Geneticists/ molecular biologists analyse genome
sequence information to understand disease
processes - Molecular modeling
- Crystallographers/ biochemists design drugs using
computer-aided tools - Phylogeny/evolution
- Geneticists obtain information about the
evolution of organisms by looking for
similarities in gene sequences - Ecology and population studies
- Bioinformatics is used to handle large amounts of
data obtained in population studies - Medical informatics
- Personalised medicine
8Sequence analysis overview
Sequence entry
Manual sequence entry
Sequence database browsing
Sequencing project management
Nucleotide sequence analysis
Nucleotide sequence file
Search for protein coding regions
Search databases for similar sequences
Protein sequence analysis
- Design further experiments
- Restriction mapping
- PCR planning
Translate into protein
Protein sequence file
coding
non-coding
Search databases for similar sequences
Search for known motifs
Predict secondary structure
Sequence comparison
Search for known motifs
RNA structure prediction
Sequence comparison
Predict tertiary structure
Multiple sequence analysis
Create a multiple sequence alignment
Edit the alignment
Format the alignment for publication
Molecular phylogeny
Protein family analysis
9Gene Sequencing Automated chemcial sequencing
methods allow rapid generation of large data
banks of gene sequences
10Database similarity searching The BLAST program
has been written to allow rapid comparison of a
new gene sequence with the 100s of 1000s of gene
sequences in data bases
Sequences producing significant alignments
(bits) Value gnlPIDe252316
(Z74911) ORF YOR003w Saccharomyces cerevisiae
112 7e-26 gi603258 (U18795) Prb1p vacuolar
protease B Saccharomyces ce... 106
5e-24 gnlPIDe264388 (X59720) YCR045c, len491
Saccharomyces cerevi... 69 7e-13 gnlPIDe23970
8 (Z71514) ORF YNL238w Saccharomyces
cerevisiae 30 0.66 gnlPIDe239572
(Z71603) ORF YNL327w Saccharomyces cerevisiae
29 1.1 gnlPIDe239737 (Z71554) ORF YNL278w
Saccharomyces cerevisiae 29 1.5
gnlPIDe252316 (Z74911) ORF YOR003w
Saccharomyces cerevisiae Length
478 Score 112 bits (278), Expect
7e-26 Identities 85/259 (32), Positives
117/259 (44), Gaps 32/259 (12) Query 2
QSVPWGISRVQAPAAHNRG---------LTGSGVKVAVLDTGIST-HPDL
NIRGG-ASFV 50 PWG RV G
G GV VLDTGI T H D R Sbjct 174
EEAPWGLHRVSHREKPKYGQDLEYLYEDAAGKGVTSYVLDTGIDTEHEDF
EGRAEWGAVI 233 Query 51 PGEPSTQDGNGHGTHVAGTIAAL
NNSIGVLGVAPSAELYXXXXXXXXXXXXXXXXXQGLE 110
P D NGHGTH AG I GVA
GE Sbjct 234 PANDEASDLNGHGTHCAGIIGSKH-
----FGVAKNTKIVAVKVLRSNGEGTVSDVIKGIE 288
11Sequence comparison Gene sequences can be
aligned to see similarities between gene from
different sources
768 TT....TGTGTGCATTTAAGGGTGATAGTGTATTTGCTCTTTAAGA
GCTG 813
87 TTGACAGGTACCCAACTGTGTGTGCTGA
TGTA.TTGCTGGCCAAGGACTG 135 .
. . . . 814
AGTGTTTGAGCCTCTGTTTGTGTGTAATTGAGTGTGCATGTGTGGGAGTG
863
136 AAGGATC.............TCAGTAATTAATCAT
GCACCTATGTGGCGG 172 . .
. . . 864 AAATTGTGGAATGTGTATGCT
CATAGCACTGAGTGAAAATAAAAGATTGT 913
173
AAA.TATGGGATATGCATGTCGA...CACTGAGTG..AAGGCAAGATTAT
216
12Restriction mapping Genes can be analysed to
detect gene sequences that can be cleaved with
restriction enzymes
13PCR Primer Design
- Oligonucleotides for use in the polymerisation
chain reaction can be designed using computer
based prgrams
OPTIMAL primer length --gt
20 MINIMUM primer length --gt
18 MAXIMUM primer length --gt 22
OPTIMAL primer melting temperature --gt
60.000 MINIMUM acceptable melting temp --gt
57.000 MAXIMUM acceptable melting temp --gt
63.000 MINIMUM acceptable primer GC --gt
20.000 MAXIMUM acceptable primer GC --gt
80.000 Salt concentration (mM) --gt
50.000 DNA concentration (nM) --gt
50.000 MAX no. unknown bases (Ns) allowed --gt 0
MAX acceptable self-complementarity --gt 12
MAXIMUM 3' end self-complementarity --gt 8 GC
clamp how many 3' bases --gt 0
14Gene discovery Computer program can be used to
recognise the protein coding regions in DNA
Plot created using codon preference (GCG)
15RNA structure prediction Structural features of
RNA can be predicted
C
A
G
U
G
C
A
U
A
G
C
A
U
C
A
U
G
G
U
U
A
A
U
A
C
A
U
G
U
G
G
A
C
C
G
C
U
G
G
G
G
G
U
C
G
G
C
C
C
A
U
C
G
U
U
C
C
A
U
A
A
G
C
G
C
U
A
G
U
C
G
G
C
C
A
16Protein structure prediction Particular
structural features can be recognised in protein
sequences
50
100
5.0
KD Hydrophobicity
-5.0
10
Surface Prob.
0.0
1.2
Flexibility
0.8
1.7
Antigenic Index
-1.7
CF Turns
CF Alpha Helices
CF Beta Sheets
GOR Turns
GOR Alpha Helices
GOR Beta Sheets
Glycosylation Sites
50
100
17Protein Structure the 3-D structure of proteins
is used to understand protein function and design
new drugs
18Multiple sequence alignment Sequences of
proteins from different organisms can be aligned
to see similarities and differences
Alignment formatted using MacBoxshade
19Phylogeny inference Analysis of sequences allows
evolutionary relationships to be determined
E.coli
C.botulinum
C.cadavers
C.butyricum
B.subtilis
B.cereus
Phylogenetic tree constructed using the Phylip
package
20Large scale bioinformatics genome projects
- Mapping
- Identifying the location of clones and markers on
the chromosome by genetic linkage analysis and
physical mapping - Sequencing
- Assembling clone sequence reads into large
(eventually complete) genome sequences - Gene discovery
- Identifying coding regions in genomic DNA by
database searching and other methods
- Function assignment
- Using database searches, pattern searches,
protein family analysis and structure prediction
to assign a function to each predicted gene - Data mining
- Searching for relationships and correlations in
the information - Genome comparison
- Comparing different complete genomes to infer
evolutionary history and genome rearrangements
21Challenges in bioinformatics
- Explosion of information
- Need for faster, automated analysis to process
large amounts of data - Need for integration between different types of
information (sequences, literature, annotations,
protein levels, RNA levels etc) - Need for smarter software to identify
interesting relationships in very large data sets
- Lack of bioinformaticians
- Software needs to be easier to access, use and
understand - Biologists need to learn about the software, its
limitations, and how to interpret its results