BioInformatics - What and Why? - PowerPoint PPT Presentation

1 / 21
About This Presentation
Title:

BioInformatics - What and Why?

Description:

BioInformatics - What and Why? The following power point presentation is designed to give some background information on Bioinformatics. This presentation is modified ... – PowerPoint PPT presentation

Number of Views:141
Avg rating:3.0/5.0
Slides: 22
Provided by: segundoenc
Category:

less

Transcript and Presenter's Notes

Title: BioInformatics - What and Why?


1
BioInformatics - What and Why?
  • The following power point presentation is
    designed to give some background information on
    Bioinformatics.
  • This presentation is modified from information
    supplied by Dr. Bruno Gaeta, and with permission
    from eBioInformatics Pty Ltd (c) Copywright

2
The need for bioinformaticists. The number of
entries in data bases of gene sequences is
increasing exponentially. Bioinformaticians are
needed to understand and use this information.
GenBank growth
82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97
98 99
3
Genome sequencing projects, including the human
genome project are producing vast amounts of
information. The challenge is to use this
information in a useful way
Publically available genomes (April 1998)
COMPLETE/PENDING PUBLICATION Rickettsia
prowazekii Pseudomonas
aeruginosa Pyrococcus abyssii Bacillus sp.
C-125 Ureaplasma urealyticum Pyrobaculum
aerophilum ALMOST/PUBLIC Pyrococcus
furiosus Mycobacterium tuberculosis
H37Rv Mycobacterium tuberculosis CSU93 Neisseria
gonorrhea Neisseria meningiditis Streptococcus
pyogenes
  • COMPLETE/PUBLIC
  • Aquifex aeolicus
  • Pyrococcus horikoshii
  • Bacillus subtilis
  • Treponema pallidum
  • Borrelia burgdorferi
  • Helicobacter pylori
  • Archaeoglobus fulgidus
  • Methanobacterium thermo.
  • Escherichia coli
  • Mycoplasma pneumoniae
  • Synechocystis sp. PCC6803
  • Methanococcus jannaschii
  • Saccharomyces cerevisiae
  • Mycoplasma genitalium
  • Haemophilus influenzae

Terry Gaasterland, Siv Andersson, Christoph
Sensen http//www.mcs.anl.gov/home/gaasterl/genome
s.html
4
Towards a paradigm shift in biology Nature
News and Views 34999
Bioinformatics impacts on all aspects of
biological research.
..We must hook our individual computers into the
worldwide network that gives us access to daily
changes in the databases and also makes immediate
our communications with each other. The programs
that display and analyze the material for us must
be improved - and we must learn to use them more
effectively. Like the purchased kits, they will
make our life easier, but also like the kits, we
must understand enough of how they work to use
them effectively Walter Gilbert (1991)
Towards a paradigm shift in biology Nature
News and Views 34999
5
Promises of genomics and bioinformatics
  • Medicine
  • Knowledge of protein structure facilitates drug
    design
  • Understanding of genomic variation allows the
    tailoring of medical treatment to the
    individuals genetic make-up
  • Genome analysis allows the targeting of genetic
    diseases
  • The effect of a disease or of a therapeutic on
    RNA and protein levels can be elucidated
  • The same techniques can be applied to
    biotechnology, crop and livestock improvement,
    etc...

6
What is bioinformatics?
  • Application of information technology to the
    storage, management and analysis of biological
    information
  • Facilitated by the use of computers

7
What is bioinformatics?
  • Sequence analysis
  • Geneticists/ molecular biologists analyse genome
    sequence information to understand disease
    processes
  • Molecular modeling
  • Crystallographers/ biochemists design drugs using
    computer-aided tools
  • Phylogeny/evolution
  • Geneticists obtain information about the
    evolution of organisms by looking for
    similarities in gene sequences
  • Ecology and population studies
  • Bioinformatics is used to handle large amounts of
    data obtained in population studies
  • Medical informatics
  • Personalised medicine

8
Sequence analysis overview
Sequence entry
Manual sequence entry
Sequence database browsing
Sequencing project management
Nucleotide sequence analysis
Nucleotide sequence file
Search for protein coding regions
Search databases for similar sequences
Protein sequence analysis
  • Design further experiments
  • Restriction mapping
  • PCR planning

Translate into protein
Protein sequence file
coding
non-coding
Search databases for similar sequences
Search for known motifs
Predict secondary structure
Sequence comparison
Search for known motifs
RNA structure prediction
Sequence comparison
Predict tertiary structure
Multiple sequence analysis
Create a multiple sequence alignment
Edit the alignment
Format the alignment for publication
Molecular phylogeny
Protein family analysis
9
Gene Sequencing Automated chemcial sequencing
methods allow rapid generation of large data
banks of gene sequences
10
Database similarity searching The BLAST program
has been written to allow rapid comparison of a
new gene sequence with the 100s of 1000s of gene
sequences in data bases
Sequences producing significant alignments
(bits) Value gnlPIDe252316
(Z74911) ORF YOR003w Saccharomyces cerevisiae
112 7e-26 gi603258 (U18795) Prb1p vacuolar
protease B Saccharomyces ce... 106
5e-24 gnlPIDe264388 (X59720) YCR045c, len491
Saccharomyces cerevi... 69 7e-13 gnlPIDe23970
8 (Z71514) ORF YNL238w Saccharomyces
cerevisiae 30 0.66 gnlPIDe239572
(Z71603) ORF YNL327w Saccharomyces cerevisiae
29 1.1 gnlPIDe239737 (Z71554) ORF YNL278w
Saccharomyces cerevisiae 29 1.5
gnlPIDe252316 (Z74911) ORF YOR003w
Saccharomyces cerevisiae Length
478 Score 112 bits (278), Expect
7e-26 Identities 85/259 (32), Positives
117/259 (44), Gaps 32/259 (12) Query 2
QSVPWGISRVQAPAAHNRG---------LTGSGVKVAVLDTGIST-HPDL
NIRGG-ASFV 50 PWG RV G
G GV VLDTGI T H D R Sbjct 174
EEAPWGLHRVSHREKPKYGQDLEYLYEDAAGKGVTSYVLDTGIDTEHEDF
EGRAEWGAVI 233 Query 51 PGEPSTQDGNGHGTHVAGTIAAL
NNSIGVLGVAPSAELYXXXXXXXXXXXXXXXXXQGLE 110
P D NGHGTH AG I GVA
GE Sbjct 234 PANDEASDLNGHGTHCAGIIGSKH-
----FGVAKNTKIVAVKVLRSNGEGTVSDVIKGIE 288
11
Sequence comparison Gene sequences can be
aligned to see similarities between gene from
different sources
768 TT....TGTGTGCATTTAAGGGTGATAGTGTATTTGCTCTTTAAGA
GCTG 813
87 TTGACAGGTACCCAACTGTGTGTGCTGA
TGTA.TTGCTGGCCAAGGACTG 135 .
. . . . 814
AGTGTTTGAGCCTCTGTTTGTGTGTAATTGAGTGTGCATGTGTGGGAGTG
863
136 AAGGATC.............TCAGTAATTAATCAT
GCACCTATGTGGCGG 172 . .
. . . 864 AAATTGTGGAATGTGTATGCT
CATAGCACTGAGTGAAAATAAAAGATTGT 913
173
AAA.TATGGGATATGCATGTCGA...CACTGAGTG..AAGGCAAGATTAT
216
12
Restriction mapping Genes can be analysed to
detect gene sequences that can be cleaved with
restriction enzymes
13
PCR Primer Design
  • Oligonucleotides for use in the polymerisation
    chain reaction can be designed using computer
    based prgrams

OPTIMAL primer length --gt
20 MINIMUM primer length --gt
18 MAXIMUM primer length --gt 22
OPTIMAL primer melting temperature --gt
60.000 MINIMUM acceptable melting temp --gt
57.000 MAXIMUM acceptable melting temp --gt
63.000 MINIMUM acceptable primer GC --gt
20.000 MAXIMUM acceptable primer GC --gt
80.000 Salt concentration (mM) --gt
50.000 DNA concentration (nM) --gt
50.000 MAX no. unknown bases (Ns) allowed --gt 0
MAX acceptable self-complementarity --gt 12
MAXIMUM 3' end self-complementarity --gt 8 GC
clamp how many 3' bases --gt 0
14
Gene discovery Computer program can be used to
recognise the protein coding regions in DNA
Plot created using codon preference (GCG)
15
RNA structure prediction Structural features of
RNA can be predicted
C
A
G
U

G
C

A
U
A
G
C

A
U

C
A
U
G
G

U
U
A
A
U
A
C
A
U
G
U
G
G
A
C
C
G
C
U
G
G
G
G
G
U
C
G
G
C
C
C
A
U
C
G
U
U
C
C
A
U
A
A
G
C
G
C
U
A

G
U

C
G
G
C
C
A
16
Protein structure prediction Particular
structural features can be recognised in protein
sequences
50
100
5.0
KD Hydrophobicity
-5.0
10
Surface Prob.
0.0
1.2
Flexibility
0.8
1.7
Antigenic Index
-1.7
CF Turns
CF Alpha Helices
CF Beta Sheets
GOR Turns
GOR Alpha Helices
GOR Beta Sheets
Glycosylation Sites
50
100
17
Protein Structure the 3-D structure of proteins
is used to understand protein function and design
new drugs
18
Multiple sequence alignment Sequences of
proteins from different organisms can be aligned
to see similarities and differences
Alignment formatted using MacBoxshade
19
Phylogeny inference Analysis of sequences allows
evolutionary relationships to be determined
E.coli
C.botulinum
C.cadavers
C.butyricum
B.subtilis
B.cereus
Phylogenetic tree constructed using the Phylip
package
20
Large scale bioinformatics genome projects
  • Mapping
  • Identifying the location of clones and markers on
    the chromosome by genetic linkage analysis and
    physical mapping
  • Sequencing
  • Assembling clone sequence reads into large
    (eventually complete) genome sequences
  • Gene discovery
  • Identifying coding regions in genomic DNA by
    database searching and other methods
  • Function assignment
  • Using database searches, pattern searches,
    protein family analysis and structure prediction
    to assign a function to each predicted gene
  • Data mining
  • Searching for relationships and correlations in
    the information
  • Genome comparison
  • Comparing different complete genomes to infer
    evolutionary history and genome rearrangements

21
Challenges in bioinformatics
  • Explosion of information
  • Need for faster, automated analysis to process
    large amounts of data
  • Need for integration between different types of
    information (sequences, literature, annotations,
    protein levels, RNA levels etc)
  • Need for smarter software to identify
    interesting relationships in very large data sets
  • Lack of bioinformaticians
  • Software needs to be easier to access, use and
    understand
  • Biologists need to learn about the software, its
    limitations, and how to interpret its results
Write a Comment
User Comments (0)
About PowerShow.com