Bioinformatics - PowerPoint PPT Presentation

About This Presentation

Title:

Bioinformatics

Description:

Bioinformatics – PowerPoint PPT presentation

Number of Views:25

Avg rating:3.0/5.0

Slides: 33

Provided by: mooki

Learn more at: http://www.columbia.edu

Category:

more less

Transcript and Presenter's Notes

Title: Bioinformatics

1
Bioinformatics
- Science Honors Program - Computer Modeling and
Visualization in Chemistry

2
What is Bioinformatics?

Its at the intersection of biotechnology and
computer science, to analyze the enormous amount
of sequence and structural data that we have
generated over the past decades.
Computational tools to mine this enormous
amount of data.

3
Bioinformatics is multidisciplinary
Mathematics/computer science
Genomics

Molecular biology Biomedicine
Bioinformatics
Biophysics
Ethical, legal, and social implications
Molecular evolution
4

Biological data
Huge data sets
Complexity of biological systems

5
What are we trying to Find out?
KEGG Kyoto Encyclopedia of Genes and Genomes.
A grand challenge in the post genomic era is a
complete computer representation of the cell and
the organism, which will enable computational
prediction of higher-level complexity of cellular
processes and organism behaviors from genomic
information. http//www.genome.jp/kegg/
6
Where Does the Data Come From?

Primary Protein Structure Determination a
variety of chemical techniques
Nucleic Acid Sequencing
PCR (polymerase chain reaction)
3D Structure
X-Ray Crystallography
Nuclear Magnetic Resonance (NMR)

7
Biological information From genes to proteins
Gene
DNA
Transcription
genomics molecular biology
RNA
Translation
structural biology biophysics
Protein
Protein folding
8
Eukaryotic Genome DNA
Structure
Nucleotides (bases) Adenine (A) Cytosine
(C) Guanine (G) Thymine (T)
Sequence data Strings of letters
triplet codons genetic code
20 amino acids (A, L, V, S etc.)
9
Three-dimensional protein structure atomic
coordinates in 3D space
Measured in Angstrom
Conversion into metric measurement Unit Angstrom
x 10-8 cm x 0.1 nm
10
Proteins Prediction of biochemical function

Relationships between
DNA or amino acid
sequence 3D structure
protein function
Use of this knowledge for prediction of function,
molecular modelling, and design (e.g., new
therapies)

CGCCAGCTGGACGGGCACACCATGAGGCTGCTGACCCTCCTGGGCCTTCT
G TDQAAFDTNIVTLTRFVMEQGRKARGTGEMTQLLNSLCTAVKAI
STAVRKAGIAHLYGIAGSTNVTGDQVKKLDVLSNDLVINVLKSSFATCVL
VTEEDKNAIIVEPEKRGKYVVCFDPLDGSSNIDCLVSIGTIFGIYRKNST
DEPSEKDALQPGRNLVAAGYALYGSATML
11
DNA Sequence Gene Protein
Sequence Function
12
Sequence Databases

Protein Sequences
ExPASy Molecular Biology Server (SWISS-PROT)
http//expasy.ch
Protein Information Resource (PIR)
http//pir.georgetown.edu
Protein Research Foundation (PRF)
http//www.prf.or.jp/en
NR from NCBI www.ncbi.nlm.nih.gov
OMIM Online Mendelian Inheritance in Man.
Genetic diseases http//www.ncbi.nlm.nih.gov/
entrez/query.fcgi?dbOMIM

13
Other Databases

GeneBank (Nucleotide) www.ncbi.nlm.nih.gov/Ge
nbank
NCBI Entrez, an integrated, text-based search and
retrieval system used at NCBI for the major
databases, including PubMed, Nucleotide and
Protein Sequences, Protein Structures, Complete
Genomes, Taxonomy, and others.
http//www.ncbi.nlm.nih.gov/gquery/gquery.fcgi\
PDB Protein Databank. Protein Structures.
www.rcsb.org/pdb
KEGG Kyoto Encyclopedia of Genes and Genomes
http//www.genome.jp
PubMed Literature References
http//www.ncbi.nlm.nih.gov/entrez/query.fcgi?dbP
ubMed
NCBI primers bottom of page. This page has
education resources. http//www.ncbi.nih.gov/
Education

14
Genome sequencing and analysis (genomics)
Genomics generates a vast amount of DNA sequence
data. Sophisticated algorithms are used to
predict gene regions. Only 3 of the vertebrate
genome codes for proteins.

Genbank hold sequences from over 800 organisms.
There are currently 113 complete genomes.
The completion of a "working draft" of the human
genome was announced in June 2001.
Estimates of 38 - 120,000 genes (40, 000)

15
Explore the following databases

KEGG Kyoto Encyclopedia of Genes and Genomes
Main Map http//www.genome.jp/kegg/pathway/map/m
ap01100.html
Main Page http//www.genome.jp/kegg/
Genbank
What is it? An annotated collection of all
publicly available DNA sequences
www.ncbi.nlm.nih.gov/Genbank
PDB The Protein Databank
What is it?A database of 3D structures of
proteins (and some DNA or other molecules)
www.rcsb.org/pdb

16
What is homology?

Homology common ancestry
Why homology
much easier to get a DNA or protein sequence than
to experimentally determine structure or function
of a biological molecule
Rapid expansion of databases of sequences (DNA,
protein, RNA) far greater than structural or
functional databases.
Develop computational methods to infer
biologically relevant information from sequence
alone.
E.g., If know that two protein sequences are
homologous, then we can infer that the two
proteins may share the same protein fold, active
site, and even function.
In simple terms, much of todays class involves
Darwinian evolution discussed at the microscopic
genetic level.

17
Stochastic Evolutionary Forces Act on Genomes

Forces that alter a genetic sequence
Random Mutation
Natural Selection
Genetic Drift
Comparison of protein sequences can be used to
infer evolutionary events that happened possibly
billions of years ago.
If find a homologous protein sequence to a given
protein, often from very divergent organisms,
then a common ancestral protein must have
existed.
Homologous protein share similar 3-D structure.
Homologous proteins can have seemingly very
different sequences.

18
EvolutionaryTree
19
Modes of Evolution.
Globin Family Evolution
Orthologous differ because of
speciation Paralogous differ because of gene
duplication
20
Orthologous Sequences Cytochrome c Family
21
Sequence Alignment
Optimal alignments of human myoglobin and human
hemoglobin (alpha chain) Algorithms
Needleman-Wunsch, Smith-Waterman Heuristic
Algorithms Pairwise Alignment
BLAST (www.ncbi.nlm.nih.gov) Multiple
Sequence Alignment ClustalW
(www.ebi.ac.uk) PSI-BLAST
22
Structural Alignment
Comparison of dihydrofolate reductases from
Mycobacterium tuberculosis (1DF7) and Esherichia
coli (1DRE) with 41 sequence identity. After
the sequences of these two proteins were aligned,
the alpha-carbons of the backbone were
structurally aligned.
23
Amino Acid Similarity Matrix
A similarity matrix incorporates information
about the likelihood that one amino-acid will be
mutated into another over evolutionary time.
Shown here is the PAM250 matrix. Another common
one is BLOSUM50
24
Sequence AlignmentNeedleman-Wunsch
25
Global Alignment
Needleman-Wunsch Algorithm
26
Global Alignment Example
27
Local Alignment
Smith-Waterman Algorithm
28
Local Alignment Example
29
Multliple Sequence Alignment
30
Gene Doping
31
Gene Therapy
32
Gene Doped Mice
HEAVY WORKOUT. This rat, injected with a
muscle-enhancing gene, boosts its strength by
lugging weights up a ladder. http//www.sciencene
ws.org/articles/20041030/bob9.asp

Write a Comment

User Comments (0)