Bioinformatics - PowerPoint PPT Presentation

About This Presentation
Title:

Bioinformatics

Description:

Bioinformatics – PowerPoint PPT presentation

Number of Views:25
Avg rating:3.0/5.0
Slides: 33
Provided by: mooki
Learn more at: http://www.columbia.edu
Category:

less

Transcript and Presenter's Notes

Title: Bioinformatics


1
Bioinformatics
- Science Honors Program - Computer Modeling and
Visualization in Chemistry

2
What is Bioinformatics?
  • Its at the intersection of biotechnology and
    computer science, to analyze the enormous amount
    of sequence and structural data that we have
    generated over the past decades.
  • Computational tools to mine this enormous
    amount of data.

3
Bioinformatics is multidisciplinary
Mathematics/computer science
Genomics

Molecular biology Biomedicine
Bioinformatics
Biophysics
Ethical, legal, and social implications
Molecular evolution
4
  • Biological data
  • Huge data sets
  • Complexity of biological systems

5
What are we trying to Find out?
KEGG Kyoto Encyclopedia of Genes and Genomes.
A grand challenge in the post genomic era is a
complete computer representation of the cell and
the organism, which will enable computational
prediction of higher-level complexity of cellular
processes and organism behaviors from genomic
information. http//www.genome.jp/kegg/
6
Where Does the Data Come From?
  • Primary Protein Structure Determination a
    variety of chemical techniques
  • Nucleic Acid Sequencing
  • PCR (polymerase chain reaction)
  • 3D Structure
  • X-Ray Crystallography
  • Nuclear Magnetic Resonance (NMR)

7
Biological information From genes to proteins
Gene
DNA
Transcription
genomics molecular biology
RNA
Translation
structural biology biophysics
Protein
Protein folding
8
Eukaryotic Genome DNA
Structure
Nucleotides (bases) Adenine (A) Cytosine
(C) Guanine (G) Thymine (T)
Sequence data Strings of letters
triplet codons genetic code
20 amino acids (A, L, V, S etc.)
9
Three-dimensional protein structure atomic
coordinates in 3D space
Measured in Angstrom
Conversion into metric measurement Unit Angstrom
x 10-8 cm x 0.1 nm
10
Proteins Prediction of biochemical function
  • Relationships between
  • DNA or amino acid
  • sequence 3D structure
    protein function
  • Use of this knowledge for prediction of function,
    molecular modelling, and design (e.g., new
    therapies)

CGCCAGCTGGACGGGCACACCATGAGGCTGCTGACCCTCCTGGGCCTTCT
G TDQAAFDTNIVTLTRFVMEQGRKARGTGEMTQLLNSLCTAVKAI
STAVRKAGIAHLYGIAGSTNVTGDQVKKLDVLSNDLVINVLKSSFATCVL
VTEEDKNAIIVEPEKRGKYVVCFDPLDGSSNIDCLVSIGTIFGIYRKNST
DEPSEKDALQPGRNLVAAGYALYGSATML
11
DNA Sequence Gene Protein
Sequence Function
12
Sequence Databases
  • Protein Sequences
  • ExPASy Molecular Biology Server (SWISS-PROT)
    http//expasy.ch
  • Protein Information Resource (PIR)
    http//pir.georgetown.edu
  • Protein Research Foundation (PRF)
    http//www.prf.or.jp/en
  • NR from NCBI www.ncbi.nlm.nih.gov
  • OMIM Online Mendelian Inheritance in Man.
    Genetic diseases http//www.ncbi.nlm.nih.gov/
    entrez/query.fcgi?dbOMIM

13
Other Databases
  • GeneBank (Nucleotide) www.ncbi.nlm.nih.gov/Ge
    nbank
  • NCBI Entrez, an integrated, text-based search and
    retrieval system used at NCBI for the major
    databases, including PubMed, Nucleotide and
    Protein Sequences, Protein Structures, Complete
    Genomes, Taxonomy, and others.
    http//www.ncbi.nlm.nih.gov/gquery/gquery.fcgi\
  • PDB Protein Databank. Protein Structures.
    www.rcsb.org/pdb
  • KEGG Kyoto Encyclopedia of Genes and Genomes
    http//www.genome.jp
  • PubMed Literature References
    http//www.ncbi.nlm.nih.gov/entrez/query.fcgi?dbP
    ubMed
  • NCBI primers bottom of page. This page has
    education resources. http//www.ncbi.nih.gov/
    Education

14
Genome sequencing and analysis (genomics)
Genomics generates a vast amount of DNA sequence
data. Sophisticated algorithms are used to
predict gene regions. Only 3 of the vertebrate
genome codes for proteins.
  • Genbank hold sequences from over 800 organisms.
    There are currently 113 complete genomes.
  • The completion of a "working draft" of the human
    genome was announced in June 2001.
  • Estimates of 38 - 120,000 genes (40, 000)

15
Explore the following databases
  • KEGG Kyoto Encyclopedia of Genes and Genomes
  • Main Map http//www.genome.jp/kegg/pathway/map/m
    ap01100.html
  • Main Page http//www.genome.jp/kegg/
  • Genbank
  • What is it? An annotated collection of all
    publicly available DNA sequences
  • www.ncbi.nlm.nih.gov/Genbank
  • PDB The Protein Databank
  • What is it?A database of 3D structures of
    proteins (and some DNA or other molecules)
  • www.rcsb.org/pdb

16
What is homology?
  • Homology common ancestry
  • Why homology
  • much easier to get a DNA or protein sequence than
    to experimentally determine structure or function
    of a biological molecule
  • Rapid expansion of databases of sequences (DNA,
    protein, RNA) far greater than structural or
    functional databases.
  • Develop computational methods to infer
    biologically relevant information from sequence
    alone.
  • E.g., If know that two protein sequences are
    homologous, then we can infer that the two
    proteins may share the same protein fold, active
    site, and even function.
  • In simple terms, much of todays class involves
    Darwinian evolution discussed at the microscopic
    genetic level.

17
Stochastic Evolutionary Forces Act on Genomes
  • Forces that alter a genetic sequence
  • Random Mutation
  • Natural Selection
  • Genetic Drift
  • Comparison of protein sequences can be used to
    infer evolutionary events that happened possibly
    billions of years ago.
  • If find a homologous protein sequence to a given
    protein, often from very divergent organisms,
    then a common ancestral protein must have
    existed.
  • Homologous protein share similar 3-D structure.
  • Homologous proteins can have seemingly very
    different sequences.

18
EvolutionaryTree
19
Modes of Evolution.
Globin Family Evolution
Orthologous differ because of
speciation Paralogous differ because of gene
duplication
20
Orthologous Sequences Cytochrome c Family
21
Sequence Alignment
Optimal alignments of human myoglobin and human
hemoglobin (alpha chain) Algorithms
Needleman-Wunsch, Smith-Waterman Heuristic
Algorithms Pairwise Alignment
BLAST (www.ncbi.nlm.nih.gov) Multiple
Sequence Alignment ClustalW
(www.ebi.ac.uk) PSI-BLAST
22
Structural Alignment
Comparison of dihydrofolate reductases from
Mycobacterium tuberculosis (1DF7) and Esherichia
coli (1DRE) with 41 sequence identity. After
the sequences of these two proteins were aligned,
the alpha-carbons of the backbone were
structurally aligned.
23
Amino Acid Similarity Matrix
A similarity matrix incorporates information
about the likelihood that one amino-acid will be
mutated into another over evolutionary time.
Shown here is the PAM250 matrix. Another common
one is BLOSUM50
24
Sequence AlignmentNeedleman-Wunsch
25
Global Alignment
Needleman-Wunsch Algorithm
26
Global Alignment Example
27
Local Alignment
Smith-Waterman Algorithm
28
Local Alignment Example
29
Multliple Sequence Alignment
30
Gene Doping
31
Gene Therapy
32
Gene Doped Mice
HEAVY WORKOUT. This rat, injected with a
muscle-enhancing gene, boosts its strength by
lugging weights up a ladder. http//www.sciencene
ws.org/articles/20041030/bob9.asp
Write a Comment
User Comments (0)
About PowerShow.com