CS 177 Introduction to Bioinformatics Fall 2005 - PowerPoint PPT Presentation

1 / 55
About This Presentation
Title:

CS 177 Introduction to Bioinformatics Fall 2005

Description:

The student will acquire a background knowledge of biological ... 1955 Sanger announces the sequence of the first protein to be analyzed, bovine insulin. ... – PowerPoint PPT presentation

Number of Views:46
Avg rating:3.0/5.0
Slides: 56
Provided by: mad81
Category:

less

Transcript and Presenter's Notes

Title: CS 177 Introduction to Bioinformatics Fall 2005


1
CS 177 Introduction to BioinformaticsFall 2005
  • Instructor Anna Panchenko (hcnap2003_at_yahoo.com)
  • Instructor Tom Madej (tom_ncbi_at_yahoo.com)

2
  • Lecture 1 Introduction
  • Instructors
  • Course goals
  • Grading policy
  • Motivating problem
  • Course overview
  • Molecular basis of cellular processes
  • Historical timeline

3
Course Goals
  • The student will be introduced to the fundamental
    problems and methods of bioinformatics.
  • The student will become thoroughly familiar with
    on-line public bioinformatics databases and their
    available software tools.
  • The student will acquire a background knowledge
    of biological systems so as to be able to
    interpret the results of database searches, etc.
  • The student will also acquire a general
    understanding of how important bioinformatics
    algorithms/software tools work, and how the
    databases are organized.

4
Grading Policy
  • Homework 50, weekly assignments
  • Mid-term exam 20
  • Final exam 30

All examinations, papers, and other graded work
products and assignments are to be completed in
conformance with The George Washington
University Code of Academic Integrity.
5
Important!
  • Please get computer accounts for Tompkins 405 by
    filling out a form in the TA room on the 4th
    floor of Tompkins.
  • Office hours AP available before class, TM
    available after class. If you want to see AP or
    TM before class, please ask in advance.
  • We will also accept questions by email, although
    we may not be able to reply immediately.

6
Homework
  • Homework assignments are due by the start of the
    next class period (330 pm Monday).
  • For an assignment turned in up to one week late
    20 penalty.
  • Homework more than one week late No credit!
  • Assignments/exams are to be done individually, no
    copying of assignments is allowed!

7
(No Transcript)
8
NCBI Books
  • NCBI home page http//www.ncbi.nlm.nih.gov
  • Follow the Books link.
  • 45 books available (currently).
  • Many specialty topics.
  • Also useful general references.
  • Searchable!
  • Exercise search the books with phylogenetic
    tree.

9
What is Bioinformatics?
  • A merger of biology, computer science, and
    information technology.
  • Enables the discovery of new biological insights
    and unifying principles.
  • Born from necessity, because of the massive
    amount of information required to describe
    biological organisms and processes.

10
Severe Acute Respiratory Syndrome (SARS)
  • SARS is a respiratory illness caused by a
    previously unrecognized coronavirus first
    appeared in Southern China in Nov. 2002.
  • Between Nov. 2002 and July 2003, there were 8,098
    cases worldwide and 774 fatalities (WHO).
  • The global outbreak was over by late July 2003.
    A few new cases have arisen sporadically since
    then in China.
  • There is currently no vaccine or cure available.

11
(No Transcript)
12
(No Transcript)
13
(No Transcript)
14
Fig. 2 from Rota et al.
15
Phylogenetic analysis of coronavirus proteins
Fig. 2 from Rota et al.
16
(No Transcript)
17
Conserved motifs in coronavirus S proteins.
Fig. 2 from Rota et al.
18
  • Exercise!
  • Look up the SARS genome on the NCBI website
    www.ncbi.nlm.nih.gov
  • Notice that you get 2 hits on the Genome
    database!

19
The (ever expanding) Entrez System
20
NCBI Databases
  • Databases are indexed for quick and efficient
    searching.
  • Databases are cross-linked to each other.

21
Exercise!
  • Search the Entrez Protein database with the
    keyword interleukin.
  • Follow the link, then look at the different
    report formats.
  • Also try a search of Protein with interleukin
    AND human orgn.

22
Course Overview
  • Lecture 1 Introduction
  • Instructors
  • Grading policy
  • Motivating problem
  • Course overview
  • Molecular basis of cellular processes
  • Historical timeline

23
  • Lecture 2 General principles of DNA/RNA
    structure and stability
  • Physico-chemical properties of nucleic acids
  • RNA folding and structure prediction
  • Gene identification
  • Genome analysis
  • Lecture 3 General principles of protein
    structure and stability
  • Physico-chemical properties of proteins
  • Prediction of protein secondary structure
  • Protein domains and prediction of domain
    boundaries
  • Protein structure-function relationships

24
  • Lecture 4 Sequence alignment algorithms
  • The alignment problem
  • Pairwise sequence alignment algorithms
  • Multiple sequence alignment algorithms
  • Sequence profiles and profile alignment methods
  • Alignment statistics

25
  • Lecture 5 Computational aspects of protein
    structure, part I
  • Protein folding problem
  • Problem of protein structure prediction
  • Homology modeling
  • Protein design
  • Prediction of functionally important sites
  • Lecture 6 Computational aspects of protein
    structure, part II
  • Structure-structure alignment algorithms
  • Significance of structure-structure similarity
  • Protein structure classification

26
  • Lecture 7 Bioinformatics databases
  • Sequence and sequence alignment formats, data
    exchange
  • Public sequence databases
  • Sequence retrieval and examples
  • Public protein structure databases
  • Lab exercises
  • Lecture 8 Bioinformatics database search tools
  • Sequence database search tools
  • Structure database search tools
  • Assessment of results, ROC analysis
  • Lab exercises

27
  • Lecture 9 Phylogenetic analysis, part I
  • Molecular basis of evolution
  • Taxonomy and phylogenetics
  • Phylogenetic trees and phylogenetic inference
  • Software tools for phylogenetic analysis
  • Lecture 10 Phylogenetic analysis, part II
  • Accuracies and statistical tests of phylogenetic
    trees
  • Genome comparisons
  • Protein structure evolution

28
  • Lecture 11 Experimental techniques for
    macromolecular analysis
  • Sequencing, PCR
  • Protein crystallography
  • Mass spectroscopy
  • Microarrays
  • RNA interference

29
  • Lecture 12 Systems biology
  • Genomic circuits
  • Modeling complex integrated circuits
  • Protein-protein interaction
  • Metabolic networks

Lecture 13 Review
30
Molecular Biology Background
  • Cells general structure/organization
  • Molecules that make up cells
  • Cellular processes what makes the cell alive

31
Two Cell Organizations
  • Prokaryotes lack nucleus, simpler internal
    structure, generally quite smaller
  • Eukaryotes with nucleus (containing DNA) and
    various organelles

32
(No Transcript)
33
(No Transcript)
34
Selected organelles
  • Nucleus contains chromosomes/DNA
  • Mitochondria generate energy for the cell,
    contains mitochrondrial DNA
  • Ribosomes where translation from mRNA to
    proteins take place (protein synthesis machinery)
  • Lysosomes where protein degradation takes place

35
Cells can become specialized
36
Three domains of life
  • Prokarya
  • Bacteria
  • Archaea
  • Eukarya
  • Eukaryotes

37
Universal phylogenetic tree.
Fig. 1 from N.R. Pace, Science 276 (1997)
734-740.
38
Molecules in the cell
  • Proteins catalyze reactions, form structures,
    control membrane permeability, cell signaling,
    recognize/bind other molecules, control gene
    function
  • Nucleic acids DNA and RNA encode information
    about proteins
  • Lipids make up biomembranes
  • Carbohydrates energy sources, energy storage,
    constituents of nucleic acids and surface
    membranes
  • Other small molecules e.g. ATP, water, ions,
    etc.

39
  • Exercise!
  • Retrieve a protein structure from the SARS
    coronavirus from the NCBI website you can use
    www.ncbi.nlm.nih.gov/Structure/
  • Look at the structure for the SARS protease
    using Cn3D.

40
The Central Dogma of Molecular Biology
41
(No Transcript)
42
(No Transcript)
43
Timeline
  • 1859 Darwin publishes On the Origin of Species
  • 1865 Mendels experiments with peas show that
    hereditary traits are passed on to offspring in
    discrete units.
  • 1869 Meischer isolates DNA.
  • 1895 Rontgen discovers X-rays.
  • 1902 Sutton proposes the chromosome theory of
    heredity.

44
Timeline (cont.)
  • 1911 Morgan and co-workers establish the
    chromosome theory of heredity, working with fruit
    flies.
  • 1943 Astbury observes the first X-ray pattern of
    DNA.
  • 1944 Avery, MacLeod, and McCarty show that DNA
    transmits heritable traits (not proteins!).
  • 1951 Pauling and Corey predict the structure of
    the alpha-helix and beta-sheet.

45
Timeline (cont.)
  • 1953 Watson and Crick propose the double helix
    model for DNA based on X-ray data from Franklin
    and Wilkins.
  • 1955 Sanger announces the sequence of the first
    protein to be analyzed, bovine insulin.
  • 1955 Kornberg and co-workers isolate the enzyme
    DNA polymerase (used for copying DNA, e.g. in
    PCR).
  • 1958 The first integrated circuit is constructed
    by Kilby at Texas Instruments.

46
Timeline (cont.)
  • 1960 Perutz and Kendrew obtain the first X-ray
    structures of proteins (hemoglobin and
    myoglobin).
  • 1961 Brenner, Jacob, and Meselson discover that
    mRNA transmits the information from the DNA in
    the nucleus to the cytoplasm.
  • 1965 Dayhoff starts the Atlas of Protein Sequence
    and Structure.
  • 1966 Nirenberg, Khorana, Ochoa and colleagues
    crack the genetic code!
  • 1970 The Needleman-Wunsch algorithm for sequence
    comparison is published.

47
Timeline (cont.)
  • 1972 Dayhoff develops the Protein Sequence
    Database (PSD).
  • 1972 Berg and colleagues create the first
    recombinant DNA molecule.
  • 1973 Cohen invents DNA cloning.
  • 1975 Sanger and others (Maxam, Gilbert) invent
    rapid DNA sequencing methods.

48
Timeline (cont.)
  • 1980 The first complete gene sequence for an
    organism (Bacteriophage FX174) is published. The
    genome consists of 5,386 bases coding 9 proteins.
  • 1981 The Smith-Waterman algorithm for sequence
    alignment is published.
  • 1981 IBM introduces its Personal Computer to the
    market.
  • 1982 The GenBank sequence database is created at
    Los Alamos National Laboratory.

49
Timeline (cont.)
  • 1983 Mullis and co-workers describe the PCR
    reaction.
  • 1985 The FASTP algorithm is published by Lipman
    and Pearson.
  • 1986 The SWISS-PROT database is created.
  • 1986 The Human Genome Initiative is announced by
    DOE.
  • 1988 The National Center for Biotechnology
    Information (NCBI) is established at the National
    Library of Medicine in Bethesda.

50
Timeline (cont.)
  • 1992 Human Genome Systems, in Gaithersburg, MD,
    is founded by Haseltine.
  • 1992 The Institute for Genomic Research (TIGR) is
    established by Venter in Rockville, MD.
  • 1995 The Haemophilus influenzea genome is
    sequenced (1.8 Mb).
  • 1996 Affymetrix produces the first commercial DNA
    chips.

51
Timeline (cont.)
  • 1988 The FASTA algorithm for sequence comparison
    is published by Pearson and Lipman.
  • 1990 Official launch of the Human Genome Project.
  • 1990 The BLAST program by Altschul et al., is
    published.
  • 1991 The CERN research institute in Geneva
    announces the creation of the protocols which
    make up the World Wide Web.

52
Timeline (cont.)
  • 1996 The yeast genome is sequenced the first
    complete eukaryotic genome.
  • 1996 Human DNA sequencing begins.
  • 1997 The E. coli genome is sequenced (4.6 Mb,
    approx. 4k genes).
  • 1998 The C. elegans genome is sequenced (97 Mb,
    approx. 20k genes) the first genome of a
    multicellular organism.

53
Timeline (cont.)
  • 1998 Venter founds Celera in Rockville, MD.
  • 1998 The Swiss Institute of Bioinformatics is
    established in Geneva.
  • 1999 The HGP completes the first human chromosome
    (no. 22).
  • 2000 The Drosophila genome is completed.

54
Timeline (cont.)
  • 2000 Human chromosome no. 21 is completed.
  • 2001 A draft of the entire human genome (3,000
    Mb) is published.
  • 2003 The Human Genome is completed! Approx.
    30,000 genes (estimated).

55
(No Transcript)
Write a Comment
User Comments (0)
About PowerShow.com