Eric C' Rouchka, D'Sc' - PowerPoint PPT Presentation

1 / 92
About This Presentation
Title:

Eric C' Rouchka, D'Sc'

Description:

CECS 694-02 Introduction to Bioinformatics University ... Image Source: http://www.amazon.com ... http://smartmoney.com/consumer/index.cfm?story=working-june02 ... – PowerPoint PPT presentation

Number of Views:105
Avg rating:3.0/5.0
Slides: 93
Provided by: drer2
Category:
Tags: eric | rouchka

less

Transcript and Presenter's Notes

Title: Eric C' Rouchka, D'Sc'


1
  • Eric C. Rouchka, D.Sc.
  • Vogt Building, Room 205
  • (502) 852-0467
  • eric.rouchka_at_uofl.edu
  • http//kbrin.a-bldg.louisville.edu/CECS694/

2
Course Overview
  • Syllabus
  • Structure of Class
  • Expectations

3
Contact Information
  • INSTRUCTOR
  • Dr. Eric Rouchka
  • Phone 852-0467 or 852-3835
  • Email eric.rouchka_at_louisville.edu
  •  
  • OFFICE HOURS
  • Vogt Building
  • Room 205
  • T, Th 130 300 pm
  • or by appointment
  •  
  • http//kbrin.a-bldg.louisville.edu/CECS694/

4
Required Texts
  • Bioinformatics Sequence and Genome Analysis.
    David Mount. 2001. ISBN 9-87969-608-7.
  •  
  • Biological Sequence Analysis Probabilistic
    models of proteins and nucleic acids. R. Durbin,
    S. Eddy, A. Krogh and G. Mitchison. 1998. ISBN
    0-521-62971-3.
  •  
  • In addition, a number of journal articles will be
    handed out in class.

5
Required Texts

Image Source http//www.amazon.com/
6
Other Bioinformatics Books
Image Source http//www.amazon.com/
7
Other Reference Books
Image Source http//www.amazon.com/
8
Tentative Schedule of Topics
  • Overview of molecular biology 
  • Pairwise sequence alignment
  • Multiple sequence alignment
  • Sequence Databases
  • Database searching  
  • Construction of phylogenetic trees
  • RNA secondary structure prediction Microarray
    image analysis
  • Sequence assembly techniques
  • Gene Prediction
  • Protein Folding Prediction

9
Course Assignments
  • 4-5 written homework assignments, 3-4 programming
    assignments, a midterm test, and a final project,
    and bioinformatics seminars.
  •  
  • Homework assignments must be turned in at the
    beginning of class on the date they are due.
    Late homework assignments will not be accepted,
    since the solutions will be posted to the course
    website.
  •  
  • Programming assignments are due at the beginning
    of class on the date they are due. The programs
    may be written in the language of your choice.
    Late programming assignments will be accepted,
    with a 10 per day deduction for a maximum of two
    days.
  •  
  • Reading assignments from the two selected texts
    and journal articles will be assigned.
  •  

10
Grading
  • Programming Projects (3-4) 25 of final grade
  • Homework (4-5) 15 of final grade
  • Midterm Test 25 of final grade
  • Final Project 25 of final grade
  • One page seminar reports (3) 10 of final grade
  •  
  • Final grades will be given using a plus/minus
    scale. The cutoffs for grades will be roughly as
    follows
  •  
  • 90-100 A
  • 80-89 B
  • 70-79 C
  • 60-69 D
  • 0-59 F

11
Class Structure
  • Introduction of a Topic
  • Description of algorithms
  • Available tools
  • Make sure to ask questions!

12
What is Bioinformatics/ Computational Biology?
  • Bioinformatics collection and storage of
    biological information
  • Computational biology development of algorithms
    and statistical models to analyze biological data
  • Bioinformatics/Computational Biology will be
    interchanged

13
What is Bioinformatics?
Source http//ccb.wustl.edu/
14
Why should I care?
  • SmartMoney ranks Bioinformatics as 1 among next
    HotJobs
  • Business Week 50 Masters of Innovation
  • Jobs available, exciting research potential
  • Important information waiting to be decoded!

http//smartmoney.com/consumer/index.cfm?storywor
king-june02
15
Why is bioinformatics hot?
  • Supply/demand few people adequately trained in
    both biology and computer science
  • Genome sequencing, microarrays, etc lead to large
    amounts of data to be analyzed
  • Leads to important discoveries
  • Saves time and money

16
What skills are needed?
  • Well-grounded in one of the following areas
  • Computer science
  • Molecular biology
  • Statistics
  • Working knowledge and appreciation in the others!

17
Where Can I Learn More?
  • ISCB http//www.iscb.org/
  • NBCI http//ncbi.nlm.nih.gov/
  • http//www.bioinformatics.org/
  • Journals
  • Conferences (ISMB, RECOMB, PSB)

18
Overview of Molecular Biology
  • Cells
  • Chromosomes
  • DNA
  • RNA
  • Amino Acids
  • Proteins
  • Genome/Transcriptome/Proteome

19
Cells
  • Complex system enclosed in a membrane
  • Organisms are unicellular (bacteria, bakers
    yeast) or multicellular
  • Humans
  • 60 trillion cells
  • 320 cell types

Example Animal Cell www.ebi.ac.uk/microarray/
biology_intro.htm
20
Organisms
  • Classified into two types
  • Eukaryotes contain a membrane-bound nucleus and
    organelles (plants, animals, fungi,)
  • Prokaryotes lack a true membrane-bound nucleus
    and organelles (single-celled, includes bacteria)
  • Not all single celled organisms are prokaryotes!

21
Chromosomes
  • In eukaryotes, nucleus contains one or several
    double stranded DNA molecules organized as
    chromosomes
  • Humans
  • 22 Pairs of autosomes
  • 1 pair sex chromosomes

Human Karyotype http//avery.rutgers.edu/WSSP/Stu
dentScholars/ Session8/Session8.html
22
Image source www.biotec.or.th/Genome/whatGenome.h
tml
23
What is DNA?
  • DNA Deoxyribonucleic Acid
  • Single stranded molecule (oligomer,
    polynucleotide) chain of nucleotides
  • 4 different nucleotides
  • Adenosine (A)
  • Cytosine (C)
  • Guanine (G)
  • Thymine (T)

24
Nucleotide Bases
  • Purines (A and G)
  • Pyrimidines (C and T)
  • Difference is in base structure

Image Source www.ebi.ac.uk/microarray/
biology_intro.htm
25
DNA
  • Can be thought of as an alphabet with 4
    characters
  • 4 letter alphabet with sufficiently long words
    contains information to create complex organisms
  • Not unlike a computer with a small alphabet

26
DNA polynucleotides(oligomers)
  • Different nucleotides are strung together to form
    polynucleotides
  • Ends of the polynucleotide are different
  • A directionality is present
  • Convention is to label the coding strand from 5
    to 3

http//www.emc.maricopa.edu/faculty/farabee/BIOBK/
BioBookDNAMOLGEN.html
27
Single Strand Polynucleotide
  • Example polynucleotide
  • 5 G?T?A?A?A?G?T?C?C?C?G?T?T?A?G?C 3

28
Double Stranded DNA
  • DNA can be single-stranded or double-stranded
  • Double stranded DNA second strand is the
    reverse complement strand
  • Reverse complement runs in opposite direction and
    bases are complementary
  • Complementary bases
  • A, T
  • C, G

29
Double Stranded Sequence
  • Example double stranded polynucleotide
  • 5 G?T?A?A?A?G?T?C?C?C?G?T?T?A?G?C 3
  • 3 C?A?T?T?T?C?A?G?G?G?C?A?A?T?C?G 5

http//www.emc.maricopa.edu/faculty/farabee/BIOBK/
BioBookDNAMOLGEN.html
30
Double Stranded DNA
Source unknown
31
Double Helix
  • Two complementary DNA strands form a stable DNA
    double helix
  • Spring 2003 marked the 50th anniversary of its
    discovery

Image source www.ebi.ac.uk/microarray/
biology_intro.htm
32
RNA
  • Ribonucleic Acid
  • Similar to DNA
  • Thymine (T) is replaced by uracil (U)
  • RNA can be
  • Single stranded
  • Double stranded
  • Hybridized with DNA

33
RNA
  • RNA is generally single stranded
  • Forms secondary or tertiary structures
  • RNA folding will be discussed later
  • Important in a variety of ways, including protein
    synthesis

34
RNA secondary structure
  • E. coli Rnase P RNA secondary structure

Image source www.mbio.ncsu.edu/JWB/MB409/lecture/
lecture05/lecture05.htm
35
mRNA
  • Messenger RNA
  • Linear molecule encoding genetic information
    copied from DNA molecules
  • Transcription process in which DNA is copied
    into an RNA molecule

36
mRNA processing
  • Eukaryotic genes can be pieced together
  • Exons coding regions
  • Introns non-coding regions
  • mRNA processing removes introns, splices exons
    together
  • Processed mRNA can be translated into a protein
    sequence

37
mRNA Processing
Image source http//departments.oxy.edu/biology/S
tillman/bi221/111300/processing_of_hnrnas.htm
38
ESTs
  • Expressed Sequence Tags
  • Basically sequence of processed mRNA

39
tRNA
  • Transfer RNA
  • Well-defined three-dimensional structure
  • Critical for creation of proteins

40
tRNA structure
Source http//www.tulane.edu/biochem/nolan/lectu
res/rna/frames/trnabtx2.htm
41
tRNA
  • Amino acid attached to each tRNA
  • Determined by 3 base anticodon sequence
    (complementary to mRNA)
  • Translation process in which the nucleotide
    sequence of the processed mRNA is used in order
    to join amino acids together into a protein with
    the help of ribosomes and tRNA

42
Genetic Code
  • 4 possible bases (A, C, G, U)
  • 3 bases in the codon
  • 4 4 4 64 possible codon sequences
  • Start codon AUG
  • Stop codons UAA, UAG, UGA
  • 61 codons to code for amino acids (AUG as well)
  • 20 amino acids redundancy in genetic code

43
20 Amino Acids
  • Glycine (G, GLY)
  • Alanine (A, ALA)
  • Valine (V, VAL)
  • Leucine (L, LEU)
  • Isoleucine (I, ILE)
  • Phenylalanine (F, PHE)
  • Proline (P, PRO)
  • Serine (S, SER)
  • Threonine (T, THR)
  • Cysteine (C, CYS)
  • Methionine (M, MET)
  • Tryptophan (W, TRP)
  • Tyrosine (T, TYR)
  • Asparagine (N, ASN)
  • Glutamine (Q, GLN)
  • Aspartic acid (D, ASP)
  • Glutamic Acid (E, GLU)
  • Lysine (K, LYS)
  • Arginine (R, ARG)

44
Amino Acids
  • building blocks for proteins (20 different)
  • vary by side chain groups
  • Hydrophilic amino acids are water soluable
  • Hydrophobic are not
  • Linked via a single chemical bond (peptide bond)
  • Peptide Short linear chain of amino acids (lt 30)
    polypeptide long chain of amino acids (which
    can be upwards of 4000 residues long).

45
Proteins
  • Polypeptides having a three dimensional
    structure.
  •  
  • Primarysequence of amino acids constituting the
    polypeptide chain
  • Secondarylocal organization into secondary
    structures such as ? helices and ? sheets
  • Tertiary three dimensional arrangements of the
    amino acids as they react to one another due to
    the polarity and resulting interactions between
    their side chains
  • Quaternarynumber and relative positions of the
    protein subunits

46
Protein Structure
Image source www.ebi.ac.uk/microarray/biology_int
ro.html
47
Central Dogma
  • DNA
  • ?
  • RNA
  • ?
  • PROTEIN

Image source unknown
48
Central Dogma
49
What is a Gene?
  • the physical and functional unit of heredity that
    carries information from one generation to the
    next
  • DNA sequence necessary for the synthesis of a
    functional protein or RNA molecule

50
Genome
  • chromosomal DNA of an organism
  • number of chromosomes and genome size varies
    quite significantly from one organism to another
  • Genome size and number of genes does not
    necessarily determine organism complexity

51
Genome Comparison
52
Transcriptome
  • complete collection of all possible mRNAs
    (including splice variants) of an organism.
  • regions of an organisms genome that get
    transcribed into messenger RNA.
  • transcriptome can be extended to include all
    transcribed elements, including non-coding RNAs
    used for structural and regulatory purposes.

53
Proteome
  • the complete collection of proteins that can be
    produced by an organism.
  • can be studied either as static (sum of all
    proteins possible) or dynamic (all proteins found
    at a specific time point) entity

54
Brief History of Sequencing
  • Discovery of Complementary Bases
  • Erwin Chargaff, 1950
  • Discovery of DNA Double Helix
  • 1953 only 50 years ago
  • James Watson
  • Francis Crick
  • Rosland Franklin

Image www.simr.org.uk/pages/biotechnology/
biotechnology_2.html
55
History Of Genetic Code
  • Genetic Code Completely uncovered (1965)
  • Marshall Nierenberg

56
Genetic Code
  • 4 possible bases (A, C, G, U)
  • 4 4 4 64 possible codon sequences
  • Start codon AUG
  • Stop codons UAA, UAG, UGA
  • 61 codons to code for amino acids (AUG as well)
  • 20 amino acids redundancy in genetic code

57
Brief History of Sequencing
  • First Protein Sequence
  • 1955 Bovine Insulin (Fred Sanger)
  • First DNA Sequence
  • 1965 yeast alanine tRNA (77 bases)
  • Development of DNA sequencing
  • Maxam-Gilbert and Sanger Methods (1977)

58
Sanger Sequencing Method
  • (Quicktime Movie)
  • SOURCE Molecular Cell Biology

59
Improving Sangers Method
  • Dideoxynucleosides fluorescently labeled (1986)
  • Reaction cut by ¼
  • Sequencing Automated by machine (1986)
  • Laser detects fluorescence

60
Image Source plantbio.berkeley.edu/
bruns/tour3.html
61
(No Transcript)
62
Genetic Mapping
  • Sex-linked genes studied since early 1900s
  • Gene mapping takes off in late 1970s
  • David Botstein (RFLPs 1978)
  • 1979 579 Genes Mapped
  • 2003 30,000 Genes Mapped
  • Mapping of Huntingtons Disease (First Diseased
    Gene)
  • Triplet Repeat
  • 1983
  • Nancy Wexler

63
Mapping of Markers
  • Sequence Tagged Sites (STS)
  • Sequences occurring only once in the human genome
  • Help to map locations
  • 52,000 STS in Humans
  • 1 every 62,000 bases

64
Cloning Techniques
  • Plasmid Cloning Introduced (1973)
  • Region of Interest duplicated by inclusion
  • YAC Chromosomes described (1987)
  • BACs introduced (1992)
  • 30,000 to 100,000 bases can be cloned

65
Hierarchical (Clone-based) Approach
  • Know location of 30,000 100,000 bp region
  • Break into 500-700 bp fragments
  • Sequence Fragments
  • Assemble based on similarity
  • 8-10x coverage
  • Current Price 0.09 / base

66
Hierarchical (clone-based) approach
  • generate overlapping set of clones
  • select a minimum tiling path
  • shotgun sequence each clone

67
Hierarchical (clone-based) approach
  • MINUS
  • map generation requires resources, time and money
  • Some regions not cloned
  • PLUS
  • easier to assemble smaller pieces
  • less chance for assembly error

68
Shotgun Sequencing Approach
  • Developed 1991 TIGR
  • Craig Venter, Hamilton Smith
  • Break genome into millions of pieces
  • Sequence each piece
  • Reassemble into full genomes

69
Whole Genome Shotgun Approach
  • reads generated directly from a whole-genome
    library
  • assemble the genome all at once

70
Whole Genome Shotgun Approach
  • MINUS
  • more prone to assembly error
  • computationally intensive
  • cannot effectively handle repeats
  • PLUS
  • Less overhead time up front

71
Base calling and Assembly Software
  • PHRED and PHRAP Developed (1988)
  • PHRED Base calling software
  • PHRAP Assists in assembly of sequenced data

72
Available Assemblers
  • SEQAID (Peltola et al., 1984)
  • CAP (Huang, 1992)
  • PHRAP (Green, 1994)
  • TIGR Assembler (Sutton et al., 1995)
  • AMASS (Kim et al., 1999)
  • CAP3 (Huang and Madan, 1999)
  • Celera Assembler (Myers et al., 2000)
  • EULER (Pevzner et al., 2001)
  • ARACHNE (Batzoglou et al., 2002)

73
History of Genome Projects
  • First Genome Sequence
  • FX174 Phage 5,386 bp 9 proteins (1980)
  • Haemophilus Influenzea Sequenced
  • First non-viral genome (1.8 MB) (1995)

74
History of Genome Projects
  • Saccharomyces cereviseae sequenced
  • First eukaryotic genome (12.1 MB) (1996)
  • Caenorhabditis elegans sequence released
  • First animal genome 200 MB (1998)

75
History of Genome Projects
  • Arabidopsis thaliana sequence released
  • First publicly available plant genome (1999)
  • Rough Draft of Human Genome Reported (2001)
  • Finished 2003

76
Human Genome Project
  • Began in 1990 (US DOE 15 years)
  • Identify all genes in human DNA
  • Determine sequence of human genome
  • Develop faster sequencing technologies
  • Develop tools for data analysis
  • ELSI

77
Microbial Genomes
  • 122 Complete Genomes in CMR
  • http//www.tigr.org/tigr-scripts/CMR2/CMR_Content.
    spl

78
Genomes
  • Fruit Fly
  • Mouse
  • Rat
  • Rice
  • Zebra fish
  • Puffer fish
  • Chicken
  • Dog
  • Frog

79
Growth of GenBank
  • 1982 600,000 Bases
  • 2002 28.5 Billion Bases

Image source www.ncbi.nlm.nih.gov
80
Other Notables
  • Dayhoff ATLAS Database of Proteins (1960s)
  • Sequence Comparison Algorithms
  • 1970, Needleman-Wunch (global alignment)
  • Protein Databank
  • Brookhaven PDB (1973)

81
Other Notables
  • NMR for protein structure identification (1980)
  • IntelliGenetics Founded
  • DNA and Protein sequence analysis (1980)

82
Other Notables
  • Smith-Waterman algorithm
  • Local sequence alignment (1981)
  • GenBank Database created (1982)
  • Genetics Computer Group Founded
  • GCG suite (1982)
  • PCR First Described (1985)

83
Other Notables
  • FASTP Algorithm
  • Protein database searching (1985)
  • SWISS-PROT
  • Protein Database (1986)

84
Other Notables
  • PERL Programming Language
  • Allows for sequence manipulation (1987)
  • NCBI Established (1988)
  • Human Genome Initiative (1988)

85
Other Notables
  • FASTA Program released (1988)
  • DNA and Protein sequence database searches
  • BLAST Program released (1990)
  • Allows for quick database searches
  • Informax Founded (1990)
  • Human Genome Project Begins (1990)

86
Other Notables
  • Creation and Use of ESTs Described (1991)
  • Incyte Pharmaceuticals Founded (1991)
  • TIGR Established (1992)
  • Shotgun sequencing methods

87
Other Notables
  • Affymetrix founded (1993)
  • PRINTS protein motif database (1994)

88
Other Notables
  • First Commercial Microarray chips produced (1996)
  • Dolly Cloned (1997)
  • Capillary Sequencing machines introduced (1997)

89
Other Notables
  • Celera Genomics Formed (1998)

90
More Detailed Histories
  • http//www.netsci.org/Science/Bioinform/feature06.
    html
  • http//www.dhgp.de/intro/history/history.html

91
Microarrays
  • Microarray
  • New Technology (less than 10 years old)
  • Allows study of thousands of genes at same time
  • Study genes under different conditions
  • Glass slide of DNA molecules
  • Molecule string of bases
  • uniquely identifies gene or unit to be studied

92
Microarray Image Analysis
  • Microarrays detect gene interactions 4 colors
  • Green high control
  • Red High sample
  • Yellow Equal
  • Black None
  • Problem is to quantify image signals
Write a Comment
User Comments (0)
About PowerShow.com