Genomics - PowerPoint PPT Presentation

1 / 80
About This Presentation
Title:

Genomics

Description:

Genomics. The Human Genome Project. Mapping and Sequencing the Genomes of Model Organisms ... Lander and Waterman equation. poisson distribution. Po = e-m ... – PowerPoint PPT presentation

Number of Views:221
Avg rating:3.0/5.0
Slides: 81
Provided by: ElliotLe6
Category:
Tags: genomics | lander

less

Transcript and Presenter's Notes

Title: Genomics


1
Genomics
2
The Human Genome Project
  • Mapping and Sequencing the Genomes of Model
    Organisms
  • Data Collection and Distribution
  • Ethical, Legal, and Social Considerations
  • Research Training
  • Technology Development
  • Technology Transfer

3
A Few Genome Resources
  • NCBI Genome Resources
  • UCSC Human Genome Browser
  • Ensembl Human Genome Server

4
Genome Sequencing Progress
  • NCBI Genome Sequence Repository
  • All organisms
  • Eukaryotic genomes
  • Prokaryotic genomes
  • Archaea genomes
  • Viruses

5
Genome Sequencing
From NCBI, 5/2001
6
Human Genome Sequencing 2/11/2001
From NCBI
7
Human Genome Progress 2/11/2001
From NCBI
8
Microbial Genomes
  • Published complete microbial genomes
  • Microbial genomes and chromosomes in progress

9
Genome Informatics
  • Annotation and Analysis
  • Data Handling
  • Metabolic Reconstruction
  • Comparative Genomics
  • Functional Genomics

10
Genome Project Organization
  • Cloning
  • Mapping
  • Sequencing
  • Annotation
  • Analysis

11
Cloning and Mapping
12
Cloning
  • Large
  • YACs
  • 1 Mb
  • BACs
  • 100 - 200 Kb
  • Intermediate
  • Cosmids
  • Lambda clones
  • Small
  • Plasmids M13

13
Mapping
  • Establishment of Guideposts
  • Aids in Assembly
  • Error Checking
  • Useful in mapping of genetic disorders

14
Genetic Maps
  • Cytogenetic markers
  • Linkage maps
  • Polymorphic loci screened by PCR to determine
    inheritence patterns
  • Produce linkage map with nearby loci

15
Physical Maps
  • Radiation Hybrid/YACs/Cosmids
  • Restriction Sites
  • Sequence Tagged Sites
  • 100 Kb resolution needed
  • 30,000 STSs
  • Expressed Sequence Tags
  • Detection
  • PCR
  • Hybridization
  • FISH
  • Fluoresecent in situ Hybridization

16
Human Genome STS Mapping Strategy
  • STS Content Mapping
  • Screen YACs by PCR
  • Radiation Hybrid Mapping
  • Screen RH Cell lines by PCR
  • Genetic Mapping
  • PCR Screening of polymorphic loci
  • Combine above to produce an integrated map

17
Mapping Resolution
  • YAC mapping
  • 1 Mb
  • Radiation hybrid mapping
  • 10 Mb
  • Genetic map
  • 30 Mb

18
GeneMap98
  • Integrated Human Genetic Map
  • Over 30,000 unique gene-based markers
  • 100 Kb resolution
  • http//www.ncbi.nlm.nih.gov/genemap98/

19
Map Integration
20
Human Chromosome 1 Genetic Map
21
Human Chromosome 1 Combination Map
22
Sequencing
23
Sequencing Methods
  • Random Shotgun
  • Ordered Shotgun
  • Directed
  • Primer Walking
  • Direct genomic sequencing

24
Random Shotgun Sequencing
  • Randomly shear or cut DNA into small pieces
  • 2-4 Kb
  • Clone into M13, pUC or some other sequencing
    vector
  • Sequence the clones from both ends
  • Rely on the computer to assemble the sequences
    into one (or as few as possible) contigs

25
Shotgun Sequencing Statistics
  • Lander and Waterman equation
  • poisson distribution
  • Po e-m
  • probability that a base is not sequenced where
    msequence coverage

26
H. influenza Sequencing
  • For 1X random sequence coverage 1.8 Mb
  • P 0.37 (63 of the bases are sequenced)
  • To get gt 99 of the bases sequenced
  • 5X coverage 8.74 Mb of sequence
  • Po e-5 0.0067
  • This coverage would leave approx. 128 gaps of
    about 100 bp in size
  • From Science 269496-512. 1995

27
Ordered Sequencing
  • Generate a set of large sequence clones in lambda
    phage
  • May be subcloned from YACs or BACs as necessary
  • End sequence the lambda clones and order the
    clones to produce a map of the genome
  • Choose a minimal tiling path of the genome from
    the ordered lambda clones

28
Ordered Sequencing...
  • Shear and subclone the lambda inserts that
    comprise the minimal tiling set into sequencing
    vectors
  • Shotgun sequence and assemble each of these
    lambda inserts individually
  • Assemble all sequences into one, contiguous
    genome

29
Directed Sequencing
  • Process used for finishing following the shotgun
    sequencing phase
  • Gap closure
  • Use specific sequencing primers to extend
    appropriate clones into gap regions
  • Use specific sequencing primers to sequence
    directly from genomic DNA

30
Sequence Assembly
31
Assembly of Shotgun Fragments
  • For H. influenzae (TIGR) 1.8 Mb
  • 24,304 Sequence fragments were generated for the
    random assembly phase
  • 11,631,485 bases
  • Generated 140 contigs
  • Assembled using the TIGR Assembler
  • 30 hours of cpu time

32
phred/phrap/consed
  • Widely used programs for sequence
  • base calling (phred)
  • assembly (phrap)
  • editing (consed)
  • Developed at the University of Washington
  • Phil Green (phrap)
  • Brent Ewing (phred)
  • David Gordon (consed)

33
Genome Annotation and Analysis
  • Pattern Matching

34
Sequence Annotation
  • ORF identification
  • Frameshift resolution
  • Genome map construction
  • Functional assignments
  • Metabolic pathway assignment
  • Metabolic pathway Reconstruction
  • Comparative analysis

35
(No Transcript)
36
Annotation Tools
  • Semi-automated
  • Manual

37
MAGPIE
  • Multipurpose Automated Genome Project
    Investigation Environment
  • Terry Gaasterland et. al.
  • http//genomes.rockefeller.edu/magpie/magpie.htmlA
    utomated
  • Semi-automated analysis tool for microbial genome
    projects

38
MAGPIE Example
39
Non-Automated Analysis and Prediction
  • The Ureaplasma urealyticum genome database
  • Run analysis tool
  • Parse results
  • Dump results into the database
  • View results
  • Manually annotate

40
Genomic Sequence Database
  • Data Storage
  • Sequence
  • Gene Map
  • Annotation
  • User Interface
  • Web browser
  • Customizable

41
The Ureaplasma urealyticum Genome Project
  • Uu - 751,719 bp
  • http//genome.microbio.uab.edu/uu/uugen.htm
  • Web-based genome analysis tool

42
(No Transcript)
43
(No Transcript)
44
(No Transcript)
45
(No Transcript)
46
Annotation Problems
  • Problems with existing sequence databases
  • Incomplete datasets
  • Skewed datasets
  • Incorrectly annotated records
  • Annotations based on experimental vs. predicted
    data
  • Nomenclature differences
  • Transitive errors in gene function predictions
  • Functional predictions for hypothetical genes

47
Metabolic Pathway Reconstruction
48
Metabolic Pathway Reconstruction
  • Role assignment
  • Extract metabolic pathways from genomes
  • Navigation and analysis
  • Pathway editing

49
Metabolic Assignments
  • Amino acid Biosynthesis
  • Biosynthesis of cofactors, prosthetic groups, and
    carriers
  • Cell envelope
  • Cellular processes
  • Central intermediary metabolism
  • Energy metabolism
  • Fatty acid and phospholipid metabolism
  • Purines, pyrimidines, nucleosides, and
    nucleotides
  • Regulatory functions
  • Replication
  • Transcription
  • Translation
  • Transport and binding proteins
  • Other categories, Unassigned
  • Hypothetical

50
Ureaplasma urealyticum Gene Map
1
50,000
100,000
50,001
100,001
150,000
150,001
200,000
200,001
250,000
250,001
300,000
300,001
350,000
350,001
400,000
400,001
450,000
450,001
500,000
500,001
550,000
550,001
600,000
600,001
650,000
650,001
700,000
700,001
750,000
750,001
751,719
Other
Cofactor Biosynthesis
Energy Metabolism
Replication
Cell envelope
Fatty Acid Metabolism
Transcription
RNA
Cellular processes
Hypothetical
Translation
Central Intermediary Metabolism
Nucleotide Metabolism
Transport
tRNA
51
Uu Genes
Mg Genes
Role
Amino acid Biosynthesis
1
0.2
0
0.0
Biosynthesis of cofactors
10
1.7
7
1.5
Cell envelope
19
3.1
26
5.4
Cellular processes
13
2.1
15
3.1
Central intermediary metabolism
15
2.5
7
1.5
Energy metabolism
23
3.8
30
6.3
Fatty acid - phospholipids
6
1.0
7
1.5
Hypothetical
293
48.3
169
35.3
Other categories
1
0.2
3
0.6
Purines, pyrimidines
18
3.0
20
4.2
Regulatory functions
4
0.7
4
0.8
Replication
45
7.4
31
6.5
Transcription
17
2.8
19
4.0
Translation
100
16.5
99
20.7
Transport and binding proteins
37
6.1
35
7.3
Unassigned
4
0.7
7
1.5
Total
606
100.0
479
100.0
52
EcoCyc
  • Peter D. Karp, PhD
  • SRI International
  • Menlo Park, CA
  • http//ecocyc.pangeasystems.com/ecocyc/ecocyc.html

53
Pathway Reconstruction
Cell
Annotated Genome
Adapted from P. Karp, Pangea Systems
54
(No Transcript)
55
(No Transcript)
56
(No Transcript)
57
(No Transcript)
58
(No Transcript)
59
(No Transcript)
60
(No Transcript)
61
Glycolysis in Uu?
glucose-1-phosphate
?
phosphoglucomutase
glucose-6-phosphate
phosphoglucose isomerase
fructose-6-phosphate
6-phosphofructokinase
fructose-1,6-bisphosphate
fructose bisphosphate aldolase
glyceraldehyde-3-phosphate
glyceraldehyde 3-phosphatedehydrogenase1.2.1.12
glyceraldehyde-3-phosphatedehydrogenase 1.2.1.9
3-phospho-D-glyceroyl-phosphate
phosphoglycerate kinase
pyruvate
3-phosphoglycerate
62
Uu Energy Metabolism
  • Glycolysis
  • Missing several components
  • Pentose-phosphate pathway
  • Only 2/8 enzyme complexes present
  • Proton motive force - ATP synthase complex
  • Urease Gene Complex
  • Biologically relevant

63
Comparative Genomics
  • What makes one organism different from all other
    organisms?
  • Molecular Biology
  • Physiology
  • Pathogenesis
  • Epidemiology
  • Genetics

64
Ortholog Comparisons
  • Uu to Mg genes 324
  • 53 of Uu 67 of Mg
  • 71 hypothetical
  • Mh to Mg genes 314
  • 41 of Mh 57 of Mg
  • 55 hypothetical (2 unique hypothetical)
  • Mh to Uu genes 330
  • 47 of Uu 43 of Mh
  • 82 hypothetical (19 unique hypothetical)

65
M. genitalium - M. pneumoniae Gene Order
M. genitalium Gene Position
M. pneumoniae Gene Position
66
M. genitalium - U. urealyticum Gene Order
M. genitalium Gene Position
U. urealyticum Gene Position
67
Paralog Analysis
  • Identification of conserved, paralogous groups
  • All against All comparison
  • Genes within one organism
  • Identifies groups of related genes
  • Primary sequence
  • Structure
  • Function

68
Uu Paralogous Clusters gt3
  • 4 tRNA synthetase
  • 4 Translation factors
  • 4 Hypothetical membrane lipoprotein
  • 5 ATP synthase alpha, beta chains
  • 6 MBA
  • 7 Hypothetical membrane lipoprotein
  • 8 Hypothetical
  • 10 Iron transporters
  • 13 Transporters

69
Functional Genomics
  • Gene Expression
  • Gene Regulation
  • Genome-wide Mutagenesis

70
Expression Arrays
  • Cell growth in different environments
  • Isolate cDNAs
  • Measure expression using array technology
  • Create database of expression information
  • Display information in an easy-to-use format
  • Show ratio of expression under different
    conditions

71
Putting it all together
72
From F. Blattner, U. Wisc.
73
Chromosome Views
  • Ensembl view
  • UC Santa Cruz view
  • NCBI View

74
(No Transcript)
75
(No Transcript)
76
(No Transcript)
77
A Final Caveat
  • The difficulty of identifying genes in anonymous
    vertebrate sequences
  • Claverie JM, Poirot O, Lopez F
  • Comput Chem 199721(4)203-14

78
The identification of genes in newly determined
vertebrate genomic sequences can range from a
trivial to an impossible task. In a statistical
preamble, we show how "insignificant" are the
individual features on which gene identification
can be rigorously based promoter signals, splice
sites, open reading frames, etc. The practical
identification of genes is thus ultimately a
tributary of their resemblance to those already
present in sequence databases, or incorporated
into training sets. The inherent conservatism of
the currently popular methods (database
similarity search, GRAIL) will greatly limit our
capacity for making unexpected biological
discoveries from increasingly abundant genomic
data. Beyond a very limited subset of trivial
cases, the automated interpretation (i.e. without
experimental validation) of genomic data, is
still a myth. On the other hand, characterizing
the 60,000 to 100,000 genes thought to be hidden
in the human genome by the mean of individual
experiments is not feasible. Thus, it appears
that our only hope of turning genome data into
genome information must rely on drastic
progresses in the way we identify and analyze
genes in silico.
79
Only One Final Word of Wisdom...
  • ...although the computer is a wonderful helpmate
    for the sequence searcher and comparer,
    biochemists and molecular biologists must guard
    against the blind acceptance of any algorithmic
    output given the choice, think like a biologist
    and not a statistician.
  • - Russell F. Doolittle, 1990

80
(No Transcript)
Write a Comment
User Comments (0)
About PowerShow.com