Human Genome Lecture - PowerPoint PPT Presentation

1 / 32

About This Presentation

Title:

Human Genome Lecture

Description:

Human Genome Lecture Historical aspects of the HGP EST sequencing: Finding new genes faster than ever Using 3 ESTs to generate human gene maps – PowerPoint PPT presentation

Number of Views:143

Avg rating:3.0/5.0

Slides: 33

Provided by: jamess74

Category:

more less

Transcript and Presenter's Notes

Title: Human Genome Lecture

1
Human Genome Lecture

Historical aspects of the HGP
EST sequencing
Finding new genes faster than ever
Using 3 ESTs to generate human gene maps
First comprehensive genome-wide human gene maps
Sequence of human genome
Complex genomic regions and sequence limitations

2
Key pre-HGP scientific advances

Structure of DNA determined (1953)
Watson Crick
Recombinant DNA created (1972)
P. Berg Cohen and Boyer
Methods for DNA sequencing developed (1977)
Maxam Gilbert F. Sanger
PCR invented (1985)
K. Mullis
Automated DNA sequencer developed (1986)
L. Hood

3
Obstacles to formation of the HGP

1) Financial/political Big biology is bad
biology
-departure from cottage industry culture of
biology
-devoid of hypothesis-driven research
-what will it cost?
-will it take away from other programs?
2) Why sequence the Junk?
-protein coding regions make up lt1.5 of the
genome
-waste of time/money to sequence repetitive,
hard-to-sequence regions
3) It is impossible to do
-mid 1980s
-primitive sequencing capabilities (500
bp/day/lab)
-primitive computer capabilities/bioinformatics
resources

4
Significance of the HGP

The book of life, The grail of human biology,
Code of codes
The instructions to create a human being
The genome is a product of evolution
- molecular replicator (DNA) heritable
variation time changing environment genome
- record of the evolutionary history of our
species
Comparative genomics the genes that make us
human
The genome unparalled system of information
storage
- 70 trillion cells in human body
- each cell stores 3 billion units of
information

5
Significance of the HGP (cont)

Biology in the 21st century
- equivalent of learning to read a new language
The genome as dynamic not static
- perspective on past/future of the species
Implications for health and disease
-Genetic disease gene discovery single-gene
diseases multifactorial diseases
-DNA-based diagnostics
-New drug targets
-Gene therapy implications
-Therapeutic uses vs. enhancements
Accumulation of a molecular parts lists of
human physiology anatomy
- Lander Periodic Table of the Elements
analogy

6
(No Transcript)
7
Genomics Timelines
8
Rapid Gene Identification Mapping ESTs and
Gene-based STSs

Single-pass sequencing of randomly selected cDNA
clones
Obtain sequences from 5 and 3 ends of cDNA
inserts
Rapidly cheaply identify human genes
Alzheimers gene discovered by EST database
search
3UT sequence ideal for STS development
PCR-based gene mapping
Readily scaled up for development of most
comprehensive human gene maps (Science 1996,
1998)

9
(No Transcript)
10
(No Transcript)
11
(No Transcript)
12
(No Transcript)
13
One gene one STS

Gene-based STSs as the basis for a human gene map
Berry et al, Nature Genetics 1995
ESTablishing a human transcript map
Boguski and Schuler, Nature Genetics 1995

14
(No Transcript)
15
Boguski Schuler, Nat. Genet. 1995
16
(No Transcript)
17
(No Transcript)
18
Size and gene content of the 24 human
chromosomes. A, Size of each human chromosome,
in millions of base pairs (1 million base pairs
1 Mb). Chromosomes are ordered left to right by
size. B, Number of genes identified on each human
chromosome. Chromosomes are ordered left to right
by gene content. (Based on www.ensembl.org, v36.)
19
(No Transcript)
20
Genomic sequencing vs EST sequencing

EST (single pass cDNA) sequencing
- very fast but not error-free (e.g. 99
accuracy)
- very rapid gene identification (reliance on
mRNA)
- cDNA abundance influences coverage
some genes will be missed
normalized cDNA libraries improve coverage
provides a gene expression profile
Genomic sequencing
- pre-2001 much slower method for gene finding
-must do gene id by computer prediction
- will generate complete gene and genome
information, e.g. introns, regulatory regions,
intergenic regions, repeats, etc.
- more expensive way to id genes
- independent of gene expression level concerns
- highly accurate when complete

21
(No Transcript)
22
(No Transcript)
23

24
(No Transcript)
25
(No Transcript)
26
Significant findings arising from analysis of the
draft sequence of the human genome

The genomic landscape shows marked variation in
the distribution of a number of features,
including genes, transposable elements, GC
content, CpG islands and recombination rate. This
gives us important clues about function. For
example, the developmentally important HOX gene
clusters are the most repeat-poor regions of the
human genome, probably reflecting the very
complex coordinate regulation of the genes in the
clusters.
There appear to be about 30,00040,000
protein-coding genes in the human genomeonly
about twice as many as in worm or fly. However,
the genes are more complex, with more alternative
splicing generating a larger number of protein
products.
The full set of proteins (the 'proteome')
encoded by the human genome is more complex than
those of invertebrates. This is due in part to
the presence of vertebrate-specific protein
domains and motifs (an estimated 7 of the
total), but more to the fact that vertebrates
appear to have arranged pre-existing components
into a richer collection of domain architectures.
Hundreds of human genes appear likely to have
resulted from horizontal transfer from bacteria
at some point in the vertebrate lineage. Dozens
of genes appear to have been derived from
transposable elements.
Although about half of the human genome derives
from transposable elements, there has been a
marked decline in the overall activity of such
elements in the hominid lineage. DNA transposons
appear to have become completely inactive and
long-terminal repeat (LTR) retroposons may also
have done so.
The pericentromeric and subtelomeric regions of
chromosomes are filled with large recent
segmental duplications of sequence from elsewhere
in the genome. Segmental duplication is much more
frequent in humans than in yeast, fly or worm.
Analysis of the organization of Alu elements
explains the longstanding mystery of their
surprising genomic distribution, and suggests
that there may be strong selection in favour of
preferential retention of Alu elements in GC-rich
regions and that these 'selfish' elements may
benefit their human hosts.
The mutation rate is about twice as high in
male as in female meiosis, showing that most
mutation occurs in males.
Cytogenetic analysis of the sequenced clones
confirms suggestions that large GC-poor regions
are strongly correlated with 'dark G-bands' in
karyotypes.
Recombination rates tend to be much higher in
distal regions (around 20 megabases (Mb)) of
chromosomes and on shorter chromosome arms in
general, in a pattern that promotes the
occurrence of at least one crossover per
chromosome arm in each meiosis.
More than 1.4 million single nucleotide
polymorphisms (SNPs) in the human genome have
been identified. This collection should allow the
initiation of genome-wide linkage disequilibrium
mapping of the genes in the human population.

27
Patterns of intrachromosomal and interchromosomal
duplication in the human genome
Bailey, et al, Science, 2002
28
Distribution of gt50 kb gaps in HapMap phase 1 -
CEU
HapMap phase 1
chromosome lengths
gt50 kb gap between SNPs
excluding centromere gaps
heterochromatin
T. Hudson
29
Bailey, et al, Science, 2002
30
Genome Structural Variation

Broadest sense all changes in the genome not due
to single base-pair substitutions
Copy number variations (CNVs)
CNV loci may cover 12 of genome
Insertions/Deletions (indels)
e.g. Repeats STRs, VNTRs
Inversions
Duplications and translocations

31
Limitations of Genome Sequencing

Nexgen sequencers are short read
Repeated/duplicated sequences often cant be
positioned
Segmental duplications make up 5 of genome
gt95 identity gt20kb
Smaller-size, highly duplicated sequence families
exist
Complex, duplication-rich regions
gt200 gaps (gt50kb each) in human genome
Difficult to accurately assemble
Linked to many human diseases
Linked to evolutionary adaptation
Location of missing heritability of GWAS?
Are critical regions of the genome being
missed/ignored?

32
Limitations of next-generation genome sequence
assembly Can Alkan, Saba Sajjadian Evan E
Eichler
Nature Methods Volume 8, Pages 6165 Year
published (2011) DOI doi10.1038/nmeth.1527
Published online 21 November 2010
Abstract Abstract High-throughput sequencing
technologies promise to transform the fields of
genetics and comparative biology by delivering
tens of thousands of genomes in the near future.
Although it is feasible to construct de novo
genome assemblies in a few months, there has been
relatively little attention to what is lost by
sole application of short sequence reads. We
compared the recent de novo assemblies using the
short oligonucleotide analysis package (SOAP),
generated from the genomes of a Han Chinese
individual and a Yoruban individual, to
experimentally validated genomic features. We
found that de novo assemblies were 16.2 shorter
than the reference genome and that 420.2 megabase
pairs of common repeats and 99.1 of validated
duplicated sequences were missing from the
genome. Consequently, over 2,377 coding exons
were completely missing. We conclude that
high-quality sequencing approaches must be
considered in conjunction with high-throughput
sequencing for comparative genomics analyses and
studies of genome evolution.

Write a Comment

User Comments (0)