Title: The Human Genome Project
1The Human Genome Project
- Lecture 4
- Strachan and Read Chapter 8
2The HGPs primary aims
- The main aims of the Human Genome Project (HGP)
were to - Construct maps of the genome (genetic and
physical) - Identify all the genes (now known to be about
30,000) - Determine the entire DNA sequence (3,000,000,000
bp)
3Other aims of HGP
- As well as the genome sequence, the aims were
- Technology development
- Model organism genome projects (E. coli, yeast,
mouse, fruit fly, C. elegans) - Ethical, legal and societal implications (ELSI)
4The linkage map
- The map was built by linkage studies in 60 large
families with grandparents and large numbers of
children, collected by the University of Utah and
the Centre d'Étude du Polymorphisme Humain
(CEPH), Paris - Families were typed with over 5000 polymorphic
DNA sequences 60 were microsatellite repeats
(mostly dinucleotide (CA) repeats, also some tri-
and tetra-nucleotides). Only about 400 of them
were actual genes - Construction of the genetic map
- Obtain genotypes of all markers on all family
members (PCR and gel electrophoresis, using
robots and automated gel apparatus - Calculation of recombination fractions between
markers - Observe crossovers between closely linked
markers, use this information to confirm order of
markers - Construction of the linkage map is a very big
problem sophisticated software was used to work
out the "best fit" map of all the markers, with
advanced statistical methods and algorithms
5STSs and ESTs
- Sequence tagged sites (STSs) are specific loci in
the genome, for which enough DNA sequence is
available to make PCR primers to amplify the
locus (usually as a fragment of a few 100bp).
These include microsatellites (e.g. CA repeats)
that can be used for linkage studies. - The information required to use an STS is just
the sequences of the PCR primers therefore it is
very easy to make databases of STSs that can be
used by anyone. No actual bits of DNA need change
hands. This is crucial in allowing genome
projects to proceed as international
collaborations, with many laboratories
participating in a co-ordinated way. - ESTs act as specific tags for each human gene,
since they are derived by sequencing cDNA clones
which came from mRNA and therefore represent the
actual transcribed sequences (as opposed to STSs,
which can be derived from anywhere in the genome
and are mostly non-coding). They allow rapid
access to the actual genes, ignoring introns and
junk DNA
6ESTs can be 3' or 5' depending on which end of
the cDNA was sequenced. Because of the methods
used to make cDNA libraries, parts of the 5' end
of the gene are often lost during cloning whereas
the 3' end is more reliable. Therefore, the same
gene may give different 5' ESTs and it will
difficult to deduce whether they have come from
the same gene. This shown on the diagram by the
white boxes representing cDNA clones being
different lengths. Another complication is due to
alternative splicing. On the left is shown the
genomic structure of a gene, with the exons as
boxes - the red one is subject to alternative
splicing.
7X-ray hybrid mapping
- X-ray hybrids are made by irradiating a human
cell line with 3000 rad of X-rays, fusion to
hamster cells, and isolation of hybrid cell lines
in culture - A panel of 100-200 hybrids with 5-10 different
fragments of human DNA in each gives about 1000
fragments in total, i.e. the human genome has
been divided into 1000 bits. - The closer together 2 markers are in the genome,
the more likely it is that they will be present
in the same hybrids (since they are less likely
to be separated by an X-ray induced break). - By doing a PCR assay for each marker on all the
hybrids, a map can be made. The units are called
cR (centiray, where 1cR is a 1 chance that the
markers will be separated by X-ray breakage).
8(No Transcript)
9For each pair of markers in turn the
"co-retention frequency" is the number of hybrids
in which both markers are present, divided by the
number of hybrids in which one or other (or both)
markers are present. On the figure, there are 5
hybrids containing both markers B and C, and 6
containing B and/or C. Therefore the co-retention
frequency is 5/6 or 0.83. Likewise it is 6/7 for
markers E and F, and 2/10 for markers C and E.
This shows that B and C are close together, E and
F are close together, but C and E are further
apart. The analysis is extended to all the
markers and their order is worked out by
considering all the co-retention frequencies.
10Clone contigs
- A clone contig is a series of cloned DNA segments
that overlap each other, assembled in the correct
order along the genome - The clones are made using vectors
- cosmids (capacity 45 kb)
- BACs or YACs (Bacterial or Yeast Artificial
Chromosomes) which can clone 100s of kb of DNA -
more suitable for dealing with large stretches of
mammalian DNA.
11Making a clone contig by fingerprinting
12Putting it together
- The physical map consists of 1000s of cloned
genomic DNA fragments, in E coli host cells
(BACs, cosmids, 40-250kb) or yeast (100-1500kb
"Yeast artificial chromosomes" or YACs), X-ray
hybrids, and hundreds of thousands or STSs and
ESTs. - The linkage map contains several thousand STSs.
- All of these can be linked together to produce an
integrated genome map. - The presence or absence of each STS or EST in
each X-ray hybrid and cloned DNA is simply
determined by PCR. - Because of the huge numbers involved, automation
of the assays is required.
13Sequencing
- There was a great deal of human genome to
sequence (3000 Mb, or 3 x 109 bp). - Due to the limitations of the techniques, each
sequencing reaction can only generate up to 700
bp of DNA sequence. - So the total sequence must be assembled from
millions of short, overlapping bits of sequence.
The starting point for this is the contigs of
overlapping BAC clones. - Each clone in the contig is subcloned into 100s
of smaller fragments, using a plasmid vector
suitable for preparing templates for the DNA
sequencing reactions.
14(No Transcript)
15(No Transcript)
16(No Transcript)