Title: Genomics
1Genomics
2Aims of genomics I
- Establishing integrated databases being far
from merely a storage - Linking genomic and expressed gene sequences
cDNA
3Aims of genomics II
- Describing every gene
- function/expression data/relationships/phenotype
- 3-d structure and features (introns/exons,
domains, repeats) - similarities to other genes
- Characterize sequence diversity in population
4Genomics can be
- Structural
- where it is?
- Functional
- what it does?
- DNA microarrays
- Comparative
- finding important fragments
5Mapping genomes
- Past
- Genetic mapsDistance between simple markers
expressed in units of recombination - Cytological mapsStained chromosomes, observable
under microscope - Present
- Physical mapsDistance between nucleotides
expressed in bases - Comparative mapCorresponding genes detection
Regulatory sequence detection
6Genome sizes
Organism DNA length Genes
Mycoplasma genitalium 0.5 Mb 470
Deinococcus radiodurans 3 Mb in 4-10 copies! 3 200
Escherichia coli 4.5 Mb 4 400
Saccharomyces cerevisiae 12 Mb 6 200
Caenorhabditis elegans 97 Mb 22 000
Drosophila melanogaster 120 Mb 18 000
Homo sapiens 3200 Mb 32 000
7Genetic differences among humans
- Goals
- Genetic diseases
- Identifying criminals
- Methods
- Genetic markers (fingerprints) and DNA sequence.
Repeats - Microsatellites (repeats of 1-12 nucleotides)
- Minisatellites (gt 12)
- Other types of variation
- Genome rearrangements
- Single nucleotide mutations
8Microsatellites and disease
- Huntingtons disease
- Huntingtin gene of unknown (!) function
- Repeats 6-35 normal 36-120 disease
- Friedrich ataxia disease
- GAA repeat in non-coding (intron) region
- Repeats 7-34 normal 35 up disease
- Repeat expansion reduces expression of frataxin
gene
9SNP - Single Nucleotide Polymorphism
- Definition
- SNP and phenotype
- Occurrence in genome
- Rarity of most SNPs (agrees with neutral
molecular evolutionary theory) - SNPs in human population
- High variance in genome!
- Detection of SNPs Hybridization
Inter-genic regions Coding regions
Every 1400bp Every 1430bp
10Sickle cell anemia
Sickle looks like this
- SNP on Beta Globin gene, which is recessive
- 2 faulty copies red blood cells change shape
under stress - anemia - 1 faulty copy red blood cells change shape under
heavy stress but gives resistance to malaria
parasite
11SNPs and haplotypes
- Passengers and their evolutionary vehicles
12SNP - Phase inference
- In the data from sequencing the genome the origin
of SNP is scrambled
Possibility 1
Possibility 2
...CTGACGGT...
...CTGACAGT...
chromosome
...CTTACAGT...
...CTTACGGT...
chromosome
- Which SNPs are on the same chromosome (are in
phase)?
13SNP phase inferenceDetermining the parent of
origin for each SNP
GG
TA
Phase inference the reason why many SNPs
sequencing is done for child and two parents.
14Linkage Disequilibrium, introHow hard is it to
break a chromosome
- An allele/trait/SNP A and a are on the same
position in genome (locus), thus on a single
chromosome an individual can have either of them
but not both - fA - frequency of occurrences of trait A in
population - fa 1- fA
- fB, fb 1 - fB are frequency occurrences of B
and b - Probabilities of occurences of both traits on the
same chromosomefABfAbfaBfab - LD and genomic recombination
15Linkage Disequilibrium, calculation
- When these alleles are not correlated we expect
them to occur together by chance alone fAB fA
fB fAb fA fb faB fa fB fab fa fb - But if A and B are occurring together more often
(disequilibrium state), we can write fAB fA fB
D fAb fA fb - D faB fa fB - D fab fa
fb D - where D is called the measure of disequlibrium
- Of course from definitions above we have D fAB
- fA fB
16How can we use it?
- Phase inference tells us how SNPs are organized
on chromosome - Linkage disequilibrium measures the correlation
between SNPs
17Back to SNPs
Daly et al (2001), Figure 1
18Haplotypes - vehicles for SNPs
- Daly et al (2001) were able to infer offspring
haplotypes largely from parents. They say that
it became evident that the region could be
largely decomposed into discrete haplotype
blocks, each with a striking lack of diversity - The haplotype blocks
- Up to 100kb
- 5 or more SNPs For example, this block shows
just two distinct haplotypes accounting for 95
of the observed chromosomes
19Haplotypes on the genome fragment
- Observed haplotypes with dotted lines wherever
probability of switching to another line is gt 2 - Percent of explanation by haplotypes
- Contribution of specific haplotypes
20Another genetic testDoes haplotypes exist?
- Each row represents an SNP
- Blue dot major
- yellow minor
- Each column represents a single chromosome
- The 147 SNPs are divided into 18 blocks defined
by black lines. - The expanded box on the right is an SNP block of
26 SNPs over 19kb of genomic DNA. The 4 most
common of 7 different haplotypes include 80 of
the chromosomes, and can be distinguished with 2
SNPs
21How much SNPs we can ignore?
- and still predict haplotypes with high accuracy?
22Literature
- Gibson, Muse A Primer of Genome Science
- N Patil et al . Blocks of limited haplotype
diversity revealed by high-resolution scanning of
human chromosome 21 Science 294 20011719-1723. - M J Daly et al . High-resolution haplotype
structure in the human genome Nat. Genet. 29
2001 229-232.