Title: Comparing the Mouse and Human Genomes: A First Look
1Comparing the Mouse and Human GenomesA First
Look
- Michael Kamal
- Whitehead Institute/MIT Center for Genome Research
2Overview
- Mouse genome assembly
- Global comparison Synteny
- Local comparison Conserved elements
3Mouse Genome Arachne Assembly
Emphasize dramatic increase in supercontig size
4gt 95 of Mouse Genome in 89 Ultracontigs
5Assembly Agrees Well with Mouse Genetic Map
6Overview
- Mouse assembly
- Global comparison Synteny
- Local comparison Conserved Elements
7Anchoring Mouse and Human Genomes Method
Syntenic human BACs
Mouse chromosome
Other human BACS
8Anchoring Mouse and Human Genomes Example
Human
Mouse
gt94 of Human BACs on TPF can be anchored to mouse
9Human Chromosome 20
Mouse (Mb)
BACs on human TPF (Mb)
10Human Chromosome 17
Mouse (Mb)
BACs on human TPF (Mb)
11Mouse Chromosome 1
BACs on human TPF (Mb)
Mouse chromosome (Mb)
12Mouse/Human Synteny
Reveals 23 new segments (arrows)
mouse chromosomes
We have been able to identify at least 23 new
syntenic segments (shown with arrows)
13Synteny for Mouse Chromosome 1
Interchromosomal at least 8 conserved segments
Intrachromosomal at least 21 conserved
segments
14Size Distribution of Syntenic Blocks in Mouse
Intrachromosomal
Interchromosomal
142 segments
303 segments
- Only considering segments gt200k
- last bin in left graph (70) includes a point for
all of chrX
- Logarithmic scale for frequency
- reasonable fit to exp curve, random breakage
model
15GC Content Humans Have Longer Tails Than Mice
Human mean 41.0, std 5.4 Mouse mean41.7
std4.3
16Comparing Mouse/Human GC Content
GC content Mouse chr 14
GC content Human-Mouse (20K syntenic bins)
Mouse
GC-content
Human
Mouse
Mouse GC content systematically higher in this
region
Spin lot of GC variation along chr, complicated
synteny, but remarkable uniformly when you
compare syntenically -- blue region is chosen
just for illustration. Will look all all syntenic
regions...
17Human Genome is Larger than Mouse Genome
Human syntenic regions are 21 larger on
average However, X chromosome is only 8 larger.
Perhaps we should understand why this is
happening (repeats?) before showing the results
18Overview
- Mouse assembly
- Global comparison Synteny
- Local comparison Conserved Elements
19Conserved Elements
- Conserved elements (PatternHunter)
- Medium scoring threshold 608,000
- High scoring threshold 484,000
20Coding Sequences Are Only Half the Story
PPARg
Large conserved elements (gt100 bp)
Exons
Non-exons
75 90
21Not All Conserved Elements Are Exons
30,000 genes x 8 exons/gene
Only 240K conserved exons
- Other conserved elements ??
- Regulatory elements Enhancers, Insulators
- RNA genes
- Chromosome structure and mobility
- Missed genes?
22Coding vs Noncoding Filtering Methodology
Human genome
Mouse genome
PatternHunter
Conserved sequences
Refseq Annotated genes with ESTsupport Riken cDNA
Known genes
IPI Protein database GenScan (including
suboptimal) FirstEF
Possible genes
Noncoding conserved
23Filtering Results for Human chr21 and chr22
24TNFa enhancer
Conserved RefSeq Genscan Human Mouse
ACCGCTTCCTCCACATGAGATCATGGTTTTCTCCACCAAGGAAGTTTTCC
GAGGGTTGAATGAGAGCTTTTCCCCGCCC
ACCGCTTCCTCCAGATGAGCTCATGGGTTTCTCCACCA
AGGAAGTTTTCCGCTGGTTGAATGA--TTCTTTCCCCGCCC
NFat/Ets CRE k3-Nfat
Ets Nfat AP1 SP1
25Conclusions
- Most of human and mouse genomes (gt94) lies in
identified syntenic blocks - About 500,000 conserved elements have been
identified - Challenge Assign function to 250,000 conserved
non-coding elements
26Acknowledgements
Whitehead Institute Kerstin Lindblad-Toh Michael
Zody David Jaffe Jonathan Butler Sante
Gnerre Evan Mauceli Dan Brown Robert Nicol Tim
Holzer Nicole Stange-Thomann Toby Bloom Jill
Mesirov Bruce Birren Chad Nusbaum Eric Lander
Washington University John McPherson Rick
Wilson M Sehhon K Wylie Bob Waterston
Homology Group Jim Mullikin, SC Jim Kent,
UCSC David Hausler, UCSC Webb Miller, PSU Ross
Hardison, PSU Michael Brent, WU
Sanger Centre Jim Mullikin R Ainscough C Clee B
Plumb B Mortimore D Willey Jane Rogers
BSI (PatternHunter) Ming Li
CSHL Ivo Grosse Michael Zhang
Funding NHGRI Mouse Sequencing Consortium
British Columbia Marco Marra