Title: Comparative Genomics
1Comparative Genomics
2Overview of the Talk
- Comparing Genomes
- Homologies Families
- Sequence Alignments
3Evolution at the DNA Level
Deletion
Mutation
ACTGACATGTACCA
Sequence edits
AC----CATGCACCA
Rearrangements
Inversion
Translocation
Duplication
4Why Compare Genomes?
- We can better understand evolution/ speciation
- We can find important, functional regions of the
sequence (codons, promoters, regulatory regions) - It can help us locate genes in other species that
are missing or not well-defined (also through
comparison and alignments).
5Comparing Genomes
- Mammals have roughly 3 billion base pairs in
their genomes - Over 98 human genes are shared with primates,
wth more than 95-98 similarity between genes. - Even the fruit fly shares 60 of its genes with
humans! (March 2000) - Differences gene structure, sequence
- Remember one nucleotide change can cause
disease such as sickle cell anemia and cancer.
6How Does Ensembl Predict Homology?
- Uses all the species
- Uses a representative protein (the longest) for
every gene - Builds a gene tree
- EnsemblCompara GeneTrees Analysis of complete,
duplication aware phylogenetic trees in
vertebrates. Vilella AJ, Severin J, Ureta-Vidal
A, Durbin R, Heng L, Birney E. Genome Res. 2008
Nov 24.
7Steps in Homology Prediction
WU Blastp SmithWaterman longest translation of
every gene against every other (Blast
Reciprocal Hit/ Blast Score Ratio)
Load Genes and Longest Translation from all
species
Protein Clustering
For each cluster, build a multiple alignment
(MUSCLE) based on the protein sequences
From each alignment, build a gene tree
Reconcile each gene tree with the species tree to
call duplication event on internal nodes (njTree)
Inference of homologues
8Viewing Trees in Ensembl
9Types of Homologues
- Orthologues any gene pairwise relation where
the ancestor node is a speciation event - Paralogues any gene pairwise relation where the
ancestor node is a duplication event
10The Gene Tree for INS (insulin precursor)
A blue square is a speciation event (Orthologues)
A red square is a duplication event (Paralogues)
Nodes (squares) are ancestors. INS (insulin
precursor) is compared across species.
11Reconciliation
M
R
H
species tree
M
H
R
unrooted gene tree
12Orthologue Types
What is 1 to 1?
What is 1 to many?
13- Find the cow MYL6 gene go to its gene summary.
- How many paralogues does it have? Find them in
the gene tree. - Is there an orthologue predicted for this gene in
zebrafish? Jump to its gene summary.
14- From the fish page you are in, click on protein
families at the left. - How were these families determined? (Click the
Help button).
15Protein Families
- How Cluster proteins for every isoform in every
species UniProt proteins. - BLASTP comparison of
- all Ensembl ENSP
- all metazoan (animal) proteins in UniProt
16Families
17Ensembl Proteins in the Family
18Overview of the Talk
- Comparing Genomes
- Homologies and Families
- Sequence Alignments
19Non-Coding Regions
- Large stretches of non-coding regions in
vertebrates - Regulatory regions of
- Developmental genes
- Transcription factors
- miRNA
Kikuta et. al, Genome Research, May 2007
20Comparative Genomics today
21Aligning Whole Genomes- Why?
- To identify homologous regions
- To spot trouble gene predictions
- Conserved regions could be functional
- To define syntenic regions (long regions of DNA
sequences where order and orientation is highly
conserved)
22Aligning large genomic sequences
- Difficulties
- Requires a significant computer resource
- Scalability, as more and more genomes are
- sequenced
- Time constraint
- As the true alignment is not known, then
difficult to measure the alignment accuracy and
apply the right method
23Methods of Alignment- Ensembl
- BLASTZ-net (comparison on nucleotide level) is
used for species that are evolutionary close,
e.g. human mouse - Translated BLAT (comparison on amino acid level)
is used for evolutionary more distant species,
e.g. human zebrafish - EPO/PECAN global alignment used for multispecies
alignments
24Conserved Regions Exercise
- Find the Ensembl MYH2 gene for cow and go to
Region in Detail. - Turn on a pairwise alignment (H.sap-B.tau)
- Would you expect an orthologous gene in human?
25Conserved Regions Exercise
- Jump to the human region.
- What is the constrained elements track?
26Acknowledgements
- Javier Herrero
- Benoît Ballester
- Kathryn Beal
- Stephen Fitzgerald
- Albert Vilella
- Ensembl team