Comparative Genomics - PowerPoint PPT Presentation

1 / 26
About This Presentation
Title:

Comparative Genomics

Description:

Mammals have roughly 3 billion base pairs in their genomes ... To identify homologous regions. To spot trouble gene predictions ... – PowerPoint PPT presentation

Number of Views:48
Avg rating:3.0/5.0
Slides: 27
Provided by: Giuliett4
Category:

less

Transcript and Presenter's Notes

Title: Comparative Genomics


1
Comparative Genomics
2
Overview of the Talk
  • Comparing Genomes
  • Homologies Families
  • Sequence Alignments

3
Evolution at the DNA Level
Deletion
Mutation
ACTGACATGTACCA
Sequence edits
AC----CATGCACCA
Rearrangements
Inversion
Translocation
Duplication
4
Why Compare Genomes?
  • We can better understand evolution/ speciation
  • We can find important, functional regions of the
    sequence (codons, promoters, regulatory regions)
  • It can help us locate genes in other species that
    are missing or not well-defined (also through
    comparison and alignments).

5
Comparing Genomes
  • Mammals have roughly 3 billion base pairs in
    their genomes
  • Over 98 human genes are shared with primates,
    wth more than 95-98 similarity between genes.
  • Even the fruit fly shares 60 of its genes with
    humans! (March 2000)
  • Differences gene structure, sequence
  • Remember one nucleotide change can cause
    disease such as sickle cell anemia and cancer.

6
How Does Ensembl Predict Homology?
  • Uses all the species
  • Uses a representative protein (the longest) for
    every gene
  • Builds a gene tree
  • EnsemblCompara GeneTrees Analysis of complete,
    duplication aware phylogenetic trees in
    vertebrates. Vilella AJ, Severin J, Ureta-Vidal
    A, Durbin R, Heng L, Birney E. Genome Res. 2008
    Nov 24.

7
Steps in Homology Prediction
WU Blastp SmithWaterman longest translation of
every gene against every other (Blast
Reciprocal Hit/ Blast Score Ratio)
Load Genes and Longest Translation from all
species
Protein Clustering
For each cluster, build a multiple alignment
(MUSCLE) based on the protein sequences
From each alignment, build a gene tree
Reconcile each gene tree with the species tree to
call duplication event on internal nodes (njTree)
Inference of homologues
8
Viewing Trees in Ensembl
9
Types of Homologues
  • Orthologues any gene pairwise relation where
    the ancestor node is a speciation event
  • Paralogues any gene pairwise relation where the
    ancestor node is a duplication event

10
The Gene Tree for INS (insulin precursor)
A blue square is a speciation event (Orthologues)
A red square is a duplication event (Paralogues)
Nodes (squares) are ancestors. INS (insulin
precursor) is compared across species.
11
Reconciliation
M
R
H
species tree
M
H
R
unrooted gene tree
12
Orthologue Types
What is 1 to 1?
What is 1 to many?
13
  • Find the cow MYL6 gene go to its gene summary.
  • How many paralogues does it have? Find them in
    the gene tree.
  • Is there an orthologue predicted for this gene in
    zebrafish? Jump to its gene summary.

14
  • From the fish page you are in, click on protein
    families at the left.
  • How were these families determined? (Click the
    Help button).

15
Protein Families
  • How Cluster proteins for every isoform in every
    species UniProt proteins.
  • BLASTP comparison of
  • all Ensembl ENSP
  • all metazoan (animal) proteins in UniProt

16
Families
17
Ensembl Proteins in the Family
18
Overview of the Talk
  • Comparing Genomes
  • Homologies and Families
  • Sequence Alignments

19
Non-Coding Regions
  • Large stretches of non-coding regions in
    vertebrates
  • Regulatory regions of
  • Developmental genes
  • Transcription factors
  • miRNA

Kikuta et. al, Genome Research, May 2007
20
Comparative Genomics today
21
Aligning Whole Genomes- Why?
  • To identify homologous regions
  • To spot trouble gene predictions
  • Conserved regions could be functional
  • To define syntenic regions (long regions of DNA
    sequences where order and orientation is highly
    conserved)

22
Aligning large genomic sequences
  • Difficulties
  • Requires a significant computer resource
  • Scalability, as more and more genomes are
  • sequenced
  • Time constraint
  • As the true alignment is not known, then
    difficult to measure the alignment accuracy and
    apply the right method

23
Methods of Alignment- Ensembl
  • BLASTZ-net (comparison on nucleotide level) is
    used for species that are evolutionary close,
    e.g. human mouse
  • Translated BLAT (comparison on amino acid level)
    is used for evolutionary more distant species,
    e.g. human zebrafish
  • EPO/PECAN global alignment used for multispecies
    alignments

24
Conserved Regions Exercise
  • Find the Ensembl MYH2 gene for cow and go to
    Region in Detail.
  • Turn on a pairwise alignment (H.sap-B.tau)
  • Would you expect an orthologous gene in human?

25
Conserved Regions Exercise
  • Jump to the human region.
  • What is the constrained elements track?

26
Acknowledgements
  • Javier Herrero
  • Benoît Ballester
  • Kathryn Beal
  • Stephen Fitzgerald
  • Albert Vilella
  • Ensembl team
Write a Comment
User Comments (0)
About PowerShow.com