Bioinformatics and Evolutionary Genomics - PowerPoint PPT Presentation

About This Presentation
Title:

Bioinformatics and Evolutionary Genomics

Description:

... of homology by sequence similarity (SCOP, Blundels Bus) ... The results of this are stored in the SCOP database (Blundel's bus) Structural alignment ... – PowerPoint PPT presentation

Number of Views:145
Avg rating:3.0/5.0
Slides: 39
Provided by: beren
Category:

less

Transcript and Presenter's Notes

Title: Bioinformatics and Evolutionary Genomics


1
Bioinformatics and Evolutionary Genomics
2
Request
  • We have a small group
  • and also heterogeneous with respect to previous
    knowledge
  • PLEASE interrupt / ask questions when I am going
    to fast, when I use jargon, when I make
    jumps/conclusions that to me seem obvious 100
    logical, but to your are erratic please point
    out my implicit assumptions regarding what
    everybody knows

3
Lectures and computer exercises
  • Homology, trees,
  • Genomic context , genome evolution, pathway
    evolution
  • HTP data
  • Eukaryotic Genome Evolution, tree of life.
  • Exercises basic abilities, plus impression of
    what is possible / how type of research is done
    (albeit on a larger scale)

4
Literature Discussion
  • Each (set of) articles will be introduced
    (presentation) by a 1 / 2 persons, presentation
    should last approximately half an hour, followed
    by a discussion
  • What to discuss
  • What are the articles actually saying? What have
    authors done? (so that everybody knows)
  • What does this mean in a larger context? (e.g. a
    discussion of the discussion)

5
Homology and Domains
6
Gene / protein sequence evolution what is
homology
  • Definition homology (biology)
  • structures are said to be homologous if they are
    alike because of shared ancestry.
  • Classic arms bird wings bat wings,
  • Genes/proteins/stretches of dna sequence
    similarity because derived from the same
    ancestral sequence
  • Instead of analogous with sequences we have
    convergence, but thought to be limited to
    specific cases (e.g. coiled-coil, regulatory
    motifs) but with function we have analogy e.g.
    analogous enzymes

7
Why are we interested in homology
  • Function prediction ? Homologous proteins tend to
    have similar functions
  • Evolutionary dynamics ? Tracing the evolution of
    genes (duplication, gene trees, origin of new
    gene families)

8
How do we detect homology
  • Similarity of
  • 3D structure ? most conserved aspect, yet not
    all structures are available. Structures are
    compared and classified by eye and software
    packages (Dali). (NB classical homology)
    criterion shared idiosyncratic features that
    are not strictly necessary for function
    sequence features
  • Sequence ? less conserved, many sequences are
    however available. Homology determination is
    mainly based on models of sequence evolution and
    the likelihood that when you compare a sequence
    to a database you will find a sequence of at
    least that similarity.
  • NB Manually curated databases of 3D structure
    similarity are used as a benchmark for detection
    of homology by sequence similarity (SCOP,
    Blundels Bus).

9
Gene / protein evolution beyond blast, distant
homology
  • Not obvious by blast
  • Substantial divergence, due to time and/or speed
  • Use profile (HMMer or PSI-BLAST),
  • In general work better because

ECGHR ECGHR C G TCQQL SIGNL
ECNHN ECNHN
10
Gene / protein evolution beyond blast, distant
homology
  • PSI-BLAST a multiple sequence alignment is
    generated on the fly to detect which
    residues/positions characterize the family.
  • OR use CDD, PFAM or SMART
  • Experts have collected representative and
    divergent members of a gene family and use HMMer
    or RPS-BLAST to see if your query sequence
    belongs to this gene family (i.e. is homologous
    to the members)
  • clearer/cleaner than psi-blast or blast.

11
How to detect very distant homology /
superfamilies
  • When two protein families are homologous but the
    homology is not obvious they are part of the same
    so called superfamily
  • How to detect
  • In depth PSI-BLAST
  • Reciprocal
  • Use of right seed
  • hopping (homology is by definition transitive)

12
Gene / protein evolution Distant homology
  • alignment-vs-alignment, Profile-vs-profile, HMM
    vs HMM comparison (whereas HHMer, PSI-BLAST
    compare a profile to a single sequence)
  • Unfortunately statistic are still poor
  • works because

ACRNG ACRNG ACGNR ACGNR C C TCQQL TCQQL
TFQQI TCILL
13
Gene / protein evolution Distant homology
  • 3D structure comparison/alignment plus visual
    inspection of multiple sequence alignment by
    Alexey Murzin
  • The results of this are stored in the SCOP
    database
  • (Blundels bus)

14
Structural alignment
  • Secondary structure elements
  • Alpha-helices
  • Beta strands (beta sheets)
  • Loops
  • Fold vs superfamily?

15
An example of distant homology
  • E.g. superfamily P-loop containing nucleoside
    triphosphate hydrolase
  • In humans AAA 130, ABC_tran 182, SMC_N 29
  • Zot UPF0079 TraG SMC_N SKI Sigma54_activat
    Rep_fac_C Rad17 NACHT Mg_chelatase MCM
    KTI12 IstB GSPII_E DUF853 DNA_pol3_delta
    Bac_DnaA APS_kinase ABC_tran AAA_PrkA AAA_5
    AAA_3 AAA_2 AAA

16
Apart from sequence and structural features
conservation of basic molecular function
17
Distant HomologyApplications to function
prediction
  • Bacterial protein of unknown function (DUF853)
  • Member of the P-loop containing nucleoside
    triphosphate hydrolase superfamily
  • Thus thought to be an ATPase

18
(No Transcript)
19
(No Transcript)
20
(No Transcript)
21
Relevance of homology for function prediction
Similar function What is function ?
  • Various levels of description
  • Sequence similarity, Homology has the largest
    relevance for Molecular Function. This is aspect
    of protein function that is best conserved,
    protein sequence, structure can often be
    interpreted in terms of function.

22
Using distant homology for function prediction
example from (just) before PSI-BLAST HMMer
  • Secreted Fringe-like Signaling Molecules May Be
    Glycosyltransferases. 
  • Cell. 1997 Jan 1088(1)9-11.
  • Y. Yuan, J. Schultz, M. Mlodzik, P. Bork

23
Distant Homology Application to evolution
  • Invention vs (duplication and) divergence
  • First determine homology before putting sequences
    in multiple sequence alignment tree building
    software
  • Two (or more) Proteins families that are present
    in all three kingdoms of life and which can be
    determined to be homologous to each other
    Information from before the Last Universal Common
    Ancestor, information about very early evolution

b
24
Protein domains structural definition separate
in structure
  • a structural domain ("domain") is an element of
    overall structure that is self-stabilizing and
    often folds independently of the rest of the
    protein chain

25
Protein domains sequence/evolutionary
definition Separate in evolution
  • Homologous parts of proteins that occur with
    different partners
  • Mobile
  • Modules
  • Almost always same as structural definition

26
Implications of domains for homology
  • The shared ancestry is not a property of the
    whole gene but only of part of the gene.
  • When studying the evolution of gene families,
    consider fusions / domain combinations (also when
    making trees etc.)

27
Domain repeats. Homology?
  • Blast homology vs the real homology unit
  • Q8TKV1 (Methanosarcina acetivorans)
  • ?

28
Q8TKV1
29
Ramifications for function prediction
understanding of cellular processes one domain
one (molecular) function (in contrast to one
gene one function)
  • This bit does this and that bit does that
  • E.g.
  • multidomain enzymes
  • Transcriptional regulators

30
Example multidomain enzyme TrpG E.coli
31
Ramifications for function prediction when doing
blast mind the domains
  • Protein B is wrongly annotated as having the
    function of domain 1, based on homology with the
    multidomain protein A, but not with domain 1
  • (multi-domain architecture problem for annotating
    proteins via blast)

1 2
A
B
B
32
Ramifications for function prediction when doing
blast mind the domains
  • Protein B is incompletely annotated as having the
    function of domain 2, based on homology with the
    single domain protein A, the second domain is
    missed in the annotation

2
1
A
B
B
33
Ramifications for function predictionwhen doing
blast do psi-blast, cdd / pfam instead.
  • Rather than discover the domain structure by
    blast yourself, use e.g. SMART / PFAM / CDD to do
    it for you
  • NB CDD

34
Domains and distant homologies
  • Promiscuous domains (i.e. that are present in
    many proteins), are often quite diverged and
    thus need sensitive homology detection tools in
    order to be recognized..
  • Moreover it is often only the most general
    functional property of the domain that is
    conserved over such long evolutionary distances
  • Over long evolutionary distances genes are often
    only homologous in the sense that they share a
    domain, rather than being full length homologous
  • We THUS use PFAM/SMART etc. for
  • The domains
  • And to improve upon BLAST / be cleaner than
    PSI-BLAST
  • And because most of the sequences are covered by
    these database. No need to reinvent the wheel.
    The ones that are not, are often non globular,
    recent inventions, or very fast evolving

35
Disclaimer non-globular regions
  • Low complexity
  • Unstructured, Elongated (as opposed to globular)
  • Many polar/charged residues few hydrophobic
    residues
  • parts of proteins that do not posses a clear 3D
    structure
  • Convergence
  • Do not obey PAM or BLOSUM

36
Disclaimer Coiled coil
  • All alpha thought to arise independently
    (convergence)
  • Hypothesis reservoir for new folds all alpha
    folds (Koonin EV)
  • E.g. ras / rho / rab / ran / -GAPs

37
Disclaimer Other protein motifs
  • Signal peptides
  • Lipid anchoring
  • Convergence yet still important to predict
  • Trans-membrane?

38
Interesting result on protein evolution regarding
domains and duplications neutral?
Black observed Blue model of recombination
duplication separate Red also duplication of
combinations
b
Write a Comment
User Comments (0)
About PowerShow.com