Identification of Protein Domains - PowerPoint PPT Presentation

1 / 44
About This Presentation
Title:

Identification of Protein Domains

Description:

PUG the homologous regions. PUG domains found in proteins with domains ... HMM from the PUG marginal similarity to IRE1p-like Kinases which are known to ... – PowerPoint PPT presentation

Number of Views:148
Avg rating:3.0/5.0
Slides: 45
Provided by: science4
Category:

less

Transcript and Presenter's Notes

Title: Identification of Protein Domains


1
Identification of Protein Domains
2
Orthologs and Paralogs
  • Describing evolutionary relationships among genes
    (proteins)
  • Two major ways of creating homologous genes is
    gene duplication and speciation.
  • Homology not sufficiently well-defined Therefore
    additional terms are used

3
  • Orthologs are two genes from two different
    species that derive from a single gene in the
    last common ancestor of the species.

ortho
para
  • Paralogs are genes that derive from a single gene
    that was duplicated within a genome.

ortho
4
Co-orthologs are paralogs produced by
duplications of orthologs subsequent to a given
speciation event.
co-ortho
5
Inparalogs are paralogs in a given lineage
that all evolved by gene duplications that
happened after the speciation event.
in-para
in-para
out-para
  • Outparalogs are paralogs in the given lineage
    that evolved by gene duplications that happened
    before the speciation event

6
Orthologs and Paralogs
  • Orthologs - evolutionary functional
    counterparts in different species
  • Inparalogs important for detecting
    lineage-specific adaptations

7
Proteins
  • Rapidly growing databases of protein sequences
    due to genome sequencing projects.
  • Many new proteins belong to protein families with
    known functions, (significant sequence
    similarity).
  • Only a small fraction of known proteins have
    functions determined by experiment.
  • Databases providing computational sequence
    analysis allow us to classify new proteins to
    known families, and thus determine their function.

8
Protein Domains
  • A domain is an independent structural unit which
    can be found alone or in conjunction with other
    domains or repeats.
  • Module mobile domain.
  • Different domains have distinct functions.
  • Many eukaryotic proteins have multiple domains.

9
Protein Domains
PX domain with ligand
SH3 domain with ligand
10
Identifying Protein Domains
  • Problems
  • Defining the members of each family.
  • Building multiple alignments of the members.
  • Finding the boundaries of the domain.

11
(No Transcript)
12
Identifying Protein Domains
  • Little structural data ? identification by
    sequence analysis.
  • Even when the structure of the domain is not
    known it may be possible to define its boundaries
    from sequence alone.
  • Sequence characterization of families -
    determine 3D structure and molecular functions.

13
Identifying Protein Domains
Motif matches are often useful to
indicate functional sites, however
  • They do not give a clear picture of the domain
    boundaries.
  • Lack sensitivity.

14
Identifying Protein Domains
  • Automatic methods
  • Fast, effective, deals with a lot of information.
  • Might fragment domain families.
  • Might cause fusion of domain families.
  • Manual methods
  • Knowledge of protein experts is put to use.
  • Slow, require a lot of manpower.

15
(No Transcript)
16
SMART (Simple Modular Architecture Research
Tool)
  • Web-based resource used for
  • rapid annotation of protein domains.
  • analysis of domain architectures.

17
Domain Architecture
Protein PA-3427CG
Species Drosophila melanogaster
Protein ENSMUSP00000023109
Species Mus musculus
Protein ENSANGP00000009529
Species Anopheles gambiae
18
SMART (Simple Modular Architecture Research Tool)
  • There are over 600 domain families.
  • Provides information about
  • function .
  • subcellular localization.
  • phyletic distribution.
  • tertiary structure.
  • Based on HMMs (Hidden Markov Models).

19
SMART (Simple Modular Architecture Research Tool)
  • HMM based on seed alignment.
  • Threshold values used to determine homology of
    domains.

20
SMART (Simple Modular Architecture Research Tool)
  • Alignments of proteins by
  • Minimize insertions/deletions in conserved
    alignment blocks.
  • Optimize amino acid property conservation.
  • Closing unnecessary gaps.
  • Gapped alignments prefered over ungapped ones
  • prediction of domain boundaries.
  • greater information content.
  • Alignment of entire structural domains.

21
(No Transcript)
22
(No Transcript)
23
PROSITE - database of protein families and
domains
  • Database of biologically significant sites and
    patterns. Contains 1,609 profiles.
  • Pattern conserved sequence of a few amino
    acids.
  • Identifies to which known family of proteins (if
    any) the new sequence belongs.
  • Used to determine the function of uncharacterized
    proteins translated from genomic or cDNA
    sequences.

24
PROSITE - database of protein families and domains
  • A protein too distant from any other to detect
    its resemblance by overall sequence alignment,
    can be classified according to a Pattern.
  • Patterns arise because of requirements of binding
    sites that impose very tight constraint on the
    evolution of portions of the protein.

25
PROSITE how is a pattern developed ?
  • As short as possible.
  • Detects all/most sequences it describes.
  • As little false results as possible.

26
PROSITE how is a pattern developed ?
  • First study reviews on a protein family.
  • Then build alignment table with particular
  • attention to residues and regions important to
  • the biological function of that family.
  • - Enzyme catalytic sites.
  • Prostethic group attachment sites (heme).
  • Amino acids involved in binding a metal ion.
  • Cysteines involved in disulfide bonds.
  • - Regions involved in binding a molecule
    (ADP/ATP, GDP/GTP, calcium, DNA, etc.) or another
    protein.

27
PROSITE steps in the development of a pattern
  • Finding a core pattern 4-5 biologically
    significant residues.
  • Test the pattern on a large database.
  • If lucky there is correlation in this region
    which indicates a good pattern.
  • Mostly, there is no correlation
  • Gradually increase the size of the pattern.
  • search over other patterns.

28
PROSITE An example
This pattern is small and would probably pick up
too many false positive results
  • ALRDFATHDDF
  • SMTAEATHDSI
  • ECDQAATHEAS

29
  • Patterns - small regions, high sequence
    similarity.
  • Profiles characterize a protein family or
    domain over its entire length.

30
(No Transcript)
31
Research Finding new domain familiesAutomatic
methods
  • The team started with 107 nuclear domains.
  • Using SMART - get all proteins with at least one
    of these domains, characterize their complete
    domain structure.
  • Regions not annotated using known SMART domain
    models were extracted with their domain context.

32
Finding new domain families Automatic methods
  • Grouping proteins by region similarity.
  • Finding homologs using PSI-BLAST on longest of
    every group (Threshold E-value
  • Finding domain organization via SMART.
  • Homologous regions candidates for a novel
    domain family.

33
Finding new domain families
34
Finding new domain families Manual confirmation
  • Different context novel module family.
  • Proteins with nuclear AND extracellular domains
    excluded.
  • Multiple alignments and known locations of
    domains definition of domains borders.
  • Automatic searches to find more members, E-value
  • Marginal similarity to domain family possible
    divergent family.

35
Prediction of Function Chromatin-Binding Domains
  • Protein SPT6 containing CSZ domain, regulates
    transcription through a histone-binding
    capability.
  • It also contains two other types of domains,
    which are unlikely to bind histones.
  • Therefore it was predicted that CSZ domain has
    that function.

36
Research
  • Search of C-terminal by PSI-BLAST (E-valuefound UBX containing proteins and metazoan
    homologs of PNGases.
  • PNGases proteins involved in UPR.
  • UPR unfolded protein response.
  • PUG the homologous regions.
  • PUG domains found in proteins with
    domains central to ubiquitin- mediated
    proteolysis, (UBA and UBX).

37
  • Conclusion
  • PUG containing proteins might link the UPR to
    ubiquitin mediated protein degradation.

38
PUG
UBA
Believed to have a role in the UPR
39
(No Transcript)
40
ApoptosisUbx domain from human faf1
Dna binding proteinc-terminal uba domain of the
human homologue of rad23a (hhr23a)
41
  • Orthologs of PNGases in metazoan are present
    singly, (not in multiple paralogs) likely to
    have similar cellular localization.
  • The ortholog in Sacharaomyces cervisiae is known
    to be localized mainly in the nucleus.

42
  • HMM from the PUG marginal similarity to
    IRE1p-like Kinases which are known to initiate
    the UPR as well.
  • They suggest the presence of divergent PUG
    domains in the C termini of these Proteins.
  • Analysis revealed a conserved region in metazoan
    PNGases. Named it PAW. Put it in SMART.

43
  • The team found 28 novel nuclear domain families.
  • Most of them with representatives in diverse
    molecular context in different species.
  • Some specific to single species.
  • Others divergent members of previously
    recognized families.

44
The End
Write a Comment
User Comments (0)
About PowerShow.com