Structural Genomics and the Protein Folding Problem - PowerPoint PPT Presentation

1 / 44
About This Presentation
Title:

Structural Genomics and the Protein Folding Problem

Description:

Alex Bateman, Lachlan Coin, Richard Durbin, Robert D. Finn, Volker Hollich, Sam Griffiths-Jones, Ajay Khanna, Mhairi Marshall, Simon ... – PowerPoint PPT presentation

Number of Views:101
Avg rating:3.0/5.0
Slides: 45
Provided by: hassans
Category:

less

Transcript and Presenter's Notes

Title: Structural Genomics and the Protein Folding Problem


1
Structural Genomics and the Protein Folding
Problem
  • George N. Phillips, Jr.
  • University of Wisconsin-Madison
  • February 15, 2006

2
From DNA to biological function
Modeling Inference
Basic Understanding/ Applications (e.g.
therapeutics)
Gene Model
Functional Assignments
High-throughput DNA Sequencing
Structure Determination Experimental Analysis
3
Developing a gene model
Glimmer (Gene Locator and Interpolated Markov
ModelER) GlimmerHMM for eukaryotic genomes (more
advanced)
Genome sequencing Genome assembly Regulatory
elements Identification of ORFs
All but the simplest genomes are works in
progress. It is estimated that 80 of gene
models have errors at present! Comparative
genomics should help the process, as will
sequencing of expressed sequence tags and other
genomics projects
Efficient implementation of a generalized pair
hidden Markov model for comparative gene
finding. W.H. Majoros, M. Pertea, and S.L.
Salzberg. Bioinformatics 219 (2005), 1782-88.
4
The sequence-space of proteins
PSI-BLAST HMM
Pfam Many others
Universe of all protein sequences
HYSIELNASLLERGV HLNIEDNPSCNAMGV PLNIELNASLNEPGV
WERIELNASLNER-- HQRIEL--SLMMRG-
HLNIEDNPSCNAMGV PLNIELNASLNEPGV WERIELNASLNER--
HQRIEL--SLMMRG-
HLNIEDNPSCNAMGV PLNIELNASLNEPGV WERIELNASLNER--
HQRIEL--SLMMRG- HYSIELNASLLERGV HLNIEDNPSCNAMGV
PLNIELNASLNEPGV WERIELNASLNER-- HQRIELK-SLMMRG
-
HYSIELNASLLERGV HLNIEDNPSCNAMGV PLNIELNASLNEPGV
WERIELNASLNER-- HQRIEL--SLMMRG-
HYSIELNASLLERGV HLNIEDNPSCNAMGV WERIELNASLNER--
HQRIEL--SLMMRG-
5
PFAM domains
Alex Bateman, Lachlan Coin, Richard Durbin,
Robert D. Finn, Volker Hollich, Sam
Griffiths-Jones, Ajay Khanna, Mhairi Marshall,
Simon Moxon, Erik L. L. Sonnhammer, David J.
Studholme, Corin Yeats and Sean R. Eddym Nucleic
Acids Research(2004) Database Issue 32D138-D141
6
Flow of information from DNA to functional
understanding
Modeling Inference
Basic Understanding/ Applications (e.g.
therapeutics)
Gene Model
Functional Assignments
High-throughput DNA Sequencing
Structure Determination Experimental Analysis
7
X-ray Laboratory
8
Crystallography reveals locations of electron
clouds of the atoms And the polypeptide chain
can be traced through space
9
The fold-space of proteins
Scop Cath
Universe of all protein structures
10
Murzin et al. http//scop.mrc-lmb.cam.ac.uk/scop/
data/scop.b.html
11
Glimpes of the fold space of proteins
Hou, Sims, Zhang, and Kim, PNAS 1002386 (2003)
12
Flow of information from DNA to functional
understanding
Modeling Inference
Basic Understanding/ Applications (e.g.
therapeutics)
Gene Model
Functional Assignments
High-throughput DNA Sequencing
Structure Determination Experimental Analysis
13
Connections between sequence and structure
Universe of sequences
Universe of structures
14
Connections between sequence and structure
?
Universe of sequences
Universe of structures
15
At what level of homology can one trust a
structural inference?
Redfern, Orengo et al., J. Chromatography B
81597 (2005)
16
What is structural genomics?
  • Experimental determination of key structures
    (target selection is a key part of the idea)
  • Modeling of family members
  • Inferring function (note infer)
  • Making direct use of the new structures

17
Protein Sequences and Folds
  • 100,000 families of proteins that cannot be
    reliably modeled at present (modeling families
    structure)
  • 50 of all domain families can be assigned to a
    structure under CATH

18
Protein Structure Initiative (PSI)Mission
Statement
To make the three-dimensional atomic level
structures of most proteins easily available from
knowledge of their corresponding DNA sequences.
19
From John Norvell - NIH
20
Genseration of new structures
Chandonia and Brenner, Science 311347 2006.
21
Center for Eukaryotic Structural Genomics
  • Exclusively eukaryotic targets
  • 60 fold-space targets (emphasis on
    eukaryote-only families
  • 20 disease relevant
  • 20 outreach targets from the community
  • Overall goals are to reduce the costs of
    determining structures of proteins from
    eukaryotes by refining all steps in the pipeline
  • Supported by National Institutes of Health
  • John Markley- PI, George Phillips/Brian Fox
    Co-PIs

22
University of Wisconsins Center for Eukaryotic
Structural Genomics (75 total, 3/4 unique)
23
How does one clone, express, purify, and solve
structures not previously studied?An
industry-style pipeline
24
Pipeline details cell-based and cell-free
protein production for X-ray and NMR
Note project involves sequencing, which aids
gene modeling!
25
Sesameintegrated LIMS in use at CESG
Open access to the publicstructures, protocols,
reagents, progress http//www.uwstructuralgenomi
cs.org
Zolnai et al., J. Struct. Func. Genomics 411
(2003)
26
At1g18200
  • Mis-annotated prior to our work, but structure
    led to discovery of function.

27
Pfam B 13 and 136 matches to s 7198 and 11634
Alignment of GalP_UDP_transf vs
1Z84APDBIDCHAINSEQUENCE/15-196
-kkfsplDhvhrrynpLtlvwilVsphrakRPikqsqsLidlk
keLwq r p t w
sprakRP 1Z84APDB 15
GDSVENQSPELRKDPVTNRWVIFSPARAKRP----------------
45 gavetpkvptdplhdp.dcysakL
cpg........atratgevNPdyest
k p p pc c g r P
1Z84APDB 46 -TDFKSKSPQNPNPKPsSCP---FCIGreqeca
peLFRVP-DHDPNWKLR 90
yvLkspkkftndFyalseDnpyikvsvSNeaIaknplfqlksvrGhelci
n als
G 1Z84APDB 91
VI-------ENLYPALSRN---LETQ------------STQPETG--TSR
116 VI...CF......SKPehDptlp
alakeeirevvdaWqlcteelGyegre I
F S P h l i a
1Z84APDB 117 TIvgfGFhdvvieS-PVHSIQLSDIDPVGIGDI
LIAYKKRINQIA----- 160
nhpayqnvqIFEmNkGaemGcsnpHPYaYFnEHGQvwatsfiP h qF N Ga G s H H
Q a P 1Z84APDB 161
QHDSINYIQVFK-NQGASAGASMSHS------HSQMMALPVVP
196
http//www.sanger.ac.uk/Software/Pfam/
28
Blind prediction of structureCASP and At5g18200
29
Flow of information from DNA to functional
understanding
Modeling Inference
Basic Understanding/ Applications (e.g.
therapeutics)
Gene Model
Functional Assignments
High-throughput DNA Sequencing
Structure Determination Experimental Analysis
30
Function space of proteins
KEGG Kyoto Encyclopedia of Genes and
Genomes The Gene Ontology project (GO)
Metabolism
Cellular Processes
Enzymes
Signal Processing
Dont forget protein-protein interactions exist
also!
31
At2g17340
  • Related to a human protein associated with
    Hallervorden-Spatz syndrome, a neurological
    disorder?

32
Parallel Enzyme Activity Testing (Collaboration
with University of Toronto)
81 protein samples sent to Toronto 8 solved
CESG structures, 73 randomly chosen Generalized
assays for phosphatase, esterase,
phospodiesterase, protease, amino acid
dehydrogenase, alcohol dehydrogenase, organic
acid dehydrogenase, amino acid oxidase, alcohol
oxidase, organic acid oxidase, beta-lactamase,
beta-galactosidase, arylsulfatase,
lipase. Results - Solid hits 3 phosphatases,
5 esterases - Weaker hits 9 more esterases, 6
phosphodiesterases - No hits all others A.
Yakuknin et al. Current Opinion in Chemical
Biology, 842 (2004)
33
Target At2g17340/JR5670
Initial Assay Wide-spectrum
  • Absorbance 0.25 is a tentative signal, 0.5 is
    a strong signal.

34
Flow of information from DNA to functional
understanding
Modeling Inference
Basic Understanding/ Applications (e.g.
therapeutics)
Gene Model
Functional Assignments
High-throughput DNA Sequencing
Structure Determination Experimental Analysis
35
At2g17340
  • Enzyme of unknown specificity.

36
A functional annotation lesson
37
Functional Annotation by Inference
From raw DNA sequences, one looks for genomic
features such as promoters, alternative splicing
of mRNAs, retrotransposons, pseudogenes, tandem
duplications, synteny, and homology. It Is
homology, both from sequence and from structure,
that allow functional inferences to be
made. Prosite, Dali, VAST, FFAS03 Some tool
integrate knowledge from many sources into one
place, acting a meta-servers of clues.
38
Connections between structure and function
Universe of functions
Universe of structures
39
Connections between structure and function
Convergent evolution
Universe of functions
Universe of structures
40
Connections between structure and function
Divergent evolution
Universe of functions
Universe of structures
41
At1g18200
  • Misleading annotation prior to our work, but
    structure led to discovery of function.

42
Flow of information from DNA to functional
understanding
Modeling Inference
Basic Understanding/ Applications (e.g.
therapeutics)
Gene Model
Functional Assignments
High-throughput DNA Sequencing
Structure Determination Experimental Analysis
43
Summary
  • Structural genomics efforts are gaining momentum
    and helping to assign new functions to orfs and
    to fill in the space of all possible
  • protein folds.

44
The Center for Eukaryotic Structural
Genomics (supported by NIH GM64598 and GM074901)
Administration Madison (Primm,
Troestler, Markley, Phillips, Fox) Cloning/sequenc
ing pipeline Madison (Wrobel, Fox) Expression
pipeline Madison (Frederick, Fox, Riters) E.
coli cell growth pipeline Madison (Sreenath,
Burns, Seder, Fox) Cell-Free System Madison
(Vinarov, Markley, Newman) Protein purification
pipeline Madison (Vojtik, Phillips, Fox,
Ellefson, Jeon) Mass spectrometry Madison
(Aceti, Sabat, Sussman) Madison
NMRFAM (Song, Tyler, Cornilescu, Markley) NMR
spectroscopy Milwaukee MCW (Peterson, Volkman,
Lytle) Crystallization / crystallography
Madison (Bingman, Phillips, Bitto, Han, Bae,
Meske) Argonne (Advanced Photon
Source) Bioinformatics Madison (Bingman, Sun,
Phillips, Wesenberg) Indianapolis
(Dunker) Milwaukee MCW (Twigger, de la
Cruz) Computational support Madison (Bingman,
Ramirez, Phillips) Sesame Madison (Zolnai,
Markley, Lee)
Write a Comment
User Comments (0)
About PowerShow.com