Title: Proteomics
1Proteomics
- Dr. Caroline Clower
- Assistant Professor of Chemistry
- Department of Natural Sciences
- Clayton State University
- ITFN 4800
- April 23, 2009
2Bioinformatics
Statistics
Evolutionary Biology
Genetics
Mathematics
Biochemistry/ Molecular Biology
Bioinformatics
Computer Science
Chemistry
Medicine
Physics
3Bioinformatics vs. Computational Biology
- Used interchangeably
- Actually different terms
- Distinction made by National Institutes of Health
- Bioinformatics
- Refers to the creation and advancement of
algorithms - Computational and statistical techniques to solve
problems arising from management and analysis of
biological data - Computational Biology
- Refers to hypothesis-driven investigation of a
specific problem using computers, using
experimental or simulated data to advance
scientific knowledge
4Questions and Answers
- Fundamental questions in science and medicine
- What are the evolutionary origins of this
protein? - What gene does this DNA sequence code for?
- What does this gene do?
- How does this enzyme work and what does it look
like? - When is this gene expressed?
- What genes are expressed before the onset of
cancer? - What drugs can be used to treat this disease?
- What mutations are responsible for this genetic
disorder? - Analytical tools
- Experimental (electrophoresis, spectroscopy,
etc.) - Computational (programs, databases, internet,
etc.) - No single comprehensive database exists for
accessing all the information needed to manage
data
5Applications
- Locate mutations responsible for genetic diseases
and disorders - Aids in treatment and diagnosis
- Pharmacogenomics
- Designer drugs and therapies
- Biotechnology
- Discover and exploit new enzymes
- Environmental clean-up
- Antibiotics and other chemotherapeutic agents
- Useful products
6The Omics
- Genomics
- DNA Sequence homology locations of genes and
functional sites phylogeny mapping infer
function - Transcriptomics
- mRNA sequence and structure
- Metabolomics
- Proteins and enzymatic pathways involved in cell
metabolism - Glycomics
- Carbohydrates of a cell
- Interactomics
- Complex interactions of protein networks in a
cell - Nutrigenomics
- Interactions between diet and genes
- Proteomics
7Introduction to Proteomics
- Proteome
- Sum total of an organisms proteins
- Essential to understanding how organisms work
(more so than genomics) - Characterization difficult
- Protein structure is complicated
- Chemical modification can occur
- Proteins must be isolated and purified
- Proteins must be crystallized
- Structure predictions (bioinformatics) are
difficult and often inaccurate
8Proteomics
- Large scale analysis of proteins
- Amino acid sequence and protein structure
- Named in 1995 development began in 1960s
- Macromolecules carry information
- Availability of computers
- Development of algorithms
- Information
- Protein sequences
- Primary sequence (SWISS-PROT and PIR databases)
- Direct submissions - protein sequencing
- Secondary sequence (GenPept and TrEMBL databases)
- Translations - putative proteins resulting from
modifying (i.e. intron splicing) nucleic acid
sequence - Predicted structure (many algorithms)
- Solved structure (Protein Data Bank)
9Protein Data Bank
- Archive of 3D structural data of biological
macromolecules - Based on experimental data
- Managed by the Research Collaboratory for
Structural Bioinformatics (RCSB) - Rutgers, UCSD, UW-Madison
- As of March 31, 2009 contained 56751 structures
10Applications of Information
- Structure reveals function
- Porins
- Enzymes
- Knowledge of structure and function allow other
applications - Pharmacotherapy, etc.
11Protein Structure and Proteomics
12Levels of Protein Structure
- Primary
- Linear sequence of amino acids
- Secondary
- Local structure certain motifs are common
- Tertiary
- Complete 3D shape
- Quaternary
- gt1 peptide chain
13Primary Structure
- Linear sequence of amino acids
- Ala-Glu-Val-Thr-Asp-Pro-Gly or AEVTDPG
- Cannot be predicted by any algorithm
- Experimental
- Protein sequencing
- First protein insulin
- Approach
- Denature protein
- Break into small segments
- Determine sequences of segments
- Animation
14Example
Margaret Dayhoffs FORTRAN programs
15Secondary Structure
- Regular repeating structure
- Helices (coil)
- Sheets (extended zig-zag)
- Turns (short loops)
- Rotation of backbone
- Only some allowed angles
- Ramachandran diagram
16Classification of Proteins by Secondary Structure
- Fibrous
- High composition of single secondary structure
- Strong and flexible
- Collagen (connective tissue, skin, tendons,
cartilage) - Triple helix
- Silk fibroin
- b-sheet
- a-Keratin (hair, wool, skin, nails)
- Coiled coil
- b-Keratin (scales, feathers)
- b-sheet
17Classification of Proteins by Secondary Structure
- Globular
- Majority of all proteins
- Contain several types of secondary structure
- Percentage of protein (on average)
- 31 a-helix
- 28 b-sheet
- 13 turns/bends
- 28 loops and random coil
- Compact spherical shapes
18Secondary Structure Prediction
- Based on observed frequencies in known structures
- Chou-Fasman algorithm
- P (a), P (b), P (turn)
- Probability of an AA participating in various
secondary structures - Uses a window of 6 residues
19Secondary Structure Prediction
- PELE Results from many different algorithms
- H a helix
- E b strand
- T b turn
- C random coil
- Algorithms listed to the right (initials indicate
authors) - JOI Joint prediction
- Assigns the structure using a "winner takes all"
procedure
20Tertiary Structure
- 3D structure (overall shape) of an entire
polypeptide - Interactions between secondary structural
components - No algorithm predicts the 3D shape with high
accuracy - Can predict cellular location and sequence motifs
- Experimentally determined by
- X-ray crystallography (85 of structures in the
PDB) - 2D Nuclear Magnetic Resonance (15)
- Computational modeling/Homology modeling (lt1)
21X-ray crystallography
- Precise positions of atoms
- Mathematical analysis of film
- Fourier transform
- Creates electron density map
- Combine with principles of chemical bonding to
create structure
222D-NMR NOESY
- Graphically displays atoms in close proximity
- Calculates structure
23Computational Modeling
- Sequence alignment to find homologous protein
- BLAST algorithm
- Searches for maximal local alignments
- Inserts gaps to optimize alignment
- Breaks sequence down into subsequences (words)
- Searches for those words in database
- Extends sequence on either side of word to look
for certain score (threshold)
24Scoring Alignments
- Scoring matrices based on similarity and
probability of substitutions/mutations - Example
- Alignment of mouse and crayfish trypsin
- Raw score 30
- Evaluate alignment with
- Alignment score (S)
- E score expected number of sequences that have
scores S that would be found randomly - Low value for E indicates search result is not
random and sequences are likely to be related
Mouse I V G G Y N C E E N S V P Y
Q 5 4 5 5 -3 2 -2 2 3 0 0 -1 6
10 4 Crayfish I V G G T D A V L G E
F P Y Q
25Protein Modeling Programs
- Investigate and manipulate structure
- Structure overlays and alignment
- Pharmacophores
- Drug design from protein structure
26Websites of Interest
- ExPASy proteomics tools (http//us.expasy.org/tool
s/) - NCBI (http//www.ncbi.nlm.nih.gov/)
- PDB (www.pdb.org)
- Protein Explorer (www.proteinexplorer.org)