Bioinformatics Tools

About This Presentation

Title:

Bioinformatics Tools

Description:

Bioinformatics Tools Protein Structure Homology Modeling: SWISS-MODEL http://swissmodel.expasy.org/ Submit a FASTA sequence (known or unknown) Swiss-Model conducts ... – PowerPoint PPT presentation

Number of Views:396

Avg rating:3.0/5.0

Slides: 49

Provided by: chemistry60

Learn more at: https://chemistry.gsu.edu

Category:

more less

Transcript and Presenter's Notes

Title: Bioinformatics Tools

1
Bioinformatics Tools
2
Bioinformatics

The use of computer science, mathematics, and
information theory to model and analyze
biological systems, especially systems involving
genetic material.
Things I can do with a computer to improve and
accelerate my work.

3
Applications of Bioinformatics

Manage and analyze data in very large databases
genetic info (DNA)
protein sequences, structures
Collections of scientific papers, experimental
results
Compare sequences and structures
Do similar sequences or folding indicate proteins
have similar functions?
Modeling and prediction
Predict 3D structure from known structures
(homology) or based on some computational
approach without modeling (ab initio)
Prediction of function from structure
Molecular mechanics/ molecular dynamics
Prediction of molecular interactions, docking
Perform energy minimization calculations
Predict useful mutations for protein engineering

4
Sources of Data

Sequence databases (EBI)
FASTA (sequence similarity)
http//www.ebi.ac.uk/Tools/fasta33/
SwissProt (database of protein sequences)
http//expasy.org/sprot/
3D structure database the RCSB PDB
http//www.rcsb.org/pdb/home/home.do

5
Sequence Analyses

Sequence Alignment
Single or Multiple Sequences
Motif or Pattern Search
Prediction of Secondary Structure
1E9NAPDBIDCHAINSEQUENCEMPKRGKKGAVAEDGDELRTEPEA
KKSKTAAKKNDKEAAGEGPALYEDPPDQKTSPSGKPATLKICSWNVDGLR
AWIKKKGLDWVKEEAPDILCLQETKCSENKLPAELQELPGLSHQYWLAL

6
Sequence Alignment

Usually first step in analysis of any
new/unidentified sequence is to perform
comparisons with sequence databases to find
existing homologues.
This might give you some idea of
How the protein might potentially fold
What other proteins it is related to
What its function might be
FASTA (http//www.ebi.ac.uk/Tools/fasta33/)
One of several web servers you can use for this.
Provides similarity search against protein
database.
Lets you select substitution matrix (BLOSUM50,
BLOSUM62, etc.) for search.
a substitution matrix describes the rate at which
one character in a sequence changes to other
character states over time.
One would use a higher numbered BLOSUM matrix for
aligning two closely related sequences and a
lower number for more divergent sequences.

7
Gaps In Sequence Alignments

When aligning sequences, score is affected by how
much penalty is assigned to gaps in sequence.
For larger gaps
Assumes greater evolutionary distance between
sequences
Probably should be assigned a higher penalty

ATCTTCAGTGTTTCCCCTGTTTTGCCC.ATTTAGTTCGCTC
ATCTTCAGTGTTTCCC
CTGTTTTGCCCGATTTAGTTCGCTC ATCTTCAGTGTTTCCCCTGTTTT
GCCC....................ATTTAGTTCGCTC

ATCTTCAGTGTTTCCCCTGTTTTGCCCGCCCCCCCC
CCCCCCCCCCCATTTAGTTCGCTC
?Smaller gap, smaller penalty
8
Other Sequence Databases

BLAST and PSI-BLAST also commonly used.
BLAST can be found at
http//www.ncbi.nlm.nih.gov/BLAST/
PSI-BLAST can be found at
http//blast.ncbi.nlm.nih.gov/Blast.cgi?PAGEProte
insPROGRAMblastpBLAST_PROGRAMSblastpPAGE_TYPE
BlastSearchSHOW_DEFAULTSon

9
BLAST Entry Window

Enter FASTA sequence or upload file
Choose your search set
Select Program

10
Results for mutant 1exr

Calmodulin At 1.68 Angstroms Resolution
Length148
Score 267 bits (683), Expect 2e-70,
Method Compositional matrix adjust.
Identities 142/153 (92), Positives 142/153
(92), Gaps 8/153 (5)
Query 1 AEQLTEEQIAEFKEAFALFDKDGDGTITTKELGTVMRS
LGQNPTEAELQDMINEVDADGN 60
AEQLTEEQIAEFKEAFALFDKDGDGTITTKELGTVMRS
LGQNPTEAELQDMINEVDADGN
Sbjct 1 AEQLTEEQIAEFKEAFALFDKDGDGTITTKELGTVMRS
LGQNPTEAELQDMINEVDADGN 60
Query 61 GTIDFPEFLSLMARKMKEQDDQFDQSEEELIEAFKVFD
RFFFGLISAAELRHV---LGEK 117
GTIDFPEFLSLMARKMKEQD
SEEELIEAFKVFDR GLISAAELRHV LGEK
Sbjct 61 GTIDFPEFLSLMARKMKEQD-----SEEELIEAFKVFD
RDGNGLISAAELRHVMTNLGEK 115
Query 118 LTDDEVDEMIREADIDGDGHINYEEFVRMMVSK
150
LTDDEVDEMIREADIDGDGHINYEEFVRMMVSK
Sbjct 116 LTDDEVDEMIREADIDGDGHINYEEFVRMMVSK
148

Deleted
Inserted
Mutated
11
Swiss-Prot/UniProt Database

Central hub for the collection of functional
information on proteins.
amino acid sequence
protein name or description
taxonomic data and citation information
Access point to ProSite
http//www.expasy.ch/tools/scanprosite/
Prosite identifies sequences and displays any
associated motifs, or accepts a motif and returns
related sequences.

12
What are Motifs?

A different approach for incorporating multiple
sequence information into a database search is to
use a Motif.
Motifs do not assign score at every position in
an alignment, but describes key residues that are
conserved and define the family. Sometimes this
is called a "signature".
Example of pseudo-EF-Hand motif (Calciomics
Pattern Search, developed in Dr. Yangs lab)
LMVITNF-FY-X(2)-YHIVF-SAITV-X(5,9)-LIMV-
X(3)-EDS-LFM-KRQL-X(20,28)-LQKF-DNG-X(1)
-DNSC-X(1)-DKN-X(4)-FY-X(1)-EKS
Specific residues can also be excluded by
enclosing in curly brackets DE

13
Multiple Sequence Alignment (MSA)

Alignments can provide information on
domain structure
location of residues likely to be involved in
protein function
Solvent exposure of residues
Evolutionary relationships
Build profiles for more sensitive searches
What you can do with this information
Create signatures for pattern searching
Identify conserved vs. variable regions
Identify structural and/or functional motifs

14
MSA with ClustalW
http//www.ebi.ac.uk/Tools/clustalw2/index.html
15
Ca-O-C angles for a) Non-EF-Hand and b) EF-Hand
16
Distribution of SC angles
S100
S100
1WDC C
2BL0 C
R2
S100
S100
Calbindin d9k
Calbindin d9k
Penta-EF
Penta-EF
Parvalbumin
Parvalbumin
Osteonectin
2HQ8
Parvalbumin
Parvalbumin
2H2K
S100
Parvalbumin
R1
Parvalbumin
Parvalbumin
Polcalcin
Polcalcin
Unrooted N-J Phylogenic Tree generated by Treeview
17
Distribution of MC angles
S100
S100
1WDC C
2BL0 C
R1
S100
S100
Calbindin d9k
Calbindin d9k
Penta-EF
Penta-EF
Parvalbumin
Parvalbumin
Osteonectin
2HQ8
Parvalbumin
Parvalbumin
2H2K
S100
Parvalbumin
Parvalbumin
Parvalbumin
Polcalcin
R2
Polcalcin
Unrooted N-J Phylogenic Tree generated by Treeview
18
Secondary Structure PDBSum

http//www.ebi.ac.uk/pdbsum/
Predicted 2 structure from sequence
Either enter PDB file or can load new/existing
sequence

19
Secondary Structure PDBSum
2oky
20
Protein Data Bank

http//www.rcsb.org/pdb/home/home.do
Comprehensive database of protein structures
Provides
3D structural data
Fasta sequence
Citation Info (who solved it, related
publications, etc.)
experimental methods (X-Ray Diffraction, NMR)
resolution
classification (e.g. metal transporter)
ligands, cofactors
Related PDB entries

21
PDB ATOM/HETATM Record Format
Data Record Partitioning
Occupancy Indicates frequency an atom is
detected in specific location. Where occupancy lt
1.00, x-ray diffraction indicates more than 1
position, i.e. there is flexibility or
disorder. B-Factor Thermal motion of atom. High
B-factor implies uncertainty. Text View of PDB
File
1-6 Record name "ATOM " or "HETATM
7-11 Atom serial number 13-14 Chemical
symbol (right justified) 18-20 Residue name
22 Chain identifier 23-26 Residue
sequence number 31-38 X- coordinate 39-46
Y- coordinate 47-54 Z- coordinate 55-60
Occupancy 61-66 Isotropic B-factor 77-78
Element symbol
ATOM 1 N ALA A 43 69.834 21.345
42.623 1.00 76.76 N ATOM 2 CA
ALA A 43 69.016 22.376 41.988 1.00 72.63
C ATOM 3 C ALA A 43
67.991 21.777 41.038 1.00 63.96 C
ATOM 4 O ALA A 43 66.942 22.368
40.784 1.00 56.68 O ATOM 5 CB
ALA A 43 69.924 23.339 41.198 1.00 72.97
C
22
Pymol Viewer
Can save session, including labels, angles,
distances, etc. These features can be turned on
or off without loss of data.
23
Proteomics Tools External tools to extract PDB
Data

http//bip.weizmann.ac.il/oca-bin/lpccsu/
LPC Analysis of interatomic Contacts in
Ligand-Protein complexes
CSU Analysis of interatomic contacts in protein
entries
OCA allows the user to rapidly search through the
contents of the entire PDB Archive for entries
obeying certain constraints
Ex. I want to find all proteins that have Zn2
bound to structure, deposited in PDB between
certain dates

24
Revising the PDB File

Adding Hydrogen Atoms (Required for using Delphi)
Reduce (http//kinemage.biochem.duke.edu/software/
reduce.php)
Runs on Mac, Linux, Windows
Free to download
Sybyl (http//www.tripos.com)
Runs on Linux
Not free
Calculating Electrostatic Potential
Delphi (http//wiki.c2b2.columbia.edu/honiglab_pub
lic/index.php/SoftwareDelPhi)
Runs on Mac, Linux, Windows (C and Fortran
Compilers reqd)
Free to download

25
Protein Structure Adding Hydrogen SYBYL

In addition to adding Hydrogen atoms to a PDB
file, Sybyl can be used to compare structures,
calculate RMSD values between structures, perform
minimization calculations.

26
Protein Structure Analysis

PONDR (Predictor of Naturally Disordered Regions)
(http//www.pondr.com/)
Internet-based
Not free
VADAR (Volume, Area, Dihedral Angle Reporter)
(http//redpoll.pharmacy.ualberta.ca/vadar/)
Internet-based
Free to use

Leigh Willard, Anuj Ranjan,Haiyan Zhang,Hassan
Monzavi, Robert F. Boyko, Brian D. Sykes, and
David S. Wishart "VADAR a web server for
quantitative evaluation of protein structure
quality" Nucleic Acids Res. 2003 July 1 31 (13)
3316.3319
27
Protein Structure Analysis PONDR
Use a series of neural network predictors (NNPs)
that use sequence data to predict disorder (i.e.
lack of fixed 3 structure) in a given region.
28
Protein Structure Analysis VADAR

A compilation of 15 algorithms for analyzing and
assessing peptide and protein structures from PDB
data.
Ramachandran plot
Shows possible conformations of phi and psi
angles for residues in a protein based on energy
considerations.
Very useful for determining whether model
structures are likely conformations
Disallowed regions involve steric clash (VDW
distances)

ß-sheet
LH a-helix
RH a-helix
http//www.bmb.uga.edu/wampler/tutorial/prot2.html
29
Visualizing Electrostatic Potential DelPhi and
Grasp

DelPhi
(http//wiki.c2b2.columbia.edu/honiglab_public/ind
ex.php/SoftwareDelPhi)
SGI Unix
Free to download
GRASP (Graphical Representation and Analysis of
Structural Properties)
(http//wiki.c2b2.columbia.edu/honiglab_public/ind
ex.php/SoftwareGRASP)
SGI Unix
Free to download

30
Visualizing Electrostatic Potential DelPhi and
Grasp
DelPhi takes as input a coordinate file format of
a molecule or equivalent data for geometrical
objects (PDB File) calculates electrostatic
potential in and around the system, using a
finite difference solution to the
Poisson-Boltzmann equation. Produces modified
PDB file and emap file as input to a 3rd party
visualization software (e.g. GRASP). GRASP then
displays and manipulates the surfaces of
molecules and their electrostatic properties.
31
Proteomics Tools GetArea 1.1
Total Area
Area by Residue

To quickly calculate solvent accessible surface
area or solvation energy of a protein molecule.
Ex. Is a proposed metal-binding site solvent
accessible?

http//pauli.utmb.edu/cgi-bin/get_a_form.tcl
32
Prediction and Design

Prediction of protein functional site
Prediction of protein structure
Design of protein functional site
Design of protein structure
Why prediction and design?

33
Protein Structure Prediction

Modeller
http//www.salilab.org/modeller/
Homology modeling
Tasser
http//zhang.bioinformatics.ku.edu/I-TASSER/
Treading
Rosetta
http//robetta.bakerlab.org/
Ab initio
CASP (many others)
http//predictioncenter.org/
A center providing objective testing of
prediction programs

34
Protein Structure Homology Modeling SWISS-MODEL

http//swissmodel.expasy.org/
Submit a FASTA sequence (known or unknown)
Swiss-Model conducts BLAST search to align
sequence with known structures
Build 3D output model that can be viewed using
DeepView (expasy)
Graphic file can be saved
Many other features including alignment modeling
with MSAs.

1EXR.pdb viewed using DeepView
35
Protein Structure Homology Modeling
PredictProtein
Similar to Swiss-Model, Modeller Requires
registration/login
36
Protein Structure Homology Modeling Modeller
37
Modeller
Šali and Blundell, JMB, 1993, Comparative protein
modeling by satisfaction of spatial restraints
38
TASSER
1. Find templates (seq. with known structure)
that share seq similarity (global or local) with
query seq. 2. Based on 1, query seq. is divided
into aligned segments (have template) and
unaligned segments. 3. Using Monte Carlo method
to connect the aligned segments 4. Outputs
(multiple possible structures) are clustered and
find structure obtained,
Zhang and Skolnick, PNAS, 2004. Automated
structure prediction of weakly homologous
proteins on a genomic scale
39
Rosetta
1. Construct a fragment library for each three
and nine residue The fragments are extracted
from observed structures in PDB. 2. Model the
structure of the fragments from the library 3.
Connect the fragments. 4. Rank the predicted
structures according to a scoring function.
40
Programs for Predicting Metal Binding Site

FEATURE
http//feature.stanford.edu/webfeature/
Machine learning (Bayesian method)
MUG
http//chemistry.gsu.edu/faculty/Yang/Calciomics.h
tm
Geometric search to predict calcium binding site
CHED
http//ligin.weizmann.ac.il/ched
Combine machine learning and geometric search to
predict zinc and other transition metal binding
sites.

41
FEATURE

1. Designed and tested their algorithm on
protein holo structures.
2. The protein structure is embedded into a 3D
grid.
3. Each grid point is evaluated by probability
scoring function (Wei and Altman)
4. The points of high score are the predicted
Ca2 location

Wei and Altman, Protein Science, 1998
42
Observation
A
B
C
D
filters
lt6.0Å
MUG
Wang, Kirberger, Qiu, Chen and Yang, Proteins,
2009
43
CHED

1. Use protein apo structures
2. Geometric search for a qualified triad of C,
H, E, D
3. Side-chain rotation of a unqualified triad
4. Apply filters to resulting qualified triad to
classify the triad as binding triad or
non-binding triad

d3
d2
d1
qualified
unqualified
qualified
output
binding/nonbinding triad
Babor et al., Proteins, 2008
44
Design Program

DEZYMER (Hellinga)
Given a ligand and a protein with known
structure, suggest residues to be mutated so that
the resulting protein binds the ligand.
ORBIT (Mayo)
Given a backbone structure, design a sequence
such that it folds to that backbone.
Rosetta (Baker)
One program to treat diverse problems
Prediction and design

45
DEZYMER
1. Define the expected binding geometry 2. Find
backbone places where if appropriate side chains
are added, the predefined geometry is
satisfied 3. Place the side chains and ligand,
and optimize there position 4. Repack residues
in positions other than binding residues. If
necessary, change residue type
Hellinga and Richards, JMB, 1991. Construction of
new ligand binding sites in protein of known
structure
46
ORBIT
1. Divide the target structure into three parts
core, surface and boundary 2. Core Ala, Val,
Leu, Ile, Phe, Tyr, Trp Surface Ala, Ser,
Thr, His, Asp, Asn, Glu, Gln, Lys, and Arg
Boundary union of the above two 3. 1.91027
possible sequence 4. Select best sequence
efficiently, using dead end elimination (DDE)
Solution structure of the designed protein.
Stereoview showing the best-fit superposition of
the 41
Comparison between the designed backbone
(averaged NMR structure, blue) and the target
backbone (red)
Dahiyat and Mayo, Science, 1997. De Novo Protein
Design Fully Automated Sequence Selection
47
Supplemental Slides
48
Calciomics

Calciomics is a specialized area of biochemistry
focusing on the study of calcium-binding
biological macromolecules and proteins to
understand the factors that contribute to
calcium-binding affinity and the selectivity of
proteins and calcium-dependent conformational
change.
http//lithium.gsu.edu/faculty/Yang/Calciomics.htm

Write a Comment

User Comments (0)