Bioinformatics Tools - PowerPoint PPT Presentation

1 / 48
About This Presentation
Title:

Bioinformatics Tools

Description:

Bioinformatics Tools Protein Structure Homology Modeling: SWISS-MODEL http://swissmodel.expasy.org/ Submit a FASTA sequence (known or unknown) Swiss-Model conducts ... – PowerPoint PPT presentation

Number of Views:396
Avg rating:3.0/5.0
Slides: 49
Provided by: chemistry60
Category:

less

Transcript and Presenter's Notes

Title: Bioinformatics Tools


1
Bioinformatics Tools
2
Bioinformatics
  • The use of computer science, mathematics, and
    information theory to model and analyze
    biological systems, especially systems involving
    genetic material.
  • Things I can do with a computer to improve and
    accelerate my work.

3
Applications of Bioinformatics
  • Manage and analyze data in very large databases
  • genetic info (DNA)
  • protein sequences, structures
  • Collections of scientific papers, experimental
    results
  • Compare sequences and structures
  • Do similar sequences or folding indicate proteins
    have similar functions?
  • Modeling and prediction
  • Predict 3D structure from known structures
    (homology) or based on some computational
    approach without modeling (ab initio)
  • Prediction of function from structure
  • Molecular mechanics/ molecular dynamics
  • Prediction of molecular interactions, docking
  • Perform energy minimization calculations
  • Predict useful mutations for protein engineering

4
Sources of Data
  • Sequence databases (EBI)
  • FASTA (sequence similarity)
  • http//www.ebi.ac.uk/Tools/fasta33/
  • SwissProt (database of protein sequences)
  • http//expasy.org/sprot/
  • 3D structure database the RCSB PDB
  • http//www.rcsb.org/pdb/home/home.do

5
Sequence Analyses
  • Sequence Alignment
  • Single or Multiple Sequences
  • Motif or Pattern Search
  • Prediction of Secondary Structure
  • 1E9NAPDBIDCHAINSEQUENCEMPKRGKKGAVAEDGDELRTEPEA
    KKSKTAAKKNDKEAAGEGPALYEDPPDQKTSPSGKPATLKICSWNVDGLR
    AWIKKKGLDWVKEEAPDILCLQETKCSENKLPAELQELPGLSHQYWLAL

6
Sequence Alignment
  • Usually first step in analysis of any
    new/unidentified sequence is to perform
    comparisons with sequence databases to find
    existing homologues.
  • This might give you some idea of
  • How the protein might potentially fold
  • What other proteins it is related to
  • What its function might be
  • FASTA (http//www.ebi.ac.uk/Tools/fasta33/)
  • One of several web servers you can use for this.
  • Provides similarity search against protein
    database.
  • Lets you select substitution matrix (BLOSUM50,
    BLOSUM62, etc.) for search.
  • a substitution matrix describes the rate at which
    one character in a sequence changes to other
    character states over time.
  • One would use a higher numbered BLOSUM matrix for
    aligning two closely related sequences and a
    lower number for more divergent sequences.

7
Gaps In Sequence Alignments
  • When aligning sequences, score is affected by how
    much penalty is assigned to gaps in sequence.
  • For larger gaps
  • Assumes greater evolutionary distance between
    sequences
  • Probably should be assigned a higher penalty

ATCTTCAGTGTTTCCCCTGTTTTGCCC.ATTTAGTTCGCTC
ATCTTCAGTGTTTCCC
CTGTTTTGCCCGATTTAGTTCGCTC ATCTTCAGTGTTTCCCCTGTTTT
GCCC....................ATTTAGTTCGCTC

ATCTTCAGTGTTTCCCCTGTTTTGCCCGCCCCCCCC
CCCCCCCCCCCATTTAGTTCGCTC
?Smaller gap, smaller penalty
8
Other Sequence Databases
  • BLAST and PSI-BLAST also commonly used.
  • BLAST can be found at
  • http//www.ncbi.nlm.nih.gov/BLAST/
  • PSI-BLAST can be found at
  • http//blast.ncbi.nlm.nih.gov/Blast.cgi?PAGEProte
    insPROGRAMblastpBLAST_PROGRAMSblastpPAGE_TYPE
    BlastSearchSHOW_DEFAULTSon

9
BLAST Entry Window
  • Enter FASTA sequence or upload file
  • Choose your search set
  • Select Program

10
Results for mutant 1exr
  • Calmodulin At 1.68 Angstroms Resolution
  • Length148
  • Score 267 bits (683), Expect 2e-70,
    Method Compositional matrix adjust.
  • Identities 142/153 (92), Positives 142/153
    (92), Gaps 8/153 (5)
  • Query 1 AEQLTEEQIAEFKEAFALFDKDGDGTITTKELGTVMRS
    LGQNPTEAELQDMINEVDADGN 60
  • AEQLTEEQIAEFKEAFALFDKDGDGTITTKELGTVMRS
    LGQNPTEAELQDMINEVDADGN
  • Sbjct 1 AEQLTEEQIAEFKEAFALFDKDGDGTITTKELGTVMRS
    LGQNPTEAELQDMINEVDADGN 60
  • Query 61 GTIDFPEFLSLMARKMKEQDDQFDQSEEELIEAFKVFD
    RFFFGLISAAELRHV---LGEK 117
  • GTIDFPEFLSLMARKMKEQD
    SEEELIEAFKVFDR GLISAAELRHV LGEK
  • Sbjct 61 GTIDFPEFLSLMARKMKEQD-----SEEELIEAFKVFD
    RDGNGLISAAELRHVMTNLGEK 115
  • Query 118 LTDDEVDEMIREADIDGDGHINYEEFVRMMVSK
    150
  • LTDDEVDEMIREADIDGDGHINYEEFVRMMVSK
  • Sbjct 116 LTDDEVDEMIREADIDGDGHINYEEFVRMMVSK
    148

Deleted
Inserted
Mutated
11
Swiss-Prot/UniProt Database
  • Central hub for the collection of functional
    information on proteins.
  • amino acid sequence
  • protein name or description
  • taxonomic data and citation information
  • Access point to ProSite
  • http//www.expasy.ch/tools/scanprosite/
  • Prosite identifies sequences and displays any
    associated motifs, or accepts a motif and returns
    related sequences.

12
What are Motifs?
  • A different approach for incorporating multiple
    sequence information into a database search is to
    use a Motif.
  • Motifs do not assign score at every position in
    an alignment, but describes key residues that are
    conserved and define the family. Sometimes this
    is called a "signature".
  • Example of pseudo-EF-Hand motif (Calciomics
    Pattern Search, developed in Dr. Yangs lab)
  • LMVITNF-FY-X(2)-YHIVF-SAITV-X(5,9)-LIMV-
    X(3)-EDS-LFM-KRQL-X(20,28)-LQKF-DNG-X(1)
    -DNSC-X(1)-DKN-X(4)-FY-X(1)-EKS
  • Specific residues can also be excluded by
    enclosing in curly brackets DE

13
Multiple Sequence Alignment (MSA)
  • Alignments can provide information on
  • domain structure
  • location of residues likely to be involved in
    protein function
  • Solvent exposure of residues
  • Evolutionary relationships
  • Build profiles for more sensitive searches
  • What you can do with this information
  • Create signatures for pattern searching
  • Identify conserved vs. variable regions
  • Identify structural and/or functional motifs

14
MSA with ClustalW
http//www.ebi.ac.uk/Tools/clustalw2/index.html
15
Ca-O-C angles for a) Non-EF-Hand and b) EF-Hand
16
Distribution of SC angles
S100
S100
1WDC C
2BL0 C
R2
S100
S100
Calbindin d9k
Calbindin d9k
Penta-EF
Penta-EF
Parvalbumin
Parvalbumin
Osteonectin
2HQ8
Parvalbumin
Parvalbumin
2H2K
S100
Parvalbumin
R1
Parvalbumin
Parvalbumin
Polcalcin
Polcalcin
Unrooted N-J Phylogenic Tree generated by Treeview
17
Distribution of MC angles
S100
S100
1WDC C
2BL0 C
R1
S100
S100
Calbindin d9k
Calbindin d9k
Penta-EF
Penta-EF
Parvalbumin
Parvalbumin
Osteonectin
2HQ8
Parvalbumin
Parvalbumin
2H2K
S100
Parvalbumin
Parvalbumin
Parvalbumin
Polcalcin
R2
Polcalcin
Unrooted N-J Phylogenic Tree generated by Treeview
18
Secondary Structure PDBSum
  • http//www.ebi.ac.uk/pdbsum/
  • Predicted 2 structure from sequence
  • Either enter PDB file or can load new/existing
    sequence

19
Secondary Structure PDBSum
2oky
20
Protein Data Bank
  • http//www.rcsb.org/pdb/home/home.do
  • Comprehensive database of protein structures
  • Provides
  • 3D structural data
  • Fasta sequence
  • Citation Info (who solved it, related
    publications, etc.)
  • experimental methods (X-Ray Diffraction, NMR)
  • resolution
  • classification (e.g. metal transporter)
  • ligands, cofactors
  • Related PDB entries

21
PDB ATOM/HETATM Record Format
Data Record Partitioning
Occupancy Indicates frequency an atom is
detected in specific location. Where occupancy lt
1.00, x-ray diffraction indicates more than 1
position, i.e. there is flexibility or
disorder. B-Factor Thermal motion of atom. High
B-factor implies uncertainty. Text View of PDB
File
1-6 Record name "ATOM " or "HETATM
7-11 Atom serial number 13-14 Chemical
symbol (right justified) 18-20 Residue name
22 Chain identifier 23-26 Residue
sequence number 31-38 X- coordinate 39-46
Y- coordinate 47-54 Z- coordinate 55-60
Occupancy 61-66 Isotropic B-factor 77-78
Element symbol
ATOM 1 N ALA A 43 69.834 21.345
42.623 1.00 76.76 N ATOM 2 CA
ALA A 43 69.016 22.376 41.988 1.00 72.63
C ATOM 3 C ALA A 43
67.991 21.777 41.038 1.00 63.96 C
ATOM 4 O ALA A 43 66.942 22.368
40.784 1.00 56.68 O ATOM 5 CB
ALA A 43 69.924 23.339 41.198 1.00 72.97
C
22
Pymol Viewer
Can save session, including labels, angles,
distances, etc. These features can be turned on
or off without loss of data.
23
Proteomics Tools External tools to extract PDB
Data
  • http//bip.weizmann.ac.il/oca-bin/lpccsu/
  • LPC Analysis of interatomic Contacts in
    Ligand-Protein complexes
  • CSU Analysis of interatomic contacts in protein
    entries
  • OCA allows the user to rapidly search through the
    contents of the entire PDB Archive for entries
    obeying certain constraints
  • Ex. I want to find all proteins that have Zn2
    bound to structure, deposited in PDB between
    certain dates

24
Revising the PDB File
  • Adding Hydrogen Atoms (Required for using Delphi)
  • Reduce (http//kinemage.biochem.duke.edu/software/
    reduce.php)
  • Runs on Mac, Linux, Windows
  • Free to download
  • Sybyl (http//www.tripos.com)
  • Runs on Linux
  • Not free
  • Calculating Electrostatic Potential
  • Delphi (http//wiki.c2b2.columbia.edu/honiglab_pub
    lic/index.php/SoftwareDelPhi)
  • Runs on Mac, Linux, Windows (C and Fortran
    Compilers reqd)
  • Free to download

25
Protein Structure Adding Hydrogen SYBYL
  • In addition to adding Hydrogen atoms to a PDB
    file, Sybyl can be used to compare structures,
    calculate RMSD values between structures, perform
    minimization calculations.

26
Protein Structure Analysis
  • PONDR (Predictor of Naturally Disordered Regions)
  • (http//www.pondr.com/)
  • Internet-based
  • Not free
  • VADAR (Volume, Area, Dihedral Angle Reporter)
  • (http//redpoll.pharmacy.ualberta.ca/vadar/)
  • Internet-based
  • Free to use

Leigh Willard, Anuj Ranjan,Haiyan Zhang,Hassan
Monzavi, Robert F. Boyko, Brian D. Sykes, and
David S. Wishart "VADAR a web server for
quantitative evaluation of protein structure
quality" Nucleic Acids Res. 2003 July 1 31 (13)
3316.3319
27
Protein Structure Analysis PONDR
Use a series of neural network predictors (NNPs)
that use sequence data to predict disorder (i.e.
lack of fixed 3 structure) in a given region.
28
Protein Structure Analysis VADAR
  • A compilation of 15 algorithms for analyzing and
    assessing peptide and protein structures from PDB
    data.
  • Ramachandran plot
  • Shows possible conformations of phi and psi
    angles for residues in a protein based on energy
    considerations.
  • Very useful for determining whether model
    structures are likely conformations
  • Disallowed regions involve steric clash (VDW
    distances)

ß-sheet
LH a-helix
RH a-helix
http//www.bmb.uga.edu/wampler/tutorial/prot2.html
29
Visualizing Electrostatic Potential DelPhi and
Grasp
  • DelPhi
  • (http//wiki.c2b2.columbia.edu/honiglab_public/ind
    ex.php/SoftwareDelPhi)
  • SGI Unix
  • Free to download
  • GRASP (Graphical Representation and Analysis of
    Structural Properties)
  • (http//wiki.c2b2.columbia.edu/honiglab_public/ind
    ex.php/SoftwareGRASP)
  • SGI Unix
  • Free to download

30
Visualizing Electrostatic Potential DelPhi and
Grasp
DelPhi takes as input a coordinate file format of
a molecule or equivalent data for geometrical
objects (PDB File) calculates electrostatic
potential in and around the system, using a
finite difference solution to the
Poisson-Boltzmann equation. Produces modified
PDB file and emap file as input to a 3rd party
visualization software (e.g. GRASP). GRASP then
displays and manipulates the surfaces of
molecules and their electrostatic properties.
31
Proteomics Tools GetArea 1.1
Total Area
Area by Residue
  • To quickly calculate solvent accessible surface
    area or solvation energy of a protein molecule.
  • Ex. Is a proposed metal-binding site solvent
    accessible?

http//pauli.utmb.edu/cgi-bin/get_a_form.tcl
32
Prediction and Design
  • Prediction of protein functional site
  • Prediction of protein structure
  • Design of protein functional site
  • Design of protein structure
  • Why prediction and design?

33
Protein Structure Prediction
  • Modeller
  • http//www.salilab.org/modeller/
  • Homology modeling
  • Tasser
  • http//zhang.bioinformatics.ku.edu/I-TASSER/
  • Treading
  • Rosetta
  • http//robetta.bakerlab.org/
  • Ab initio
  • CASP (many others)
  • http//predictioncenter.org/
  • A center providing objective testing of
    prediction programs

34
Protein Structure Homology Modeling SWISS-MODEL
  • http//swissmodel.expasy.org/
  • Submit a FASTA sequence (known or unknown)
  • Swiss-Model conducts BLAST search to align
    sequence with known structures
  • Build 3D output model that can be viewed using
    DeepView (expasy)
  • Graphic file can be saved
  • Many other features including alignment modeling
    with MSAs.

1EXR.pdb viewed using DeepView
35
Protein Structure Homology Modeling
PredictProtein
Similar to Swiss-Model, Modeller Requires
registration/login
36
Protein Structure Homology Modeling Modeller
37
Modeller
Šali and Blundell, JMB, 1993, Comparative protein
modeling by satisfaction of spatial restraints
38
TASSER
1. Find templates (seq. with known structure)
that share seq similarity (global or local) with
query seq. 2. Based on 1, query seq. is divided
into aligned segments (have template) and
unaligned segments. 3. Using Monte Carlo method
to connect the aligned segments 4. Outputs
(multiple possible structures) are clustered and
find structure obtained,
Zhang and Skolnick, PNAS, 2004. Automated
structure prediction of weakly homologous
proteins on a genomic scale
39
Rosetta
1. Construct a fragment library for each three
and nine residue The fragments are extracted
from observed structures in PDB. 2. Model the
structure of the fragments from the library 3.
Connect the fragments. 4. Rank the predicted
structures according to a scoring function.
40
Programs for Predicting Metal Binding Site
  • FEATURE
  • http//feature.stanford.edu/webfeature/
  • Machine learning (Bayesian method)
  • MUG
  • http//chemistry.gsu.edu/faculty/Yang/Calciomics.h
    tm
  • Geometric search to predict calcium binding site
  • CHED
  • http//ligin.weizmann.ac.il/ched
  • Combine machine learning and geometric search to
    predict zinc and other transition metal binding
    sites.

41
FEATURE
  • 1. Designed and tested their algorithm on
    protein holo structures.
  • 2. The protein structure is embedded into a 3D
    grid.
  • 3. Each grid point is evaluated by probability
    scoring function (Wei and Altman)
  • 4. The points of high score are the predicted
    Ca2 location

Wei and Altman, Protein Science, 1998
42
Observation
A
B
C
D
filters
lt6.0Å
MUG
Wang, Kirberger, Qiu, Chen and Yang, Proteins,
2009
43
CHED
  • 1. Use protein apo structures
  • 2. Geometric search for a qualified triad of C,
    H, E, D
  • 3. Side-chain rotation of a unqualified triad
  • 4. Apply filters to resulting qualified triad to
    classify the triad as binding triad or
    non-binding triad

d3
d2
d1
qualified
unqualified
qualified
output
binding/nonbinding triad
Babor et al., Proteins, 2008
44
Design Program
  • DEZYMER (Hellinga)
  • Given a ligand and a protein with known
    structure, suggest residues to be mutated so that
    the resulting protein binds the ligand.
  • ORBIT (Mayo)
  • Given a backbone structure, design a sequence
    such that it folds to that backbone.
  • Rosetta (Baker)
  • One program to treat diverse problems
  • Prediction and design

45
DEZYMER
1. Define the expected binding geometry 2. Find
backbone places where if appropriate side chains
are added, the predefined geometry is
satisfied 3. Place the side chains and ligand,
and optimize there position 4. Repack residues
in positions other than binding residues. If
necessary, change residue type
Hellinga and Richards, JMB, 1991. Construction of
new ligand binding sites in protein of known
structure
46
ORBIT
1. Divide the target structure into three parts
core, surface and boundary 2. Core Ala, Val,
Leu, Ile, Phe, Tyr, Trp Surface Ala, Ser,
Thr, His, Asp, Asn, Glu, Gln, Lys, and Arg
Boundary union of the above two 3. 1.91027
possible sequence 4. Select best sequence
efficiently, using dead end elimination (DDE)
Solution structure of the designed protein.
Stereoview showing the best-fit superposition of
the 41
Comparison between the designed backbone
(averaged NMR structure, blue) and the target
backbone (red)
Dahiyat and Mayo, Science, 1997. De Novo Protein
Design Fully Automated Sequence Selection
47
Supplemental Slides
48
Calciomics
  • Calciomics is a specialized area of biochemistry
    focusing on the study of calcium-binding
    biological macromolecules and proteins to
    understand the factors that contribute to
    calcium-binding affinity and the selectivity of
    proteins and calcium-dependent conformational
    change.
  • http//lithium.gsu.edu/faculty/Yang/Calciomics.htm
Write a Comment
User Comments (0)
About PowerShow.com