Title: Sequence Based Analysis Tutorial
1Sequence Based Analysis Tutorial
- NIH Proteomics Workshop
- Lai-Su L. Yeh, Ph.D.
- Protein Science Team Lead
- Protein Information Resource at
- Georgetown University Medical Center
2Retrieval, Sequence Search Classification
Methods
- Retrieve protein info by text / UID
- Sequence Similarity Search
- BLAST, FASTA, Dynamic Programming
- Family Classification
- Patterns, Profiles, Hidden Markov Models,
Sequence Alignments, Neural Networks - Integrated Search and Classification System
3Sequence Similarity Search
- Based on Pair-Wise Comparisons
- Dynamic Programming Algorithms
- Global Similarity Needleman-Wunch
- Local Similarity Smith-Waterman
- Heuristic Algorithms
- FASTA Based on K-Tuples (2-Amino Acid)
- BLAST Triples of Conserved Amino Acids
- Gapped-BLAST Allow Gaps in Segment Pairs
- PHI-BLAST Pattern-Hit Initiated Search
- PSI-BLAST Position-Specific Iterated Search
4Sequence Similarity Search
- Similarity Search Parameters
- Scoring Matrices Based on Conserved Amino Acid
Substitution - Dayhoff Mutation Matrix, e.g., PAM250 (20
Identity) - Henikoff Matrix from Ungapped Alignments, e.g.,
BLOSUM 62 - Gap Penalty
- Search Time Comparisons
- Smith-Waterman 10 Min
- FASTA 2 Min
- BLAST 20 Sec
5Feature Representation
- Features Residue Physicochemical Properties,
Context (Local Global) Features, Evolutionary
Features - Alternative Alphabets Classification of Amino
Acids To Capture Different Features of Amino Acid
Residues
6Substitution Matrix
- Likelihood of One Amino Acid Mutated into Another
Over Evolutionary Time - Negative Score Unlikely to Happen (e.g.,
Gly/Trp, -7) - Positive Score Conservative Substitution (e.g.,
Lys/Arg, 3) - High Score for Identical Matches Rare Amino
Acids (e.g., Trp, Cys)
7BLAST
- BALST (Basic Local Alignment Search Tool)
- Extremely fast
- Robust
- Most frequently used
- It finds very short segment pairs (seeds)
between the query and the database sequence - These seeds are then extended in both directions
until the maximum possible score for extensions
of this particular seed is reached
8BLAST Search
- From BLAST Search Interface
- Table-Format Result with BLAST Output and SSEARCH
(Smith-Waterman) Pair-Wise Alignment
9BLAST/SSEARCH Results
10Family Classification Methods
- Based on Family Information
- ClustalW Multiple Sequence Alignment
- ProSite Pattern Search
- Profile Search
- Hidden Markov Models (HMMs)
- Neural Networks
- Integrated Analysis
11Multiple Sequence Alignment
- ClustalW
- Progressive Pairwise Approach
- Base on Exhaustive Pairwise Alignments
- Neighbor Joining
- Joining Order Corresponding to a Tree
- Alignment Varies
- Dependent on Joining Order
12How do you build a tree?
- Pick sequences to align
- Align them
- Verify the alignment
- Keep the parts that are aligned correctly
- Build and evaluate a phylogenetic tree
13Multiple Alignment and Tree
- From Text/Sequence Search Result or ClustalW
Alignment Interface
14(No Transcript)
15Motif Patterns (Regular Expressions)
- Signature Patterns for Functional Motifs
ProClass Motif Alignments
16PIR Pattern Search
- From Text/Sequence Search Result or Pattern
Search Interface - One Query Sequence Against PROSITE Pattern
Database - One Query Pattern (PROSITE or User-Defined)
Against Sequence DB
17Pattern Search Result (I)
- One Query Sequence Against PROSITE Pattern
Database
18Pattern Search Result (II)
- One Query Pattern Against Sequence Database
19Profile Method
- Profile A Table of Scores to Express Family
Consensus Derived from Multiple Sequence
Alignments - Num of Rows Num of Aligned Positions
- Each row contains a score for the alignment with
each possible residue. - Profile Searching
- Summation of Scores for Each Amino Acid Residue
along Query Sequence - Higher Match Values at Conserved Positions
20PIR HMM Domain/Motif Search
- From Text/Sequence Search Result or HMM Search
Interface - HMMER Model Building Sequence Search
- Search One Query Protein Against All HMMs
- Search One HMM Against Sequence DB
21HMM Search Result (I)
- One Query Protein Against All Pfam HMMs
22HMM Search Result (II)
- Search User-Built HMM Against Protein Sequence DB
- Input Sequences (Optional Residue Ranges) -gt
Multiple Sequence Alignment -gt Model Building -gt
HMM Search
23Secondary Structure Features
- a Helix Patterns of Hydrophobic Residue
Conservation Showing I, I3, I4, I7 Pattern Are
Highly Indicative of an a Helix (Amphipathic) - b Strands That Are Half Buried in the Protein
Core Will Tend to Have Hydrophobic Residues at
Positions I, I2, I4, I6
24Integrated Bioinformatics System for Function and
Pathway Discovery
- Data Integration
- Associative Analysis
25Analytical Pipeline
26Integrated Bioinformatics System
- Global Bioinformatics Analysis of 1000s of Genes
and Proteins -
- Pathway Discovery, Target Identification
27Lab Section
28Peptide Search Results
29Blast Similarity Search
30Blast Search Results
31Pair-Wise Alignment
32Multiple Sequence Alignment
33Pattern Search Results
34HMM Domain Search Result
35Building HMM Profile
36Using HMM Profile for Searching
37Rabbit Alpha Crystallin A Chain An iProClass View
of the entry
38alpha-Crystallin and Related Proteins