Title: Introduction to Bioinformatics
1Introduction to Bioinformatics
- Phylogenetics
- Part III
- Character-Based Methods
2Distance Methods
- Complexity
- Distance-based methods much faster than other
methods - Commonly used in multiple sequence alignment
- UPGMA PILEUP
- Neighbor-joining CLUSTALW http//www.ebi.ac.uk/c
lustalw/ - Problems
- Both UPGMA neighbor-joining are greedy
heuristics - Possible to be trapped in local maxima (no
backtracking) - Output is a single tree, even if many equal-cost
alternatives - Big Picture only
- May use as starting point
- Tree generated provides upper bound for
branch-and-bound - Initial tree for probabilistic branch-swapping
techniques - Other approaches?
3Character-Based Methods
- Maximum Parsimony (MP) Fitch 1971
- Minimize number of sequence changes in tree
- Assume fewest changes (mutations) most likely
(evolution) - Informative site
- Position with useful change information (for
parsimony) - I.e., of changes in position dependent on tree
chosen - Must have 2 different bases / residues, such
that each base / residue appears in 2 sequences
4MP Example
- What are the informative sites in this example?
- Build distance matrix
5(No Transcript)
6MP Example
- Most parsimonious tree
- Tree with fewest total of changes at
informative sites - Continue with our example
- Informative sites
- Seq1 GG
- Seq2 GT
- Seq3 AG
- Seq4 AT
- Site changed
- Tree 1 __
- Tree 2 __
- Tree 3 __
- Which tree?
7MP Method
- Algorithm
- Generate all possible tree topologies
- Count number of changes required
- Select tree with minimum changes
- Use branch-and-bound to reduce search
- Search trees with increasing of leaves
- Abandon subtree when changes best completed
tree - Characteristics
- Computationally expensive
- Analyze only informative sites
- Misleading if rates of changes vary among
branches - Evolution is not always parsimonious
8MP Method
- Can infer ancestors
- An internal node
- Intersection of two children, if it is not empty
- Union of two children, otherwise
- unions substitutions
(GT)
9Tree Construction Issues
- Selecting tree construction algorithm
- If strong sequence similarity ? maximum parsimony
- If clearly recognizable sequence similarity ?
distance methods - Otherwise ? maximum likelihood
- Determining statistical significance
- Multiple tree shapes possible
- Find probability that tree shape is as described
- Sample by bootstrapping Efron Tibshirani
1993 - Generate artificial data set by repeatedly
selecting random columns of alignment
(pseudo-alignment) with replacement - Build tree for pseudo-alignments many (1000)
times - Frequency phylogenetic feature appears ?
confidence level
10Tree with Bootstrap Values
Source http//fungal.genome.duke.edu/images/fungi
_subset_tree.jpg
11Phylogenetics Issues
- Gene trees vs species trees
- Gene duplication can complicate phylogenetic
analysis - Paralogues (duplicated genes) do not fit in
evolutionary tree - Choice of target sequence type
- Ribosomal RNA (slowest change / mutation rate)
- Use for very long-term evolutionary studies,
spanning species boundaries biological kingdoms - DNA / RNA (fastest change / mutation rate)
- Use for short-term studies of closely-related
species - Contains more evolutionary information than
protein - Protein (medium change / mutation rate)
- Use for wide species comparisons
- More reliable alignment than DNA
12Phylogenetics Summary
- Phylogenetic prediction
- Infer evolutionary relationships from shared
features - May have application to sequence alignment,
epidemiology - Phenotypic vs. genetic (i.e., molecular)
characteristics - Phylogenetic trees
- May be ultrametric and / or additive
- Tree construction
- Inexpensive distance-based (UPGMA,
neighbor-joining) - Expensive (exhaustive) tree searches (parsimony,
likelihood) - Assessing phylogenetic trees
- Algorithms always produce some tree (of varying
accuracy) - Expert biology knowledge to assess correctness /
significance
13Phenetics vs Cladistics
- Phenetics uses all the data
- Uses overall similarity to group taxa not
necessarily evolutionary - Any kind of object could be subjected to a
phenetic analysis - Taxa that are more similar are grouped together
- Cladistics uses only informative sites
- Taxa are grouped together based on patterns of
sharing of derived character states - Taxa sharing a derived character state do so
because they inherited this character state
through a common ancestor - Advantages of the cladistic approach
- Less susceptible to such rate variation
- Shared, derived character states won't mislead
you - Birds are dinosaurs cladistic perspective
14Applications Building Tree of Life
15Applications Building Tree of Life
Source http//www.isem.univ-montp2.fr/PPP/PM/RES/
Phylo/Mamm/PHYLMOL-Placentalia7EEnglish.jpg
16Application - CSI
- Which patients are more likely to be infected by
the dentist?
Source http//trc.ucdavis.edu/djbegun/Lect_12.1.h
tml
17Application -Human migration
- Based on mtDNA genome
- Africans have twice as much diversity among them
as do non-Africans ? Africans have a longer
genetic history - More recent population expansion for non-Africans
- Africans and non-Africans diverged recently
- Out of Africa
Source Discovering Genomics, Proteomics,
Bioinformatics, by Campbell Heyer
18Software
- http//evolution.genetics.washington.edu/phylip/so
ftware.html - 301 of the phylogeny packages
- 39 free servers