Computing the Tree of Life - PowerPoint PPT Presentation

About This Presentation

Title:

Computing the Tree of Life

Description:

... the Tree of the Life Website, University of Arizona. Orangutan. Gorilla. Chimpanzee. Human. DNA Sequence Evolution. AAGACTT. TGGACTT. AAGGCCT -3 mil yrs -2 mil yrs ... – PowerPoint PPT presentation

Number of Views:35

Avg rating:3.0/5.0

Slides: 16

Provided by: tandyw

Learn more at: https://www.cs.utexas.edu

Category:

more less

Transcript and Presenter's Notes

Title: Computing the Tree of Life

1
Computing the Tree of Life

The University of Texas at Austin
Department of Computer Sciences
Tandy Warnow

2
Phylogeny
From the Tree of the Life Website,University of
Arizona
Orangutan
Human
Gorilla
Chimpanzee
3
DNA Sequence Evolution
4
Molecular Phylogenetics
U
V
W
X
Y
TAGCCCA
TAGACTT
TGCACAA
TGCGCTT
AGGGCAT
X
U
Y
(Tree is unrooted)
V
W
5
Phylogeny
From the Tree of the Life Website,University of
Arizona
Orangutan
Human
Gorilla
Chimpanzee
6
Evolution informs about everything in biology

Big genome sequencing projects just produce data
-- so what?
Evolutionary history relates all organisms and
genes, and helps us understand and predict
interactions between genes (genetic networks)
drug design
predicting functions of genes
influenza vaccine development
origins and spread of disease
origins and migrations of humans

7
Evolutionary trees and the pharmaceutical industry

Big genome sequencing projects just produce data
-- so what? Evolutionary history relates all
organisms and genes, and evolutionary trees are
used to make important biological discoveries.
The pharmaceutical industry uses phylogenies for
many applications, such as the development of
influenza vaccine!
Inaccuracies in the phylogenies lead to
inaccurate predictions (e.g., vaccines that dont
work, drugs that dont have the required
properties). Current software isnt accurate
enough, or fast enough!
This means !

8
NSFs program for Assembling the Tree of Life

The Tree of Life has proven useful in many
fields, such as
choosing experimental systems for biological
research,
determining which genes are common to many kinds
of organisms and which are unique,
tracking the origin and spread of emerging
diseases and their vectors,
bio-prospecting for pharmaceutical and
agrochemical products,
Developing databases for genetic information, and
evaluating risk factors for species conservation
and ecosystem restoration.

9
Computational challenges for Assembling the Tree
of Life (NSF)

8 million species for the Tree of Life -- cannot
currently analyze more than a few hundred (and
even this takes years)
We need new methods for inferring large
phylogenies - hard optimization problems!
We need new software for visualizing large trees
We need new database technology

10
We are world leaders in research in Computational
Phylogenetics

DCM-boosting for phylogeny reconstruction -
improves accuracy and speeds up heuristics for
NP-hard problems (Warnow, UT-Austin)
GRAPPA -- software for whole genome phylogeny
(Moret, UNM)
Visualization of large trees, and sets of trees
(Amenta, UC Davis)
Phylogenetic databases (Miranker)

11
DCM-boosting improves methodsNakhleh et al.
ISMB 2001

Random trees
K2PGamma model
Sequence length1000
Average branch length0.05

12
(Figure)Nakhleh et al. ISMB 2001

Random trees
K2PGamma model
Sequence length1000
Average branch length0.05

DCM-NJ
0.8
NJ
DCM-NJ MP
HGT-FP
0.6
Avg. RF
0.4
0.2
0
0
400
800
1600
1200
No. Taxa
13
DCM-boosting phylogenetic reconstruction
methodsNakhleh et al. ISMB 2001

DCM-boosting makes fast methods more accurate
DCM-boosting speeds-up heuristics for hard
optimization problems

0.8
NJ
DCM-NJ
0.6
Error Rate
0.4
0.2
0
0
400
800
1600
1200
No. Taxa
14
Whole-Genome Phylogenetics
15
Benchmark gene order dataset Campanulaceae

12 genomes 1 outgroup (Tobacco), 105 gene
segments
NP-hard optimization problems breakpoint and
inversion phylogenies
1997 BPAnalysis (Blanchette and Sankoff) 200
years (est.)

16
Benchmark gene order dataset Campanulaceae

12 genomes 1 outgroup (Tobacco), 105 gene
segments
NP-hard optimization problems breakpoint and
inversion phylogenies
1997 BPAnalysis (Blanchette and Sankoff) 200
years (est.)
2000 Using GRAPPA v1.1 on the 512-processor Los
Lobos Supercluster machine 2 minutes
(200,000-fold speedup per processor)

17
Benchmark gene order dataset Campanulaceae

12 genomes 1 outgroup (Tobacco), 105 gene
segments
NP-hard optimization problems breakpoint and
inversion phylogenies
1997 BPAnalysis (Blanchette and Sankoff) 200
years (est.)
2000 Using GRAPPA v1.1 on the 512-processor Los
Lobos Supercluster machine 2 minutes
(200,000-fold speedup per processor)
2003 Using latest version of GRAPPA 2 minutes
on a single processor (1-billion-fold speedup per
processor)

18
GRAPPA (Genome Rearrangement Analysis under
Parsimony and other Phylogenetic Algorithms)

http//www.cs.unm.edu/moret/GRAPPA/
Heuristics for NP-hard optimization problems
Fast polynomial time distance-based methods
Contributors U. New Mexico,U. Texas at Austin,
Universitá di Bologna, Italy
Fastest and most accurate software for whole
genome phylogeny worldwide

19
Opportunities

New phylogenetic reconstruction software can
improve pharmaceutical RD (making more accurate
solutions achievable in hours or days, rather
than months or years)
Software for researchers is available as free
(open source), but users need the latest tools
now, with proper interfaces -- business
opportunity.

20
Participants and Funding

University of Texas Computer Scientists Warnow,
Dhillon, Hunt, and Miranker
University of Texas biologists Jansen,
Linder, and Hillis
Other institutions UNM, UC Davis, Central
Washington, CUNY, JGI
Funding Three NSF ITR grants, NSF Biocomplexity,
David and Lucile Packard Foundation

21
Phylolab, U. Texas
Please visit us at http//www.cs.utexas.edu/users/
phylo/

Write a Comment

User Comments (0)