Computing the Tree of Life - PowerPoint PPT Presentation

About This Presentation
Title:

Computing the Tree of Life

Description:

... the Tree of the Life Website, University of Arizona. Orangutan. Gorilla. Chimpanzee. Human. DNA Sequence Evolution. AAGACTT. TGGACTT. AAGGCCT -3 mil yrs -2 mil yrs ... – PowerPoint PPT presentation

Number of Views:35
Avg rating:3.0/5.0
Slides: 16
Provided by: tandyw
Category:
Tags: computing | life | tree

less

Transcript and Presenter's Notes

Title: Computing the Tree of Life


1
Computing the Tree of Life
  • The University of Texas at Austin
  • Department of Computer Sciences
  • Tandy Warnow

2
Phylogeny
From the Tree of the Life Website,University of
Arizona
Orangutan
Human
Gorilla
Chimpanzee
3
DNA Sequence Evolution
4
Molecular Phylogenetics
U
V
W
X
Y
TAGCCCA
TAGACTT
TGCACAA
TGCGCTT
AGGGCAT
X
U
Y
(Tree is unrooted)
V
W
5
Phylogeny
From the Tree of the Life Website,University of
Arizona
Orangutan
Human
Gorilla
Chimpanzee
6
Evolution informs about everything in biology
  • Big genome sequencing projects just produce data
    -- so what?
  • Evolutionary history relates all organisms and
    genes, and helps us understand and predict
  • interactions between genes (genetic networks)
  • drug design
  • predicting functions of genes
  • influenza vaccine development
  • origins and spread of disease
  • origins and migrations of humans

7
Evolutionary trees and the pharmaceutical industry
  • Big genome sequencing projects just produce data
    -- so what? Evolutionary history relates all
    organisms and genes, and evolutionary trees are
    used to make important biological discoveries.
  • The pharmaceutical industry uses phylogenies for
    many applications, such as the development of
    influenza vaccine!
  • Inaccuracies in the phylogenies lead to
    inaccurate predictions (e.g., vaccines that dont
    work, drugs that dont have the required
    properties). Current software isnt accurate
    enough, or fast enough!
  • This means !

8
NSFs program for Assembling the Tree of Life
  • The Tree of Life has proven useful in many
    fields, such as
  • choosing experimental systems for biological
    research,
  • determining which genes are common to many kinds
    of organisms and which are unique,
  • tracking the origin and spread of emerging
    diseases and their vectors,
  • bio-prospecting for pharmaceutical and
    agrochemical products,
  • Developing databases for genetic information, and
    evaluating risk factors for species conservation
    and ecosystem restoration.

9
Computational challenges for Assembling the Tree
of Life (NSF)
  • 8 million species for the Tree of Life -- cannot
    currently analyze more than a few hundred (and
    even this takes years)
  • We need new methods for inferring large
    phylogenies - hard optimization problems!
  • We need new software for visualizing large trees
  • We need new database technology

10
We are world leaders in research in Computational
Phylogenetics
  • DCM-boosting for phylogeny reconstruction -
    improves accuracy and speeds up heuristics for
    NP-hard problems (Warnow, UT-Austin)
  • GRAPPA -- software for whole genome phylogeny
    (Moret, UNM)
  • Visualization of large trees, and sets of trees
    (Amenta, UC Davis)
  • Phylogenetic databases (Miranker)

11
DCM-boosting improves methodsNakhleh et al.
ISMB 2001
  • Random trees
  • K2PGamma model
  • Sequence length1000
  • Average branch length0.05

12
(Figure)Nakhleh et al. ISMB 2001
  • Random trees
  • K2PGamma model
  • Sequence length1000
  • Average branch length0.05

DCM-NJ
0.8
NJ
DCM-NJ MP
HGT-FP
0.6
Avg. RF
0.4
0.2
0
0
400
800
1600
1200
No. Taxa
13
DCM-boosting phylogenetic reconstruction
methodsNakhleh et al. ISMB 2001
  • DCM-boosting makes fast methods more accurate
  • DCM-boosting speeds-up heuristics for hard
    optimization problems

0.8
NJ
DCM-NJ
0.6
Error Rate
0.4
0.2
0
0
400
800
1600
1200
No. Taxa
14
Whole-Genome Phylogenetics
15
Benchmark gene order dataset Campanulaceae
  • 12 genomes 1 outgroup (Tobacco), 105 gene
    segments
  • NP-hard optimization problems breakpoint and
    inversion phylogenies
  • 1997 BPAnalysis (Blanchette and Sankoff) 200
    years (est.)

16
Benchmark gene order dataset Campanulaceae
  • 12 genomes 1 outgroup (Tobacco), 105 gene
    segments
  • NP-hard optimization problems breakpoint and
    inversion phylogenies
  • 1997 BPAnalysis (Blanchette and Sankoff) 200
    years (est.)
  • 2000 Using GRAPPA v1.1 on the 512-processor Los
    Lobos Supercluster machine 2 minutes
    (200,000-fold speedup per processor)

17
Benchmark gene order dataset Campanulaceae
  • 12 genomes 1 outgroup (Tobacco), 105 gene
    segments
  • NP-hard optimization problems breakpoint and
    inversion phylogenies
  • 1997 BPAnalysis (Blanchette and Sankoff) 200
    years (est.)
  • 2000 Using GRAPPA v1.1 on the 512-processor Los
    Lobos Supercluster machine 2 minutes
    (200,000-fold speedup per processor)
  • 2003 Using latest version of GRAPPA 2 minutes
    on a single processor (1-billion-fold speedup per
    processor)

18
GRAPPA (Genome Rearrangement Analysis under
Parsimony and other Phylogenetic Algorithms)
  • http//www.cs.unm.edu/moret/GRAPPA/
  • Heuristics for NP-hard optimization problems
  • Fast polynomial time distance-based methods
  • Contributors U. New Mexico,U. Texas at Austin,
    Universitá di Bologna, Italy
  • Fastest and most accurate software for whole
    genome phylogeny worldwide

19
Opportunities
  • New phylogenetic reconstruction software can
    improve pharmaceutical RD (making more accurate
    solutions achievable in hours or days, rather
    than months or years)
  • Software for researchers is available as free
    (open source), but users need the latest tools
    now, with proper interfaces -- business
    opportunity.

20
Participants and Funding
  • University of Texas Computer Scientists Warnow,
    Dhillon, Hunt, and Miranker
  • University of Texas biologists Jansen,
    Linder, and Hillis
  • Other institutions UNM, UC Davis, Central
    Washington, CUNY, JGI
  • Funding Three NSF ITR grants, NSF Biocomplexity,
    David and Lucile Packard Foundation

21
Phylolab, U. Texas
Please visit us at http//www.cs.utexas.edu/users/
phylo/
Write a Comment
User Comments (0)
About PowerShow.com