Title: RNAsim/CRIMSON Algorithm Benchmark Suite
1RNAsim/CRIMSON Algorithm Benchmark Suite
- U Penn Junhyong Kim, Sampath Kannan, Susan
Davidson, Steve Fisher, Sheng Guo - U Texas David Hillis, Lauren Meyers, Tracey
Heath, Derrick Zwickl - NC State Spencer Muse
- Florida State Mark Holder
- Yale Paul Turner
2Goal Develop validated datasets of sufficient
complexity and scale to realistically benchmark
latest tree algorithms
3Benchmark Infrastructure
Model Characterization
Simulators
Character Evolution Simulators
Taxon Sampling
Database
Tree Topology Simulators
Data Subset with Associated Subtree
- Others
- Tree/Char Combined
- Experimental Evolution
- Virtual Cell
- etc
Model Sampling
Format Translators
RNAsim
CRIMSON
PAUP, etc
4Benchmark Scheme
- Generate a very large dataset (gt106 positions)
over a very large tree (gt106 taxa) using various
models of evolution - Store the data in a database
- Retrieve subsets of the data by various sampling
schemes
5- RNA macro-evolution simulation (Sheng Guo, Lisan
Wang) - Incorporate 2ndary structure constraints,
incorporate indels, using a simulator based on
edit mutations. A set of edit operators are
implemented, such as stem edit, each of which
operate on evolving strings with a characteristic
wait time. Ancestral molecule is based on known
rRNA gene with putative known 2ndary structure.
Evolution of the 2ndary structure is tracked.
anc
delete stem pair
change base
initiate new stem
insert base
delete base
add stem pair
desc
6Fixation probability as a function of fitness
Parameters Neeffective population size ?
neutral mutation rate s fitness change
Neutral Advantageous(sgt0)/Deleterious(slt0) Comp
ensatory Mutation
7One-step mutation ensemble of a RNA
8Weaker Selection
9Calibration on Empirical Data
Simulated RNA
100 Eukaryotic ssRNA
10Example Pairwise Similarity of 1000 locally
optimal ML trees (MDS plot)
Empirical Data
RNAsim
ROSE
SeqGen
11CPU Time to reach local optimum (PAUP ML, TBR)
121 Million Leaves (Tracey Heath Birth-Death Model
with variable rates)20 Data Replicate Partition
Simulated and Stored at SDSC
13Crimson Stephen Fisher, Susan Davidson, Junhyong
Kim
- Facilitates the extraction of sub-trees from very
large phylogenetic trees. - Trees loaded into a shared database (Oracle or
MySQL) - Extensive tree sampling options
- Save query output to NEXUS or phylip files
- Include paup commands in query output files
- Comprehensive graphical dialogs
- Command line interface allowing python-like
scripting -
- Display trees with Walrus 3D Viewer
14Query Options
- Species Selection
- Select All
- Random Selection
- Select By Temporal Depth
- Same number of samples per sub-tree
- Weight sampling of sub-trees by number of leaves
- Select By Species Level
- Same number of samples per sub-tree
- Weight sampling of sub-trees by number of leaves
- Manual Selection
- Sequence Selection
- Select All
- Random Selection
- Manual Selection
15Depth Threshold Distribution
16Crimson Interface
17Current Benchmarking Effort
- Sample 1
- 10 leaves per sampled tree
- Repeat taxon sampling 40 times per replicate data
partition - Sample 2
- 100 leaves per sampled tree
- Repeat taxon sampling 30 times per replicate data
partition - Sample 3
- 1,000 leaves per sampled tree
- Repeat taxon sampling 20 times per replicate data
partition - Sample 4
- 10,000 leaves per sampled tree
- Repeat taxon sampling 10 times per replicate data
partition
18Algorithms (to be expanded)
- Neighbor Joining (paup)
- breaktiesrandom
- Parsimony (paup)
- set maxtrees200 increaseno
- hsearch timelimit432000
- contree all /strictno majruleyes
- RAxML (raxmlHPC)
- -f a
- - 100
- -m GTRGAMMA
19Benchmarking Stats
20Distribution of False Positive Edges
21Computational Difficulty of Dataset Versus
Accuracy
sec
hr
hr
22RAxML Computation Time (Heuristic) Over 30 Random
100-taxon Trees
Replicates
23- Thanks to
- Davidson, Susan
- Fisher, Steve
- Guo, Sheng
- Hillis, David
- Heath, Tracey
- Wang, Lisan
- Zhang, Yifeng
- Zwickl, Derrick
- Please Ask and Talk to
- Steve Fisher
- Sheng Guo
- Lisan Wang
Please See CRIMSON Demo by Steve Fisher