Title: BIOLOGO
1BIOLOGO
- A domain-specific language for morphogenesis.
- Trevor M. Cickovski
- December 9, 2004
2What is Morphogenesis?
- A stage in embryonic development
- Mesenchymal cells begin to cluster and form
patterns - Chemical secretion/resorption/diffusion
- Cell differentiation
- Cell growth
- Cell division
- Cell migration
3Chemical secretion/resorption/diffusion
- Single Dictyostelium cell undergoing chemotaxis
(Source DictyBase)
4Cell Growth and Division
- Division of a Dictyostelium cell (Source
DictyBase)
5Cell migration
- Migration of a Dictyostelium slug (Source
DictyBase)
6Example Avian Limb Bud
- Chick limb bud, after 3.5 days
- Source (Hinchliffe and Johnson, 1980)
7Avian Limb Stages
- Schematic Representation
- Forelimb Pattern Formation Order
- Humerus
- Radius/Ulna
- Carpals/Metacarpals
- Digits
8Example Dictyostelium Discoideum
Dictyostelium life cycle from 12h to 24h.
Multicellular slug body migrates towards a
chemical gradient to the right. Eventually the
slug culminates into a fruiting body, with a
spore, stalk and tip. (Source,
Dictyostelium-homepage zl-munich).
9Software Modeling of Morphogenesis
- CompuCell3D, a C framework for
three-dimensional simulation of morphogenesis - BIOLOGO, a domain-specific language for
morphogenesis, used to extend CompuCell3D
10Motivation For BIOLOGO
- The Problem That Arises
- Morphogenesis researchers may not have C
experience - CompuCell3D is extended through plugins, a C
design pattern - This poses a serious challenge for extensibility
- Namely, the real users wont be able to extend
it! - Thats because the plugin method for extending
CompuCell3D is too low-level for them if they
dont know C
11Solution
- Extend CompuCell3D through a domain-specific
language which we call BIOLOGO - Make syntax of BIOLOGO understandable to
morphogenesis researchers - Create a higher level of abstraction
- Allow representation of biological phenomena in a
more structured and straightforward manner - Generate plugin extensions for CompuCell3D,
readable and with no performance penalty
12Validation
- Generality
- We apply BIOLOGO to three biologically relevant
validation simulations. - Accuracy
- We compare the visualization of results using
BIOLOGO-extended CompuCell3D with visualization
using handcoded extensions, and verify they are
the same.
13Related Work Similar Domains, Different Design
Patterns
- Cell Programming Language (CPL) (Agarwal, 1994)
- Simulates morphogenetic cellular behavior
- Each cell is a program, different cell types run
different programs, represented as pixels - CPL instructions are mapped to Objective C
functions, we allocate objects dynamically - Nonscalable move, divide and grow algorithms
14Related Work Similar Domains, Different Design
Patterns
- CellML (Nelson, 2004)
- Also based on XML
- Very high flexibility by allowing
embedding/inclusion of other supported languages - For simulation of cellular/subcellular processes
- Cell cycle, cell electrophysiology, muscle
methods - Operates at a different level of modeling than
BIOLOGO
15Related Work Similar Design Patterns, Different
Domains
- SPIRAL (Puschel et al., 2004)
- Domain Signal Processing
- Implements a compiler for the DSL SPL
- Translates SPL source to C/Fortran source with
optimizations - TCE (Baurngartner et al., 2002)
- Domain Tensor Contractions (Chem/Phys)
- Compiler for a Mathematica-style DSL
- Runs generalized matrix multiplications
- Translates DSL source to Fortran source with
optimizations
16The CPM A Mathematical Model for Morphogenesis
- Cells represented in a 3D lattice
- Each unique cell given a different integer index,
indices stored in pixels - Extracellular matrix has 0 index
- Neighbors and levels (1-4) given for a pixel S
17Metropolis Algorithm in the CPM
- Choose a pixel at random
- Propose to change the pixel index to that of one
of its neighbors (index flip) - Execute the flip with Monte Carlo probability
based on the resulting energy from the flip
18CPM Energy Calculation
- Three terms
- results from adhesion between adjacent
pixels - results from cell deviation from their
target volume and surface area - results from cell chemotaxis or
haptotaxis to a secreted or diffusing chemical.
19CPM Energy Equations
20Computational Modeling Issues
- Software must be extensible, flexible and easy to
use, allowing - Extensible CPM Hamiltonians
- Cell Type Maps for various organisms
- Arbitrary number of superimposed chemical fields
- Large 3D CPM lattices
- Speed and memory usage concerns
21Miscellaneous Data
- Class Count
- CompuCell3D 134
- BIOLOGO 85
- Lines of code
- CompuCell3D 18347
- BIOLOGO 9197
- Plus external Xerces libraries (Apache, 2003)
22CompuCell3D Overview
23Customizing CompuCell3D
CompuCell3D defines a set of classes that
can be extended to add features to a simulation.
Steppables execute once per Monte Carlo step and
once before and after the main loop. They are
the main hooks for initialization and rendering.
Plugins load at runtime. They are the main way
of adding new features to CompuCell. They can be
Steppables, Steppers, CellChangeWatchers, or
Automatons.
CellChangeWatchers execute once per each
successful spin flip. They are useful for
adjusting values that depend on the number of
lattice points in a cell.
Steppers execute once per spin flip attempt.
They are the main hooks for energy functions.
Some simulation features, such as Renders are so
common that they are built into the system.
Automatons enable cell state to change their
state as the simulation evolves.
24What BIOLOGO Saves
- Requirements to add a new Cell Type Map to
CompuCell3D - Example, simulating a brand new organism with
different cell types and transitions - 5 C classes plus 1 for each transition, and
appropriate methods and implementation of rules
25What BIOLOGO Saves
- Requirements to add a new Hamiltonian to
CompuCell3D - Example, we want to add an energy due to cell
chemotaxis - 2 C classes including full implementation of
the energy calculations, plus any superimposed
fields
26Generated Code
- In the form of CompuCell3D plugins
- Plugins are the standard method for extending
CompuCell3D - Plugins are dynamically loaded libraries
- They are loaded upon reference in the
configuration file
27Three Validation Simulations
- Cell Sorting (Beysens et al., 2000)
- Basic CPM adhesion and volume
- No chemical energy or superimposed chemicals
- Avian Limb (Cickovski et al., 2004)
- Cells undergo haptotaxis with chemical
fibronectin - Domain grows in time
- Dictyostelium discoideum (Maree, 2000)
- Polarity within the slug
- Dynamic activator field
28Predefined BioLogo variables
- Reference Cellular Potts Model algorithm
- potts.cellfield 3D field of type cell. The
central Potts lattice. - pt The selected Potts pixel
- oldcell The selected Potts cell
- newcell The candidate Potts cell
29Validation Simulation Avian Limb (Cickovski et
al., 2004)
- Cells haptotax to secreted chemical fibronectin
- Secretion stimulated by exterior activator
- Cells can be one of two types Condensing and
NonCondensing - Condensing location of high activator
- Condensing more adhesive
30Validation Simulation Avian Limb Growth
(Cickovski et al., 2004)
31Validation Simulation Avian Limb Growth
(Cickovski et al., 2004)
ltcellmodel nameChickGrowthgt ltuseplugin
nameLimbChemical /gt ltcelltype
nameNonCondensinggt ltupdatecelltypesgt
ltchangeif currenttypeCondensing
conditionLimbChemical.
ReactionDiffusionpt.xpt.ypt.z less
LimbChemical.ActivatorThreshold /gt
lt/updatecelltypesgt lt/celltypegt
lt/cellmodelgt
32Validation Simulation Avian Limb Growth
(Cickovski et al., 2004)
ltcellmodel nameChickGrowthgt ltuseplugin
nameLimbChemical /gt ltcelltype
nameNonCondensinggt ltupdatecelltypesgt
ltchangeif currenttypeCondensing
conditionLimbChemical.
ReactionDiffusionpt.xpt.ypt.z less
LimbChemical.ActivatorThreshold /gt
lt/updatecelltypesgt lt/celltypegt ltcelltype
nameCondensinggt ltupdatecelltypesgt
ltchangeif currenttypeNonCondensing
conditionLimbChemical.R
eactionDiffusionpt.xpt.ypt.z greater
LimbChemical.ActivatorThreshold /gt
lt/updatecelltypesgt lt/celltypegt lt/cellmodelgt
33Validation Simulation Avian Limb Growth
(Cickovski et al., 2004)
Generated Code
34Validation Simulation Avian Limb Growth
(Cickovski et al., 2004)
Generated Code
35Validation Simulation Avian Limb Growth
(Cickovski et al., 2004)
Generated Code
36Validation Simulation Avian Limb Growth
(Cickovski et al., 2004)
Generated Code
37Validation Simulation Avian Limb Growth
(Cickovski et al., 2004)
Generated Code
38Validation Simulation Avian Limb Growth
(Cickovski et al., 2004)
39Validation Simulation Avian Limb Growth
(Cickovski et al., 2004)
ltHamiltonian nameLimbChemicalgt ltInput
nameActivatorThreshold typedouble /gt
ltInput nameLambda typeint /gt ltInput
nameFibroRate typedouble /gt
40Validation Simulation Avian Limb Growth
(Cickovski et al., 2004)
ltHamiltonian nameLimbChemicalgt ltInput
nameActivatorThreshold typedouble /gt
ltInput nameLambda typeint /gt ltInput
nameFibroRate typedouble /gt ltInput
nameConcentrationFile typefile
fieldnameReactionDiffusion fieldtypefloat
/gt ltField nameFibroField typefloat /gt
41Validation Simulation Avian Limb Growth
(Cickovski et al., 2004)
ltHamiltonian nameLimbChemicalgt ltInput
nameActivatorThreshold typedouble /gt
ltInput nameLambda typeint /gt ltInput
nameFibroRate typedouble /gt ltInput
nameConcentrationFile typefile
fieldnameReactionDiffusion fieldtypefloat
/gt ltField nameFibroField typefloat /gt
ltStepgt ltif conditionoldcell.type
notequal Mediumgt ltdeclaregtltfloat
namefibro valueFibroFieldpt.xpt.ypt.z
/gtlt/declaregt ltif conditionReactionDi
ffusionpt.xpt.ypt.z greaterequal
ActivatorThresholdgt ltcopy
toFibroFieldpt.xpt.ypt.z
fromfibroFibroRate /gt ltcopy
tofibro fromFibroFieldpt.xpt.ypt.z
/gt ltreturn valuefibroLambda
/gt lt/ifgt lt/ifgt
ltreturn value0 /gt lt/Stepgt lt/Hamiltoniangt
42Validation Simulation Avian Limb Growth
(Cickovski et al., 2004)
Instantiation in CompuCell3D Configuration File
ltPlugin NameLimbChemical
ltActivatorThresholdgt0.7lt/ActivatorThresholdgt
ltLambdagt10lt/Lambdagt ltFibroRategt0.01lt/FibroRate
gt ltConcentrationFilegtbnewSys123_71x31x281.datlt
/ConcentrationFilegt lt/Plugingt
43Validation Simulation Dictyostelium discoideum
(Maree, 2000)
- Dictyostelium discoideum is a type of slime mould
- Begins as unicellular
- Develops into a multicellular fruiting body
- Follows a Cyclic AMP (cAMP) gradient that leads
the slug to the soil surface - Marees simulation starts from this point
44Validation Simulation Dictyostelium discoideum
(Maree, 2000)
- Piecewise Puschino kinetics determine cAMP
gradient - This field is input from a file, but is dynamic
- Means We need a file read frequency!!!!
- Cell type map has three cell types
- Autocycling (tip)
- Prestalk (form an upper stalk, initially 40 of
body) - Prespore (form a lower spore, initially 60 of
body) - Simple map with no type transitions
45Validation Simulation Dictyostelium discoideum
(Maree, 2000)
ltHamiltonian nameDictyChemicalgt ltInput
nameLambda typeint /gt
lt/Hamiltoniangt
46Validation Simulation Dictyostelium discoideum
(Maree, 2000)
ltHamiltonian nameDictyChemicalgt ltInput
nameLambda typeint /gt ltInput
nameConcentrationFile typefile
fieldnameReactionDiffusion fieldtypefloat
frequencysteps /gt
lt/Hamiltoniangt
47Validation Simulation Dictyostelium discoideum
(Maree, 2000)
ltHamiltonian nameDictyChemicalgt ltInput
nameLambda typeint /gt ltInput
nameConcentrationFile typefile
fieldnameReactionDiffusion fieldtypefloat
frequencysteps /gt ltStepgt
ltif conditionoldcell.type equal
Autocyclinggt ltreturn
valueReactionDiffusionpt.xpt.ypt.z
Lambda /gt lt/ifgt lt/return
value0.0 /gt lt/Stepgt lt/Hamiltoniangt
48Validation Simulation Dictyostelium discoideum
(Maree, 2000)
Reference in CompuCell3D Configuration File
ltPlugin NameDictyChemicalgt
ltLambdagt200lt/Lambdagt ltConcentrationFilegtbincac
tivatorlt/ConcentrationFilegt
ltConcentrationFileFrequencygt100lt/ConcentrationFile
Frequencygt lt/Plugingt
49Validation Simulation Dictyostelium discoideum
(Maree, 2000)
50Implications
- Extension Core of CompuCell3D is untouched by
BIOLOGO - Modeling BIOLOGO is a modeling language it does
not need to be run multiple times for the same
features - Instantiation BIOLOGO generates extensions to
CompuCell3D that can be instantiated through the
CompuCell3D configuration file
51Implementation Details
- Framework consists of two modules
- Compiler
- Generates Intermediate File
- Code Generator
- Generates CompuCell3D Source
52Known DSL Design Patterns Used
- Source-To-Source Transformation
- Translation is done from BIOLOGO source to
CompuCell3D source - Data Structure
- Complex data structures (i.e. Cell Type Maps)
represented as simpler BIOLOGO cell models - Language Extension
- BIOLOGO extends XML
53Video Demo
54Future Work
- 1) Higher abstractions
- Have Hamiltonians use mathematical notation
- Make BIOLOGO a visual language
- Interface to a GUI (i.e., in Java)
- Potentially follow the example of Modeler
(Michel et al., 2001)
55Future Work
- 2) Web service
- Help for those who do not have built-in compilers
- Use compile farm
- Compile and create plugins on the server, client
downloads them and runs CompuCell3D
56Future Work
- 3) Management of Changes
- If a user runs BIOLOGO and wants to permanently
commit their new feature to CompuCell3D, this
could affect other users - Need a change manager
- Distributed plugin repository
57Future Work
- 4) Implement a hybrid XML and scripting DSL
- XML is not as ideal for computation, better for
structural modules - Computation can be done through scripting
- For example, embedding Lua or Python
- Pros More flexibility, less setup time
- Cons Interpreted
58Future Work
- 5) Intermediate-level optimizations (cf. Aho et
al., 1986) - Right now no performance penalty for generated
code, but this could ensure high efficiency - Examples
- Optimal algorithm selection
- Optimize code for CompuCell3D compiler on target
machine
59Future Work
- 6) Improve DSL performance by making BIOLOGO a
telescoping language (Kennedy et al., 2000) - Generate a custom compiler for intermediate
language with extensive knowledge of underlying
libraries on the target machine - Generates code with maximum efficiency
- Downside Creating custom compiler takes a long
time
60Publications
- 1) R. Chaturvedi, J. A. Izaguirre, C. Huang, T.
Cickovski, P. Virtue, G. Thomas, G. Forgacs, M.
Alber, S. A. Newman, and J. A. Glazier,
Multi-Model Simulations of Chicken Limb
Morphogenesis. Springer Verlag LNCS 2659,
Computational Science - ICCS 2003, International
conference Melbourne, Australia and St.
Petersburg, Russia, Part II, pp. 39-49, June
2003. - 2) J. A. Izaguirre, R. Chaturvedi, C. Huang, T.
Cickovski, J. Coffland, G. Thomas, G. Forgacs, M.
Alber, S. A. Newman, and J. A. Glazier,
CompuCell, A Multi-Model Framework for Simulation
of Morphogenesis. Bioinformatics,
20(7)1129-1137, 2004. - 3) T. Cickovski, T. Matthey and J. A. Izaguirre,
Design Patterns for Generic Object-Oriented
Software. University of Notre Dame Technical
Report 2004-29, November 2004.
61Submitted/In Progress
- 1) T. Cickovski, C. Huang, R. Chaturvedi, T.
Glimm, H. G. E. Hentschel, M. Alber, J. A.
Glazier, S. A. Newman, J. A. Izaguirre, A
Framework for Three-Dimensional Simulation of
Morphogenesis. Submitted to IEEE/ACM
Transactions on Computational Biology and
Biocomplexity, 2004. - 2) T. Cickovski and J. A. Izaguirre, BIOLOGO, A
Domain-Specific Language for Morphogenesis. In
progress, to be submitted to ACM Transactions on
Programming Languages and Systems.
62Conferences
- Already Attended
- Midwest Numerical Analysis Day (Northern Illinois
University, May 2003) - Biocomplexity IV (Indiana University, May 2003)
- Biocomplexity VI (Indiana University, May 2004)
- To Attend
- SIAM Computational Science and Engineering
(CSE05) (Orlando, February 2005)
63Acknowledgements
- Special thanks to the following
- Mark Alber, Kedar Aras, Brian Bien, Rajiv
Chaturvedi, David Cieslak, Joseph Coffland, James
A. Glazier, Tilmann Glimm, George Hentschel,
Chengbang Huang, Thierry Matthey, Chris Mueller,
Stuart A. Newman, Troy Raeder, Matthew Rissler
and Todd Schneider. - Last and most
- To my advisor, Jesus A. Izaguirre
- Funding
- My funding was provided by a Schmitt Fellowship
and NSF grants IBN-0083653 and IBN-0313730
64Supplementary Material Starts Here
- Slides Available Upon Request
- Cell Sorting Example
- More Syntax
- CompuCell3D Applied Patterns
- Generated C For Avian Limb Chemical Hamiltonian
- Miscellaneous Data
65Example Basic Cell Sorting
- Two different types of cells
- One type is very adhesive to other cells of the
same type - All cells are repelled by the medium
Source Beysens et al., 2000, from the
experiments of Steinberg et al., 1963 and 1998.
66Validation Simulation Cell Sorting (Beysens et
al., 2000)
- A basic model of cells of two different types
(Light and Dark), clustering. - Dark cells are highly adhesive
- Light cells are less adhesive
- All cells are repelled by the medium
67Validation Simulation Cell Sorting (Beysens et
al., 2000)
ltcellmodel nameCellSortgt ltdeclaregtltint
nameflag /gtlt/declaregt ltcelltype
nameLightgt ltcreationgt
ltcopy toflag from0 /gt lt/creationgt
ltupdatevariablesgt ltcopy
toflag from1 /gt lt/updatevariablesgt
lt/celltypegt ltcelltype nameDarkgt
ltupdatecelltypesgt ltchangeif
currenttypeLight
condition((flag equal 0) and (drand48()
greaterequal .5)) /gt lt/updatecelltypesgt
lt/celltypegt lt/cellmodelgt
68Validation Simulation I Cell Sorting (Beysens et
al., 2000)
- Ran on CompuCell3D, Visualized with Ogle
69Syntax Basic Statements
70Syntax Declare Stmt
- ltdeclaregtlttype name valuegtlt/declaregt
- Types (data members use . operator)
- boolean, int, float, double, char, string
- pixel
- Data members x, y, z
- cell
- Data members are cell state variables
- Defined by the cell model
71Syntax Expressions
- Infix notation
- Symbols
- , -, , /, , (, )
- greater, less, greaterequal, lessequal, equal,
notequal - - encompasses characters and strings
- Can enscript C functions as well
- A note Do minimally and with care right now,
BIOLOGO does not have complete built-in C
parsing. Future goal is to improve this.
72Syntax ForNeighbors
- ltforneighbors variable point grid
distance depth checkboundsgt - .. BIOLOGO Statements
- lt/forneighborsgt
- Function
- At each iteration select a neighbor to
ltpointgt within ltgridgt, store in ltvariablegt - Store distance between ltvariablegt and ltpointgt
in ltdistancegt - Loop until ltdistancegt hits ltdepthgt
- Implement neighbor bounds checking if
ltcheckboundsgt is true
73CompuCell3D Features/Patterns
- Allows different boundary conditions per axis
through the Strategy and Factory design patterns - Dynamic class nodes contiguously allocate all
attributes of a particular cell, reducing cache
misses and page faults - Singleton object for medium pixels
- Lazy pixel neighbor evaluation
- Factory pattern for cell object creation
74C Code Generation For Avian Limb Chemical
Hamiltonian
- BIOLOGO code
- ltHamiltonian nameLimbChemicalgt
- ltInput nameActivatorThreshold
typedouble /gt - ltInput nameLambda typeint /gt
- ltInput nameFibroRate typedouble /gt
- ltInput nameConcentrationFile typefile
fieldnameReactionDiffusion fieldtypefloat
/gt - ltField nameFibroField typefloat /gt
-
- ltStepgt
- ltif conditionoldcell.type notequal
Mediumgt - ltdeclaregtltfloat namefibro
valueFibroFieldpt.xpt.ypt.z /gtlt/declaregt - ltif conditionReactionDiffusionpt.x
pt.ypt.z greaterequal ActivatorThresholdgt - ltcopy toFibroFieldpt.xpt.yp
t.z fromfibroFibroRate /gt - ltcopy tofibro
fromFibroFieldpt.xpt.ypt.z /gt - ltreturn valuefibroLambda /gt
- lt/ifgt
- lt/ifgt
- ltreturn value0 /gt
75C Code Generation For Avian Limb Chemical
Hamiltonian (1)
76C Code Generation For Avian Limb Chemical
Hamiltonian (2)
77C Code Generation For Avian Limb Chemical
Hamiltonian (3)
78C Code Generation For Avian Limb Chemical
Hamiltonian (4a)
79C Code Generation For Avian Limb Chemical
Hamiltonian (4b)
80C Code Generation For Avian Limb Chemical
Hamiltonian (5a)
81C Code Generation For Avian Limb Chemical
Hamiltonian (5b)
82Miscellaneous Data