Title: Simulation of the Heart and other Organs
1Simulation of the Heart and other Organs
- Kathy Yelick
- EECS Department
- U.C. Berkeley
2The 20 Year Vision
- Imagine a digital body double
- 3D image-based medical record
- Includes diagnostic, pathologic, and other
information - Used for
- Diagnosis
- Less invasive surgery-by-robot
- Experimental treatments
- Digital Human Effort
- Lead by the Federation of American Scientists
-
3Digital Human Today Imaging
- The Visible Human Project
- 18,000 digitized sections of the body
- Male 1mm sections, released in 1994
- Female .33mm sections, released in 1995
- Goals
- study of human anatomy
- testing medical imaging algorithms
- Current applications
- educational, diagnostic, treatment planning,
virtual reality, artistic, mathematical and
industrial - Used by gt 1,400 licensees in 42 countries
Image Source www.madsci.org
4Digital Human Roadmap
1 organ 1 model
multiple models
organ system
digital human
1995
2000
2005
2010
5Organ Simulation
- Lung transport
- Vanderbilt
- Cardiac flow
- NYU, UCB, UCD
- Cardiac cells/muscles
- SDSC, Auckland, UW, Utah,
- Kidney mesh generation
- Dartmouth
- Electrocardiography
- Johns Hopkins,
Skeletal mesh generation
Just a few of the efforts at understanding and
simulating parts of the human body
6Immersed Boundaries within the Body
- Fluid flow within the body is one of the major
challenges, e.g., - Blood through the heart
- Coagulation of platelets in clots
- Effect of sounds waves on the inner ear
- Movement of bacteria
- A key problem is modeling an elastic structure
immersed in a fluid - Irregular moving boundaries
- Wide range of scales
- Vary by structure, connectivity, viscosity,
external forced, internally-generated forces, etc.
7Heart Simulation
- Developed by Peskin and McQueen at NYU
- Ran on vector and shared memory machines
- 100 CPU hours on a Cray C90
- Models blood flow in the heart
- Immersed boundaries are individual muscle fibers
- Rules for contraction, valves, etc. included
- Applications
- Understanding structural abnormalities
- Evaluating artificial heart valves
- Eventually, artificial hearts
Source www.psc.org
8Platelet Coagulation
- Developed by Fogelson and Peskin
- Simulation of blood clotting in 2D
- Immersed boundaries are
- Cell walls, represented by polygons
- Artery walls
- Rules added to simulate adhesion
- For vector and shared memory machines
- We did earlier work on this 2D problem in Split-C
9Cochlea Simulation
- Simulation by Givelberg and Bunn
- In OpenMP
- 18 hours on HP Superdome
- Embedded 2D structures are
- Elastic membranes and shells
- Simulates fluid-structure interactions due to
incoming sound waves - Potential applications design of cochlear
implants
10Insect Flight Simulation
- Work by on insect flight
- Wings are immersed 2D structure
- Under development
- UW (Wang) and NYU (Miller)
- Applications to
- Insect robot design
Source Dickenson, UCB
11Small Animal Motion
- Simulation of small animal motion by Fauci,
Dillon and other - Swimming of eels, sperm, and bacteria
- Crawling motion of amoeba
- Biofilms, such as plaque, with multiple
micro-organisms - Applications at a smaller scale
- Molecular motors, fluctuations in DNA
- Thermal properties may become important
- Brownian motion extension by Kramer, RPI
12Other Applications
- The immersed boundary method has also been used,
or is being applied to - Flags and parachutes
- Flagella
- Embryo growth
- Valveless pumping (E. Jung)
- Paper making
- Whirling instability of an elastic filament (S.
Lim) - Flow in collapsible tubes (M. Rozar)
- Flapping of a flexible filament in a flowing soap
film (L. Zhu) - Deformation of red blood cells in shear flow
(Eggleon and Popel)
13Immersed Boundary Simulation Framework
Model Builder
Immersed Boundary Simulation
Visualization Data Analysis
Titanium Vector Machines Shared and Distributed
memory parallel machines PC Clusters
C/OpenGL Java3D workstation PC
C workstation
14Building 3D Models from Images
- Image data from
- visible human
- MRI
- Laboratory experiments
- Automatic construction
- Surface mesh
- Volume mesh
- John Sullivan et al, WPI
Image source John Sullivan et al, WPI
15Heart Structure Model
- Current model is based on three types of cones to
construct ventricals - Outer/Inner layer
- Right-Inner/Left-Outer
- Left Cylinder layer
- Advantages simple model
- Disadvantages unrealistic and time-consuming to
compute
16Old Heart Model
- Full structure shows cone shape
- Includes atria, ventricles, valves, and some
arteries - The rest of the circulatory system is modeled by
- sources inflow
- sinks outflow
17New Heart Model
- New model replaces the geodesics with
triangulated surfaces - Based on CT scans from a healthy human.
- Triangulated surface of left ventricle is shown
- Work by
- Peskin McQueen, NYU
- Paragios ODonnell, Siemens
- Setserr, Cleveland Clinic
18(No Transcript)
19(No Transcript)
20(No Transcript)
21(No Transcript)
22(No Transcript)
23(No Transcript)
24(No Transcript)
25(No Transcript)
26(No Transcript)
27(No Transcript)
28(No Transcript)
29(No Transcript)
30(No Transcript)
31(No Transcript)
32(No Transcript)
33Structure of the Ear
- The inner ear is a fluid-filled cavity containing
- cochlea
- semi-circular canals, responsible for balance
- The fluid is viscous and incompressible
Sound ? Cochlear Waves
Ossicles in middle ear malleus, incus, stapes
- Input is from the stapes knocking on the oval
window - The round window is covered by a membrane to
conserve volume
Cochlear canal
Ear drum
34Apical Turn of the Cochlea
35Schematic Description of the Cochlea
oval window
scala vestibuli
The cochlear partition
scala tympani
helicotrema
round window
0.15 mm
36Geometry of the Cochlea Model
Immersed Boundaries are 2D
37Immersed Boundary Equations
Navier Stokes
Force on fluid from material
Movement of material
38Immersed Boundary Method
- Compute the force f the immersed material applies
to the fluid. - Compute the force applied to the fluid grid
-
- Solve the Navier-Stokes equations
- Move the material
39Immersed Boundary Method
Combines Lagrangian and Eulerian Components
Navier-Stokes equations discretized on a
periodic rectangular 3-d grid
elastic fibers
viscous in-compressible fluid
Hookes spring law
Hookes spring law Discretized on a 1-d grid
Fourth order PDE discretized on a 2-d grid
40Immersed Boundary Method Structure
2D Dirac Delta Function
1.Material activation force calculation
Material Points
4. Interpolate move material
2. Spread Force
Interaction
3. Navier-Stokes Solver
Fluid Lattice
41Challenges to Parallelization
- Irregular material points need to interact with
regular fluid lattice - Efficient scatter-gather across processors
- Placement of materials across processors
- Locality store material points with underlying
fluid and with nearby material points - Load balance distribute points evenly
- Need a scalable fluid solver
- Currently based on 3D FFT
- Multigrid and AMR explored by others
42Current Parallel Algorithm
- Immersed materials are described by a hierarchy
of 1d or 2d arrays - Each resides on a single processor
- Use hierarchy in cochlea for work distribution
- 3D fluid grid uses a 1D distribution (slabs)
- Interactions between the fluid and material
requires inter-processor communication - Maintain data structures to aggregate
communication - Optimized for high overhead/latency networks
43Data Structures for Interaction
0 1 2 3
0 1 2 3
- Metadata and indexing overhead can be high
- Very little real work per fluid point
- Old method send entire bounding box faster than
exact set of points (former CS267 project) - New method Logical grid of 4x4x4 cubes used
- Communication aggregation also performed
44Computer Science Research Problems
- Communication performance
- Expose the best features of the network
- Load balancing and locality
- Divide work evenly while minimizing communication
- Programming language support
- Make these programs easier to write
- Compilers
- Compiling parallel shared memory code
- Architecture-specific tuning
- Self tuning code for key computational kernels
- Performance modeling and prediction
45Fluid Solver
- Incompressible fluid needs an elliptic solver
- High communication demand
- Information propagates across domain
- FFT-based solver divides domain into slabs
- Transposes before last direction of FFT
- Limits parallelism to n processors for n3 problem
- Former 267 project decompose slabs within SMP
node
1D FFTs
46Load Balancing On the Cray T3E
- Fluid grid is divided in slabs
- Communication required in fibers
- T3E had lightweight communication
Pizza cutter
Egg slicer
47Load Balancing Version 2
- Trend towards higher communication overheard
- Moores Law
- Commoditization
- Avoid splitting fibers
- Splitting leads to fine-grained communication
during spring force eval - Still split surfaces in the cochlea, but
aggregate communication
Figure shows fibers owned by one processor
48Load Balancing 3 Clever Fiber Splitting
- Can take advantage of linearity in spread
- Avoid communication across split fibers
- Instead split a point by replicating it
- Still a cost to splitting redundant work
- Cheaper than communication
49Load Balancing 4 Graph Partitioning
- Formalize as a graph partitioning problem
- Material/fiber points form graph nodes
- Edges defined by connectivity of material
- Also overlap in spread operation
- Use Metis graph partitioner
- Simultaneously optimizes for even partition and
minimum edge cuts
Amount overlap (4 - xk1 xk2) x (4 -
yk1 yk2) x (4 - zk1 zk2)
(xp1, yp1, zp1)
(xp2, yp2, zp2)
50Results of Fiber Splitting
15 faster than previous solution on modest of
processors
Figure shows partition on subset of fiber points
51Titanium Overview
- Object-oriented language based on Java with
- Scalable parallelism
- Single Program Multiple Data (SPMD) model of
parallelism, 1 thread per processor - Global address space
- Processors can read/write memory on other
processor - Pointer dereference can cause communication
- Intermediate point between message passing and
shared memory
52Language Support for Performance
- Multidimensional arrays
- Contiguous storage
- Support for sub-array operations without copying
- Support for small objects
- E.g., complex numbers
- Called immutables in Titanium
- Sometimes called value classes
- Semi-automatic memory management
- Create named regions for new and delete
- Avoids distributed garbage collection
- Optimizations on parallel code
- Communication and memory hierarchies
53Distributed Arrays
- Titanium allows construction of distributed
arrays in the shared Global Address Space
t0
t1
t2
54Domain Calculus and Array Copy
- Full power of Titanium arrays combined with PGAS
model - Titanium allows set operations on RectDomains
- // update overlapping ghost cells of neighboring
block - dataneighborPos.copy(myData.shrink(1))
- The copy is only done on intersection of array
RectDomains - Titanium also supports nonblocking array copy
intersection (copied area) fills in neighbors
ghost cells
non-ghost (shrunken) cells
mydata
dataneighborPos
ghost cells
55The Local Keyword and Compiler Optimizations
- Local keyword ensures that compiler statically
knows that data is local - double 3d myData (double 3d local)
datamyBlockPos - This allows the compiler to use more efficient
native pointers to reference the array - Avoid runtime check for local/remote
- Use more compact pointer representation
- Titanium optimizer can often automatically
propagate locality info using Local Qualifier
Inference (LQI)
56Immutable Classes
- For small objects, would sometimes prefer
- to avoid level of indirection and allocation
overhead - to pass by value (copying of entire object)
- especially when immutable (fields never modified)
- Extends idea of primitives to user-defined data
types - Example Complex number class
- immutable class Complex
- // Complex class is now unboxed
- public double real, imag
-
No assignment to fields outside of constructors
57Cross-Language Calls
- Titanium supports efficient calls to
kernels/libraries in other languages - no data copying required
- Example the FT benchmark calls the FFTW library
to perform the local 1D FFTs - This encourages
- shorter, cleaner, and more modular code
- the use of tested, highly-tuned libraries
58Are these features expressive?
- Compared line counts of timed, uncommented
portion of each program - MG and FT disparities mostly due to Ti domain
calculus and array copy - CG line counts are similar since Fortran version
is already compact
59FT Speedup
- All versions of the code use FFTW 2.1.5 for the
serial 1D FFTs - Nonblocking array copy allows for comp/comm
overlap - Max Mflops/proc
For Ti(bl) Ti(nbl)
G5 238 294 350
Opt. 203 268 318
60MG Speedup
61CG Speedup
62Case Study in Block-Structured AMR
(Wen and Colella)
- Adaptive Mesh Refinement (AMR) is challenging
- Irregular data accesses and control from
boundaries - Mixed global/local view is useful
63AMR in Titanium
- C/Fortran/MPI AMR
- Chombo package from LBNL
- Bulk-synchronous comm
- Manually pack boundary data
- Titanium AMR
- Entirely in Titanium
- Finer-grained communication
- No explicit pack/unpack code
- Automated in runtime system
Code Size in Lines Code Size in Lines Code Size in Lines
C/Fortran/MPI Titanium
AMR data Structures 35000 2000
AMR operations 6500 1200
Elliptic PDE solver 4200 1500
10X reduction in lines of code!
Somewhat more functionality in PDE part of
Chombo code
Elliptic PDE solver running time (secs) Elliptic PDE solver running time (secs) Elliptic PDE solver running time (secs)
PDE Solver Time (secs) C/Fortran/MPI Titanium
Serial 57 53
Parallel (28 procs) 113 126
Comparable running time
Work by Tong Wen and Philip Colella
Communication optimizations joint with Jimmy Su
64Use of AMR in Heart Simulation
- PhD thesis by Boyce Griffith at NYU
- Combines use of PETSc and SAMRAI in parallel
implementation - Note the Titanium version is not adaptive
Source Boyce E. Griffith, http//www.math.nyu.edu
/griffith/
65Software Architecture
Application Models
Heart (Titanium)
Cochlea (TitaniumC)
Flagellate Swimming
Generic Immersed Boundary Method (Titanium)
Extensible Simulation
Spectral (Titanium)
AMR
Solvers
- Can add new models by extending material points
- Uses Java inheritance for simplicity
66Immersed Boundary Simulation in Titanium
- Using Seaborg (Power3) at NERSC and DataStar
(Power4) at SDSC
Code Size in Lines Code Size in Lines
Fortran Titanium
8000 4000
Joint work with Ed Givelberg, Armando Solar-Lezama
67Performance Analysis
68Building a Performance Model
- Based on measurements/scaling of components
- FFT is time is
- 5nlogn flops flops/sec (measured for FFT)
- Other costs are linear in either material or
fluid points - Measure constants
- flops/point (independent machine or problem
size) - Flops/sec (measured per machine, per phase)
- Time is a b points
- Communication done similarly
- Find formula for message size as function of
problem size - Check the formula using tracing of some kind
- Use a/b model to predict running time a b
size
69A Performance Model
- 5123 in lt 1 second per timestep not possible
- Primarily limited by bisection bandwidth
70Simulation Results
71Traveling Wave in the Cochlea
4 successive wave snapshots
wave envelope
4
3
2
1
characteristic frequency location
wave envelope
stapes
helicotrema
Basilar Membrane
high frequencies
low frequencies
72Sound Wave Propagation in Cochlea
- Centerline of the Basilar membrane
- Response to 10 KHz frequency input
73Heart Simulation
- Animation of lower portion of the heart
Source www.psc.org
74Future Research in Immersed Boundaries
- Additional applications
- In Immersed Boundaries
- Variable timestepping
- Improved scalability (underway)
- More accurate methods
- Higher accuracy IB methods
- Adaptive Mesh Refinement
- Multi-physics models
- Incorporating diffusion, electrical models, etc.
- Needed for cardiac muscle contraction
- Needed for Outer Hair Cell in cochlea
75Collaborators
- Titanium Faculty
- Susan Graham
- Paul Hilfinger
- Alex Aiken
- Phillip Colella, LBNL
- Bebop faculty
- Jim Demmel
- Eun-Jin Im, Kookmin
- NYU IB Method
- Charlie Peskin
- Dave McQueen
- Students, Postdocs, Staff
- Christian Bell
- Wei Chen
- Greg Balls, SDSC
- Dan Bonachea
- Ed Givelberg
- Peter McQuorquodale, LBNL
- Tong Wen, LBNL
- Mike Welcome, LBNL
- Jason Duell, LBNL
- Paul Hargrove, LBNL
- Christian Bell
- Wei Chen
- Sabrina Merchant
- Kaushik Datta
- Dan Bonachea
- Rich Vuduc
- Amir Kamil
- Omair Kamil
- Ben Liblit
- Meling Ngo
- Geoff Pike
- Jimmy Su
- Siu Man Yau
- Shaoib Kamil
- Benjamin Lee
- Rajesh Nishtala
- Costin Iancu, LBNL
- David Gay, Intel
- Armando Solar-Lezama
- Hormozd Gahvari
http//titanium.cs.berkeley.edu/ http//upc.lbl.go
v http//bebop.cs.berkeley.edu
Primary researchers on the IB simulation
76Plate Simulation (Synthetic Problem)
- Embedded structures are
- Elastic membranes
- Shells
- Variable elasticity
- Capture shape of cochlea
- Shown here is a plate
- Implemented as a 267 project in S01
Source Titanium simulation on a T3E
77Torus Simulation (Synthetic Problem)
- Model problem based on torus simulation
- Uses fibers (1D immersed structures)
- Similar to the heart, but without full model
- Useful in
- Performance analysis
- Some validation
- Problem can be scaled relatively easily
- Was run on the Millennium cluster, and other
machines
Source Titanium simulation on Millennium
78The IB software package
- Basic objects
- Fluid
- MaterialType
- Grid, GridPoint, GridArray
- Control, MailBox
- Usage
- FluidWithSources extends IB.Fluid
- ShellPoint extends IB.GridPoint
- Shell extends IB.Grid
- MaterialType.register(ShellPoint sp, Shell s)
- Tissue extends IB.GridArray
- IB.Control.register(MyFluid, MyTissue)
- IB.Control.go(NumberOfTimeSteps)