Title: Database Methods for Scientific Computing
1Database Methods for Scientific Computing
- David R. OHallaron
- Associate Professor of CS and ECE
- Carnegie Mellon University
- (joint work with Tiankai Tu and Julio Lopez)
2The Scientific Computing Process
Physical model
Simulation results
Mesh
Mesh generation
Visuali- zation
Solver
3The Euclid Project
- Goal Run large-scale physical simulations on
PCs with limited physical memory. - Approach Index and store the input and output
datasets in databases, and compute on the
databases directly. - Requires research at the intersection of
scientific computing, algorithms, databases, and
systems.
Mesh DBs
Physical model DB
Simulation results DB
Mesh generation
Visuali- zation
Solver
4David OHallaron, Jacobo Bielak, Omar Ghattas
(Carnegie Mellon) Jonathan Shewchuk (UC
Berkeley) Steven Day (SD State)
5Teora, Italy 1980
6San Fernando Valley
7San Fernando Valley (Top View)
Hard rock
epicenter
x
Soft soil
8San Fernando Valley (Side View)
Soft soil
Hard rock
9Node Distribution
10Partitioned Unstructured Mesh
nodes
element
11Simulation and Visualization
12Scientific Computing with Euclid
- Represent physical model, mesh, and simulation
results on disk in spatial database structures
called etrees (Euclid trees) - Linear octree indexed by standard Morton-based
locational codes. - Disk pages indexed by standard B-tree indexing
structure. - Perform entire process out-of-core by querying
and updating the etrees.
Mesh node and element etrees
Physical model etree
Simulation results etree
Mesh generation
Visuali- zation
Solver
13Octrees
Octree mesh generation
Balance requirement for meshes (2-to-1
constraint)
14Linear Octrees
y
8
7
h
m
6
5
4
e
g
b
j
l
3
d
f
2
a
c
i
k
1
0
x
B-tree index
3
2
4
7
8
5
6
0
1
m
B-tree Pages
15Addressing Linear Octree Elements
8
y
ds left-lower corner (2, 2)
7
h
m
6
Binary form (010, 010)
5
4
e
g
b
j
l
3
Interleave the bits to obtain Morton code
d
f
2
a
c
i
k
1
010
010
0
x
3
2
4
7
8
5
6
0
1
00 11 00
Morton code Maps n-dimensional points to
one-dimensional scalars Locational code Appends
an octants level to the Morton code of its
left-lower corner
Append level of d to obtain locational code
001100_11
16Nice Properties of Linear Octrees
m
h
e
g
b
j
l
d
f
a
c
i
k
An addressing scheme that clusters nearby
octants Finding an octant without knowing its
locational code The order imposed by the
locational code is the same as the preorder
traversal of leafs in octree
17Etree Mesh Generator
Application-specific input
element database
unbalanced octree
balanced octree
construct
transform
balance
etree library
etree library
etree library
node database
18Etree Library A Framework In C for Manipulating
Etrees on Disk
- Etree API Octant (insert) and octree (balance)
level operations. - Linear octree Well-known coding scheme to
assign keys to octants. - Auto navigation New algorithm for constructing
octree automatically. - Local balancing New algorithm to speed up
balancing operation. - B-tree Well-known DB indexing structure.
Application (e.g., construct, balance)
Etree API
Etree Library
Linear Octree
Auto Navigation
Local Balancing
B-Tree
19Mesh Element Etree
root
01
10
11
A
F
G
01
10
11
01
10
11
00
00
B
C
D
E
B-tree page (locational code keys)
0000_01 A
0100_10 B
0101_10 C
0110_10 D
0111_10 E
1000_01 F
1100_01 G
exact hit
aggregate hit
X0101_10
Y1010_10
KEY FACT Leaf nodes and aggregated nodes can be
located within a B-tree page with a fast binary
search, without traversing the edges of the
octree.
00
20Mesh Node Etree
n(4,4)
k(2,4)
j(1,4)
i(0,4)
c(0,3)
h(2,3)
e(1,3)
b(0,2)
m(4,2)
g(2,2)
d(1,2)
a(0,0)
f(2,0)
l(4,0)
B-tree leaf page 1 (Morton code keys)
B-tree leaf page 2 (Morton code keys)
21Auto Navigation
- Navigation octree
- Guided by an application function
- An in-memory pointer-based octree
- Dynamically grows in depth-first fashion
- Leaf octants are pruned and flushed to disk in
preorder (in increasing locational code order) - Appends the octants to the etree database to
avoid database search
Octants not yet processed (in memory)
Non-leaf octants being decomposed (in memory)
Leaf octants (flushed to database)
22Local Balancing
- Operational steps
- Partition the entire domain into equal-size
blocks - Perform internal balancing to enforce 2-to-1
constraint within each block (in a memory
resident blocking array) - Perform boundary balancing to resolve
interactions between adjacent blocks
Key Fact Interactions between adjacent blocks
are always absorbed by boundary octants and will
not be propagated into the blocks.
23Some Evaluation Questions
Is etree mesh generation feasible? How does
running time vary with the physical memory
size? What is the performance impact of auto
navigation? What is the performance impact of
local balancing?
24Evaluation Methodology
Used etree mesh generator to build family of
finite element meshes for San Fernando Valley
earthquake ground motion simulations.
Mesh Elements Nodes Slave nodes
SF10 7,940 12,118 4,432
SF5 76,330 105,886 34,858
SF2 1,838,524 2,213,035 407,336
SF1 13,579,124 15,097,365 1,649,855
SFx A mesh of the 50 km x 50 km x 12 km San
Fernando Valley that resolves seismic waves with
periods of at most x seconds.
25Evaluation Setup
All experiments conducted on a PIII 1GHz machine
running Linux 2.4.17. Machines physical memory
for the experiments ranged from 128 MB to 880
MB. Before each experiment, two 1.5 GB files were
sequentially scanned to ensure that the operating
systems buffer cache was flushed.
26Etree Feasibility
All experiments performed with 128 MB physical
memory
Mesh Elements DB size (MB) Time (sec) Thruput (elem/s)
SF10 7,940 2.5 40 199
SF5 76,330 24 186 410
SF2 1,838,524 583 1,637 1,123
SF1 13,579,124 4,300 9,449 1,439
- Generating a mesh with 13.6 million elements and
of size 4.3 GB in 2.6 hours seems reasonable - The overall throughput increases with mesh size
27Impact of Physical Memory Size
- Memory size does not have a significant impact
on the running time - The etree method is not relying on the operating
systems internal caching mechanism to achieve
its performance
28Impact of Auto Navigation
- Reducing B-tree buffer size does not increase
the construction time - Auto navigation is not sensitive to B-tree
buffer size
29Impact of Local Balancing
- Achieves speedups ranging from 8 (SF1) to 28
(SF10) - Benefits from the one-time scan of the database
and the efficient array-based neighbor finding
algorithm
30Some Related Work
General octree algorithms Samet 90 Octree mesh
Shepard Geoges 91, Bern et al. 90, Young et al.
91, Wang99 Out-of-core octree solver method
Salmon 97 Linear quadtree Gargantini 82, Morton
66 Space filling curve Orenstein 84, Orenstein
86, Faloutsos Roseman 89 Large dataset
processing Freitag Loy 99, Seamons Winslett
96, Ferreira et al. 99, Kurc et al. 01,
Choudhary et al. 99, Parashar Browne 97
31Summary and Conclusions
- Euclid project aims to recast entire scientific
computing process in terms of database ops. - Incorporating existing database techniques
(linear octree and B-tree) with new algorithms
(auto navigation and local balancing) in a
unified framework (the etree) can deliver new
capabilities. - On the horizon
- Caching and prefetching for etree solver
- Remote access and derived value caching for
visualization - Parallell visualization system based on etrees
- Unstructured tetrahedral mesh generation using
R-trees.
32Etree API
Unix file I/O style, three levels of abstraction
Initialization and cleanup. e.g.,
etree_t etree_open(const char path, int flag,
)
Octant-level operations. e.g.,
int etree_insert(etree_t ep, location_t loc,
void value)
Octree-level operations. e.g.,
int etree_balance(etree_t ep, decom_t baldecom)