Title: Indexing Data Relationships
1Indexing Data Relationships
- Michael J. Franklin
- University of California, Berkeley
- RightOrder Inc.
2Overview
- Data relationships can be complex.
- Hierarchical views XML, LDAP,
- Semistructure dynamic schema
- ApproachEncode paths as tagged strings
- raw paths encode structure
- refined paths accelerate lookups
- Index strings in a highly-compact structure.
- Live on top of, next to or inside DBMS.
- Benefits
- Performance, Scalability Adaptivity
- Leverages mature DBMS technology
3Raw paths w/Designators
4Refined paths
- Optimize specific access paths
Find invoices where X sold to Y
Find invoices where X bought Y and Z
Find invoices where a buyer bought X, Y and Z
5Index Fabric
- An index structure for long strings.
- Provides fast lookups
- Handles long strings
- Ideal substrate for designated keys
- Based on Patricia tries
- Highly compressed string representation
- Cost in index independent of string length
- But, need to balance.
6Patricia tries
Indexes first point of difference between keys
greenbeans
greentea
D. R. Morrison. PATRICIA Practical algorithm
to retrieve information coded in alphanumeric.
J. ACM, 15 (1968) pp. 514-534
7Multiple Hierarchical Views
- Can store multiple permulations of relationships
- Find animals and the plants they eat
- Find plants and the animals that eat them
- Represent as a new set of keys
- Store data once using permutation records
8Example
a
b
a
w
o
c
b
a
c
c
9Example
a
b
a
w
o
c
b
a
c
c
a
b
10Balancing Patricia tries
11Balancing Patricia tries
Step 1 divide trie into blocks
12Balancing Patricia tries
Step 2 build another layer
g
e
Layer 1 Layer 0
13Balancing Patricia tries
Search for cash
greenbeans
g
e
Layer 1 Layer 0
14Balancing Patricia tries
Search for cash
0
g
c
g
2
2
e
a
w
r
e
2
t
grass
corn
cow
b
greenbeans
greenbeans
greentea
Layer 1 Layer 0
15Balancing Patricia tries
Search for cash
0
g
c
g
2
2
e
a
w
r
greenbeans
e
2
t
grass
corn
cow
b
greenbeans
greentea
Layer 1 Layer 0
16Balancing Patricia tries
17Performance
- Number of layers is small
- Fixed (small) space per key
- High branching factor per block
- Bushy, shallow tree
- Example
- 8 KB blocks
- 32 bit pointers 2 bytes for keys/structure
- 1000 pointers per block
- 3 layers for 1 billion pointers to data (10003)
- Upper layers are tiny (10 megabytes), in RAM
- Only layer 0 on disk
- Usually one index I/O per key lookup
Data
18Find publications by co-authors
10,000 queries
RDBMS Edge mapping
19Find publications by co-authors
10,000 queries
20Conclusion
- Index arbitrary relationships
- Encode as designated strings
- Relationships and structures can be complex
- Index many data access paths
- No need for DTD or pre-defined schema
- Index Fabric
- Special data structure for long keys
- High performance key lookups
- Supports designator encoding
21For more information
- technology_at_rightorder.com
- www.rightorder.com