Thor: a Fast, Distributed, Persistent Object System - PowerPoint PPT Presentation

About This Presentation
Title:

Thor: a Fast, Distributed, Persistent Object System

Description:

Thor: a Fast, Distributed, Persistent Object System Andrew Myers CS 632 Advanced Database Systems 22 Feb 01 Persistence Question : what is the right programming ... – PowerPoint PPT presentation

Number of Views:113
Avg rating:3.0/5.0
Slides: 34
Provided by: AndrewM110
Category:

less

Transcript and Presenter's Notes

Title: Thor: a Fast, Distributed, Persistent Object System


1
Thor a Fast, Distributed, Persistent Object
System
  • Andrew Myers
  • CS 632Advanced Database Systems
  • 22 Feb 01

2
Persistence
  • Question what is the right programming model
    for accessing persistent data? what is the right
    data model?
  • Current persistent data
  • Much data stored in relational databases
  • Much less data stored in object databases
  • Lots of data in flat files in file systems (every
    Windows machine in the world!)
  • Structure of data sometime encoded in directory
    structure, ( relations)
  • More often implicit in application code

3
Structuring Persistent Data
  • Huge amount of data are going into digital
    formats (e.g., digital libraries)
  • Defining suitable models for persistent data is
    important -- earlier the better
  • Models must be flexible, extensible
  • Should support safe sharing of data across
    applications, across distributed computing
    environment
  • Good performance also important

4
Impedance mismatch
  • Problem popular persistent data formats dont
    look much like popular programming language
    models
  • Persistent data no pointers, no object identity,
    weak referential integrity, no garbage
    collection, no type checking
  • Important only for volatile data?

5
Effect on Programs
  • Program reads file of persistent data
  • Creates convenient volatile in-memory data
    structures using parsing routines
  • Data manipulated in volatile form
  • Explicitly saved by converting back to
    persistent format (unparsing)
  • Extra parsing unparsing code with no support
    for correctness
  • No fine-grained, concurrent sharing
  • No pointers, no garbage collection

6
Orthogonal Persistence
  • Idea write application in any language you like
    (e.g., Java)
  • Objects manipulated by the program transparently
    persistent or volatile
  • Persistence defined by reachability from root,
    not by type or explicit annotation
  • Result persistence for free low-cost software
    development more robust code

7
Thor
  • Provides standard single-machine programming
    model, but supports distributed persistent data
    transparently
  • persistent objects with semantics
  • rich type system (Java)
  • referential integrity
  • garbage collection
  • distributed storage caching
  • sequential consistency -- hides concurrent
    access, failures
  • heterogeneous language support

8
Thor architecture
  • Front ends do computation, cache objects, provide
    application interface to persistence
  • Object repository (OR) provides persistent
    storage of objects

Client
Client
Client
FE
FE
FE
OR
OR
OR
9
Programming model
  • Each FE caches part of object universe
  • Objects automatically fetched as needed
  • 64 bit persistent object ids 32-bit in-memory
    ptrs
  • Safe languages supported Java, Theta

FE
(231 objects)
264 objects
10
Veneers
  • Applications may be in unsafe language (C, C)
  • Object operations invoked via veneer
  • automatically generated stubs for app language
  • Reflective object system
  • objects point to their own implementations
  • impls are objects in OR
  • can discover interfaces dynamically

Client (unsafe lang.)
shared-memorypipe
Veneer
FE (safe lang.)
ORs
11
Object References
surrogate object
(node marking)
Client
Client
cached object copies
unswizzled pointer
FE
FE
(edge marking)
intra-node reference (32 bit)
OR
OR
persistent objects
inter-node reference (64 bit via forwarding obj)
12
Transactions
  • Computation at FE is broken up into transactions
    separated by checkpoints
  • Transaction is committed atomically to
    participating ORs via two-phase commit

Client
FE
OR
OR
OR
13
Persistence by Reachability
  • An OR has a root object
  • always persistent
  • always reachable
  • a light-weight directory
  • Any object reachable from root becomes persistent
    at transaction commit
  • No explicit declaration of persistence needed
  • No type distinction between persistent and
    volatile objects orthogonal persistence

14
  • Convenient programming model,strong semantic
    guarantees,and high performance too?

15
Performance
  • Performance comparison OO7 benchmark
  • Most generally-accepted object-oriented database
    benchmark
  • Similar to a CAD database -- good model
  • mixture of very small and large objects (4W-32K)
  • various recursive traversals (w/ w/o
    modification) of complex pointer structure
  • must run in a fixed amount of memory (so that
    only fraction of database can fit in memory)

16
Implementation options
  • Relational database
  • Conventional file system with read/write
  • Conventional file system with memory-mapped files
  • Object-oriented database
  • Distributed object-oriented database (Thor)

17
Using Relational Database
15 levels
  • Problem relational database dont implement
    pointers (object references) efficiently
  • Must introduce extra keys, use index to find
    appropriate records extra storage, locality
    problems

18
Memory-mapped files
  • Memory-mapped files (mmap) avoids data
    duplication between application and OS file
    buffer cache
  • Buffer cache memory mapped directly into
    application VM
  • Conventional file I/O uses twice the memory can
    cache only half as much of persistent data in
    memory

Application
OS Kernel
Buffer cache
Volatile data
19
Relative Performance for OO7
?
Non-distributed Object databases
Memory-mapped files
Simple File I/O
Thor
Object-relational databases
Relational databases
20
Relative Performance
  • Object data in OO7 does not fit in memory ?
    fetches of persistent data into memory dominate
    performance
  • System with fewest fetches wins

21
OO7 in C, memory-mapped
  • C/OS application implementing OO7 benchmark
  • Objects in memory-mapped file
  • close( ) on file flushes memory to disk
  • Weak semantic guarantees
  • no concurrency control
  • no array bounds checks
  • no support for failure during write

22
Traversals
  • Sparse vs. dense traversals
  • dense traversals use every page of disk storage
    effectively (unrealistic) (91)
  • sparse traversal only touches a few objects on
    each page (3)
  • Realistic bound TN92 15-41 hit rate per page
  • Read-only vs. read-write traversals
  • read-write traversals accumulate changes that
    must be written back to disk

23
Thor vs. C/mmap (dense)
Thor
200
C/mmap
sec
150
18MB
100
50
T2a
T2b
24
Dense read-only traversal
25 speedup
200
sec
150
C/mmap
15 speedup
100
40 slowdown
50
Thor
10
20
30
40
50
FE cache size (MB)
25
Other traversals
  • C/OS does best on unrealistically dense
    traversals
  • Sparse traversals Thor has up to 1000 relative
    performance
  • C NFS server was given much more memory than
    Thor OR server (137MB vs. 36MB)

26
Conclusion
  • File systems are obsolete -- they provide
    sub-optimal performance and a even worse
    interface for programmers to write applications

27
Thor vs. Quickstore
  • Quickstore (commercial object-oriented database)
    has best published performance results for any
    OODB
  • Not a distributed system
  • Built on memory-mapped files -- uses page-based
    memory management

28
Results
  • Number of fetches
  • sparse dense
  • Thor 506 10.2k
  • Quickstore 610 13.2k
  • Thor has 21-25 fewer fetches
  • No Quickstore results for medium-sized
    traversals even more advantageous for Thor
  • Conclusion object caching beats page caching

29
Front end features
  • Object storage managed by Hybrid Adaptive Caching
    (HAC) algorithm
  • CLOCC optimistic concurrency control algorithm
    provides sequential consistency, best performance
  • Techniques may be applicable to more conventional
    databases

30
Object repository
  • Server cache speeds up client fetches
  • Modified Object Buffer (MOB)
  • keeps track of object mods separately from cache
  • defers writes until necessary
  • reduces installation reads, allows write
    absorption

Server
Page cache
read
Log
flusher
commit, abort
MOB
31
More OR features
  • Replicated ORs (log stability via replication)
  • Referential integrity
  • object mobility (multiple oids per object)
    supported through OR surrogate objects, lazy
    forwarding
  • no centralized location service
  • distributed GC algorithm collects cycles
    efficiently

FE
OR
OR
OR
32
Other Issues
  • Queries not directly supported in standard PLs or
    in Thor
  • can be coded using conventional data structures,
    but can high-performance queries be achieved?
  • may require moving code to data (function
    shipping) Thor model is data shipping
  • relational databases not obsolete
  • Schema evolution how to handle changes to
    software and data objects?
  • Disconnected operation/long transactions

33
Reading
  • Providing Persistent Objects in Distributed
    Systems (ECOOP 99)
  • Hybrid Adaptive Caching for Distributed Storage
    Systems (SOSP 97)
  • Safe and Efficient Sharing of Persistent Objects
    in Thor (SIGMOD 96)
  • The Language-Independent Interface of the Thor
    Persistent Object System, in Object-Oriented
    Multidatabase Systems
Write a Comment
User Comments (0)
About PowerShow.com