Title: Clientserver caching and object stores
1Client-server caching and object stores
- Benjamin Atkin
- batkin_at_cs.cornell.edu
2Client-server database design
- Low-level considerations
- How can database systems exploit powerful client
machines? - What implementation techniques are required?
- High-level considerations
- What interface is provided to applications?
- How can we efficiently implement it?
3Overview
- Client-server systems
- Advantages of caching
- Object-oriented databases
- Wisconsin's Exodus storage manager
- Cache consistency and transactions
- Implementation of programming interface
QuickStore
4Client-server systems
- Simplify the client machines
- Share services filesystem, database, ...
- Run on powerful, dedicated hosts
- User machines are "clients" of servers
- enables data sharing
- centralised maintenance
- greater security
- e.g. Suns Network File System
5Networks of workstations
- c. 1990 more powerful clients
- Move some processing to clients
- faster response time
- better utilisation of client machines
- less load on server
- greater scalability
- autonomy in the face of server failure
6Naive client-server data access
"read blue object"
...
"read blue object"
7Client-server caching
"read blue object"
...
"read blue object"
8Caching principles
- Analogous to hardware caching
- Server stores the canonical copy of data
- Client caches the results of each read
- Subsequent accesses served from cache
- What if the data changes? Alternatives
- "cheap to detect incorrect data", e.g. DNS
- "validate before use"
- "notify on change"
9Caching in distributed file systems
- CMU's Andrew File System
- clients cache all files on local disks
- 50 client machines for each server
- UC Berkeley's Sprite OS
- file cache completes with virtual memory
- Coda follow-up to AFS, UCLA's Ficus
- client can completely disconnect from server
- prediction algorithms to determine what to cache
10Disadvantages of caching
- Increases client workload, complexity
- We may cache the wrong data!
- potentially wasted network traffic
- uses valuable space in the cache
- Data consistency problem
- stale cached data
- simultaneous writeback
11Client-server caching revisited
Michael J. Franklin and Michael J. Carey
12Dividing the work
- Query shipping
- clients send queries to the server
- Data shipping
- clients request data from server
- transactions run locally
- potential for caching
13Why cache data?
- A client may
- read a data object repeatedly
- read and write an object
- execute multiple transactions on an object
- Cache an object and execute transactions locally
- Write back final value on commit
14Database client caching
client
server
begin transaction
read A
cache A
write A
end transaction
store A
begin transaction
read A
read B
...
15The downside
- Introduces a consistency problem
- Increases work at client
- Slower under some conditions
- Potentially higher abort rates ...
- ...
16Caching in EXODUS
- Small objects are grouped in fixed-size disk
pages - Caching and locking at the page level
- Client has buffer manager, lock manager
- FranklinLivny investigate the best strategy for
caching with transactions
17Alternatives for caching
- Intra-transaction versus inter-transaction
caching - Caching locks as well as data
- Local versus global locking
- Optimistic versus centralised locking
- Invalidation versus propagation of updates
18What to do on writeback?
begin transaction ... fetch blue object
19What to do on writeback?
commit transaction
propagate or invalidate?
?
20A taxonomy of strategies
- Primary-copy server 2PL
- Caching 2PL
- no lock caching, validate data before use
- Optimistic 2PL variants
- O2PL-Dynamic, O2PL-New Dynamic
- Callback locking
21Optimistic 2PL
- During transaction acquire local locks
- At commit, validate with server
- Propagation variant requires 2PC
- Dynamic variant's propagation heuristic
- page is resident at receiving site
- accessed since last propagation of page
- previously invalidated this page incorrectly
22Callback locking
- Global locks required during transaction
- On lock conflict, server callback to revoke other
locks - No validation required on commit
- CB-Read cache only read locks
- CB-All cache write locks as well, lock downgrade
on conflict
23Experiments
- Vary data access patterns
- Vary bottlenecks in the system
24HOTCOLD workload, slow network
25FEED workload,slow network
26HICON workload,fast network
27Summary
- CB-Read, O2PL-ND come out best
- CB-Read implemented in EXODUS
- lower abort rate than O2PL-ND
- scales better with data contention
- Natural consequences of the optimistic approach?
28QuickStore a high-performance mapped object store
Seth J. White and David J. DeWitt
29Object-oriented versus object-relational DBs
- Distributed application support
- Persistent store for program data
- Access through programming language (C), not
SQL - Transactions over objects
30The programming interface
- Application manipulates object identifiers (OIDs)
- "Swizzling" resolves OID to the object
- hardware swizzling OID is a pointer, use VM
manipulations to do mapping - software swizzling OID contains a pointer,
indirection
31Design alternatives
- WhiteDeWitt compares QuickStore, E
- QuickStore uses hardware swizzling
- E uses software swizzling, interpreter
- Both extend C, over EXODUS storage manager
(ESM) - All objects are accessed in transactions
32QuickStore structure
client
frame A
page a
buffer pool
object store
ESM
33Fine points of pointer swizzling
- Complication objects can contain pointers to
other objects - When a page is mapped to a frame
- if a pointer in page points to a mapped page,
make it point to the correct frame - otherwise, make it point to a new frame
- Use page protection to catch accesses to
non-mapped pages Unix mmap
34Page faults
- Page frames in memory have protection bits
read-only, no access, etc. - Incorrect access generates a "fault"
- Protection faults can be handled by the
application itself - In QS, reference to no-access frame gt bring the
page from the object server
35QuickStore page faults
client
0x180
?
fault
object store
36The mapping procedure
- Fault on a pointer dereference
- Request a page from the server
- Load into buffer pool
- Rewrite pointers in page
- Map buffer slot to required frame
37The ESM buffer manager
- Limited buffer pool space available
- Frame-to-page mapping may need to be removed to
reclaim a buffer slot - Modified clock algorithm for page replacement
38Optimisations
- Rewriting pointers is expensive
- Store pointers in disk pages
- Try to remap page to its previous frame
- Changing protection bits is expensive!
- Try and change many at a time
- Log optimisation with page diffs
39Hardware versus software swizzling
- Page-level swizzling obscures object identity
- Pointers to deleted objects are still valid
- Using VM pointers allows a more compact OID
representation
40Comparison the OO7 benchmark
- Parts database representative of "CAD/CAM/CASE
application" - Multiple possible database sizes
- Hierarchical structure, composite parts
- Benchmark operations specified
- traversals of parts tree
- queries which retrieve random parts
41Cold times, small database
42Hot times, small database
43Cold times, medium database
44QuickStore and E compared
- QuickStore is not necessarily better!
- E performs better with low locality
- Compact representation
- small database 6.6MB versus 10.5MB
- medium database 54.2MB versus 94.1MB
- Log optimisation reduces commit times