Title: From Boxwood to Eclipse
1From Boxwood to Eclipse
2A Quick Overview of Boxwood
- Virtualized distributed storage that provides
high-level abstractions
An evolution path for distributed storage
Storage Applications
3A Quick Overview of Boxwood
- Virtualized distributed storage that provides
high-level abstractions
An evolution path for distributed storage
Storage Applications
Virtual Disk
4A Quick Overview of Boxwood
- Virtualized distributed storage that provides
high-level abstractions
An evolution path for distributed storage
Storage Applications
Tree
Table
List
5Boxwood Architecture
Storage Application
B-Tree
High-level Storage Abstractions
Chunk Store
Reliable Media
Replicated Logical Device
Magnetic Media
6Open Question
- What happens if the system experiences massive
failures? - Graceful degradation (or mitigating the
availability cliff)
7Availability Cliff
Availability
1
Graceful Degradation
0
Failures
More Severe
8Boxwood is Inherently NOT Gracefully Degradable
M
machine M
A
B
C
D
E
Paxos Service/Monitor Service
B-Tree
Dependencies are the main problem!
9Eclipse Gracefully Degradable Storage
Abstractions
Courtesy of Shirin Observatory and Science
Center, MIT http//web.mit.edu/taalebi/www/soscof
/solarEclipse/images/eclipse8_jpg.jpg
10Why Eclipse?
System View
current scenario
Failures
More Severe
11Degraded Availability
- A client might have a partial view of data
- A client might be allowed to perform a subset of
operations on data - We are NOT talking about graceful performance
degradation!
12Benefits of Graceful Degradation
- Seamless disaster recovery from massive permanent
failures - During massive transient failures or network
partitions, - Offer a partial system view until system heals
- Return to a consistent/complete state when system
heals
13Eclipse Concepts
- Gracefully degradable storage abstractions
- Sets or a collection of sets
- A subset can be considered a degradation
- Gracefully degradable and self-restoring system
architecture - Failure isolation Failure of one unit should not
cause the information on other units to be
inaccessible - Paxos for the self-restoring point, but
operational even without Paxos
14Is Set Abstraction Useful?
- A mail service, where each mailbox can be
implemented as a set - A data retention/backup system
- MSN Spaces
- An emerging trend A flat structure with search
capability to replace a hierarchical structure
15Availability Cliff Revisited
Availability
Self-restoration
1
Fault tolerance
Fault isolation
0
Failures
More Severe
16Storing a Set
- Set elements are stored on multiple servers
- A local index is maintained for each locally
stored element - As long as a server is available, its local
elements are accessible - A global view can be constructed from local views
17Global Index as Soft State
- Global index for each set
- Can also maintain metadata for each element
- Soft state, for performance improvements
- Can support more complex data structures
- Map set id to the server maintaining the sets
global index - Paxos maintains the authoritative mapping
- Mapping disseminated to all servers as hints
- Same set might have multiple index servers during
massive failures
18Replication Strategies
- Limited operations during degraded mode (e.g., no
updates) - Optimistic replication hard to figure out the
stabilization point - Inherently weak semantics immutable elements,
tolerance to re-appearance of deleted elements
19Related Work
- Optimistic Replication (Saito and Shapiro)
- Bayou (Terry et al.), Coda (Kumar and
Satyanarayanan), Ficus (Reiher et al.), and Locus
(Walker et al.) - Fault Isolation
- Hive (Chapin et al.), Archipelago (Ji et al.),
Porcupine (Saito et al.), Pangaea (Saito et al.),
and D-GRAID (Sivathanu et al.) - Other Related Work
- Harvest, Yield, and Scalable Tolerant Systems
(Fox and Brewer), TACT (Yu and Vahdat)