PETAL:%20DISTRIBUTED%20VIRTUAL%20DISKS - PowerPoint PPT Presentation

About This Presentation
Title:

PETAL:%20DISTRIBUTED%20VIRTUAL%20DISKS

Description:

petal: distributed virtual disks e. k. lee c. a. thekkath dec src – PowerPoint PPT presentation

Number of Views:154
Avg rating:3.0/5.0
Slides: 31
Provided by: Jehan53
Learn more at: https://www2.cs.uh.edu
Category:

less

Transcript and Presenter's Notes

Title: PETAL:%20DISTRIBUTED%20VIRTUAL%20DISKS


1
PETALDISTRIBUTED VIRTUAL DISKS
  • E. K. LeeC. A. Thekkath
  • DEC SRC

2
Highlights
  • Paper presents a distributed storage management
    system
  • Petal consists of a collection of
    network-connected servers that cooperatively
    manage a pool of physical disks
  • Client see Petal as a highly available
    block-level storage partitioned into virtual
    disks

3
Introduction
  • Petal is a distributed storage system that
  • Tolerates single component failures
  • Can be geographically distributed to tolerate
    site failures
  • Transparently reconfigures to expand in
    performance or capacity
  • Uniformly balances load and capacity
  • Provides fast efficient support for backup and
    recovery

4
Petal User Interface
  • Petal appears to its clients as a collection of
    virtual disks
  • Block-level interface
  • Lower-level service than a DFS
  • Makes system easier to model, design, implement
    and tune
  • Can support heterogeneous clients and applications

5
Client view
NTFS
EXT2 FS
NTFS
Scalable Network
Petal
Virtualdisks
6
Physical view
NTFS
EXT2 FS
NTFS
Scalable Network
7
Petal Server Modules
Global StateModule
RecoveryModule
LivelinessModule
Virtual toPhysical
Data AccessModule
8
Overall design (I)
  • All state information is maintained on servers
  • Clients maintain only hints
  • Liveness module ensures that all servers will
    agree on the system operational status
  • Uses majority consensus and periodic exchanges of
    Im alive/Youre alive? messages

9
Overall design (II)
  • Information describing
  • current members of storage system and
  • currently supported virtual disks
  • is replicated across all servers
  • Global state module keeps this information
    consistent
  • Uses Lamports Paxos algorithm
  • Assumes fail-silent failures of servers

10
Overall design (III)
  • Data access and recovery modules
  • Control how client data are distributed and
    stored
  • Support
  • Simple data striping w/o redundancy
  • Chained declustering
  • It distributes mirrored data in a way that
    balances load in the event of a failure

11
Address translation (I)
  • Must translate virtual addresses
  • ltvirtual-disk ID, offsetgt
  • into physical addresses
  • ltserver ID, disk ID, offsetgt
  • Mechanism should be fast and fault-tolerant

12
Address translation (II)
  • Uses three replicated data structures
  • Virtual disk directorytranslates virtual disk
    ID into a global map ID
  • Global maplocates the server responsible for
    translating the given offset (block number)
  • Physical mapLocates physical disk and computers
    physical offset within that disk

13
Virtual to physical mapping
vdiskID
offset
diskID and diskOffset on this server
14
Address translation (III)
  • Three step process
  • VDir translates virtual disk ID given by client
    into a GMap ID
  • Specified GMap finds server that can translate
    given offset
  • PMap of server translates GMap ID and offset to
    a physical disk and a disk offset
  • Last two steps are almost always performed by
    same server

15
Address translation (IV)
  • There is one GMap per virtual disk
  • That GMap specifies
  • Tuple of servers spanned by the virtual disk
  • Redundancy scheme used to protect data
  • GMaps are immutable
  • Cannot be modified
  • Must create a new GMap

16
Address translation (V)
  • PMaps are similar to page tables
  • Each PMap entry maps 64 KB of physical disk
    space
  • Server that performs the translation will usually
    perform the disk I/O
  • Keeping GMaps and PMaps separate minimizes amount
    of global information that must be replicated

17
Support for backups
  • Petal supports snapshots of virtual disks
  • Snapshots are immutable copies of virtual disks
  • Created using copy-on-write
  • VDir maps ltvirtual-disk ID, epoch(?)gt into ltGMap
    ID, epochgt
  • Epoch identifies current version of virtual disks
    and snapshots of past versions

18
Incremental reconfiguration (I)
  • Used to add/remove new servers and new disks
  • Three simple steps
  • Create new GMap
  • Update VDir entries
  • Redistribute the data
  • Challenge is to perform the reconfiguration
    concurrently with normal client requests

19
Incremental reconfiguration (II)
  • To solve the problem
  • Read requests will
  • Try first new GMap
  • Switch to old GMap if new GMap has no
    appropriate translation
  • Write requests will always use new GMap

20
Incremental reconfiguration (III)
  • Observe that new GMap must be created before any
    data are moved
  • Too many read requests will have to consult both
    GMaps
  • Seriously degrades system performance
  • Do instead incremental changes over a fenced
    region of a virtual disk

21
Chained declustering (I)
Virtual Disk
22
Chained declustering (II)
  • If one server fails, its workload will be almost
    equally distributed among remaining servers
  • Petal uses a primary/secondary scheme for
    managing copies
  • Read requests can go to either primary or
    secondary copy
  • Write requests must go first toprimary copy

23
Petal prototype
  • Four servers
  • Each has fourteen 4.3 GB disks
  • Four clients
  • Links are 155 Mb/s ATM links
  • Petal RPC interface has 24 calls

24
Latency of a virtual disk
25
Throughput of a virtual disk
Throughput is mostly limited by CPU overhead (233
MHZ CPUs!)
26
File system performance
(Modified Andrew Benchmark)
27
Conclusion
  • Block-level interface s simpler and more
    flexible than a FS interface
  • Use of distributed software solutions allows
    geographic distribution
  • Petal performance is acceptable but for write
    requests
  • Must wait for primary and secondary copies to be
    successfully updated

28
Paxos the main idea
  • Proposers propose decision values from an
    arbitrary input set and try to collect
    acceptances from a majority of the accepters
  • Learners observe this ratification process and
    attempt to detect that ratification has occurred
  • Agreement is enforced because only one proposal
    can get the votes of a majority of accepters

29
Paxos the assumptions
  • Algorithm for consensus in a message-passing
    system
  • Assumes the existence of Failure Detectors that
    let processes give up on stalled processes after
    some amount of time
  • Processes can act as proposers, accepters, and
    learners
  • A process may combine all three roles

30
Paxos the tricky part
  • The tricky part is to avoid deadlocks when
  • There are more than two proposals
  • Some of the processes fail
  • Paxos lets
  • Proposers make new proposals
  • Accepters release their earlier votes for losing
    proposals
Write a Comment
User Comments (0)
About PowerShow.com