Title: Erasure Code Replication
1Erasure Code Replication
- Presenter W.K Lin
- (The Chinese University of Hong Kong)
2Why we need replication?
- Storage devices can fail to function.
- Use replication to increase data availability,
e.g. RAID - The basic idea of replication
- Place more data in different places and increase
the chance of finding a data. - P2P systems often provide replication.
3Server-less VoD Architecture
- No centralized video server to provide the video
streaming. - Each client in the system store a partial video
blocks. - Store the video blocks by erasure code.
- Not necessary to stream from all peers for
complete video playback. - The clients can stream the video from other
clients.
4Some Terminologies
- Peers are the computers/ storage devices that
store the data. - Peer availability µ is a measure to indicate the
portion of time that the peer is up/ online. - File availability A is the probability to recover
the file from the duplicated copies of data. - Storage overhead S is the ratio of storage
required for replication to the storage required
before replication
5Whole File Replication
- Whole file replication replicates the complete
file. - If the storage overhead is S, then there are S
copies of data in the system. - File availability Aw
6Whole File Replication
- It is not storage effective
Adopted from Replication Strategies for Highly
Available Peer to Peer Networks, Ranjita Bhagwan
et. al,
7Erasure Code Replication
- Instead of replicating the whole file, replicate
a portion of the file. - Principle
- A file is divided into b blocks.
- Use erasure code to add redundancy to these b
blocks. We then have n blocks in total. - Make the n file blocks dependent to each other
each file block has partial information of other
blocks. - Any b out of the n blocks are enough to recover
the original file.
8Erasure Code Replication
- Storage overhead S n/b or n Sb.
- Since we need any b out of the Sb copies to
recover the file, the file availability Aw is - Notice that whole file replication is a special
case of erasure code replication with b 1.
9Erasure Code Replication
- Erasure code replication is more storage effective
Adopted from Replication Strategies for Highly
Available Peer to Peer Networks, Ranjita Bhagwan
et. al,
10Effectiveness of Erasure Code Replication
- The effectiveness of erasure code replication is
determined by two factors - combinatorial effect, i.e. SbCb gtgt SC1
- peer availability factor µb(1-µ)Sb-b
- Erasure code replication depends on S, b, and µ.
11Effectiveness of Erasure Code Replication
12How Erasure Code Replication Performs?
- File availability A (Aw or Ab) by varying µ and S
13A Related Problem
- Lee and Liew paper Parallel Communications for
ATM Network Control and Management points out a
similar problem - An information string is divided into b parts,
then encoded into n parts. - Any b out of the n parts is enough to recover the
original information. - Very similar to our problem!
- They prove a necessary bound Sµ gt 1 for reliable
communication.
14Erasure Code Bound (Sµ gt 1)
- The area above the curve define the region that
erasure code replication is preferred for large b.
15Erasure Code Replication Sensitivity Analysis
- We need to use a large b in order to benefit from
erasure code replication. - If the system is operating at a level Sµ 1, a
little fluctuation of system parameter will harm
the system.
16Erasure Code Replication Sensitivity Analysis
- The system is targeted to operate at S 3, µ
0.35. - Sµ gt 1
- 10 measurement error of µ.
17Related Work I
- Markov chain model for a simple birth/ death
model
Adopted from Design and Analysis of a
Fault-Tolerant Mechanism for a Server-Less
Video-On-Demand System Lee and Yeung
18Related Work I
- Mean time to failure of the model
19Related Work II
c connected state, mean time to stay ? u
disconnected state, mean time to stay µ . d
dead state a the probability of going to
disconnected state d.
Adopted from Data Durability in Peer to Peer
Storage Systems Gil Utard, Antoine Vernois
20Related Work II
Storage overhead S3
21Conclusion
- Traditionally, erasure code replication has been
very successful, e.g. RAID - A strict bound Sµ gt 1, has to be satisfied for
replication to gain from erasure code
replication. - Erasure code replication is sensitive to system
measurement errors. - Partly explain why erasure code replication is
not seen in P2P systems.
22Future Directions
- Most analysis are based on the assumption that
all peers have the same availability level. - In real system, a peer might have different
failure and recovery rates. - The replica distribution, discovery are opened
for research - How to place/ locate the replicas if the peers
are having different availabilities? - If the system fail, how to recover the lost
replicas from the system?
23 End of presentation
24Appendix
Let X be a binomial random variable having mean
µSbµ and variance s2 Sbµ(1-µ).
25Appendix