A Server-less Architecture for Building Scalable, Reliable, and Cost-Effective Video-on-demand Systems

1 / 29
About This Presentation
Title:

A Server-less Architecture for Building Scalable, Reliable, and Cost-Effective Video-on-demand Systems

Description:

A Server-less Architecture for Building Scalable, Reliable, and Cost-Effective Video-on-demand Systems Raymond Leung and Jack Y.B. Lee Department of Information ... –

Number of Views:161
Avg rating:3.0/5.0
Slides: 30
Provided by: Raym52
Category:

less

Transcript and Presenter's Notes

Title: A Server-less Architecture for Building Scalable, Reliable, and Cost-Effective Video-on-demand Systems


1
  • A Server-less Architecture for Building Scalable,
    Reliable, and Cost-Effective Video-on-demand
    Systems
  • Raymond Leung and Jack Y.B. Lee
  • Department of Information Engineering
  • The Chinese University of Hong Kong

2
Contents
  • Introduction
  • Server-less Architecture
  • Performance Evaluation
  • System Scalability
  • Summary

3
Client-Server Architecture
Introduction
  • Traditional client-server architecture
  • clients connect to server for streaming
  • system capacity limited by server capacity

4
Motivation
Introduction
  • Limitation of client-server system
  • system capacity limited by server capacity
  • high-capacity server is very expensive
  • Availability of powerful client-side device, or
    called set-top box (STB)
  • home entertainment center - VCD/DVD player,
    digital music jukebox, etc.
  • relatively high processing capability, and local
    HD storage
  • Server-less architecture
  • eliminates the dedicated server
  • each user node (STB) serves both as a client and
    as a mini-server
  • fully distributed storage, processing, and
    streaming

5
Server-less Architecture
Architecture
  • Basic principles
  • dedicated server is eliminated
  • users are divided into clusters
  • video data is distributed to nodes in a cluster

6
Challenges
Architecture
  • Data placement policy
  • Retrieval and transmission scheduling
  • Fault tolerance
  • Distributed directory service
  • System adaptation and dynamic reconfiguration
  • etc.

7
Data Placement Policy
Architecture
  • Block-based striping
  • video data is divided into fixed-size blocks and
    then distributed among nodes in the cluster
  • low storage requirement, load balanced
  • capable of fault tolerance using redundant
    unit(s)

8
Retrieval and Transmission Scheduling
Architecture
  • Round-based Schedulers
  • retrieves data block in each micro-round
  • transmission starts at the end of micro-round

9
Retrieval and Transmission Scheduling
Architecture
  • Disk retrieval scheduling
  • Grouped Sweeping Scheme1 (GSS)
  • able to control the tradeoff between disk
    efficiency and buffer requirement
  • Transmission scheduling
  • Macro round length
  • time required that every node sends out a data
    block of Q bytes
  • depends on system scale, data block size and
    video bitrate

Tf macro round length n number of nodes
within a cluster Q data block size Rv video
bit-rate
1P.S. Yu, M.S. Chen D.D. Kandlur, Grouped
Sweeping Scheduling for DASD-based Multimedia
Storage Management, ACM Multimedia Systems, vol.
1, pp. 99 109, 1993
10
Retrieval and Transmission Scheduling
Architecture
  • Transmission scheduling
  • Micro round length
  • under the GSS scheduling, the GSS group duration
    within each macro round
  • depends on macro round length and number of GSS
    groups

Tg micro round length Tf macro round length g
number of GSS groups
11
Fault Tolerance
Architecture
  • Node characteristics
  • lower reliability than high-end server
  • shorter mean time to failure (MTTF)
  • system fails if any one of the nodes fails
  • Fault tolerance mechanism
  • erasure correction code to implement fault
    tolerance
  • Reed-Solomon Erasure code2 (RSE)
  • retrieve and transmit coded data at higher data
    rate
  • recover data blocks at the receiver node

2A. J. McAuley, Reliable Broadband Communication
Using a Burst Erasure Correcting Code, in Proc.
ACM SIGCOMM 90, Philadelphia, PA, September 1990,
pp. 287306.
12
Fault Tolerance
Architecture
  • Redundancy
  • encode redundant data from video data
  • recover lost data in case of node failure(s)

13
Performance Evaluation
Performance Evaluation
  • Storage capacity
  • Network capacity
  • Disk access bandwidth
  • Buffer requirement
  • System response time

14
Storage Capacity
Performance Evaluation
  • What is the minimum number of nodes required to
    store a given amount of video data?
  • For example
  • video bitrate 150 KB/s
  • video length 2 hours
  • storage required for 100 videos 102.9GB
  • If each node can allocate 1GB for video storage,
    then
  • 103 nodes are needed (without redundancy) or
  • 108 nodes are needed (with 5 nodes added for
    redundancy)
  • This sets the lower limit on the cluster size.

15
Network Capacity
Performance Evaluation
  • How many nodes can be connected given a certain
    network switching capacity?
  • For example
  • video bitrate 150KB/s
  • If the network switching capacity is 32Gbps, and
    assume 60 utilization
  • up to 8388 nodes (without redundancy)
  • Network switching capacity is not a bottleneck.

16
Disk Access Bandwidth
Performance Evaluation
  • Recall the retrieval and transmission scheduling
  • Continuous data transmission constraint
  • must finish retrieval before transmission in each
    micro-round
  • need to quantify the disk retrieval round length
    and verify against the above constraint

17
Disk Access Bandwidth
Performance Evaluation
  • Disk retrieval round length
  • time required retrieving data blocks for
    transmission
  • depends on seeking overhead, rotational latency
    and data block size
  • suppose k requests per GSS group
  • Continuous data transmission constraint

18
Disk Access Bandwidth
Performance Evaluation
  • Example
  • Disk Quantum Atlas 10K3
  • Data block size (Q) 4KB
  • Video bitrate (Rv) 150KB/s
  • Number of nodes N
  • GSS group number (g) N (reduced to FCFS
    scheduling)
  • Micro round length
  • Disk retrieval round length 0.017s lt 0.027s
  • Therefore the constraint is satisfied even if
    FCFS scheduler is used.

3G. Ganger and J. Schindler, Database of
Validated Disk Parameters for DiskSim,
http//www.ece.cmu.edu/ganger/disksim/diskspecs.h
tml
19
Buffer Requirement
Performance Evaluation
  • Receiver buffer requirement
  • double-buffering scheme
  • one for storing data received from the network
    plus locally retrieved data blocks
  • another one for video decoder
  • Sender buffer requirement
  • under GSS scheduling

20
Buffer Requirement
Performance Evaluation
  • Total buffer requirement versus system scale
  • Data block size 4KB, Number of GSS groups gN

21
System Response Time
Performance Evaluation
  • System response time
  • time required from sending out request to
    playback begins
  • scheduling delay pre-fetch delay
  • Scheduling delay under GSS
  • time required from sending out request to data
    retrieval starts
  • can be analyzed using urns model
  • detailed derivation available elsewhere4
  • Prefetch delay
  • time required from retrieving data to playback
    begins
  • one micro round to retrieve a data block and one
    macro round to transmit the whole block to the
    client node

4Lee, J.Y.B., Concurrent push-A scheduling
algorithm for push-based parallel video servers,
IEEE Transactions on Circuits and Systems for
Video Technology, Volume 9 Issue 3 , April
1999, Page(s) 467 -477
22
System Response Time
Performance Evaluation
  • For example
  • Data block size 4KB

23
System Scalability
System Scalability
  • Not limited by network or disk bandwidth
  • prefers FCFS disk scheduler over SCAN
  • Limited by system response time
  • prefetch delay increases linearly with system
    scale
  • example response time of 5.615s at a scale of
    200 nodes
  • Solution
  • forms new clusters to expand system scale
  • uses smaller block size (limited by disk
    efficiency)

24
Summary
Summary
  • Server-less architecture proposed for VoD
  • dedicated server is eliminated
  • each node serves as both a client and a
    mini-server
  • inherently scalable
  • Challenges addressed
  • data placement policy
  • retrieval and transmission scheduling
  • fault tolerance
  • Performance evaluation
  • acceptable storage and buffer requirement
  • scalability limited by system response time

25
  • End of Presentation
  • Thank you
  • Question Answer Session

26
Reliability
Appendix
  • Higher reliability achieved by redundancy
  • each node has independent failure and recovery
    rate, and respectively
  • let state i be the system state where i out of
    the N nodes failed
  • at state i, the changing rate to state (i1) and
    (i-1) are and respectively
  • assume the system can tolerate up to h failures
    using redundancy
  • the system state diagram is shown as follows

27
Reliability
Appendix
  • System mean time to failure (MTTF)
  • can be analyzed by continuous time Markov Chain
    model
  • solving the expected time from state 0 to state
    (h1) in previous diagram,

28
Impact of Redundancy
Appendix
  • Bandwidth requirement (without redundancy)
  • (N-1) received from network and one locally
    retrieved from disk
  • Bandwidth requirement (with h redundancy)
  • additional network bandwidth will be needed for
    transmitting the redundant data

Rv video bit-rate
29
Impact of Redundancy
Appendix
  • Data block size (without redundancy)
  • block size Q bytes
  • Data block size (with h redundancy)
  • block size
Write a Comment
User Comments (0)
About PowerShow.com