Dynamic Resource Management in Internet Hosting Platforms - PowerPoint PPT Presentation

1 / 55
About This Presentation
Title:

Dynamic Resource Management in Internet Hosting Platforms

Description:

Dynamic Resource Management in Internet Hosting Platforms Ph.D. Thesis Defense Bhuvan Urgaonkar Advisor: Prashant Shenoy – PowerPoint PPT presentation

Number of Views:175
Avg rating:3.0/5.0
Slides: 56
Provided by: psu114
Learn more at: https://www.cse.psu.edu
Category:

less

Transcript and Presenter's Notes

Title: Dynamic Resource Management in Internet Hosting Platforms


1
Dynamic Resource Management in Internet Hosting
Platforms
  • Ph.D. Thesis Defense
  • Bhuvan Urgaonkar
  • Advisor Prashant Shenoy

2
Internet Applications
  • Proliferation of Internet applications

auction site
online game
online retail store
  • Growing significance in personal, business
    affairs
  • Focus Internet server applications

3
Hosting Platforms
  • Data Centers
  • Clusters of servers
  • Storage devices
  • High-speed interconnect
  • Hosting platforms
  • Rent resources to third-party applications
  • Performance guarantees in return for revenue
  • Benefits
  • Applications dont need to maintain their own
    infrastructure
  • Rent server resources, possibly on demand
  • Platform provider generates revenue by renting
    resources

4
Goals of a Hosting Platform
  • Meet service-level agreements
  • Satisfy application performance guarantees
  • E.g., average response time, throughput
  • Maximize revenue
  • E.g., maximize the number of hosted applications
  • Question How should a hosting platform manage
    its resources to meet these goals?

5
Challenge 1 Dynamic Workloads
  • Multi-time-scale variations
  • Time-of-day, hour-of-day
  • Overloads
  • E.g., Flash crowds
  • User threshold for
    response time
    8-10 s
  • Key issue How to provide good
  • response time under varying workloads?

1200
0
0
1
2
3
4
5
Time (days)
Arrivals per min
140K
0
0 12 24
Time (hours)
6
Challenge 2 Complexity of Applications
  • Complex software architecture
  • Diverse software components
  • Web servers, Java application servers, databases
  • Multiple classes of clients
  • How to provide differentiated service?
  • Replicable components
  • How many replicas to have?
  • Tunable configuration parameters
  • E.g., MaxClient in Apache
  • How to set these parameters?
  • Key issue How to capture all this complexity?

7
Talk Outline
  • Motivation
  • Thesis contributions
  • Application modeling
  • Dynamic provisioning
  • Scalable request policing
  • Conclusions

8
Hosting Platform Models
  • Small applications
  • Require only a fraction of a server
  • Shared Web hosting, 20/month to run own Web site
  • Shared hosting multiple applications on a server
  • Co-located applications compete for server
    resources

9
Hosting Platform Models
  • Large applications
  • May span multiple servers
  • eBay site uses thousands of servers!
  • Dedicated hosting at most one application per
    server
  • Allocation at the granularity of a single server

10
Thesis Contributions
  • Dynamic resource management in hosting platforms
  • Shared Hosting
  • Statistical multiplexing and under-provisioning
    OSDI 2002
  • Application placement PDCS 2004
  • Dedicated Hosting
  • Analytical model for an Internet application
    SIGMETRICS 2005
  • Dynamic provisioning Autonomic Computing 2005
  • Scalable request policing PODC 2004, WWW 2005

11
Talk Outline
  • Motivation
  • Thesis contributions
  • Application modeling
  • Dynamic provisioning
  • Scalable request policing
  • Conclusions

12
Internet Application Architecture
queries
search moby
response
Melvilles Moby Dick Music CDs by Moby
HTTP
J2EE
Database
request processing in an online bookstore
  • Multi-tier architecture
  • Each tier uses services provided by its successor
  • Session-based workloads

13
Baseline Application Model
SIGMETRICS05
clients
application
  • Model consists of two components
  • Sub-system to capture behavior of clients
  • Sub-system to capture request processing inside
    the application

14
Modeling Clients
Z
Client 1
Z
Client 2
application
clients
Z
Client N
Q0
  • Clients think between successive requests
  • Infinite server system to capture think time Z
  • Captures independence of Z from processing in
    application

15
Modeling Request Processing
pM1
p3
p1
p2
S1
S2
SM
Q1
Q2
QM
N
tier 1
tier 2
tier M
  • Transitions defined to capture circulation of
    requests
  • Request may move to next queue or previous queue
  • Multiple requests are processed concurrently at
    tiers
  • Processor sharing scheduling discipline
  • Caching effects get captured implicitly!

16
Putting It All Together
pM1
p3
p1
p2
Z
S1
S2
SM
client
Z
client
Q1
Q2
QM
Q0
N
tier 1
tier 2
tier M
  • A closed-queuing model that captures a given
    number of simultaneous sessions being served

17
Mean-value Analysis
1
client
n
client
Q1
Q2
QM
n1
Q0
client
A2(n1)
AM(n1)
A1(n1)
L1(n)
L2(n)
LM(n)
  • Product-form closed queuing network
  • Lm average length of Qm
  • Am average number of clients in Qm seen by
    arriving client
  • Am (n1) Lm (n)
  • Iterative algorithm to compute mean queue
    lengths, sojourn times

18
Parameter Estimation
  • Visit ratios
  • Equivalent to trans. probs. for MVA
  • Vi ?i / ?req ?req at sentry, ?i from logs
  • Service times
  • Use residence time Xi logged at tier i
  • For last tier, SM XM
  • Si Xi ( Vi1 / Vi ) Xi1
  • Think time
  • Measured at the application sentry

19
Evaluation of Baseline Model
  • Auction site RUBiS
  • One server per tier

Apache
JBOSS
Mysql
75
150
  • Concurrency limits not captured

20
Handling Concurrency Limits
Z
S1
S2
SM
Z
Q1
Q2
QM
Q0
N
dropped requests
  • Requests may be dropped due to concurrency limits
  • Need to model the finiteness of queues!

21
Handling Concurrency Limits
Z
S1
S2
SM
Z
Q1
Q2
QM
Q0
N
drop
QM
Q1
drop
pM
drop
p1
drop
drop
drop
S1
SM
  • Approach Subsystems to capture dropped requests
  • Distinguish the processing of dropped requests

22
Estimating Drop Probabilities and Delay Values
  • Drop probability
  • Step 1 Estimate throughput using MVA assuming no
    concurrency limits
  • Step 2 Estimate pidrop as the drop probability
    of M/M/1/Ki queue
  • Delay value for tier i
  • Subject the application to offline workload that
    causes limit to be exceeded only at tier i
    record response time of failed requests

Ki
t
t(1-pidrop)
Tputt
tpidrop
High limit
Low limit
High limit
23
Response Time Prediction
  • Enhanced model can capture concurrency limits

24
Replication and Load Imbalances
Apache
Mysql
JBOSS
  • Causes of imbalance
  • Sticky sessions
  • Variation in session durations and resource
    requirements
  • Imbalance factor for jth most-loaded replica of
    tier i
  • imbalance(i, j) num_arrivals(i, j) /
    num_arrivals(i)
  • Scale visit ratio
  • Vi, j Vi imbalance(i, j)

25
Capturing Load Imbalance
Number of requests (per-replica)
Response times (based on load)
1000
1800
1600
800
1400
Replica 1
Least loaded
600
1200
Number of requests
Replica 2
Medium loaded
1000
Avg. resp. time (msec)
400
Replica 3
800
Most loaded
600
200
Average
400
0
200
30
90
150
210
210
270
0
Time (sec)
Observed
Perfect Load balancing
Enhanced Model
  • Session affinity causes load imbalance
  • Imbalance shifts among replicas
  • Our enhancement helps improve
    response time prediction

Mysql
Apache
JBOSS
26
Talk Outline
  • Motivation
  • Thesis contributions
  • Application modeling
  • Dynamic provisioning
  • Scalable request policing
  • Conclusions

27
Dynamic Provisioning
Auto. Computing05
Monitor workload
Compute current/ future demand
Adjust allocation
  • Key idea increase or decrease allocated servers
    to handle workload fluctuations
  • Monitor incoming workload
  • Compute current or future demand
  • Match number of allocated servers to demand

28
Dynamic Provisioning at Multiple Time-scales
  • Predictive provisioning
  • Certain Internet workloads patterns can be
    predicted
  • E.g., time-of-day effects, increased workload
    during Thanksgiving
  • Provision using model at time-scale of hours or
    days
  • Reactive provisioning
  • Applications may see unpredictable fluctuations
  • E.g., Increased workload to news-sites after an
    earthquake
  • Detect such anomalies and react fast (minutes)

29
Request Policing
Sentry policing
drop
  • Key Idea If incoming req. rate gt current
    capacity
  • Turn away excess requests
  • Why police when you can provision?
  • Provisioning is not instantaneous
  • Residual sessions on reallocated server
  • Application and OS installation and configuration
    overheads
  • Overhead of several (5-30) minutes

30
Existing Work
  • Lots of existing work on request policing
  • Kanodia00, Li00, Verma03, Welsh03, Abdelzaher99,
  • Shortcomings of existing work
  • Does not attempt to integrate policing and
    provisioning
  • Does not address scalability of the policer!
  • The policer itself may become the bottleneck
    during overloads

31
Policer Design Goals
  • Each class should sustain its guaranteed
    admission rate
  • Class-based differentiation and revenue
    maximization
  • Challenging due to online nature of the problem
  • An admitted request may cause a more important
    request arriving later to be dropped
  • Approach Preferential admission to higher class
    requests
  • Scalability
  • The policer should remain operational even under
    extremely high arrival rates

32
Overview of Policer Design
PODC04 / WWW05
Admission control
dgold
Class gold
admitted
dsilver
Class silver
Classifier
dropped
dbronze
Class bronze
Leaky buckets
Class-specific queues
  • Our policer has three components
  • Request classifier and per-class leaky buckets
  • Class-specific queues
  • Admission control

33
Class-based Differentiation
Admission control
dgold
Class gold
admitted
dsilver
Class silver
Classifier
dropped
dbronze
Class bronze
Leaky buckets
Class-specific queues
  • Each incoming request undergoes classification
  • Per-class leaky buckets used to ensure that rates
    guaranteed in SLA are admitted

34
Revenue Maximization
Admission control
dgold
Class gold
admitted
dsilver
Class silver
Classifier
dropped
dbronze
Class bronze
Leaky buckets
Class-specific queues
  • Idea Different delays in processing requests of
    different classes
  • More important requests processed more frequently
  • Methodology to compute delay values in online
    manner
  • Bounds probability of a request denying admission
    to a more important request Appendix B of thesis

35
Admission Control
Admission control
dgold
Class gold
admitted
dsilver
Class silver
Classifier
dropped
dbronze
Class bronze
Leaky buckets
Class-specific queues
  • Goal Ensure that an admitted request meets its
    response time target
  • Measurement-based admission control algorithm
  • Use information about current load on servers and
    estimated size of new request to make decision

36
Scalability of Admission Control
  • Idea 1 Reduce the per-request admission control
    cost
  • Admission control on every request may be
    expensive
  • Bursty arrivals during overloads gt batches get
    formed
  • Delays for class-based differentiation gt batches
    get formed
  • Admission control that operates on batches
    instead of requests
  • Idea 2 Sacrifice accuracy for computational
    overhead
  • When batch-based processing becomes prohibitive
  • Threshold-based scheme
  • E.g., Admit all Gold requests, drop all Silver
    and Bronze requests
  • Thresholds chosen based on observed arrival rates
    and service times
  • Extremely efficient
  • Wrong threshold gt bad response times or fewer
    requests admitted

37
Scaling Even Further
  • Protocol processing overheads will saturate
    sentry resources at extremely high arrival rates
  • Indiscriminate dropping of requests will occur
  • Important requests may be turned away without
    even undergoing the admission control test
  • Loss in revenue!
  • Sentry should still be able to process each
    arriving request!
  • Idea Dynamic capacity provisioning for sentry
  • Pull in an additional sentry if CPU utilization
    of existing sentries exceeds a threshold (e.g.,
    90)
  • Round-robin DNS to load balance among sentries

38
Class-based Differentiation
  • Three classes of requests Gold, Silver, Bronze
  • Policer successful in providing preferential
    admission to important requests

39
Threshold-based Higher Scalability
  • Threshold-based processing allows the policer to
    handle upto 4 times higher arrival rate
  • Single sentry can handle about 19000 req/s

40
Threshold-based Loss of Accuracy
  • Higher scalability comes at a loss in accuracy of
    admission control
  • More violations of response time targets

41
Talk Outline
  • Motivation
  • Thesis contributions
  • Application modeling
  • Dynamic provisioning
  • Scalable request policing
  • Summary and Future Research

42
Thesis Contributions
  • Dynamic resource management in hosting platforms
  • Shared Hosting
  • Statistical multiplexing and under-provisioning
    OSDI 2002
  • Application placement PDCS 2004
  • Dedicated Hosting
  • Analytical model for Internet applications
    SIGMETRICS 2005
  • Dynamic provisioning Autonomic Computing 2005
  • Scalable request policing PODC 2004, WWW 2005

43
Future Research Directions
  • Virtual machine based hosting
  • Recent research has shown feasibility of
    migrating VMs across nodes
  • Adds a new dimension to the capacity provisioning
    problem
  • Characterizing multi-tier workloads
  • Workloads for standalone Web servers are
    well-characterized
  • E.g., typical service times at Java tier or query
    processing times?
  • Offshoot of this study workloads generators for
    multi-tier applications
  • Automated determination of provisioning
    parameters
  • Predictor and reactor invoked based on manually
    chosen frequencies
  • System administrators use rules-of-thumb gt
    error-prone

44
Thanks to
  • Advisor
  • Prashant Shenoy
  • Thesis committee
  • Emery Berger, Jim Kurose, Don Towsley, Tilman
    Wolf
  • Collaborators
  • Abhishek Chandra, Pawan Goyal, Giovanni
    Pacifici, Timothy Roscoe, Arnold
    Rosenberg, Mike Spreitzer, Asser Tantawi
  • All my teachers
  • Paul Cohen, Mani Krishna, Don Towsley
  • Friends and family

45
  • Questions or comments?

46
Query Caching at the Database
  • Caching effects
  • Captured by tuning Vi and/or Si
  • Bulletin-board site RUBBoS
  • 50 sessions
  • SELECT SQL_NO_CACHE causes Mysql to not cache the
    response to a query

47
Agile Switching Using Virtual Machine Monitors
  • VMMs allow multiple virtual m/c on a server
  • E.g., Xen, VMWare,

dormant
dormant
active
active
VM1
VM1
VM2
VM3
VM2
VM3
VMM
VMM
  • Use VMMs to enable fast switching of servers
  • Switching time only limited by residual sessions

48
Prototype Data Center
Server Node
Application capsules Sentries
Resource monitoring Parameter estimation
Control Plane
Application placement Dynamic provisioning
  • 40 Linux servers
  • Gigabit switches
  • Multi-tier applications
  • Auction (RUBiS)
  • Bulletin-board (RUBBoS)
  • Apache, JBOSS (replicable)
  • Mysql database

49
Sentry Provisioning (XXX)
50
System Overview
Server Node
Application capsules Sentries
Resource monitoring Parameter estimation
Control Plane
Application placement Dynamic provisioning
  • Control Plane
  • Centralized resource manager
  • Nucleus
  • Per-server measurements and resource management
  • Sentry
  • Per-application admission control
  • Capsule
  • Component of an application running on a server

51
Existing Application Models
  • Models for Web servers Chandra03, Doyle03
  • Do not model Java server, database etc.
  • Black-box models Kamra04, Ranjan02
  • Unaware of bottleneck tier
  • Extensions of single-tier models Welsh03
  • Fail to capture interactions between tiers
  • Existing models inadequate for multi-tier
    Internet applications

52
Existing Work
  • Predictable resource management within a single
    server
  • Proportional-share schedulers for CPU, network
    Duda,Goyal,Waldspurger
  • Multi-processors Chandra
  • Memory management Berger,Waldspurger
  • Disk scheduling Shenoy
  • Hosting platforms and Internet applications
  • Rice, Duke, Penn State shared platforms for Web
    servers
  • IBM, HP Labs shared platforms, workload
    prediction
  • Berkeley novel architecture for Internet
    applications
  • Main shortcomings
  • Possible statistical multiplexing gains in shared
    platforms unexplored
  • Most work assumes simplistic applications (e.g.,
    only Web servers)
  • Provisioning either purely reactive or purely
    predictive
  • Handling of extreme overloads not addressed
    satisfactorily

53
Predictive Provisioning
















Servers
54
Reactive Provisioning
lactual
Prediction error
Invoke reactor
allocate servers
gt t
lerror
lpred
time series
  • Idea react to current conditions
  • Useful for capturing significant short-term
    fluctuations
  • Can correct errors in predictions
  • Track error between long-term predictions and
    actual
  • Allocate additional servers if error exceeds a
    threshold
  • Can be invoked if request drop rate exceeds a
    threshold
  • Operates over time scale of a few minutes
  • Pure reactive provisioning lags workload
  • Reactive predictive more effective!

55
Dynamic Capacity Provisioning
  • Auction application RUBiS
  • Factor of 4 increase in 30 min

Server allocations
Workload
Response time
  • Server allocations increased to match increased
    workload
  • Response time kept below 2 seconds
Write a Comment
User Comments (0)
About PowerShow.com