Rebecca Isaacs - PowerPoint PPT Presentation

1 / 32
About This Presentation
Title:

Rebecca Isaacs

Description:

Instrumentation. System activity recorded to logs. Generic request parser ... Each instrumentation point posts an event. Events are logged to disk ... – PowerPoint PPT presentation

Number of Views:41
Avg rating:3.0/5.0
Slides: 33
Provided by: rebecca98
Category:

less

Transcript and Presenter's Notes

Title: Rebecca Isaacs


1
Magpie Distributed request tracking for
realistic performance modelling
  • Rebecca Isaacs
  • Paul Barham
  • Richard Mortier
  • Dushyanth Narayanan
  • Microsoft Research Cambridge
  • James Bulpin
  • University of Cambridge

2
Performance in distributed systems
  • Faults in distributed systems are notoriously
    hard to diagnose
  • Performance problems are even more subtle to
    debug
  • Often transient or affect only a subset of
    requests / users
  • Frequently involve complex interactions between
    multiple machines
  • Aggregate statistics (e.g. utilization) may look
    perfectly normal

3
Magpie Approach
  • Track individual requests end to end
  • Observe control flow (causality)
  • Monitor resource consumption CPU, bandwidth,
    disk
  • Debug performance in the small
  • Build a probabilistic workload model from the
    aggregate requests
  • Cluster similar requests according to their
    observed behaviour
  • Debug performance in the large

4
How do we use this information?
  • Performance debugging
  • Why did this request take much longer than that
    request?
  • Fault detection
  • Configuration and management
  • Performance prediction
  • Realistic workload models for capacity planning
  • Obtain automatically on a live system

5
Magpie components
  • Instrumentation
  • System activity recorded to logs
  • Generic request parser
  • Extract individual requests from logs according
    to an event schema
  • Model construction
  • Behavioural clusters
  • Probabilistic state machine

6
Outline
  • Introduction
  • What is a request?
  • Instrumentation
  • Request extraction
  • Modelling
  • Current status

7
What is a request?
  • System activity which takes place in response to
    an action initiated by the application being
    traced
  • HTTP request
  • Database query
  • File open request
  • We describe a request as
  • The sequence of application components involved
    in its processing
  • The resource consumed at each stage
  • CPU, bandwidth, disk transfer size, (latency)

8
A typical e-commerce site (1)
Internet
Storage
SQL Servers
Web Front Ends
9
A typical e-commerce site (2)
SQL Server
Web Server
CLR
IIS
Application
Logic
Filter
Stored
Static
procedures
Content
ASP.NET
ADO.NET
Data
WinSock2 API
WinSock2 API
Kernel
Kernel
10
HTTP request detailed view
from
!
WEB.eec
-

-

-
-


-
-
-
WEB.398
Disk
Net RX
Net TX
10.051s
10.155s
10.100s
Net TX
Net RX
Disk
-
-
-
SQL.9c4
10.051s
10.155s
10.100s
Blocked
IIS
ASP.NET
SQL
KEY
Disk
Other
11
Why is request tracking hard?
  • Many components, multiple machines
  • Must track control flow across machines
  • No globally unique request ID
  • Components are developed independently
  • Multiple thread pools
  • Many threads participate in processing a request
  • Asynchronous communication
  • Must match send/recvs between threads/machines
  • Hand-rolled synchronization primitives
  • SQL server has user-mode scheduler

12
Outline
  • Introduction
  • What is a request?
  • Instrumentation
  • Request extraction
  • Modelling
  • Current status

13
Event Tracing for Windows
  • Low-overhead event mechanism
  • Events timestamped with cycle counter
  • Global ordering on events on a single machine
  • Can enable/disable sets of events at runtime
  • Using ETW in Magpie
  • Each instrumentation point posts an event
  • Events are logged to disk
  • Logs are post-processed to extract requests
  • Can also consume events in real time

14
Instrumentation points
  • Existing ETW event providers
  • IIS, kernel
  • App-specific hooks
  • IIS, ASP.NET, SQL Server
  • Detours
  • Wrap dlls to trap Win32 and WinSock2 calls
  • WinPcap
  • Capture packets on the wire

15
CPU usage from kernel events
  • The ETW kernel logger records every context
    switch
  • How do we know which cycles are used for which
    request?
  • We can attribute cycles to a request by
  • An application-specific event which occurs within
    a delimited sector of CPU time, or
  • The current context of execution, eg thread id

16
Example protocol processing in a DPC
DPC start
DPC end
pkt recv
cswitch
Events
cswitch
Request 1 cycle count
time
Request 2 cycle count
17
Application and middleware events
  • Cover points where flow of control moves between
    components
  • Cover points where resources are multiplexed and
    demultiplexed
  • E.g. user-level scheduling primitives
  • Propagation of a global request id is not
    required!
  • Magpie used to do this but not any more

18
Instrumenting a web service
SQL Server
Web Server
CLR
IIS
Application
Extended SPs
Logic
Filter
Stored
Static
procedures
Content
HTTPModule
ASP.NET
ADO.NET
Data
ISAPI Filter
CLR profiler
Intercept
Intercept
WinSock2 API
WinSock2 API
Kernel
Kernel
Event Tracing for Windows
Event Tracing for Windows
Packet capture
Packet capture
19
Outline
  • Introduction
  • What is a request?
  • Instrumentation
  • Request extraction
  • Modelling
  • Current status

20
Generic request extraction
  • No inbuilt assumptions about the system or the
    application
  • No common unique identifier
  • Schema specifies semantics of events
  • Easy to add new event types
  • Parser stitches events into requests based on
    event semantics

21
Terminology
  • Namespace
  • Event parameter which references an entity in the
    system, eg thread id
  • Timeline
  • Instantiation of a namespace with a unique value,
    eg thread id 0xa
  • Events bind or unbind requests to timelines
  • Bindings capture the semantics of each event for
    a particular request type

22
Example connecting events
Recv returns
Enter Recv
DPC start
DPC end
TCP pkt
cswitch
cswitch
Cpuid0
Tid0xa
Tid0xb
Connid0xd
Request 1
Request 2
23
End-to-end request extraction
  • An instance of the request parser runs on each
    machine in the distributed system
  • Online or offline mode
  • Offline post-processing connects request
    fragments from each node according to a globally
    unique namespace, e.g. packet IP identifier

24
Outline
  • Introduction
  • What is a request?
  • Instrumentation
  • Request extraction
  • Modelling
  • Current status

25
Clustering for workload generation
  • Target the Indy performance modelling tool
  • Calculates throughput, bottlenecks
  • Needs transaction mix, resource consumption
  • Previously microbenchmark approach
  • Run 10000 of each transaction type (URL)
  • Divide aggregate resource usage by 10000
  • Aim provide realistic workload models
  • From real, mixed workloads
  • Derive transaction types automatically

26
Single request cartoon view
  • Partial ordering of events
  • Annotated with resource usage

IIS CPU
ASP.NET CPU
SQL Server CPU
27
Behavioural clustering of requests
  • Represent requests as event strings
  • Flatten out any concurrency
  • Use Levenshtein string edit distance
  • Modified to factor in resource usage vectors
  • Cluster requests based on this distance
  • Linear-time algorithm
  • Each cluster is a request type
  • Select representative from near centroid

28
Build a workload model by clustering similar
requests
  • Requests in the same cluster often have
    different URLs, and one URL may appear in many
    clusters

A
B
C
E
D
29
Taking it further work-in-progress
  • Online and incremental modelling
  • Detect component failure
  • Detect sudden shifts in workload
  • More sophisticated models
  • Learn the probabilistic state machine for each
    request
  • c.f. flowcharts annotated with performance
    information
  • Bayesian watchdogs
  • Compute the likelihood of a requests behaviour
    as it moves through the system
  • Deal with unlikely requests appropriately

30
Outline
  • Introduction
  • What is a request?
  • Instrumentation
  • Request extraction
  • Modelling
  • Current status

31
Current status
  • Recent focus has been developing a generic
    request extraction scheme
  • Prototype for 2-machine e-commerce site
  • TPC-W style workload
  • Prototype for single machine SQL Server 2000
  • Challenge is user mode scheduler
  • TPC-C workload
  • Other applications on the way
  • Large-scale
  • Real systems with real performance problems

32
Conclusion
  • Magpie is a tool for performance analysis in a
    distributed system
  • Bottom up, per-request approach
  • Complementary to existing techniques
  • Performance counters
  • Program profiling
  • Feeds into performance debugging and prediction
    tools

33
Work-in-progress learning the probabilistic
state machine
  • Infer a stochastic context free grammar from a
    sample set of strings
  • Each state transition emits a character and has
    an associated probability
  • Use the Alergia algorithm (Carrasco Oncina 94)
  • Construct a prefix tree from the sample set
  • Merge similar subtrees
  • Apply to Magpie requests
  • Just event strings

34
Ongoing work with Alergia
  • Tuning the similarity criterion
  • Factoring in resource usage information
  • Can we identify event sequences with suspiciously
    low probability
  • Run online for anomaly detection?
Write a Comment
User Comments (0)
About PowerShow.com