CS556: Distributed Systems - PowerPoint PPT Presentation

1 / 20

About This Presentation

Title:

CS556: Distributed Systems

Description:

Condor. Distributed batch processing. Large-scale numerical ... restart() invoked by Condor code. Instead of user's main() Overwrites its own data segment ... – PowerPoint PPT presentation

Number of Views:24

Avg rating:3.0/5.0

Slides: 21

Provided by: dimp9

Category:

more less

Transcript and Presenter's Notes

Title: CS556: Distributed Systems

1
CS-556 Distributed Systems
Condor, Mariposa

Manolis Marazakis
maraz_at_csd.uoc.gr

2
Condor

Distributed batch processing
Large-scale numerical simulations
U. Wisconsin Fermi Labs
Schedule job execution on idle workstations
More efficient resource utilization
Suspend execution when owner begins to use it
Either migrate to another idle workstation
checkpoint of process state
Data stack segments, CPU state, pending
signals, open FDs
Executables need to be re-linked (not
re-compiled)
Allow access to files even without a DFS
remote system calls
Unmodified OS kernel
There are some limitations
Or queue until an idle workstation is available

3
Checkpointing (I)

UNIX process state
Text
Data
Initialized (static) data
Segment starting at first page-size boundary
above text area
Unitialized data
Heap
Grows toward higher addresses
Stack
Scratch space for function call mechanism
Automatic variables, arguments, return values
Complications with user-level thread packages
The stack may be in the process data segment
Kernel state
Signals, open FDs, registers
Mapped segments
Dynamically linked libraries

4
Checkpointing (II)

setjmp() longjmp() for manipulating the stack
ioctl() in /proc file system to find mapped
memory segments
write() to produce checkpoint file
or write to socket
Ensure that dynamic library text data are
included
must be in the same addresses at restart
mmap() for /dev/zero, read() to restore
checkpoint
Checkpoint initiated by signal (SEGV)
restart as if returning from signal handler
Re-open active file descriptors
Recorded using a modified version of open()
Uses syscall()
sigprocmask(), sigaction(), sigispending()

5
Checkpointing (III)

Condor init. handler is linked with users code
Init. data structures install signal handler
checkpoint()
Invoke users main()
Upon signal reception, perform check-point
Continue execution (periodic snapshot)
or vacate workstation
restart() invoked by Condor code
Instead of users main()
Overwrites its own data segment
Returns to checkpoint() which is a signal
handler ?
When checkpoint() returns, the users code is
resumed

6
Checkpointing (IV)

Shadow process
Handles RPCs for operations on file descriptors
opened on a host from which the process was
vacated
Useful when there is no common FS
Modified read() write()
Using syscall()
Limitations
Communicating processes ?
sockets, signals, pipes,
Programs that invoke fork() ? exec()
What about code that cannot be re-linked ?

7
Top-down approaches to resource management

Optimization of system-wide metrics
Average response time throughput
Overall
OR per-class
Representing requirements of individual
application classes
Centralized approaches
Resource Manager
Co-operative approaches
Consensus
Multicast

Consistent global state ?
8
Resource management in open systems

Heterogeneity of applications components
How to define a global performance metric ?
Conflicting goals
Dynamic changes in the environment
Dynamic participation of providers
New application classes
Changes in resource consumption patterns
Communication overhead
Limited resilience to failures

9
Bottom-up resource management

Market
A system where independent individuals interact
via trading to achieve a fair allocation of
resources
Mechanisms protocols
Price systems, auctions
Economic theory
Analytical framework for reasoning about
properties of resource allocations
No attempt to optimize a global metric
Each producer/consumer has its own goals
definition of optimality
Independent decisions
Competition, selfish optimization
Global coherent behavior emerges when an
equilibrium state is reached

10
Pareto optimality

A set of allocations is optimal if no subset of
the agents can improve on their allocations
No requirement for
Comparable preferences (utility functions)
Central co-ordination
Multiple independent optimization problems
One for each decision agent
Trading reveals no private state
Only exchanges acceptance/rejection of offers
Initial endowments to agents
Reflect relative priorities

11
General equilibrium

Perfect balance of supply demand
For all traded goods
How to make allocation decisions ?
Some approaches require that an equilibrium state
is reached
Tatonnement process (Walras, 1874)
Arbitrary ordering of resources
Adjust price of a resource so as to balance
demand with supply, given the prices of all other
resources
Multiple rounds, as a change in the price of a
resource may trigger a change of the excess
demand for all resources
Other approaches are more dynamic
Stock market metaphor
Basic assumption An agent that selfishly
competes will not voluntarily trade with others
unless it is made better off by trading

12
Auctions

Mechanisms for adjusting prices so as to match
supply demand
English auction
Dutch auction
Sealed bid
Double auction
Stock commodity exchange
Vickrey auction
Similar to sealed-bid, but the winner pays the
price of the 2nd-highest price
The optimal strategy is to reveal true valuations
thus avoiding multiple rounds

13
A load-balancing economy

Allocation of CPU time link bandwidth bet.
competing jobs
dij delay over link (i, j)
ri processing rate of node i
µj processing requirement of job j
Preferences of jobs
Min Ck cost of processing at node k,
including cost of communication B/W from origin
node to node k
Min STk service time at node k
STk
Min Ck aSTk
Auctions held by processors bidding by the jobs

14
A data management economy

Management of data migration replication
Minimize expected Tx response time
Control variables
copies of each data object
Assignment of copies to nodes
Pricing strategies of data suppliers
Txs pay for data access at a processor
which leases copies of objects from data
managers
The number of read-only copies adapts to the
read/write ratio
Without any further coordination

15
Mariposa

DDBMS built based on Distributed INGRES
Wide-area distribution
Local autonomy
Assumptions differing from the conventional
Static data allocation
Single administration authority
Uniformity in CPU, network connectivity, query
processing capacity
Microeconomic resource management
Query routing scheduling
Replication of data fragments
Naming service
Execute queries within their budget
By contracting processing sites for query
fragments

16
Dynamic environment

Naming service
advertisements for available data objects
Replicated (not centralized)
Contract other instances to receive updates
(asynchronously)
A server can join the system by buying copies of
data objects advertising its services
Per-site bidder storage manager
Attempt to maximize profit per unit of proc. time
Total autonomy
Some queries may not be completed
Some data objects may be dropped
Data mobility
No notion of home node

17
Replica management (I)

Storage managers contract others to receive
asynchronous notifications of updates
Define payment stream for updates delivered
within a specified time interval
Trade-off bet. currency of data replication
cost
Updates are merged
Data returned by queries may be out-of-data
by varying degrees
Budget a non-increasing function of time
Administered by querys broker
Obtains bids ltCi, Di, Digt
for each of the sub-queries in the querys
execution plan
cost for processing a sub-query within Di sec
after its receipt
Expiration time (validity period)

18
Replica management (II)

Per-site billing rate for each data object
Allows site admin. to express bias towards
specific objects
Hot list
Data objects for which the site always issues
bids
Actual bid Computed Bid Load Average
Low price when idle
High price when overloaded

19
B/W management

Table with entries of the form ltBW, t1, t2gt
Available B/W bet. sites, for a given time
interval
Network bid requests
ltTx, request, from ,togt
Network bid
ltTx, request, reservation price, timegt
Step-by-step calculation of available B/W
Starting at destination
Propagation of B/W profiles
When the B/W profile reaches the source, it
provides the minimum B/W over all links along the
path
Source-to-destination pass to determine the price
for B/W along the path

20
References

M. Litzkow, T. Tannenbaum, J. Basney, M. Livny,
Checkpointing and migration of UNIX processes in
the Condor distributed processing system,
Technical report 1346, U. Wisconsin-Madison,
Computer Sciences Dept., 1997.
M. Stonebraker, Mariposa A wide-area
distributed database system, VLDB Journal,
October, 1996.

Write a Comment

User Comments (0)