CS 603 Review

About This Presentation

Title:

CS 603 Review

Description:

Language / Platform Independent. Implementation Issues: Data Conversion ... Independent Recovery. Problems with 2-PC. Blocking on failure. 3-PC as solution ... – PowerPoint PPT presentation

Number of Views:33

Avg rating:3.0/5.0

Slides: 68

Provided by: clif8

Learn more at: https://www.cs.purdue.edu

Category:

more less

Transcript and Presenter's Notes

Title: CS 603 Review

1
CS 603Review

April 24, 2002

2
Seminar Announcements

Saurabh Bagchi, Hierarchical Error Detection in
a Distributed Software Implemented Fault
Tolerance (SIFT) Environment
April 25, 1030-1130, MSEE 239
Fabian E. Bustamante, The Active Streams
Approach to Adaptive Distributed Systems
April 29, 1030-1130, CS 101

3
Review

Why do we want distributed systems?
Scaling
Heterogeneity
Geographic Distribution
What is a distributed system?
Transparency vs. Exposing Distribution
Hardware Basics
Communication Mechanisms

4
Basic Software Concepts

Hiding vs. Exposing
Distribution Distributed OS
Location, but not distribution Middleware
None Network OS
Concurrency Primitives
Semaphores
Monitors
Distributed System Models
Client-Server
Multi-Tier
Peer to Peer

5
Communication Mechanisms

Shared Memory
Enforcement of single-system view
Delayed consistency d-Common Storage
Message Passing
Reliability and its limits
Stream-oriented Communications
Remote Procedure Call
Remote Method Invocation

6
RPC Mechanisms

DCE
Language / Platform Independent
Implementation Issues
Data Conversion
Underlying Mechanisms
Fault Tolerance Approaches
Java RMI
SOAP
Interoperable
Language independent
Transport independent (anything that moves XML)

7
Naming Requirements

Disambiguate only
Access resource given the name
Build a name to find a resource
Do humans need to use name?
Static/Dynamic Resource
Performance Requirements

8
Registry Example X.500

Goal Global white pages
Lookup anyone, anywhere
Developed by Telecommunications Industry
ISO standard directory for OSI networks
Idea Distributed Directory
Application uses Directory User Agent to access a
Directory Access Point
Basis for LDAP, ActiveDirectory

9
Directory Information Base(X.501)

Tree structure
Root is entire directory
Levels are groups
Country
Organization
Individual
Entry structure
Unique name
Build from tree
Attributes Type/value pairs
Schema enforces type rules
Alias entries

10
X.500

Directory Entry
Organization level CNPurdue University, LWest
Lafayette
Person level CNChris Clifton, SNClifton,
TITLEAssociate Professor
Directory Operations
Query, Modify
Authorization / Access control
To directory
Directory as mechanism to implement for others

11
X.500 Distributed Directory

Directory System Agent
Referrals
Replication
Cache vs. Shadow copy
Access control
Modifications at Master only
Consistency
Each entry must be internally consistent
DSA giving copy must identify as copy

12
Clock Synchronization

Definition All nodes agree on time
What do we mean by time?
What do we mean by agree?
Lamport Definition Events
Events partially ordered
Clock counts the order

13
Event-based definition(Lamport 78)

Define partial order of processes
A ? B A happened before B Smallest relation
such that
If A and B in same process and A occurs first, A
? B
If A is sending a message and B is receipt of a
message, A ? B
If A ? B and B ? C, then A ? C
Clock C(x) is time x occurs
C(x) Ci(x) where x running on node i.
Clocks correct if ? a,b a?b ? C(a) lt C(b)

14
Lamport Clock Implementation

Node i Increments Ci between any two successive
events
If event a is sending of a message m from i to j,
m contains timestamp Tm Ci(a)
Upon receiving m, set Cj current Cj and gt Tm
Can now define total ordering. a ? b iff
Ci(a) lt Cj(b)
Ci(a) Cj(b) and Pi lt Pj

15
What if we want wall clock time?

Ci must run at correct rate
? ? ltlt 1 such that dCi(t)/dt 1 lt ?
Synchronized
? small e such that ? i,j Ci(t) Cj(t) lt e
Assume transmission time between µ and µ?
Algorithm Upon receiving message m,set Cj(t)
max(Cj(t), Tmµ)
Theorem Assume every t seconds a message with
unpredictable delay ? is sent over every arc.
Then ? t t0 td, e d(2?t ?)

16
Clock SynchronizationLimits

Best Possible Delay Uncertainty
Actually e(1 1/n)
Synchronization with Faults
Faulty clock
Communication Failure
Malicious processor
Worst case Can only synchronize if lt 1/3
processors faulty
Better if clocks can be authenticated

17
Process Synchronization

Problem Shared Resources
Model as sequential or parallel process
Assumes global state!
Alternative Mutual Exclusion when Needed
Coordinator approach
Token Passing
Timestamp

18
Mutual Exclusion

Requirements
Does it guarantee mutual exclusion?
Does it prevent starvation?
Is it fair?
Does it scale?
Does it handle failures?

19
Mutual ExclusionColored Ticket Algorithm

Goals
Decentralized
Fair
Fault tolerant
Space Efficient
Idea Numbered Tickets
Next number gets resource
Problem Unbounded Space
Solution Reissue blocks

20
Multi-ResourceMutual Exclusion

New Problem Deadlock
Processes using all resources
Each needs additional resource to proceed
Dining Philosophers Problem
Coordinated vs. truly distributed solutions
Problems with deterministic solutions
Probabilistic solution Lehman Rabin
Starvation / fairness properties

21
Distributed Transactions

ACID properties
Issues
Commit Protocols
Fault Tolerance
Why is this enough?
Failure Models and Limitations
Mechanisms
Two-phase commit
Three-phase commit

22
Two-Phase Commit(Lamport 76, Gray 79)

Central coordinator initiates protocol
Phase 1
Coordinator asks if participants can commit
Participants respond yes/no
Phase 2
If all votes yes, coordinator sends Commit
Participants respond when done
Blocks on failure
Participants must replace coordinator
If participant and coordinator fail, wait for
recovery
While blocked, transaction must remain Isolated
Prevents other transactions from completing

23
Transaction Model

Transaction Model
Global Transaction State
Reachable State Graph
Local states potentially concurrent if a
reachable global state contains both local states
Concurrency set C(s) is all states potentially
concurrent with s
Sender set S(s) local states t t sends m and
s can receive m
Failure Model
Site failure assumed when expected message not
received in time
Independent Recovery

24
Problems with 2-PC

Blocking on failure
3-PC as solution
Theorems on recovery limits
Independent recovery No two-site failure
Non-independent recovery
Anything short of total failure okay
Recovery protocol for total failure

25
3PC assuming timeout on receipt of message
Coordinator
Participant
q1
q2
start xact/ no
start xact/ yes
xact request/ start xact
abort/ -
w1
w2
no/ abort
yes/ pre-commit
pre-commit/ ack
p1
p2
ack/commit
commit/ -
26
Termination Protocol

If participant times out in w2 or p2
Elect new Coordinator
If coordinator alive, would have
committed/aborted
New coordinator requests state of all processes.
Termination rules
If any aborted, broadcast abort
If any committed, broadcast commit
If all w2, broadcast abort
If any p2, send pre-commit and enter state p1
Complete failure protocol

27
Data Replication

Fault Tolerance
Hot backup
Catastrophic failure
Performance
Parallelism
Decreased reliance on network
Correctness criterion Replication invisible
One-copy serializability (1SR)

28
Data Replication How?

Goal Ensure one-copy serializability
Write-all solution All copies identical
Write goes to every site
Read from any site
Standard single-copy concurrency control
Guarantees 1SR
Single-copy concurrency control gives
serializable execution
Equivalent to serial execution where all writes
happen in one transaction

29
Write All Approach
Writer
Reader
5
read
5
5
5
read
3
3
3
3
5
5
5
30
Problem Site Failure

Failure causes write to block
Must maintain locks
Clogs up entire system
Is this fault tolerance?
What about write all available?
T0 w0xA w0xB w0yC c0
B-fails
T1 r1yC w1xA c1
B-recovers
T2 r2xB w2yC c2
What is the serial equivalent order?

31
Write All Available FailsEven if no recovery!
32
Solutions

Validate availability on commit
Check if any failed writes now available
Check that all sites read or written still
available
Enforces serializability for site failures
Doesnt work with communication failures!

33
Formalisms for Relaxed consistency

Goal Relaxed consistency constraints
Meet application needs
Outperform true transparent replication
How do we ensure constraints meet needs?
Formalisms to describe application needs
Methods to prove constraints adequate

34
Quasi-Copies(Alonso, Barbará, Garcia-Molina 90)

Data Caching
Each site keeps copy of data likely to be used
locally
Propagation cost of writes high
User-Defined Cache
Controlled Divergence
Weak consistency constraints
Bounds on the differences between copies
User defines constraints

35
Assumptions

Read-only copies
Updates sent to master copy
E.g., ORACLE Materialized View
User Specified Coherency
Strict limits
Hints
Example Stock Purchase
Place order based on delayed price
Limit order to ensure price paid okay

36
Selection Conditions

Identification clause
Select/Project Query
Modifier Clause
Add / drop from cache
Compulsory or advisory cache
Static / Dynamic As new objects meet the
identification clause, are they cached?
Triggering delay on dynamic

37
Coherency Conditions

Default (always enforced) Value was true once
Delay W(x,a) Max time lag
Version V(x) Number of updates
Periodic P(x) Time for refresh
Arithmetic A(x) Bounded Difference
Combine conditions with logical operators
Multi-object conditions
Consistency conditions on a group
Order of application in a group

38
CS 603Review

April 26, 2002

39
Remote Operation Mechanisms

Client-Server Model
Remote Procedure Call
Problem Remote Site must already know what we
want to do!
Process consists of
Code
Resources (files, devices, etc.)
Execution (data, stack, registers, etc.)
Fork copies everything
Is this needed?
Solution Copy part of the process

40
So where are we?

Models for Remote Processing
Server Request documented service
RPC Request execution of existing procedure
What if operation we want isnt offered remotely?
Solution Agents / Code Migration

41
Types of Code Migration
From Andrew Tanenbaum, Distributed Operating
Systems, 1995.
42
Types of Code Migration

Weak mobility Copy only code
Program starts from initial state
Example Java applets
Strong mobility Copy code and execution
Resume execution where it stopped
But doesnt necessarily have same resources (less
than fork)
Example DAgents (later), cluster computing
(Condor, LSF)

43
Types of Code Migration

Sender Initiated
Receiver Initiated
Examples
Java Applets
Receiver Initiated
Cluster computing
Sender Initiated?
Central manager initiated?

44
Types of Code Migration

Where executed?
In target process
In new process
Strong Mobility Move vs. Copy
Migrate process Ceases at originating site
Clone process Two copies in parallel

45
Resource Binding
Resource to Machine Binding Resource to Machine Binding Resource to Machine Binding Resource to Machine Binding Resource to Machine Binding
Process to Resource Binding Unattached Fastened Fixed
Process to Resource Binding Identifier Move Global Reference Global Reference
Process to Resource Binding Value Copy Value Global Reference Global Reference
Process to Resource Binding Type Rebind Locally Rebind locally Rebind Locally
46
The Hard Part Resources

Migrated process still needs resources
Options to Connect to a Resource (Fugetta et al.,
1998)
Binding by identifier (e.g., URL)
Attach to the same resource
Binding by value (e.g., standard libraries)
Bind to equivalent resource
Bind by type (e.g., local printer)
Bind to resource with same function

47
The Hard Part Resources

Alternative Move the Resource
Unattached resources (e.g., data files)
Relatively easy to move
Fastened resource (e.g., database)
Expensive to move
Fixed resource (e.g., communications endpoint)
Cant be moved

48
DCOM What is it?

Start with COM Component Object Model
Language-independent object interface
Add interprocess communication

49
DCOMDistributed COM

Looks like COM to the client
Built on DCE RPC
Extends to support full COM functionality

50
DCOM Architecture
51
Locating ObjectsActivation

CoCreateInstance(Ex)(ltCLSIDgt)
Interface pointer to uninitialized instance
Same as COM
CoiGetInstanceFromFile, FromStorage
Create new instance
CoGetClassObject(ltCLSIDgt)
Factory object that creates objects of ltCLSIDgt
CoGetClassObjectFromURL
Downloads necessary code from URL and
instantiates
Can take server name as parameter
Or default to server specified in DCOM
configuration on client machine
HKEY_CLASSES_ROOT\APPID\ltappid-guidgt
"RemoteServerName""ltDNS namegt
Also store information in ActiveDirectory

52
DCOM vs. CORBA

CORBA
Single interface name
Multiple inheritance
Dynamic Invocation Interface
C-style Exception Handling
Explicit and Implicit reference counts
Implemented by ORB with replaceable services

DCOM
Distinction between Class and Instance Identifier
Implement multiple interfaces
Type libraries for on-demand marshaling
32 Bit Error Code
Explicit reference count only
Implemented by many independent services

53
What is .NET?

Language for distributed computation
C, VB.NET, JScript
Protocols
SOAP, HTTP
Run-time environment
Common Language Runtime (CLR)
ActiveDirectory
Web Servers (ASP.NET)

54
COM/DCOM ? .NET

DCOM
IDL
Name, Monikers
Registry / ActiveDirectory
C, Visual Basic
DCE RPC
DCOM Network protocol (based on DCE standards)

.NET
Web Services Description Language (WSDL)
DISCO (URI grammar)
Universal Description Discovery and Integration
(UDDI)
C, VB.NET
SOAP
HTTP (presumed ubiquitous), SMTP (!?)

55
How .NET works

Query UDDI directory to get service location
Query service to get WSDL (interface
specification)
Build call (XML) based on WSDL spec.
Make call using SOAP
Parse XML results based on WSDL spec.

56
JiniJava Middleware

Tools to construct federation
Multiple devices, each with Java Virtual Machine
Multiple services
Uses (doesnt replace) Java RMI
Adds infrastructure to support distribution
Registration
Lookup
Security

57
Service

Basic unit of JINI system
Members provide services
Federate to share access to services
Services combined to accomplish tasks
Communicate using service protocol
Initial set defined
Add more on the fly

58
InfrastructureKey Components

RMI
Basic communication model
Distributed Security System
Integrated with RMI
Extends JVM security model
Discovery/join protocol
How to register and advertise services
Lookup services
Returns object implementing service (really a
local proxy)

59
Programming Model

Lookup
Leasing
Extends Java reference with notion of time
Events
Extends JavaBeans event model
Adds third-party transfer, delivery and
timeliness guarantees, possibility of delay
Transaction Interfaces

60
Jini Component Categories

Infrastructure Base features
Programming Model How you use them
Services What you build
Java / Jini Comparison

61
Failure Models

Failure System doesnt give desired behavior
Component-level failure (can compensate)
System-level failure (incorrect result)
Fault Cause of failure (component-level)
Transient Not repeatable
Intermittent Repeats, but (apparently)
independent of system operations
Permanent Exists until component repaired
Failure Model How the system behaves when it
doesnt behave properly

62
Failure Model(Flaviu Cristian, 1991)

Dependency
Proper operation of Database depends on proper
operation of processor, disk
Failure Classification
Type of response to failure
Failure semantics
State of system after given class of failure
Failure masking
High-level operation succeeds even if they depend
on failed services

63
Failure Classification

Correct
In response to inputs, behaves in a manner
consistent with the service specification
Omission Failure
Doesnt respond to input
Crash After first omission failure, subsequent
requests result in omission failure
Timing failure (early, late)
Correct response, but outside required time
window
Response failure
Value Wrong output for inputs
State Transition Server ends in wrong state

64
Crash Failure types(based on recovery behavior)

Amnesia
Server recovers to predefined state independent
of operations before crash
Partial amnesia
Some part of state is as before crash, rest to
predefined state
Pause
Recovers to state before omission failure
Halting
Never restarts

65
Failure Semantics
u
r
l
sr
f(sr)
sr
f(sr)

Max delay on link d Max service time p
Should get response in 2dp
Assume omission failure only
If no response in 2dp, resend request
What if performance failure possible?
Must distinguish between response to sr and sr

66
Failure Semantics

Specification for service must include
Failure-free (normal) semantics
Failure semantics (likely failure behaviors)
Multiple semantics
Combine to give (weaker) semantics
Arbitrary failure semantics Weakest possible
Choice of failure semantics
Is class of failure likely?
Probability of type of failure
What is the cost of failure
Catastrophic?

67
Failure Masking

Hierarchical failure masking
Dependency Higher level gets (at best) failure
semantics of lower level
Can compensate for lower level failure to improve
this
Group Failure Masking
Redundant servers
Allows failure semantics of group to be higher
than individuals
k-fault tolerant
Group can mask k concurrent group member failures
from client

68
Fault Tolerance

A distributed program A is said to tolerate
faults from a fault class F for an invariant P
iff there exists a predicate T for which
At any configuration where P holds, T also holds
(i.e., P ? T)
Starting from any state where T holds, if any
actions of A or F are executed, the resulting
state will always be one in which T holds (i.e.,
T is closed in A and T is closed in F)
Starting from any state where T holds, every
computation that executes actions from A alone
eventually reaches a state where P holds
If a program A tolerates faults from a fault
class F for invariant P, we say that A is
F-tolerant for P.

69
Forms of fault tolerance
Live Not live
Safe Masking Fail safe
Not safe Nonmasking none

For each entry, determine
F Fault class handled
T Set of states that can be reached

70
Reliable Multicast

Classes
Sender-initiated Acknowledge all packets
Scales poorly in normal operation
Receiver-initiated Request missing packets
Sender doesnt need receiver list
Scales poorly on failure (cascading failure?)
Tree-based, Ring-based protocols

71
Tree-based Protocols

Organize multicast group into tree
Children acknowledge to parent
Parent acknowledges when all children have
acknowledged
Advantages
Sender doesnt need to know full group
Solves unbounded memory
Scalable
Disadvantages
Rate paced by slowest acknowledgement path in tree

72
Ring-based protocols

Idea Token site responsible for retransmit
Sender multicasts
Token site multicasts ACK
Receivers request retransmit from token site if
ACK doesnt match what they have
Can only accept token if youve received
everything acknowledged
Keep packets since last time you had token
Advantages
Space
Low load on sender

73
Disaster Recovery

Problem complete failure at single site
Must have multiple sites
Thus a distributed problem
Two examples
Distributed Storage Palladio
Think wide-area RAID
Distributed Transactions Epoch algorithm

74
Epoch Algorithm (Garcia-Molina, Polyzois, and
Hagmann 1990)

1-Safe backup
No performance penalty
Multiple transaction streams
Use distribution to improve performance
Multiple Logs
Avoid single bottleneck

75
Algorithm Overview

Idea Transactions that can be committed
together grouped into epochs
Primaries write marker in log
Must agree when safe to write marker
Keep track of current epoch number
Master broadcasts when to end epoch
Backups commit epoch when all backups have
received marker

76
Correctnes Criteria

Atomicity If any writes of a transaction appear
at backup, all must appear
If ?W(Tx, d) at backup then?W(Tx, d), W(Tx, d)
exists at backup
Consistency If Ti ? Tj at primary, then
Local Tj installed at backup ? Ti installed at
backup
Mutual If W(Ti, d) and W(Tj, d), thenW(Ti, d)
? W(Tj, d)
Minimum Divergence If Tj is at the backup and
does not depend on a missing transaction, then it
should be installed at the backup

77
Single-Mark Algorithm