Byzantine%20Techniques%20II - PowerPoint PPT Presentation

About This Presentation
Title:

Byzantine%20Techniques%20II

Description:

BAR Fault Tolerance for Cooperative Services. Amitanand S. Aiyer, et. al. (SOSP 2005) ... Data on BAR-B can be retrieved within the lease period ... – PowerPoint PPT presentation

Number of Views:45
Avg rating:3.0/5.0
Slides: 124
Provided by: csCor
Category:

less

Transcript and Presenter's Notes

Title: Byzantine%20Techniques%20II


1
Byzantine Techniques II
  • Justin W. Hart
  • CS 614
  • 12/01/2005

2
Papers
  • BAR Fault Tolerance for Cooperative Services.
    Amitanand S. Aiyer, et. al. (SOSP 2005)
  • Fault-scalable Byzantine Fault-Tolerant Services.
    Michael Abd-El-Malek et.al.  SOSP 2005

3
BAR Fault Tolerance for Distributed Services
  • BAR Model
  • General Three-Level Architecture
  • BAR-B

4
Motivation
  • General approach to constructing cooperative
    services that span multiple administrative
    domains (MADs)

5
Why is this difficult?
  • Nodes are under control of multiple
    administrators
  • Broken Byzantine behaviors.
  • Misconfigured, or configured with malicious
    intent.
  • Selfish Rational behaviors
  • Alter the protocol to increase local utility

6
Other models?
  • Byzantine Models Account for Byzantine
    behavior, but do not handle rational behavior.
  • Rational Models Account for rational behavior,
    but may break with Byzantine behavior.

7
BAR Model
  • Byzantine
  • Behaving arbitrarily or maliciously
  • Altruistic
  • Execute the proposed program, whether it benefits
    them or not
  • Rational
  • Deviate from the proposed program for purposes of
    local benefit

8
BART BAR Tolerant
  • Its a cruel world
  • At most (n-2)/3 nodes in the system are Byzantine
  • The rest are rational

9
Two classes of protocols
  • Incentive-Compatible Byzantine Fault Tolerant
    (IC-BFT)
  • Guarantees a set of safety and liveliness
    properties
  • It is in the best interest of rational nodes to
    follow the protocol exactly
  • Byzantine Altruistic Rational Tolerant
  • Guarantees a set of safety and liveliness
    properties despite the presence of rational nodes
  • IC-BFT is a subset of BART

10
An important concept
  • It isnt enough for a protocol to survive drills
    of a handful of attacks. It must provably
    provide its guarantees.

11
A flavor of things to come
  • Protocol builds on Practical Byzantine Fault
    Tolerance in order to combat Byzantine behavior
  • Protocol uses game theoretical concepts in order
    to combat rational behavior

12
A taste of Nash Equilibrium
Swerve Go Straight
Swerve 0, 0 -1,1
Go Straight 1,-1 X_X,X_X -100,-100
13
and the nodes are starving!
  • Nodes require access to a state machine in order
    to complete their objectives
  • Protocol contains methods for punishing rational
    nodes, including denying them access to the state
    machine

14
An expensive notion of identity
  • Identity is established through cryptographic
    keys assigned through a trusted authority
  • Prevents Sybil attacks
  • Bounds the number of Byzantine nodes
  • Gives rational nodes reason to consider long-term
    consequences of their actions
  • Gives real world grounding to identity

15
Assumptions about rational nodes
  • Receive long-term benefit from staying in the
    protocol
  • Conservative when computing the impact of
    Byzantine nodes on their utility
  • If the protocol provides a Nash equilibrium,
    then all rational nodes will follow it
  • Rational nodes do not colludecolluding nodes
    are classified as Byzantine

16
Byzantine nodes
  • Byzantine fault model
  • Strong adversary
  • Adversary can coordinate collusion attacks

17
Important concepts
  • Promptness principal
  • Proof of Misbehavior (POM)
  • Cost balancing

18
Promptness principal
  • If a rational node gains no benefit from delaying
    a message, it will send it as soon as possible

19
Proof of Misbehavior (POM)
  • Self-contained, cryptographic proof of wrongdoing
  • Provides accountability to nodes for their actions

20
Example of POM
  • Node A requests that Node B store a chunk
  • Node B replies that it has stored the chunk
  • Later Node A requests that chunk back
  • Node B sends back random garbage (it hadnt
    stored the chunk) and a signature
  • Because Node A stored a hash of the chunk, it can
    demonstrate misbehavior on part of Node B

21
but its a bit more complicated than that!
  • This corresponds to a rather simple behavior to
    combat. Aggressively Byzantine behavior.

22
Passive-aggressive behaviors
  • Harder cases than aggressively Byzantine
  • A malicious Node A could merely lie about
    misbehavior on the part of Node B
  • A node could exploit non-determinism in order to
    shirk work

23
Cost Balancing
  • If two behaviors have the same cost, there is no
    reason to choose the wrong one

24
Three-Level Architecture
25
Level 1
  • Unilaterally deny service to nodes that fail to
    deliver messages
  • Tit-for-Tat
  • Balance costs
  • No incentive to make the wrong choice
  • Penance
  • Unilaterally impose extra work on nodes with
    untimely responses

26
Level 2
  • Failure to respond to a request by a state
    machine will generate a POM from a quorum of
    nodes in the state machine

27
Level 3
  • Makes use of reliable work assignment
  • Needs only to provide sufficient information to
    identify valid request/response pairs

28
Nuts and Bolts
  • Level 1
  • Level 2

29
Level 1
  • Ensure long-term benefit to participants
  • The RSM rotates the leadership role to
    participants.
  • Participants want to stay in the system in order
    to control the RSM and complete their protocols
  • Limit non-determinism
  • Self interested nodes could hide behind
    non-determinism to shirk work
  • Use Terminating Reliable Broadcast, rather than
    consensus.
  • In TRB, only the sender can propose a value
  • Other nodes can only adopt this value, or choose
    a default value

30
Level 1
  • Mitigate the effects of residual non-determinism
  • Cost balancing
  • The protocol preferred choice is no more
    expensive than any other
  • Encouraging timeliness
  • Nodes can inflict sanctions on untimely messages
  • Enforce predictable communication patterns
  • Nodes have to have participated at every step in
    order to have the opportunity to issue a command

31
Terminating Reliable Broadcast
32
3f2 nodes, rather than 3f1
  • Suppose a sender s is slow
  • The same group of nodes now want to determine
    that s is slow
  • A new leader is elected
  • Every node but s wants a timely conclusion to
    this, in order to get their turn to propose a
    value to the state machine
  • s is not allowed to participate in this quorum

33
TRB provides a few guarantees
  • They differ during periods of synchrony and
    periods of asynchrony

34
In synchrony
  • Termination
  • Every non-Byzantine process delivers exactly one
    message
  • Agreement
  • If on non-Byzantine process delivers a message m,
    then all non-Byzantine processes eventually
    deliver m

35
In asynchrony
  • Integrity
  • If a non-Byzantine process delivers m, then the
    sender sent m
  • Non-Triviality
  • If the sender is non-Byzantine and sends m, then
    the sender eventually delivers m

36
Message Queue
  • Enforces predictable communication patterns
  • Bubbles
  • A simple retaliation policy
  • Node As message queue is filled with messages
    that it intends to send to Node B
  • This message queue is interleaved with bubbles.
  • Bubbles contain predicates indicating messages
    expected from B
  • No message except the expected predicate from B
    can fill the bubble
  • No messages in As queue will go to B until B
    fills the bubble

37
Balanced Messages
  • Weve already discussed this quite a bit
  • We assure this at this level of the protocol
  • This is where we get our gigantic timeout message

38
Penance
  • Untimely vector
  • Tracks a nodes perception of the responsiveness
    of other nodes
  • When a node becomes a sender, it includes its
    untimely vector with the message

39
Penance
  • All nodes but the sender receive penance messages
    from each node.
  • Because of bubbles, each untimely node must sent
    a penance message back in order to continue using
    the system
  • This provides a penalty to those nodes
  • The sender is excluded from this process, because
    it may be motivated to lie in its penance vector,
    in order to avoid the work of transmitting
    penance messages

40
Timeouts and Garbage Collection
  • Set-turn timeout
  • Timeout to take leadership away from the sender
  • Initially 10 seconds in this implementation, in
    order to overcome all expected network delays
  • Can only be changed by the sender
  • Max_response_time
  • Time at which a node is removed from the system,
    its messages discarded and its resources garbage
    collected
  • Set to 1 week or 1 month in the prototypes

41
Global Punishment
  • Badlists
  • Transform local suspicion into POMs
  • Suspicion is recorded in a local nodes badlist
  • Sender includes its badlist with its message
  • If, over time, recipients see a node in f 1
    different senders badlists, then they too,
    consider that node to be faulty

42
Proof
  • Real proofs do not appear in this paper, they
    appear in the technical report

43
but heres a bit
  • Theorem 1 The TRB protocol satisfies
    Termination, Agreement, Integrity and
    Non-Triviality

44
and a bit more
  • Theorem 2 No node has a unilateral incentive to
    deviate from the protocol
  • Lemma 1 No rational node r benefits from
    delaying sending the set-turn message
  • Follows from penance
  • Lemma 2 No rational node r benefits from sending
    the set-turn message early
  • Sending early could result in senderTO to be sent
    (this protocol uses synchronized clocks, and all
    messages are cryptographically signed)

45
and the rest thats mentioned in the paper
  • Lemma 3 No rational node r benefits from sending
    a malformed set-turn message.
  • The set-turn message only contains the turn
    number. Because of this, doing so reduces to
    either sending early (dealt with in Lemma 1) or
    sending late (dealt with in Lemma 2)

46
Level 2
  • State machine replication is sufficient to
    support a backup service, but the overhead is
    unacceptable
  • 100 participants 100 MB backed up 10 GB of
    drive space
  • Assign work to individual nodes, using arithmetic
    codes to provide low-overhead fault-tolerant
    storage

47
Guaranteed Response
  • Direct communication is insufficient when nodes
    can behave rationally
  • We introduce a witness that overhears the
    conversation
  • This eliminates ambiguity
  • Messages are routed through this intermediary

48
Guaranteed Response
49
Guaranteed Response
  • Node A sends a request to Node B through the
    witness
  • The witness stores the request, and enters
    RequestReceived state
  • Node B sends a response to Node A through the
    witness
  • The witness stores the response, and enters
    ResponseReceived

50
Guaranteed Response
  • Deviation from this protocol will cause the
    witness to either notice the timeout from Node B
    or lying on the part of Node A

51
Implementation
  • The system must remain incentive-compatible
  • Communication with the witness node is not in the
    form of actual message sending, it is in the form
    of a command to the RSM
  • Theorem 3 If the witness node enters the
    request received state, for some work w to
    rational node b, then b will execute w
  • Holds if sufficient sanctions exist to cause it
    to be motivated to do this

52
State limiting
  • State is limited by limiting the number of slots
    (nodes with which a node can communicate)
    available to a node
  • Applies a limit to the memory overhead
  • Limits the rate at which requests are inserted
    into the system
  • Forces nodes to acknowledge responses to requests
  • Nodes want their slots back

53
Optimization through Credible Threats
54
Optimization through Credible Threats
  • Returns to game theory
  • Protocol is optimized so nodes can communicate
    directly. Add a fast path
  • Nodes register vows with the witness
  • If recipient does not respond, nodes proceed to
    the unoptimized case
  • Analogous to a driver in chicken throwing their
    steering wheel out the window

55
Periodic Work Protocol
  • Witness checks that periodic tasks, such as
    system maintenance are performed
  • It is expected that, with a certain frequency,
    each node in the system will perform such a task
  • Failure to perform one will generate a POM from
    the witness

56
Authoritative Time Service
  • Maintains authoritative time
  • Binds messages sent to that time
  • Guaranteed response protocol relies on this for
    generating NoResponses

57
Authoritative Time Service
  • Each submission to the state machine contains the
    timestamp of the proposer
  • Timestamp is taken to be the maximum of the
    median of timestamps of the previous f1
    decisions
  • If no decision is decided, then the timestamp
    is the previous authoritative time

58
Level 3 BAR-B
  • BAR-B is a cooperative backup system
  • Three operations
  • Store
  • Retrieve
  • Audit

59
Storage
  • Nodes break files up into chunks
  • Chunks are encrypted
  • Chunks are stored on remote nodes
  • Remote nodes send signed receipts and store
    StoreInfos

60
Retrieval
  • A node storing a chunk can respond to a request
    for a chunk with
  • The chunk
  • A demonstration that the chunks lease has
    expired
  • A more recent StoreInfo

61
Auditing
  • Receipts constitute audit records
  • Nodes will exchange receipts in order to verify
    compliance with storage quotas

62
Arithmetic Coding
  • Arithmetic coding is used to keep storage size
    reasonable
  • 1 GB of storage requires 1.3 GB of overhead
  • Keeping this ratio reasonable is crucial to
    motivate self-interested nodes to participate

63
Request-Response pattern
  • Store
  • Retrieve
  • Audit

64
Retrieve
  • Originator sends a Receipt for the StoreInfo to
    be retrieved
  • Storage node can send
  • A RetrieveConfirm
  • Containing the data and the receipt
  • A RetrieveDeny
  • Containing a receipt and a proof regarding why
  • Anything else
  • Generates a POM

65
Store
  • Originator sends a StoreInfo to be stored
  • Storage node can send
  • A receipt
  • A StoreReject
  • Demonstrates that the node has reached its
    storage commitment
  • Anything else
  • Generates a POM

66
Audit
  • Three phases
  • Auditor requests both OwnList and StoreList from
    auditee
  • Does this for random nodes in the system
  • Lists are checked for inconsistencies
  • Inconsistencies result in a POM

67
Time constraints
  • Data is stored for 30 days
  • After this, it is garbage collected
  • Nodes must renew their leases on stored chunks,
    in order to keep them in the system, prior to
    this expiration

68
Sanctions
  • Periodic work protocol forces generation of POMs
    or special NoPOMs
  • POMs and NoPOMs are balanced
  • POMs evict nodes from the system

69
Recovery
  • Nodes must be able to recover after failures
  • Chained membership certificates are used in order
    to allow them to retrieve their old chunks
  • Use of certificate later in the chain is regarded
    as a new node entering the system
  • The old node is regarded as dead
  • The new node is allowed to view the old nodes
    chunks

70
Recovery
  • This forces nodes to redistribute their chunks
    that were on that node
  • Length of chains is limited, in order to prevent
    nodes from shirking work by using a certificate
    later in the chain

71
Guarantees
  • Data on BAR-B can be retrieved within the lease
    period
  • No POM can be gathered against a node that does
    not deviate from the protocol
  • No node can store more than its quota
  • A time window is available to nodes with
    catastrophic failures for recovery

72
Evaluation
  • Performance is inferior to protocols that do note
    make these guarantees, but acceptable

73
Impact of additional nodes
74
Impact of rotating leadership
75
Impact of fast path optimization
76
Fault-Scalable Byzantine Fault-Tolerant Services
  • Query/Update (Q/U) protocol
  • Optimistic quorum based protocol
  • Better throughput and fault-scalability than
    Replicated State Machines
  • Introduces preferred quorum as an optimization on
    quorum protocols

77
Motivation
  • Compelling need for services and distributed data
    structures to be efficient and fault-tolerant
  • In Byzantine fault-tolerant systems, performance
    drops off sharply as more faults are tolerated

78
Fault Scalability
  • A fault-scalable service is one in which
    performance degrades gracefully as more server
    faults are tolerated

79
Operations-based interface
  • Provides an interface similar to RSMs
  • Exports interfaces comprised of deterministic
    methods
  • Queries
  • Do not modify data
  • Updates
  • Modify data
  • Multi-object updates
  • Allow a set of objects to be updated together

80
Properties
  • Operates correctly under an asynchronous model
  • Queries and updates are strictly serializable
  • In benign execution, they are obstruction-free
  • Cost is an increase in the number of required
    servers 5b 1 servers, rather than 3b 1 servers

81
Optimism
  • Servers store a version history of objects
  • Updates are non-destructive to the objects
  • Use of logical timestamps based on contents of
    update and object state upon which the update is
    conditioned

82
Speedups
  • Preferred quorum, rather than random quorum
  • Addressed later
  • Efficient cryptographic techniques
  • Addressed later

83
Efficiency and Scalability
84
Efficiency
  • Most failure atomic protocols require at least a
    2 phase commit
  • Prepare
  • Commit
  • The optimistic approach does not need a prepare
    phase
  • This introduces the need for clients to repair
    inconsistent objects
  • The optimistic approach also obviates the need
    for locking!

85
Versioning Servers
  • In order to allow for this, versioning servers
    are employed
  • Each update creates a new version on the server
  • Updates contain information about the version to
    be updated.
  • If no update has been committed since that
    version, the update goes through unimpeded.

86
Throughput-scalability
  • Additional servers, beyond those necessary to
    provide the desired fault tolerance, can provide
    additional throughput

87
Scaleup pitfall?
  • Encourage the use of fine-grained objects, which
    reduce per-object contention
  • If majority of accesses access individual
    objects, or few objects, then scaleup pitfall can
    be avoided
  • In the example applications, this holds.

88
No need to partition
  • Other systems achieve throughput-scalability by
    partitioning services
  • This is unnecessary in this system

89
The Query/Update Protocol
90
System model
  • Asynchronous timing
  • Clients and servers may be Byzantine faulty
  • Clients and servers assumed to be computationally
    bounded, assuring effectiveness of cryptography
  • Failure model is a hybrid failure model
  • Benign
  • Malevolent
  • Faulty

91
System model
  • Extends definition of fail prone system given
    by Malkhi and Reiter

92
System model
  • Point-to-point authenticated channels exist
    between all clients and servers
  • Infrastructure deploying symmetric keys on all
    channels
  • Channels are assumed unreliable
  • but, of course, they can be made reliable

93
Overview
  • Clients update objects by issuing requests
    stamped with object versions to version servers.
  • Version servers evaluate these requests.
  • If the request is over an out of date version,
    the clients version is corrected and the request
    reissued
  • If an out of date server is required to reach a
    quorum, it retrieves an object history from a
    group of other servers
  • If the version matches the server version, of
    course, it is executed
  • Everything else is a variation upon this theme

94
Overview
  • Queries are read only methods
  • Updates modify an object
  • Methods exported take arguments and return
    answers
  • Clients perform operations by issuing requests to
    a quorum
  • A server receives a request. If it accepts it it
    invokes a method
  • Each update creates a new object version

95
Overview
  • The object version is kept with its logical
    timestamp in a version history called the replica
    history
  • Servers return replica histories in response to
    requests
  • Clients store replica histories in their object
    history set, an array of replicas indexed by
    server

96
Overview
  • Timestamps in these histories are candidates for
    future operations
  • Candidates are classified in order to determine
    which object version a method should be executed
    upon

97
Overview
  • In non-optimistic operation, a client may need to
    perform a repair
  • Addressed later
  • To perform an operation, a client first retrieves
    an object history set. The clients operation is
    conditioned on this set, which is transmitted
    with the operation.

98
Overview
  • The client sends this operation to a quorum of
    servers.
  • To promote efficiency, the client sends the
    request to a preferred quorum
  • Addressed later
  • Single phase operation hinges on the availability
    of a preferred quorum, and on concurrency-free
    access.

99
Overview
  • Before executing a request, servers first
    validate its integrity.
  • This is important, servers do not communicate
    object histories directly to each other, so the
    clients data must be validated.
  • Servers use authenticators to do this, lists of
    HMACs that prevent malevolent nodes from
    fabricating replica histories.
  • Servers cull replica histories from the
    conditioned on OHS that they cannot validate

100
Overview the last bit
  • Servers validate that they do not have a higher
    timestamp in their local replica histories
  • Failing this, the client repairs
  • Passing this, the method is executed, and the new
    timestamp created
  • Timestamps are crafted such that they always
    increase in value

101
Preferred Quorums
  • Traditional quorum systems use random quorums,
    but this means that servers frequently need to be
    synced
  • This is to distribute the load
  • Preferred quorums choose to access servers with
    the most up to date data, assuring that syncs
    happen less often

102
Preferred Quorums
  • If a preferred quorum cannot be met, clients
    probe for additional servers to add to the quorum
  • Authenticators make it impossible to forge object
    histories for benign servers
  • The new host syncs with b1 host servers, in
    order to validate that the data is correct
  • In the prototype, probing selects servers such
    that the load is distributed using a method
    parameterized on object ID and server ID

103
Concurrency and Repair
  • Concurrent access to an object may fail
  • Two operations
  • Barrier
  • Barrier candidates have no data associated with
    them, and so are safe to select during periods of
    contention
  • Barrier advances the logical clock so as to
    prevent earlier timestamps from completing
  • Copy
  • Copies the latest object data past the barrier,
    so it can be acted upon

104
Concurrency and Repair
  • Clients may repeatedly barrier each other, to
    combat this, an exponential backoff strategy is
    enforced

105
Classification and Constraints
  • Based on partial observations of the global
    system state, an operation may be
  • Complete
  • Repairable
  • Can be repaired using the copy and barrier
    strategy
  • Incomplete

106
Multi-Object Updates
  • In this case, servers lock their local copies, if
    they approve the OHS, the update goes through
  • If not, a multi-object repair protocol goes
    through
  • In this case, repair depends on the ability to
    establish all objects in the set
  • Objects in the set are only repairable if all are
    repairable. If objects in the set that would be
    repairable are reclassified as incomplete.

107
An example of all of this
108
Implementation details
109
Cached object history set
  • Clients cache object history sets during
    execution, and execute updates without first
    querying.
  • Failing the request based on an out of date OHS,
    the server returns an up-to-date OHS with the
    failure

110
Optimistic query execution
  • If a client has not accessed an object recently,
    it is still possible to complete in a single
    phase.
  • Servers execute the update on the latest object
    that they store. Clients then evaluate the
    result normally.

111
Inline repair
  • Does not require a barrier and copy
  • Repairs the candidate in-place, obviating the
    need for a round trip
  • Only possible in cases where there is no
    contention

112
Handling repeated requests
  • Mechanisms may cause requests to be repeated
  • In order to shortcut other checks, the timestamp
    is checked first

113
Retry and backoff policies
  • Update-update requires retry, and backoff to
    avoid livelock
  • Update-query does not, the query can be updated
    in place

114
Object syncing
  • Only 1 server needs to send the entire object
    version state
  • Others send hashes
  • Syncing server then calculates hash and comparers
    against all others

115
Other speedups
  • Authenticators
  • Authenticators use HMACs rather than digital
    signatures
  • Compact timestamps
  • Hashes are used rather than object histories in
    timestamps using a collision resistant hash
  • Compact replica histories
  • Replica histories are prune based on the
    conditioned-on timestamp after updates

116
Malevolent components
  • The astute among you must have noticed the
    possibility of DOS attacking by refusing
    exponential backoff
  • Servers could rate-limit clients
  • Clients could also issue updates to a subset of a
    quorum, forcing incomplete updates
  • Lazy verification can be used to verify
    correctness of client operations in the
    background
  • The amount of unverified work by a client can
    then be limited

117
Correctness
  • Operations are strictly serializable
  • To understand, consider the conditioned-on chain.
  • All operations chain back to the initial
    candidate, and a total order is imposed through
    on all established operations
  • Operations occur atomically, including those
    spanning multiple objects
  • If no operations span multiple objects, then
    correct operations that complete are also
    linearizable

118
Tests
  • Tests performed on a rack of 76 Intel Pentium 4
    2.8 GHz machines
  • Implemented an increment method and an NFSv3
    metadata service

119
Fault Scalability
120
More fault-scalability
121
Isolated vs Contending
122
NFSv3 metadata
123
References
  • Text and images have been borrowed directly from
    both papers.
Write a Comment
User Comments (0)
About PowerShow.com