5' Consistency and Replication - PowerPoint PPT Presentation

1 / 39
About This Presentation
Title:

5' Consistency and Replication

Description:

Consistency is straightforward as write operations can be carried out at a single replica. ... Which replica is updated at which time depends on the ... – PowerPoint PPT presentation

Number of Views:62
Avg rating:3.0/5.0
Slides: 40
Provided by: george591
Category:

less

Transcript and Presenter's Notes

Title: 5' Consistency and Replication


1
5. Consistency and Replication
  • Introduction
  • Consistency Models
  • Distribution Protocols
  • Consistency Protocols

2
Learning Objectives
  • To understand the important reasons for having
    replicated data in DS, and the consistency issue
    in replication
  • To study the various major (data-centric)
    consistency models, such as strict consistency,
    sequential consistency, and weak consistency with
    synchronization variables, etc
  • To get a good understanding of various ways to
    distribute updates to replicas, independent of
    the underlying consistency model.
  • To examine several consistency protocols to show
    the actual implementation of consistency models.

3
Introduction Reasons and Main Issue
  • Two primary reasons for replicating data in DS
    reliability and performance.
  • Reliability It can continue working after one
    replica crashes by simply switch to one of the
    other replicas Also, it becomes possible to
    provide better protection against corrupted data.
  • Performance When the number of processes to
    access data managed by a server increases,
    performance can be improved by replicating the
    server and subsequently dividing the work Also,
    a copy of data can be placed in the proximity of
    the process using them to reduce the time of data
    access.
  • Consistency issue keeping all replicas
    up-to-date.

4
Introduction Consistency Issue
  • Intuitively, a collection of copies is consistent
    when the copies are always the same a read
    operation performed at any copy will always
    returns the same result.
  • Consequently, when an update operation is
    performed on one copy, the update should be
    propagated to all copies before a subsequent
    operation takes place.
  • Achieving such a tight consistency incurs high
    cost because updates need to be executed as
    atomic operation, and global synchronization is
    required.
  • The only real solution is to loosen the
    consistency constraints. Various consistency
    models have been proposed and used for
    replication in DS.

5
Data-Centric Consistency Models
  • Consistency model a contract between processes
    and the data store if processes agree to obey
    certain rules, the store promises to work
    correctly.
  • Data store any data, available by means of
    shared memory, shared database, or a file system
    in DS. A data store may be physically distributed
    across multiple machines.
  • Each process that can access data from the store
    is assumed to have a local (or nearby) copy
    available of the entire store. Write operation
    are propagated to other copies.
  • Consistency is discussed in the context of read
    and write operation on data store. When an
    operation changes the data, it is classified as
    write operation, otherwise, it is regarded as
    read operation.

6
Data-Centric Consistency Models
  • The general organization of a logical data store,
    physically distributed and replicated across
    multiple processes.

7
Consistency Models Strict Consistency
  • Strict consistency Any read on a data item x
    returns a value corresponding to the result of
    the most recent write on x.
  • The definition implicitly assumes the existence
    of absolute global time, so that the
    determination of most recent is unambiguous.
  • Uniprocessor systems traditionally observes the
    strict consistency. However, in DS, it is
    impossible that all writes are instantaneously
    visible to all processes and an absolute global
    time order is maintained.
  • In the following, the symbol Wi(x)a (Ri(x)b) is
    used to indicate a write by (read from) process
    Pi to data item x with value a (returning value
    b).

8
Strict Consistency
  • Behavior of two processes, operating on the same
    data item.
  • (a) A strictly consistent store.
  • (b) A store that is not strictly consistent.

9
Consistency Models Sequential Consistency
  • It is impossible to implement strict consistency
    in DS (why?). Furthermore, experience shows that
    users can often manage quite well with weaker
    consistency models.
  • Sequential consistency (Lamport, 1979) A data
    store is said to be sequentially consistent if it
    satisfies the following condition
  • The result of any execution is the same as if
    the (read and write) operations by all processes
    on the data store were executed in some
    sequential order and the operations of each
    individual process appear in this sequence in the
    order specified by its program.

10
Consistency Models Sequential Consistency
  • In sequential consistency model, when processes
    run concurrently, possibly on different machines,
    any interleaving of read and write operations is
    acceptance behavior, but all processes should see
    the same interleaving of operations.
  • In the examples in the next slide, it shows that
    the time does not play a role in sequential
    consistency.
  • Sequential consistency is comparable to
    serialization in the case of transaction. The
    former is defined in terms of read/write
    operations, while the latter in terms of
    transactions.
  • The sequential consistency model is useful,
    because the users are taught to program in such a
    way that exact order of statement execution does
    not matter. When such an order is essential,
    synchronization operations should be used.

11
Sequential Consistency (1)
  • A sequentially consistent data store.
  • A data store that is not sequentially consistent.

12
Sequential Consistency (2)
  • Three concurrently executing processes, x, y and
    z are 0 initially.

13
Sequential Consistency (3)
  • Four valid execution sequences for the processes
    of the previous slide. The vertical axis is time.

14
Consistency Models Weak Consistency
  • It is reasonable to let the process finish its
    critical section (for read/write operations) and
    then make sure that the final results are sent
    everywhere.
  • Using synchronization variables, weak consistency
    models have the following three properties
  • (1) Accesses to synchronization variables
    associated with a data store are sequentially
    consistent
  • (2) No operation on a synchronization variable
    is allowed to be performed until all previous
    writes have been completed everywhere
  • (3) No read or write operation on data items
    are allowed to be performed until all previous
    operations to synchronization variables have been
    performed.

15
Consistency Models Weak Consistency
  • In weak consistency, all processes see all
    operations on synchronization variables in the
    same order (Property 1) When the synchronization
    is done, all previous writes are guaranteed to be
    done as well (Property 2) By doing
    synchronization, before reading shared data, a
    process can be sure of getting the most recent
    values (Property 3).
  • Weak consistency enforces (sequential)
    consistency on a group of operations. It is most
    useful when isolated accesses to shared data are
    rare, with most accesses coming in clusters.
  • In weak consistency, we now limit only the time
    when consistency holds, rather than limiting the
    form of consistency.

16
Weak Consistency (1)
int a, b, c, d, e, x, y / variables /int
p, q / pointers /int f( int p, int
q) / function prototype / a x
x / a stored in register /b y
y / b as well /c aaa bb a
b / used later /d a a c / used
later /p a / p gets address of a /q
b / q gets address of b /e f(p,
q) / function call /
  • A program fragment in which some variables may be
    kept in registers.

17
Weak Consistency (2)
  • A valid sequence of events for weak consistency.
  • An invalid sequence for weak consistency.

18
Summary of Consistency Models
19
Distribution Protocols Replica Placement
  • Several ways of distributing (propagating)
    updates to replicas, independent of the supported
    consistency model, have been proposed.
  • Replica Placement deciding where, when, and by
    whom copies of the data store are to be placed.
  • Three different types of copies, permanent
    replicas, server-initiated replicas, and
    client-initiated replicas, can be distinguished,
    and logically organized as show in the next
    slide.
  • Permanent replicas the initial set of replicas
    constituting a distributed data store.

20
Replica Placement
  • The logical organization of different kinds of
    copies of a data store into three concentric
    rings.

21
Distribution Protocols Replica Placement
  • Server-initiated replicas copies of a data store
    for enhancing performance. They are created at
    the initiative of the (owner of the) data store.
  • For example, it may be worthwhile to install a
    number of such replicas of a Web server in
    regions where many requests are coming from.
  • One of the major problems with such replicas is
    to decide exactly where and when the replicas
    should be created or deleted.
  • Server-initiated replication is gradually
    increasing in popularity, especially in the
    context of Web hosting services. Such hosting
    services can dynamically replicate files to
    servers close to demanding clients.

22
Server-Initiated Replicas
  • Counting access requests from different clients.

23
Distribution Protocols Replica Placement
  • Client-initiated replicas copies created at the
    initiative of clients, known as caches.
  • In principle, managing the cache is left entirely
    to the client, but there are many occasions in
    which the client can rely on participation from
    the data store to inform it when the cached data
    has become stale.
  • Placement of client caches is relatively simple
    a cache is normally placed in the same machine as
    its client, or on a machine shared by clients in
    the same LAN.
  • Data are generally kept in a cache for a limited
    amount time to prevent extremely stale data from
    being used, or simply to make room for other data.

24
Distribution Protocols Update Propagation
  • Update operations on replicas are generally
    initiated at a client and subsequently forwarded
    to one of the copies. From there, the update
    should be propagated to other copies, while
    ensuring consistency.
  • What is to be propagated there are three
    possibilities
  • (1) a notification of an update
    (invalidation protocol)
  • (2) data from one copy to another
  • (3) the update operation to other copies
    (active replication).
  • In invalidation protocol in (1), other copies are
    informed about an update on a data, and the data
    are no longer valid. It uses little network
    bandwidth, suitable for relatively small
    read-to-write ratio.

25
Distribution Protocols Update Propagation
  • Transferring the modified data in (2) is useful
    when the read-to-write ratio is relatively high.
    It is also possible to log the changes and
    transfer only those logs to save bandwidth, and
    multiple modifications can be packed into a
    single message to save communication overhead,
  • In the active replication in (3), updates can
    often be propagated at minimal bandwidth costs,
    provided the size of the parameters associated
    with an operation are relatively small. However,
    more processing power may be required by each
    replica, especially for complex operations.
  • Whether updates are pushed or pulled push-based
    approach is referred to as server-based protocol
    while pull-based one is referred to as
    client-based protocol.

26
Distribution Protocols Update Propagation
  • Push-based approach updates are propagated to
    other replicas without those replicas even asking
    for the updates, which are often used between
    permanent and server-initiated replicas, for a
    relatively high degree of consistency.
  • Pull-based approach a server or client requests
    another server to send it any updates it has at
    that moment, which is often used by client cache.
    It is efficient when the read-to-update ratio is
    relatively low (e.g., in the case of client
    cache).
  • Unicast or multicast should be used In umicast,
    if a server that updates a replica sends its
    update to N other servers, it does so by sending
    N separate update messages, one to each server.
    With multicast, the underlying network takes care
    of sending a multicast message efficiently to
    multiple receivers.

27
Pull versus Push Protocols
  • A comparison between push-based and pull-based
    protocols in the case of multiple client, single
    server systems.

28
Consistency Protocols Primary-Based Protocols
  • Consistency protocol describing an
    implementation of a specific consistency model,
    including sequential consistency, weak
    consistency with synchronization variable, as
    well as atomic transactions.
  • Primary-Based Protocol Each data item x in the
    data store has an associated primary, which is
    responsible for coordinating write operations on
    x.
  • A distinction can be made as to whether the
    primary is fixed at a remote server or if write
    operations can be carried out locally after
    moving the primary to the process where the write
    operation is initiated.

29
Consistency Protocols Primary-Based Protocols
  • The simplest primary-based (remote-write)
    protocol is the one in which all read and write
    operations are carried out at a (remote) single
    server. Data are not replicated at all, which is
    traditionally used in client-server systems.
  • The primary-backup (remote-write) protocols allow
    processes to perform read operations on a locally
    available copy, but should forward write
    operations to a (fixed) primary copy (see the
    next slide).
  • The primary-backup protocols provide a
    straightforward implementation of sequential
    consistency, as the primary can order all
    incoming writes.

30
Remote-Write Protocols (1)
  • Primary-based remote-write protocol with a fixed
    server to which all read and write operations are
    forwarded.

31
Remote-Write Protocols (2)
  • The principle of primary-backup protocol.

32
Consistency Protocols Primary-Based Protocols
  • In the simple primary-based (local-write)
    protocols, there is only a single copy of each
    data item x whenever a process wants to perform
    an operation on some data item, it is first
    transferred to the process, then the operation is
    performed.
  • In the primary-backup (local-write) protocols,
    the primary copy migrates between processes that
    wish to perform a write operation. The main
    advantage is that multiple, successive write
    operations can be carried out locally, while
    reading processes can still access their local
    copies.
  • Consistency is straightforward as write
    operations can be carried out at a single replica.

33
Local-Write Protocols (1)
  • Primary-based local-write protocol in which a
    single copy is migrated between processes.

34
Local-Write Protocols (2)
  • Primary-backup protocol in which the primary
    migrates to the process wanting to perform an
    update.

35
Consistency Protocols Replicated-Write Protocols
  • Replicated-write protocols write operations can
    be carried out at multiple replicas.
  • Active replication each replica has an
    associated process that carries out update
    operations. Updates are generally propagated by
    means of the write operation that causes the
    update. It is also possible to send update.
  • Quorum-Based Protocols the basic idea is to
    require clients to request and acquire the
    permission of multiple servers before reading or
    writing a replicated date item.

36
Consistency Protocols Cache-Coherence
  • In the middleware-based DSs built on top of
    general-purpose OSs, software-based solutions to
    caches are more feasible.
  • Coherence detection strategy to decide when
    inconsistencies are actually detected. When a
    cached data item is accessed, the client needs to
    verify whether the data item is still consistent
    with the version stored at the server If not,
    the new version in the server should be obtained.
  • Coherence enforcement strategy to determine how
    caches are kept consistent with the copies stored
    at the servers.
  • Two major methods for coherence enforcement The
    first is to let a server send an invalidation to
    all caches whenever a data item is modified the
    second one is to simply propagate the update.

37
Summary I
  • Two major reasons for replicating data are
    improving the reliability and enhancing
    performance in a DS.
  • Replication introduces consistency problem
    whenever a replica is updated, it becomes
    different from the others.
  • To keep replicas consistent, we need to propagate
    updates in such a way that temporary
    inconsistencies are not noticed, but doing so
    incurs high cost (even impossible in DS). The
    only solution is to see whether consistency can
    be somewhat relaxed.
  • Strict consistency states that a read operation
    always returns the most recent value written.
    With lack of global time in DS, it cannot be
    realized.

38
Summary II
  • Sequential consistency provides the semantics
    that users expect in concurrent programming all
    write operations are seen by everyone in the same
    order.
  • Weak-consistency assumes that each series of
    read/write operations is appropriately
    bracketed by accompanying operations on
    synchronization variables, such as locks. They
    are generally easier to implement in an efficient
    way than stronger models such as sequential
    consistency.
  • To propagate (distribute) updates, a distinction
    needs to be made concerning what is exactly
    propagated, to where updates are propagated, and
    by whom propagation is initiated.

39
Summary III
  • In update propagation, notification, operations,
    and state can be propagated. Which replica is
    updated at which time depends on the distribution
    protocol. Finally, a choice can be made whether
    updates are pushed to other replicas, or a
    replica pulls in updates from another replica.
  • Primary-based and replicated-write protocols are
    consistency protocols for sequential consistency
    and its variants.
  • In primary-based protocols, all updates
    operations are forwarded to a primary copy that
    subsequently ensures the update is properly
    ordered and forwarded.
  • In replication-write protocols, an update is
    forwarded to several replicas at the same time
    correctly ordering operations often becomes more
    difficult.
Write a Comment
User Comments (0)
About PowerShow.com