Distributed Systems: Message Passing, Clusters, and Implementation of Clusters in Representative Ope

1 / 31
About This Presentation
Title:

Distributed Systems: Message Passing, Clusters, and Implementation of Clusters in Representative Ope

Description:

Communication and synchronization mechanisms in distributed systems. Distributed ... Global update manger. Provides an update service for other components ... –

Number of Views:642
Avg rating:3.0/5.0
Slides: 32
Provided by: marius3
Learn more at: http://www.cs.iit.edu
Category:

less

Transcript and Presenter's Notes

Title: Distributed Systems: Message Passing, Clusters, and Implementation of Clusters in Representative Ope


1
Distributed Systems Message Passing,
Clusters, andImplementation of Clusters in
Representative Operating Systems

2
Distributed message passing
  • Communication and synchronization mechanisms in
    distributed systems
  • Distributed message passing
  • Remote procedure call
  • An implementation approach for message passing
  • Use the services of a message-passing module
  • Service is requested in the form of primitives
    and parameters

3
Distributed message passing (cont.)
  • Send primitive
  • Parameters
  • Destination process identifier
  • The message contents
  • Operation
  • Sending process uses Send primitive
    (destination, message contents)
  • Message-passing module constructs data unit with
    destination and contents
  • Data unit is sent to the destination machine
    using communication facility (e.g., TCP/IP)
  • Data unit is received by the destination machine
    and is routed by the communication facility to
    the message-passing module
  • The message-passing module stores the message in
    the buffer for the destination process
  • Receive primitive
  • Operation
  • Destination process assigns buffer area for
    messages and uses Receive primitive to the
    message passing module
  • Alternatively, message-passing module signals
    destination process with Receive' signal and
    places message in shared buffer

4
Distributed message passing (cont.)
  • Design issues
  • Reliability vs. unreliability
  • Blocking vs. non-blocking
  • Reliability vs. unreliability
  • Reliable message passing
  • Guarantees delivery if possible
  • Uses a reliable transport protocol
  • Performs error checking, acknowledgment,
    retransmission, and reordering of messages if
    delivered out of sequence
  • Acknowledgment to the sending process that
    delivery was either successful or it failed (e.g.
    network failure)
  • Unreliable message passing
  • Message-passing facility sends the message
    without reporting success or failure
  • Message passing facility has a simple design and
    low overhead
  • Applications may use Request and Reply to
    acknowledge delivery

5
Distributed message passing (cont.)
  • Blocking vs. non-blocking
  • Blocking or synchronous primitives
  • Blocking Send does not return control to the
    sending process (process suspended)
  • until
  • Message has been transmitted (unreliable
    service), or
  • Message has been sent and an acknowledgment
    received (reliable service)
  • Blocking Receive does not return control to the
    receiving process until
  • Message has been placed in the allocated buffer

6
Distributed message passing (cont.)
  • Blocking vs. non-blocking
  • Non-blocking or asynchronous primitives
  • Send primitive does not suspend process
  • Control returned to the process as soon as the
    message has been queued for transmission or a
    copy has been made
  • After the message has been transmitted or copied
    to a safe place for later transmission, sending
    process is interrupted to be informed that the
    message buffer is available
  • Receive primitive does not suspend process
  • Process is sent an interrupt upon message arrival
    or process can poll periodically for messages
  • Advantages/disadvantages
  • Efficient use of message passing mechanism
  • Difficult to test and debug time-dependent
    sequences can lead to obscure bugs

7
Remote procedure calls
  • Provides access to remote services by providing
    simple procedure call/return semantics, similar
    to those used for local services
  • Advantages
  • The procedure call is used extensively
  • Remote interfaces can be specified and clearly
    documented as a set of named operations with
    designated types
  • The interface is standardized
  • The communication code for an application can be
    generated automatically
  • Client/server modules can be easily ported
    between different OSs and target systems
  • Example of procedure call for the calling program
  • CALL P (X, Y)
  • where P procedure name
  • X passed arguments
  • Y returned values

8
Remote procedure calls (cont.)
  • Dummy or stub procedure on the local machine
  • Included in the callers address space or
    dynamically linked at call time
  • Creates message identifying remote procedure and
    includes parameters
  • Sends message to remote system and waits for
    reply
  • When reply arrives, it returns to the calling
    program providing the returned values
  • Dummy or stub procedure on the remote machine
  • Upon receiving the message, generates a local
    CALL P (X, Y)
  • Returns reply

9
Remote procedure calls (cont.)
  • Design issues
  • Parameter passing
  • Call by value (parameters passed as values)
  • Parameters copied into the message and sent to
    remote system
  • Easy to implement for RPCs
  • Call by reference (pointers to a location that
    contains the value)
  • More difficult to implement for RPCs
  • Parameters and results representation
  • No problem if the calling and called programs use
    the same language and run on the same type of OSs
    and machines
  • If there are differences, the remote procedure
    call mechanism must provide the conversion
    standardized format for common objects (e.g.,
    integers, characters)
  • Client/server binding
  • A client/server binding is established after the
    two applications have made a logical connection
    and are ready to exchange commands and data
  • Non-persistent binding Logical connection
    between the two processes established at the time
    of RPC and disconnected after the values are
    returned
  • Persistent binding Connection set up for RPC
    remains up after return

10
Remote procedure calls (cont.)
  • Design issues (cont.)
  • Synchronous vs. asynchronous
  • Synchronous RPC
  • Calling process waits for the returned values
  • Traditional, functions like a subroutine call
  • Easy to understand and test but leads to lower
    performance
  • Asynchronous RPC
  • Calling process is not blocked
  • Methods for synchronizing the client and the
    server
  • Higher layer applications in both client and
    server initiate the exchange and then verifies
    that all actions have been completed
  • Client uses a series of asynchronous RPCs
    followed by a synchronous RPC

11
Remote procedure calls (cont.)
  • Design issues (cont.)
  • Object-oriented mechanisms
  • Operation
  • Client sends request to an object request broker
  • Broker acts as a directory of all remote services
    on the network. Broker calls appropriate remote
    object and passes data.
  • Remote object services request, replies to
    broker, which returns response to client
  • Competing approaches
  • Common Object Request Broker Architecture (CORBA)
    from the Object Management Group, backed by IBM,
    Apple, Sun
  • Common Object Model (COM), the basis for Object
    Linking and Embedding (OLE) from Microsoft

12
Clusters
  • Cluster group of interconnected computers
    (nodes) working together as a unified computer
    recourse and creating the illusion of being one
    machine
  • Advantages of clusters
  • Absolute scalability
  • Clusters can consist of hundreds of machines,
    each being a multiprocessor
  • Incremental scalability
  • A cluster can grow in small increments with
    minimum service disruption
  • High availability
  • Fault-tolerant operation in software
  • High price/performance ratio
  • Off-the shelf building blocks

13
Clusters (cont.)
  • Cluster configurations
  • Passive standby
  • Active system processes the entire load, the
    standby takes over in case of failure of primary
  • Active sends heartbeat messages to standby to
    indicate continued operation
  • High cost no tasks sharing
  • Easy to implement
  • Active secondary
  • Secondary server is also used for processing
    tasks
  • Reduced cost due to tasks sharing
  • Increased complexity

14
Clusters (cont.)
  • Cluster configurations (cont.)
  • Separate servers
  • Each server has its own disk, no disks shared
  • Data copied between servers periodically
  • Scheduling assigns client requests to servers to
    balance the load
  • High availability
  • High server and network overhead due to data
    copying
  • Shared disks, non-shared volumes (shared nothing)
  • Common disks are partitioned into volumes, each
    volume owned by only one computer
  • On computer failure, cluster is reconfigured to
    assign volumes to remaining computers
  • Shared disks, shared volumes
  • Each computer has access to all volumes on all
    disks
  • Locking mechanism used to ensure that data is
    accessed by one computer at a time

15
Clusters (cont.)
  • OS design issues
  • Failure management
  • Highly available clusters
  • High probability that all resources will be in
    service
  • In case of failure, the queries in progress are
    lost
  • If retried, the query will be serviced by another
    computer in the cluster
  • Fault-tolerant clusters
  • Redundant shared disks and fault-tolerant
    operations
  • Fail-over switching an application from a failed
    system to an alternative
  • Fail-back the restoration of applications and
    data resources to the failed system after
    recovery
  • Load balancing
  • Load must be balanced among available computers
  • When a new computer is added to the cluster,
    loads needs to be rebalanced to include the new
    computer

16
Clusters (cont.)
  • OS design issues (cont.)
  • Parallelizing computation executing software
    from a single application in parallel
  • Parallelizing compiler
  • It is determined, at compile time, which parts of
    the application can be run in parallel
  • The parallel parts are assigned to different
    computers in the cluster
  • Parallelized application
  • The application is designed to run on the cluster
    and uses message passing for communication
  • Most powerful approach to exploit clusters
  • Parametric computing
  • Useful for programs that must be executed a large
    number of times, each time with a different set
    of parameters (e.g., a simulation model)
  • Parametric processing tools are needed to
    organize, run, and manage the jobs

17
Clusters (cont.)
  • Cluster computer architecture
  • All computers are interconnected by a high-speed
    LAN or switch
  • Each computer is capable of operating
    independently
  • A middleware layer of software runs on each
    computer to implement the cluster functionality
  • Provides a unified system image to the user,
    called a single-system image
  • Is responsible for providing load balancing and
    high availability
  • Middleware services and functions
  • Single entry point A user logs into the cluster,
    not on a specific computer
  • Single file hierarchy The user sees only a
    single file hierarchy, under one root directory
  • Single control point A default workstation is
    used for cluster management and control
  • Single virtual networking There is a single
    virtual network connecting the cluster computers,
    even if it consists of multiple interconnected
    networks

18
Clusters (cont.)
  • Middleware services and functions (cont.)
  • Single memory space A distributed shared memory
    is used to share variables
  • Single job-management system The cluster has a
    job scheduler and jobs are submitted to the
    cluster and not to individual computers
  • Single user interface A common graphic interface
    is used for all users, regardless of the
    workstation they use to enter the cluster
  • Single I/O space Any node can access any I/O
    device
  • Single process space A process on any node can
    create or communicate with any other process in
    the cluster
  • Check-pointing Process states and intermediate
    results are saved periodically, permitting
    rollback recovery after failures
  • Process migration Processes can mode inside the
    cluster to provide load balancing

19
Clusters (cont.)
  • Clusters compared with SMPs
  • SMPs
  • Easier to manage and configure than clusters
  • Much closer to the original uniprocessor model
  • Major difference from the uniprocessor is the
    scheduler function
  • Uses less physical space and requires less energy
    than a comparable cluster
  • SMP products are well established and stable
  • Clusters
  • Far superior to SMPs in terms of absolute and
    incremental scalability
  • Far superior in terms of availability
  • Clusters are likely to dominate the
    high-performance server market

20
Windows 2000 Cluster Server
  • The configuration is a shared-nothing cluster,
    where each volume and other resources are owned
    by a single system at a time (initially
    code-named Wolfpack)
  • Main concepts
  • Cluster Service
  • The software on each node responsible for
    cluster-specific activities
  • Resource
  • These are the resources managed by the cluster
    service
  • They are objects representing either physical
    hardware devices (e.g., disk drives, network
    cards) or logical items (e.g., disk volumes, IP
    addresses, applications, databases)
  • Resources are implemented as dynamically linked
    libraries (DLLs) and managed by a resource
    monitor
  • Online A resource is online at a node if it
    provides a service at that node
  • Group
  • A collection of resources that are managed as a
    single entity
  • Consists of all elements needed to run a specific
    application and to allow the client systems to
    connect to the service provided by that
    application
  • Operations can be performed on the entire group
    (e.g., transfer to another node)

21
Windows 2000 Cluster Server (cont.)
22
Windows 2000 Cluster Server (cont.)
  • The W2K Cluster Server components and their
    relationship in a single node of a cluster
  • Node manager
  • Responsible for maintaining this nodes
    membership in the cluster
  • It sends periodic heartbeat messages to the node
    managers of the other nodes in the same cluster
  • If it detects the loss of heartbeat messages from
    another node
  • It broadcasts a message to the entire cluster
  • All members exchange messages to verify their
    view of current cluster membership
  • If a node manager does not reply, it is removed
    from cluster and its active groups are
    transferred to one or more of the other nodes in
    the cluster
  • Configuration database manager
  • Responsible for the cluster configuration
    database
  • The database has information about all cluster
    resources, groups, and node ownership of groups
  • Database managers on all nodes communicate with
    each other to maintain a consistent view of
    configuration information in the cluster
  • The integrity of the database is maintained by
    using fault-resistant software for all changes to
    cluster configuration

23
Windows 2000 Cluster Server (cont.)
  • The W2K Cluster Server components and their
    relationship in a single node of a cluster
    (cont.)
  • Resource manager / fail-over manager
  • Responsible for management of resource groups
  • Initiates actions such as startup, reset, and
    fail-over
  • In case of fail-over, the fail-over managers on
    the active nodes negotiate the redistribution of
    resource groups from the failed node to the
    remaining active ones
  • When the node that failed has recovered, the
    fail-over managers may decide to move back some
    groups
  • Event processor
  • Connects all the components of the cluster
    service
  • Handles common operations
  • Controls cluster service initialization
  • Communications manager
  • Provides the facilities for message exchange with
    other nodes in the cluster
  • Global update manger
  • Provides an update service for other components

24
Sun cluster
  • Solaris UNIX has been extended to make the Sun
    Cluster distributed operating system
  • It appears to users and applications as a single
    computer running the Solaris OS
  • Components
  • Object and communications support
  • Process management
  • Networking
  • Global distributed file system

25
Sun cluster (cont.)
  • Object and communications support
  • Object oriented uses the CORBA object model to
    define objects and the remote procedure call
    (RPC) mechanism
  • Global process management
  • The location of a process is transparent to the
    user
  • Each process has a unique identifier within the
    cluster
  • Process migration is possible a process can move
    from node to node to achieve load balancing and
    for fail-over (caveat the threads of a single
    process must be on the same node)
  • Networking
  • Strategy
  • A packet filter is used to route packets to the
    proper node
  • Cluster appears externally as a single server
    with a single IP address
  • Operation
  • Incoming packets are received on the node that
    has the network adapter, filtered, and delivered
    to the correct target node for protocol
    processing over cluster interconnect
  • For outgoing packets, originating node performs
    protocol processing, transfers packet over
    cluster interconnect to the node that has
    external network physical connection

26
Sun cluster (cont.)
  • Global file system
  • Like the standard Solaris, the Sun Cluster is
    based on the the concepts of virtual node (vnode)
    and the virtual file system (vfs)
  • Standard Solaris
  • Vnode
  • The vnode structure is used to provide a
    general-purpose interface to all types of file
    systems
  • A vnode provides mapping to an object in any file
    system type (by contrast, an inode in UNIX can
    provide mapping to UNIX files only)
  • The vnode interface accepts general-purpose file
    manipulation commands (e.g., read, write) and
    translates them into the actions appropriate for
    the respective file system
  • Vfs
  • Vfs structures are used to describe entire file
    systems
  • The Vfs interface accepts general-purpose
    commands that operate on entire files and
    translates them into actions appropriate for a
    particular file system

27
Sun cluster (cont.)
  • Global file system (cont.)
  • Global file access
  • The global file system provides an uniform
    interface to files distributed over the cluster
  • Processes on all nodes use the same pathname to
    locate a file and can open any file
  • Implementation
  • A proxy file system was built on top of the
    existing Solaris file system at the vnode
    interface
  • Vfs/vnode operations are converted by the proxy
    layer into object invocations
  • The invoked object may reside on any node in the
    cluster it performs a local vnode/vfs operation
    on the underlying file system
  • Caching is used for file contents, directory
    information, and file attributes

28
Beowulf and Linux clusters
  • Beowulf
  • Beowulf project
  • Initiated under the NASA High Performance
    Computing and Communications (HPCC) project
  • Goal expand the capabilities of clustered PCs
    for performing important computational tasks
  • Widely implemented, the most important new
    cluster technology available
  • Beowulf features
  • Use of off-the shelf components, no custom
    components, available from many vendors
  • Dedicated processors
  • Dedicated private network (LAN or WAN or
    inter-networked combination)
  • Scalable I/O
  • Free software base and distributed computing
    tools
  • Return of the design and improvements to the
    community

29
Beowulf and Linux clusters (cont.)
30
Beowulf and Linux clusters (cont.)
  • Most Beowulf implementations use a cluster of
    Linux workstations or PCs
  • A representative Linux implementation of Beowulf
    contains
  • A number of workstations (not necessarily the
    same platform) all running Linux
  • Secondary storage at each workstation can be
    available for distributed access (e.g.,
    distributed file sharing)
  • The Linux nodes are interconnected with an
    off-the-shelf network (e.g., Ethernet switch or
    an interconnected set of Ethernet switches)
  • Beowulf software
  • Open-source Beowulf software
  • Beowulf tools and utilities
  • Linux kernel, modified to allow the individual
    nodes to participate in a number of global
    namespaces

31
Beowulf and Linux clusters (cont.)
  • Examples of Beowulf system software
  • Beowulf distributed process space (BPROC)
  • Allows a process to span multiple nodes in a
    cluster environment
  • Provides a mechanism for starting a process on
    another node without logging in that node
  • Makes all remote processes visible in the process
    table of the clusters front end node
  • Beowulf Ethernet channel bonding
  • Mechanism that joins multiple networks into a
    single logical network with high bandwidth
  • Distributes packets over the available device
    transmit queues
  • Provides load balancing over multiple Ethernets
    connected to Linux workstations
  • PVMSYNC
  • Provides a synchronization mechanism and shared
    data objects within a cluster
  • EnFusion
  • Set of tools for parametric computing, i.e.,
    execution of a program as a large number of jobs,
    each with different parameters
Write a Comment
User Comments (0)
About PowerShow.com