Title: Distributed Processing and Networking
1Distributed Processing and Networking
- Chapter 13
- A Brief Overview
2Centralized Systems v. Distributed Systems
- Centralized system (uniprocessor, SMP)
- shared memory, tightly coupled hardware, single
clock - user applications run on the central computer
data storage is centralized as well - users may have a terminal or low-end PC for
communication with central computing facility - some processes run locally, others on the large
centralized system. - processes communicate using the shared memory
model
3Centralized Systems v. Distributed Systems
- Distributed Systems
- multiple separate or whole computers each has
its own memory, clock - individual computers (nodes) are connected by
some kind of communications network - nodes share resources disk storage, I/O
facilities, etc. - processes at separate nodes communicate using the
message passing model - some process run locally, some could benefit from
being distributed across several processors.
4Software for Distributed Systems
- Distributed systems can be connected by software
in a number of different ways. - A variety of connective techniques exist, as the
following examples show - communications architecture
- network operating system (NOS)
- distributed operating system (DOS)
5Communications Architecture
- Software to connect computers that are primarily
stand-alone. - The connection is designed to support
applications such as email, file transfer, etc. - Computers on the network may be heterogeneous and
have different operating systems. - TCP/IP is the most common example
6Network Operating System (NOS)
- The distributed system consists of a network of
machines, including servers - Servers support file system, printers, etc.
- The NOS is an add-on to the local OS
- The user is fully aware that there are separate
machines in the system. - A common communications architecture supports the
NOS - NSF or Windows NT are examples
7Distributed Operating System (DOS)
- A DOS tries to make a distributed system look
like a centralized system - Users can transparently access all system
resources as if they were local - no need to name
remote sites explicitly. - DOS must still use some kind of communications
architecture. - True distributed operating systems are still
mostly experimental.
8Review
- Centralized versus distributed architectures
- Architectural differences
- System software for distributed architectures
- Communications software (e.g TCP/IP)
- NOS (TCP/IP non-transparent resource sharing)
- DOS (transparent resource sharing)
9The Need for Communication Protocols
- A protocol is a formal set of rules that governs
interaction between two entities (here, processes
or computers) - Issues include
- agreement on data format message format
- a negotiation to make sure the receiver is ready
to accept a message - a routing mechanism to forward the message across
the network and numerous other details
10Communication Protocols
- Communication protocols are designed as layered
systems. - The same set of protocols exist on each machine
communication is between peer layers on the
communicating machines. - Layers on the sender side append information to
the message, the corresponding layer on the
receiver side uses the information
11Protocol Architecture
- A protocol architecture describes the functions
that must be performed to support computer
communication. - The architecture structures the functions into a
set of layered modules. - The next slide shows a simplified architecture
for accomplishing a file transfer.
12File Transfer
13The File Transfer Protocol
- In the previous slide, each module incorporates
several logical functions. - e.g., the file transfer application is concerned
with things like passwords and specific file
operations - Each module provides/receives services for other
modules in the stack.
14TCP/IP Protocol Architecture
- TCP/IP (Transmission Control Protocol/Internet
Protocol) was developed by DARPA to support
networking in support of defense-related
projects. - Today, TCP/IP is the basic communication
architecture for the Internet.
15Packet Switching
- TCP/IP is an example of a packet-switching
protocol. - The original message is broken up into small,
fixed-size units (packets). - A header is incrementally appended to each packet
by the sender. - The receiver uses the header to interpret the
message.
16TCP/IP Layers
- One view of TCP/IP divides it into 5 layers, from
the bottom (hardware dependent) to the top
(abstractions) - Physical layer
- Network access layer
- Internet layer (IP)
- Transport layer (TCP)
- Application layer
17TCP/IP Layers
- Physical governs the physical interface between
a computer and the network - Network handles details of data exchange between
a computer and the network. This layer is
network dependent e.g. Ethernet versus Myrinet.
18TCP/IP Layers
- Internet (network) layer handles routing
(point-to-point transmission) of packets. - Packets are routed from sender to receiver,
possibly through multiple steps. - Routers are special processors that connect two
networks and switch a packet from one network to
the next. - At each step the network layer handles the
transmission.
19TCP/IP Layers
- Transport (host-to-host) Layer responsible for
providing reliable transmission. - Packets are numbered and transmitted when they
are received, they are reassembled in the
original order. - Each packet must be acknowledged. If the sender
fails to get an ack, the transport layer will
re-transmit the message. - Applications that dont want added overhead can
use UDP instead of TCP protocol at this level.
20TCP/IP Layers
- Application layer contains the code needed to
support network applications. - There must be a separate module for each
application. - Applications that run on TCP/IP include SMTP
(Simple Mail Transfer Protocol), FTP, and TELNET.
21Ports
- Messages sent from one host machine to another
are associated with a specific process at each
end. - The network layer only needs to know sending and
receiving host, to get the message to the right
computer. - The transport layer needs to know the process
identity - A port is associated with a particular process,
so messages are actually sent from Host X, Port Y
to Host I, Port J.
22Sockets
- Socket concept developed at Berkeley.
- Every message has a source port and a destination
port. - Host IP address port value socket
consequently, a socket is unique throughout the
Internet - Sockets act as communication endpoints.
23Distributed Processing, Client/Server, and
Clusters
24Distributed Processing
- A category of processing in which various parts
of an application may be processed at different
nodes in a network. - Location of processing will ideally be determined
by such things as load balancing and the choice
of the most appropriate platform for a task.
25Client-Server Computing
- Client-server processing is based on the
following model client processes request
services from server processes. - Client machines are often single user systems
- Server machines support multiple users (clients)
and provide specific services
26Generic Client/Server Environment
27Client/Server Computing
- Client machines are connected to servers through
some type of communication software probably
TCP/IP. - Applications are divided into tasks, each task
executes where it can be handled most
efficiently. - For example, the client will provide the user
interface to the system. - The server may do all or part of the processing.
28Fat Client/Thin Client
- In the fat-client model, much of the processing
is done locally on the client. - Requires high-end PCs or workstations.
- Maintaining large numbers of client machines is
hard upgrades, etc. must be applied locally to
each machine. - Thin-client model does most processing on the
server - Maintenance is centralized, therefore simpler
- Client machines can be much simpler
29(No Transcript)
30Client/server Applications
- File servers store files for a distributed
system. - Print servers allow multiple users to share a set
of printers. - Web servers provide documents and forms
- Database servers store and process data for large
applications. - Name servers map domain names to IP addresses.
31Database Applications
- In business applications the database is often
the primary computing application. - The server is a database server
- Interaction between client and server is in the
form of transactions - the client makes a database request and receives
a database response - Server maintains the database
32(No Transcript)
33Cache Consistency
- Clients may cache files/data from the server in
client caches to reduce network transmission
time. - Several clients may cache the same data.
- Caches are consistent if they contain exact
copies of the remote (server-based) data. - If one client updates a file, other copies are
now stale out of date. - The cache consistency problem how to maintain
local caches in a consistent state.
34(No Transcript)
35Caches in a Distributed System
- Client caches and server cache are all in primary
memory no special hardware. (On the client
side, caching may also use disk.) - When a client process accesses a file
- Check local cache(s)
- If not present, check server cache
- If not present, retrieve from server disk (the
primary copy)
36Distributed File Caches
- The advantage of client-side caching is a
reduction in network access time, and a reduction
of network load. - The disadvantage is the possibility of
inconsistency. - Note that if one client modifies cached data the
servers copy is stale and so are any other
copies cached at other clients.
37Middleware
- How do clients communicate with servers from
different vendors with different APIs? - One approach middleware software that glues
together two different applications. - Middleware becomes another layer in the
architecture of a client/server system
38(No Transcript)
39Middleware
- Client/server products and communication
architectures are not standardized. - Ideally, developers should be able to design
applications that use uniform methods to access
data regardless of the platform or system that
supplies the data. - Middleware provides a standard programming
interface to support this uniformity.
40Middleware
- A set of tools that provide a uniform way to
access system resources across all platforms - Enable programmers to use the same method to
access data, regardless of where it is located. - Example middleware products that link a database
system to a Web server. - Users can request data from the database using
forms displayed on a browser. The Web server can
return dynamic Web pages based on the user's
requests.
41Logical View of Middleware
APIs
Platform Interfaces
42Middleware
- There are both client and server components to
the middleware (both client and server must be
able to interact with this level) - Objective provide uniform access to different
systems. - Examples CORBA, SOAP, DCOM
- Middleware is typically based on either message
passing or remote procedure calls.
43Peer to Peer (P2P) Processing
- P2P is an alternative to client/server
processing. - Client/server has a non-symmetric structure
different nodes have different capabilities. - In P2P processing every node has the capability
of acting as a client or a server. - Most familiar in music-sharing services, but not
limited to that application.
44P2P
- P2P systems are more unstructured than
client/server. - They distribute network load more evenly across
the network, dont suffer from congestion around
server nodes, etc. - Resources from many different host machines can
be shared
45P2P
- Napster made the term popular, although strictly
speaking it did not have a true P2P structure (it
used a central server to locate resources.) - P2P systems are more loosely structured than
traditional C/S they include nodes that are only
intermittently connected to the network, are not
as reliable as managed servers, and may even be
malicious.
46P2P
- A drawback to P2P is the difficulty of locating
resources (because of the lack of centralized
servers)
47Distributed Communication
- Message passing is the only communication
technique for processes in distributed systems,
since there is no shared memory. - Remote Procedure Calls (RPC) provide an interface
to message passing, so processes can interact
using call/return semantics, as in ordinary
procedure or function calls.
48Message Passing
- Message passing was covered in Chapter 5 as a
contrast to shared memory communication between
processes. - In some systems it merely provides an alternate
communication mode (e.g. client/server operating
systems support message passing between modules) - In a distributed system there is no other choice.
49Basic Message-Passing Primitives
50Message Passing Review
- Message passing primitives
- Send (message, destination)
- Receive (message, source)
- In a network, TCP/IP protocols typically govern
message formats. - Messages are typically broken into smaller pieces
(packets) which are transmitted over the network
51Design Issues for Messages
- Reliability versus unreliability
- Reliable message passing guarantees that the
message will be received - Reliable message passing usually relies on a
reliable communication protocol, such as TCP - Unreliable doesnt (which makes it faster)
- However, since network communication is subject
to failure, results arent guaranteed.
52Design Issues for Messages
- Blocking versus Nonblocking
- Nonblocking (asynchronous) primitives return
control as soon as the OS has processed the
command - Sender regains control when the message has been
copied to kernel buffer (or queued for
transmission) - Blocking (synchronous)
- Sender blocks until message has been sent
(unreliable) or acknowledged (reliable) - Receiver blocks until a message is received.
53Remote Procedure Calls (RPC)
- Allow programs on different machines to interact
using simple procedure call/return semantics - Widely accepted
- Standardized
- Client and server modules can be moved among
computers and operating systems easily
54(No Transcript)
55Synchronous vs. Asynchronous
- Synchronous RPC
- Behaves much like a subroutine call
- Caller must wait for results before proceeding
- Asynchronous RPC
- Does not block the caller
- Enables a client execution to proceed locally in
parallel with server invocation - Suitable for some application (e.g., dont wait
for a print server)
56Cluster Computing
- Alternative to symmetric multiprocessing (SMP)
- Group of interconnected computers working
together as a unified computing resource - Illusion is one machine (ideally)
- Individual nodes in a cluster may, themselves, be
multiprocessors.
57Benefits of Clustering
- Absolute scalability can have much more
computing power than any standalone machine - Incremental scalability cluster size can
increase as needs increase small clusters can
grow. - High availability if one node fails, the others
can continue to process fault tolerant - Superior price/performance cluster can be built
more cheaply than a multiprocessor of equivalent
power.
58Applications
- Cluster servers
- Provide redundancy for fault tolerance
- Partition workload across several servers
- Server clusters can share large RAID disk
clusters and/or have private disks - Parallel programming large scientific or
engineering applications require huge amounts of
processing power
59Cluster Computer Architecture
- Machines in a cluster are generally connected by
a high-speed network which may or may not be
connected to the outside world - Each node runs its own OS, but also shares
software (middleware) to support internode
communication and interoperability.
60Cluster Computer Architecture
61Cluster Computing
- The middleware layer may not provide full
transparency. - Many parallel programming applications are
structured using PVM or MPI (message passing
packages) as the support structure for managing
parallel operations.
62Beowulf Clusters
- A Beowulf cluster can be homemade it is
characterized by being composed of off-the-shelf
components both hardware and OS software. - For example, PCs running Linux
63Beowulf Features
- Mass market commodity components
- Dedicated processors (as opposed to sharing CPU
time with local users) - Dedicated high-speed network
- No custom components