Title: Introduction to Distributed Systems
1Introduction to Distributed Systems
- What is a Distributed System?
- Why do we want to have distributed systems?
- Different Types of Distributed Computer Systems
- Middleware DOS Vs. NOS
- Some Examples Applications
- Resources Sharing in a Distributed System
- Challenges and Problems
- New Developments in Distributed Systems
2- What is a distributed system?
- A definition based on what we want to have
- From Single Processor System -gt
- Distributed System Middleware
- What is a middleware?
- A definition based on what we want to have and
the current situations
3What is a Distributed System?
- What is a distributed system?
- A system which is distributed
- What is a system? Multiple components, connected,
defined functions, etc - What is a centralized system? Standalone
computer??? - Centralized Vs. distributed (performance and
structures) - What is the meaning of distributed? Separated
connected? - What are the implications (problems) of being
distributed? Many! - A distributed system is a collection of
independent computers that appear to the system
as a single computer (physically distributed but
logically centralized) (abstraction) - - Tanenbaum 2002
- A distributed system is one in which components
located at networked computers communicate and
coordinate their actions only by passing messages
(components) (connection) - - Dollimore et al. 2005
4Abstraction Vs. Connection
- Which definition is better?
- A. Tanenbaum (functionally single but physically
distributed) - B. Dollimore (physically distributed) (A or B?)
- Why do we have these two definitions?
- Any more? Any suggestions?
- What are the differences between these two
definitions in - System development and implementation
- Services provided
- Performance
5Why do we want to have distributed systems?
- Two basic reasons for going distributed
- Performance reasons
- Reduce response time (better performance)
- Distributed systems give better performance
(normally) - More processing units, larger memory, more data
for processing - Performance tradeoffs (security, reliability, )
- Resource sharing
- More resources (data, hardware, memory, computing
units, ) for sharing across the networks - I.e., sharing of a printer, memory, disk
storages, CPUs, etc
6Different Types of Computer Systems
- What is a Computer (Computing system)?
- Hardware software (system boundary)
- Functionally
- A machine that can perform computation
- What is the meaning of computation or
compute? - Structurally
- A specially designed machine a CPU, memory
devices and I/O devices, etc. - Mostly, a computer can be used for multiple
(general) purposes (loading different programs to
execute for different purposes) - Do we have computers for specific purposes? Yes?
No? - Hardware
- Single processor, multiple processors, multiple
computers, loosely coupled, tightly coupled
hardware - Software (supported by OS applications)
- Single process, multiple processes, concurrent
processes
7Performance Issues
- In the old days, a computer has a single CPU and
processes jobs sequentially one by one without
interleaving - How to improve the performance of a computer
system? - More and better (faster) hardware, operating
environment, efficient coding, - How to measure the performance of a computer
system? - I.e., Response time, throughput (number of jobs
completed per unit time), utilization - Limitations of machines adding more resources
(faster CPU, more CPUs, more memory, ) - Performance is limited by bottleneck resource
sequentially uses resources - Limitations of operating environment Concurrent
execution of processes may not be allowed or
limited (each time can only serve one job
although multiple jobs may be active) - What are other considerations in addition to
these performance measures? Reliability,
security, availability,
8Single Thread and Single CPU
What is the meaning of a thread in here? Single
active process
U S / A Q U / (1 U) R QS S U
utilization S service time A inter-arrival
time Q queue length
9Multiple Threads and Single CPU
Executing
How to determine the switching order and the
number active processes?
Process A
CPU
Process E
Process D
Process B
Process C
Switching among Processes A, B and C
Multiple active processes
What is the main benefit of having multiple
threads? If Process A is suspended, i.e., due to
waiting input data, the CPU may execute Process
B What are the overheads? Context switching
10Concurrent ProcessesMultiple Threads and
Multiple CPU
CPU 1
Process A
CPU 2
Process E
Process D
Process B
Processes A, B and C are executed
concurrently Shorten response time (waiting time)
CPU 3
Process C
What is ignored in this figure? Data structure
algorithm (modeling of application environment
into the computer virtual environment)
11Different Types of Computer Systems
- Centralized Computer Systems
- Computing units are physically located at the
same site - Note the users may be distributed
- No network delay in processing (communication/memo
ry data access delay gt 0) - What are the implications? Timing and
synchronization are easier - Single processor or multiprocessors
12Centralized Computer System
Computing units (may be multi-processor and
multithreading)
request
Simple user interface for submitting requests
result
network
user
13Centralized Computer Systems
- Performance problems of centralized computer
systems - All requests (jobs) are performed at a
centralized site - Workload at the site could be very heavy
(overloaded) (unpredictable performance) - Q U / (1 U)
- Transmission delay of job requests and results to
and from the originating site and the centralized
site (requests may be from remote users) - Scalability problem
- Management of a very large amount of data (i.e.,
a large database) - I.e., making a phone call requires location
management of millions of mobile users (searching
a tree) - Reliability problem single point of failure
- Price/performance a power machine (mainframe)
(millions HKD) Vs. several cheap machines (PCs)
(a few thousand HKD)
14Multicomputer/Multiprocessor Systems
- Multiprocessors are aimed to resolve the
performance problem (i.e., shorten response time
and higher throughput) in CCS - Note a single processor can complete a program
in 10 sec does not mean that using two processor
can finish it in 5 sec (why?) - Different architectures of multiprocessor/multicom
puter systems - Different degrees of sharing of hardware
resources - Varieties in machine architecture and operation
environment of different machines (how to
organize the processors) - Multiprocessor system (tightly coupled)
- All processors map to the same memory address
space - Multicomputer system (loosely coupled)
- Each processor (computer) has its own private
memory - Heterogeneous and homogeneous
- How to make the sharing?
- Needs the redesign of the whole architecture of
the processors and computer, and the support of
the operation systems - What are the functions of an operating system?
15Shared-Memory Architecture
16Shared-Disk Architecture
17Shared-Nothing Architecture
Are they distributed systems? Structurally YES
18Multiple threads Single processor system
Tightly coupled system
Single thread Single processor system
Multiple threads multiple processors system
Loosely coupled system
Distributed computers
19What are distributed?
- Hardware resources
- Software resources (various types of services)
- What is software? Specifying the functions to be
performed, normally in steps - How to divide a single software program into
several software programs to be executed by
different computing units? - How to implement an algorithm into distributed
processes? I.e., a searching algorithm becomes a
distributed searching algorithm - Data
- I.e., a large database is partitioned into
several fragments to be maintained by different
local database systems - How to process the distributed data? I.e., a
SELECT state to access to distributed database.
20Operating Systems
- How to connect the different machines together?
- What are the tasks of an OS?
- Distributed operating systems (DOS)
- Network operating systems (NOS)
- Middleware
21Distributed Systems Services
- DOS
- An operating system for distributed computers
- Not intended for independent computers (can join
and leave independently) - The computers have high degree of coupling and
similarity in structure, architecture and
operating environment - NOS
- An operating system for loosely connected
computers and could be very different in
structure, architecture and operating environment - Does not intended to provide a view of single
coherent system - Add an additional layer (middleware) to achieve
the two objectives - To hide the heterogeneity (differences) and
provide a high degree of transparency
22- Why does DOS have these limitations, such as high
degree of coupling, not for independent computer
and heterogeneous computers? - If you are asked to design a new OS, will you
choose to build a new OS which is a DOS or use a
NOS and add a new layer as middleware?
23Network Operating Systems
- In principles, there is NO distributed operating
systems (DOS). Why? - An operating system that produces a single system
image like this for all the resources in a
distributed system - The DOS has total control over all the nodes in
the system and it transparently locates new
processes and resources at whatever node suits
its scheduling policies - Examples of NOS Unix and Windows
- They provides networking capability and can
access to remote resources - NOS retains autonomy in managing their own
resources. Processes created by the process
resided at another machine has no control of its
child process
24Middleware Positioning
- A distributed system organized as middleware on
top of a network operating system to hide the
heterogeneity of the underlying platform from the
applications - The middleware layer extends over multiple
machines - Applications become operating system independent
but middleware dependent - The primary function to be provided from the
middleware is the various types of transparency
services (What is the meaning of transparency?
Transparent to whom? What are the benefits?) - The machines to the user program are logically a
single machine (Why?) - Each local operating system forming a part of the
entire network operating system provides local
resource management
25Middleware Positioning
NOS
26Transparencies Provided by Middleware
Different forms of transparency in a distributed
system
27Middleware Services
In an open middleware-based distributed system,
the protocols used by each middleware layer
should be the same, as well as the interfaces
they offer to applications.
28A Comparison of Different Architectures
29Middleware Services
- Some common services from middleware
- Distributed file systems (accessing a remote file
like accessing a local file) - Remote procedure calls (RPC) (calling a procedure
supported by a remote node is similar to calling
local procedure) - Distributed objects
- Distributed documents
- High levels communication facilities that hides
the low level message passing - Naming services allow the search of remote
entities - Persistence storage of data
- Distributed transaction management
- Security
- Note Many of them are resource management jobs
30- What is a distributed system?
- By connecting existing Computer Systems
- A definition based on the existing
architecture/structure
31Distributed Systems Concepts of Networked
Computers
- Components gt processes (communicating processes)
- Networked computers gt connected (loosely
coupled) computers for sharing of resources - Networked computers
- Similar to loosely coupled hardware
- Spatially (physically) separated
- Communication delays are long and unpredictable
gt when to decide for time-out (in case of
network failure, worst-case estimation) - Concurrent execution of processes are common
(concurrency) gt performance - No global clocks
- Coordinating processes at different networked
computers - What are the problems of lacking a global clock
- What is the main function of a global clock?
Event sequencing - Independent failures
32Examples of Distributed Systems
- The Internet
- Variety a large number of different types of
networked computers connected using a set of
standard communication protocols - Mostly a share of information and resources
- A lot of reading requests
- Use the same interfaces and protocols to access
remote resources - Intranets
- A portion of the Internet separately
administrated and has a boundary - Configured with local security policy
- Connect to the Internet through a router and
protected with a firewall - A firewall filters incoming and outgoing messages
- Mobile computing and ubiquitous computing
- Mobile computing provides computing services
while the application is moving (also called
nomadic computing) - Ubiquitous computing provide computing services
everywhere (smart spaces) (also called pervasive
computing)
33Examples of Distributed Systems
A typical portion of the Internet
34Examples of Distributed Systems
A typical intranet
35Examples of Distributed Systems
Portable and handheld devices in a distributed
system
36Examples of Distributed Systems
- Note Computers are NOT just Internet computers
- What are the applications of computer systems?
- Personal, commercial, government and many others
- Computers are not just for entertainment, (i.e.,
playing games , chatting with people), there are
still many various applications such as
controlling a flight, weather forecast, stock
trading, - Many of these applications are distributed in
nature, i.e., stock trading systems and ticket
booking systems - Our real world gt virtual world in computers
- Our world is distributed. We are the computing
unit. Our brain is the memory unit and we have
communication facilities - They are better to be supported by a distributed
architecture instead of a centralized
architecture - Distributed users, distributed data and
distributed resources - We use a single computer in the past mainly
because building distributed computer systems
were expensive
37Some Benefits of Distributed Systems
- Price/performance
- Computers are expensive in the past
- Easier to manage a centralized computer system
- A cost-effective way to build a larger system
(higher performance) is to use several cheap CPUs
or connecting the existing small computers to
form a large system - Reliability
- If one machine crashes, the system as a whole can
still survive - What are the different types of failures?
Different degrees of reliability gt some
functions are failed, multiple components provide
the same function - Nature of some applications
- Some applications are inherently distributed
(e.g. banking and supermarket chain) - Some applications are moving (Examples? Why?)
38Example
39Some Benefits of Distributed Systems
- Communication
- It provides communication facilities (i.e. same
communication protocol) - Sending emails and transmitting documents to
different users - Flexibility
- Load balancing
- It spreads the workload over the available
machines in the most cost-effective way - Dynamic workload management (performance Vs.
workload) - Performance gt response time
- Given a workload, under what situation, the
response time is the smallest? - Different nodes have similar utilization gt
minimum response time - Note These two are not benefits (Why?)
40Resources Sharing in a Distributed System
- Many physical resources are distributed in nature
(devices) - The sources for generating soft resources
(information/data) are also distributed in nature
- I.e. weather, news, sport results, ticket
information, etc. - A natural trend to share resources
- Data and software sharing
- It allows many users access to a remote database
or even download a program for execution locally - Device sharing
- It allows many users to share expensive
peripherals - I.e., Printers and other peripherals
- Computation power
- Computation may be performed by remote computer
- Incremental growth and scalability
- Computing power can be added in small increments
41Resources Sharing in a Distributed System
- Note resource sharing is NOT always good
- Why do you want to sharing of resources with
other users? - Although you access to other users resources,
you also need to provide your resources for other
users to access to - If you have all your required resource, what do
you want? Sharing? No sharing? - What are the problems associated with resource
sharing? - Security, management problems, access problems,
reliability,
42Resources Sharing in a Distributed System
- How to access to remote resources? Through a
Resource manager - What is a resource manager?
- A program that offers a communication interface
enabling the resource to be accessed, manipulated
and updated reliably and consistently - What should the resource manager do?
- Provide resource name (naming services)
- Identify resource location (distributed directory
management) - Map resource name to communication address
(naming directory management) - Coordinate concurrent accesses to ensure
consistency (correctness) - Different scales in sharing
- Internet and computer-supported cooperative
working (CSCW) - Resource encapsulation (security)
- Only the resource manager can access the resource
- Other users send request to the resource manager
using a standard way and protocol
43Example Association (Group Management)
- Multiple objects
- Multiple objects co-exist in a distributed
environment. Some of them are service providers
and the others are users - Association at least one of a given pair of
components communicates with another within the
system (cooperatively perform a task (provide
services)) - After association gt Interoperation the
interaction during association - Association is spontaneous (without user
intervention) - Network bootrapping
- Communication takes place over a local network
within the system - The device acquires an address (ID and a name) on
the local network - Who determine the assignment and manage the
network
44Centralized Vs. Distributed Management
- Management (Algorithm) gt Centralized OR
Distributed - Centralized approach use a powerful server to
manage the space status and connection
information - Distributed approach multiple devices (service
providers) manage the information - Comparisons
- Problems in distributed computing
- Perform operations at device level because of
limited bandwidth - Due to the dynamic properties of the objects, a
lot of updates are needed to be generated - A distributed approach can make the management of
objects to be localized and adaptive to the
changing systems status (in-network processing).
But, the communication overhead could be very
heavy - A hierarchical approach multiple levels with
different levels of coordinators may be used
45Example Jinis Discovery System
- Java based system for mobile and pervasive
computing systems - Components lookup services (discovery services),
Jini services and Jini clients - A Jini service provides services
- The lookup stores services
- Jini clients request services
- Lookup service allows Jini services to register
the services they offer - A Jini service may be registered with one or more
lookup services - Jini clients request services that match their
requirements - If a match is found, the Jini client downloads an
object that provides access to the service from
the lookup service
46Example Jinis Discovery System
- When a Jini client or service starts up, it sends
a request to a well-known IP multicast address - Any lookup service that receives the request
sends its address enabling the requester to
perform a remote invocation to look up or
register a service with it - The client requires a lookup service in the
finance group so it multicasts a request with
that group name - Only one lookup is bound to the group name and
that service responds including its address - The client communicates directly using RMI to
locate all services of type printing - Only one printing service has registered with the
lookup service - The client then uses the printing directly
47Service Discovery in Jini
admin
1. finance lookup service
Printing
service
Client
admin
Client
Lookup
service
Network
2. Here I am .....
4. Use printing
service
admin, finance
Lookup
3. Request printing
service
Printing
Corporate
infoservice
service
finance
48- How to satisfy the definitions (requirements) of
a distributed system? - Tanenbaums requirements and others
- Challenges???
49Challenges of Distributed Systems
- Heterogeneity
- Openness
- Security
- Scalability
- Failure handling
- Concurrency
- Transparency
50Heterogeneity
- One of the most important aims of the middleware
is to hide the differences in underlying systems - Applications access remote objects and resources
using a standard way (interface and protocols) as
they are managed locally - Heterogeneity Vs. Transparency
- Differences in
- Networks (LAN, WAN, wireless LAN, GSM, etc.)
- Computer hardware (different types of CPUs and
machines) - Operating systems (unix, windows, WinCE, etc.)
- Programming languages (C, Java, C, etc.)
- Implementations by different developers
- Standardization Although the Internet consists
of different types of networks, all the
communications use the same set of Internet
protocols
51Openness
- Expandability it is the characteristic that
determines whether the system can be extended in
various ways and connected to other systems
(interoperability) - New users can join the Internet at any time
- New resources can be added and be made available
for use - Portability an existing application developed
for a specific distributed system can be moved
to work in another distributed system - Standardization of interface for accessing the
resources - Flexibility a distributed system should be
easily configured (reconfigured) even the system
components are from different developers - Need to provide definitions not only for the high
level interface but also definitions for
interfaces to internal parts of the system and
describe how those parts interact - Monolithic systems tend to be closed
52Security
- Security for information has three components
- Confidentiality protection against disclosure to
unauthorized individuals - Integrity protection against alteration or
corruption (correctness) - Availability protection against interference
with the mean to access the resources - Specification of what services and resources are
provided to each user or each group of users
(levels of accesses and authorities) - Methods (encryption and decryption) for encoding
the messages transmitted over the network - Identification of the right users
- Denial of services
- A user may wish to disrupt the service by
bombarding the service with a large number of
pointless requests - Security of mobile code
- Receives an executable program as an attachment
of an email
53Concurrency gt Consistency
- Processes access to the same resources (or
different resources) at the same time - The server serves the processes concurrently
(why?) - Parallel executions occur for two reasons
- Many users simultaneously invoke commands or
interact with application programs - Many server processes run concurrently, each
responding to different requests from client
processes - A higher concurrency in general implies a better
performance (shorter waiting time for services) - In a distributed system with M computers, up to M
processes can execute in parallel - However, this may not be true in many cases
(why?) - The two processes may alter the resources that
will be used by the other
54Example
Global Data
X
Data Synchronization
X
X
X is duplicated
55Scalability
- A distributed system is scalable if it will
remain effective (providing similar quality of
services) if there is a significant increase in
the number resources and users - There are 3 scales
- The smallest 2 workstations 1 file server
- Local area network (LAN) up to hundreds
workstations and several file servers and print
servers (fax servers etc..) - Internetworking Several LANs interconnected may
contains thousands of computers and share
resources - The Internet
- What will be the consequence of doubling the
number of users? - Requesting the same set of data
- Requesting to connect to the same server
- Requesting to transmit data through the same
segment of network
56Scalability
- To resolve the performance problem, the system
configuration may need to be changed - Adding more servers to balance the workload
- Duplicating data to resolve the problem in data
synchronization - Caching data to reduce the transmission workload
- Note Mostly, a solution creates another problem
- The applications should not be affected due to
the change in system configuration
57Failure Handing
- Failures are possible at any time (planning for
the worst) (unavoidable) - Mostly the failures are partial in a distributed
systems and failures occur one by one - Failure handling consists of
- Detection of failures
- Masking failures
- Recovery of failures
- The design of fault-tolerant computer systems is
based on (redundancy) - Hardware redundancy the use of redundant
components - Software redundancy and data redundancy
- Software recovery the design of programs to
tolerate (process group) or recover from faults - Availability measures the proportion of time
that the system is available for services
58Transparency
- Hidden from the user (application) programmer of
separation of components - Achieve a single system image to make everyone
into thinking that the collection of machines is
simply an old-fashioned time-sharing system - Using the same access method even when the system
configuration has been changed - Logical system design Vs. physical implementation
- Layer structure to hide the details
- Access transparency
- Enable local and remote information to be
accessed using identical operations - Location transparency
- Enable the information objects to be accessed
without knowledge of their location (users need
not tell where resources are located) - Who knows the locations?
59Transparency
- Concurrency transparency
- Enable several processes to operate concurrently
using shared information objects without
interference (multiple users can share resources
automatically) - Replication transparency
- Enable multiple replicas to be used to increase
reliability and performance without user
knowledge of how many replicas exist - Why need replicated data?
- Failure transparency
- Enable concealment of faults, allowing users to
complete their tasks despite the failure of
hardware or software components - Migration transparency
- Allow information objects move within a system
without changing their name or affecting users - Why do we need to migrate data objects?
60Transparency
- Performance transparency
- Allow the system to be configured to improve
(maintain the guaranteed) performance as loads
vary - Scaling transparency
- Allow the system and applications to expand in
scale without change to the system structure or
the application algorithms - Parallelism transparency
- Allow the program to be executed in parallel
without users knowledge
61Some Basic Techniques for Building a Distributed
System
- Replicate to increase availability
- Trade off availability against consistency
- Exploit cache locality to reduce access delay
- Use time-out for revocation
- Use a standard remote invocation mechanism
- Use encryption for authentication and data
security - Distributed Vs. centralized resource management
62New Development in Distributed Systems
- Computing units getting smaller and smaller but
with higher computation power and energy supply - Extreme large memory storage units
- Network everywhere both wired and wireless
networks - Performance of mobile network has been improved
greatly - Applications both commercial and personal
(personal computer becomes one of our essential
units at home) - Computation everywhere (mobile games and mobile
phones) - New applications
- Real-time systems Distributed real-time
multimedia systems - Many small computation units Peer-to-peer
systems - Interaction with environment sensor network
systems - Multiple information stream Information
integration and filtering -
- What will be the FUTURE???
63References
- Readings
- DollimoreCh1
- Tanenbaum Ch1 (except 1.5)