Title: Distributed Systems
1Distributed Systems
Hochschule der Medien
Walter Kriha
2Overall Goals
- Learn the basic concepts of Distributed Systems
like concurrency and remoteness - Understand different programming models for
Distributed Systems - Understand interdependencies between technical
means of distribution and distribution as a
business or social model
3Goal for today
- Give an overview of distributed systems. Later
lectures will dig into the gory details like
security, transactions, remote calling mechanisms
etc.
4Introduction
- What is a Distributed System (DS)
- Why distribute?
- Types of DS
- Characteristics of DS
- Middleware for DS
- Resources
- Exercises
5Definition of a Distributed System
Independent agents repeatedly interacting in a
way that a coherent behavior (system) emerges
6Why learn about Distributed Systems?
Because most IT systems ARE ALREADY DISTRIBUTED
SYSTEMS (and not only the IT Systems)
7Types of Distributed Systems
- Energy grid, telcom net
- Villages, towns and big cities
- It-Infrastructure of large companies
- High-performance clusters
- The WWW
- The human body, organizations, states
- A flock of birds
8The energy grid now hub and spoke
office
home
home
Power- plant
factory
home
Electricity flows in one direction only, with a
lot of it lost during transport. Control resides
with the power plant. www.wired.com/wired/archive/
9.07/juice.html
9The future grid micropower
Fuel cell
Fuel cell
office
home
home
car
Fuel cell
Power- plant
Fuel cell
factory
home
Fuel cell
car
car
Power flows many directions, controlled by
independent sensors in the grid. A tenfold
increase of transactions. Modern GRID computing
allows users to tap into a wealth of distributed
computing resources. http//www.thegridreport.com/
10The power law of settlements
There are many villages, quite a few towns but
only a small number of big cities
11IT-Infrastructure of large corporations
Un-trusted clients
Customer data zone
Processing zone
12World Wide Web
Internet
DNS
Intranet
Dialup clients live at the edges of the
internet (no fixed IP address, slow upload). How
many graphs are layered on top of the physical
network structure? (hyperlinks, search-engines,
DNS)
13The New Web
P2p overlay
Social network overlay
Mobile PAN network
mashups
Internet
Location based service
Sensors
Cams
Real world
Aggregation of external information and
collaboration based on social networks will bring
new forms of content production and consumption
and consumer areas will influence companies
(consumerization, Gartner Group). More
interconnection of different net-types brings
more emergent phenomenons.
14Why Distribute?
- Risk avoid single points of failures (e.g. Use
hot stand-by data centers) - Performance run tasks on several nodes
- Security create different security domains
15Application Structure
16Views of Distributed Systems
- Enterprise View the role within an enterprise
- Information View the flow of information in the
DS (information architecture) - Computational View the processing of information
in the DS (logical architecture) - Engineering View the system infrastructure
(nodes, connections, system management, replicas
etc.) (physical or distribution architecture) - Technology View the specific technology used to
build the DS
(From Open Distributed Processing plus my
architecture categories)
17Characteristics of Distributed Systems
- re-definition of programming language concepts
- distribution topology
- emergent behavior
- autonomous components
- heterogeneous components
- a strong need for security
- Concurrency
- Scale
- Remoteness
- global naming and addressing
- ownership and control
- transactions across nodes
- no global state
- no centralized control
- many points of failure
- Asynchronous communication
Dont worry, well dig into all this another day!
18Programming Languages and Distributed Systems
- DS
- system defines security
- Objects are versioned
- Global identity
- Components
PL and DS are orthogonal.
- PL
- Design security (private, protected etc.)
- no versioning, no components as types
- memory address used for identity
- platform dependent basic type size (int)
19Distribution Topology
It takes only a small number of intermediate
persons to connect any person on this world to
any other one. (A knows B, B knows C, .... F
knows G.)
From The Milgram experiments on social networks.
(Andy Oram, Peer-To-Peer, Harnessing the power of
disruptive technologies). OpenBC or LinkedIn
create a social network from distributed
participants.
20Systems showing the small world effect
High local clustering
How efficient can this DS transport messages?
Queries? How robust is it against random attacks
on nodes, targeted attacks on the important
connecting nodes?
21The power law of DS a law of nature?
Cities, Companies, Power, social networks etc
seem to exhibit the power law. Each size is a
tenth of the next bigger size but has ten times
more instances. In defense of cities, Clay
Sharky, www.openp2p.com
22Metcalfes law - Network Effects
- The usefulness of a network grows by the square
of the number of users (think about a fax machine
how useful is one?) - The adoption rate of a network increases in
proportion to the utility provided by the
network. (Thats why companies give away software
e.g.)
23Emergent Behavior a flock of birds
There is no central controller, no Super-bird.
No bird has a representation of the figure in its
head. Instead, every bird follows very simple
rules. The resulting figure shows EMERGENT
behavior. Many distributed systems show it as
well for good or for bad. (Kevin Kelly, Out of
Control The biology of the new machines. Peter
Wegner, Interaction vs Algorithm.)
24Heterogeneous Components
Hardware unreliable Frequent downtimes Little
endian byte order Java Data Types No
callbacks Slow, no access control
Fault tolerant hardware System management Big
endian byte order C data types Fast, access
controlled
25Security in Distributed Systems
Authentication Authorization
Integrity, Confidentiality
But sometimes anonymity is needed!! (peer-to-peer
systems)
Authentication Authorization
26Security Topics
- Firewalls
- Certificates, Public Key Infrastructure, Digital
Signature - Encryption (methods and devices)
- Software Architecture
- Intrusion Detection
- Sniffing
- PGP, SSL etc.
- Denial of Service attacks
- Authentication (who are you?)
- Authorization (what can you do?)
- Confidentiality (can someone spy on us?)
- Integrity (Did somebody change your message?)
- Non-repudiation (It was you who ordered X)
- Privacy/Anonymity
27Important Programming Terms for DS
- Identity
- Value vs. Reference
- Exception
- Interface vs. Implementation
- Interface Definition Language (IDL)
- Quality of Service (QOS)
- Stubs/Proxies
28Distributed System Design
- Common Problems (performance, fail-over,
maintenance, policies, security integration) - Information Architecture (define and qualify the
information fragments and flows) - Distribution Architecture (create a map of all
participating systems and their quality of
service) - Policy-Driven Architectures
29Middleware for Distributed Systems
30What is Middleware?
software that helps two separate systems
communicate seamlessly. (www.knownow.com/middlewar
e/lexicon.html)
In a strict sense middleware is transport
software that is used to move information from
one program to one or more other programs,
shielding the developer from dependencies on
communication protocols, operating systems and
hardware platform (plumbing) (www.talarian.com)
31Positioning Middleware
- General structure of a distributed system as
middleware.
1-22
From van Steen/Tanenbaum
32The Transparency Dogma
- Middleware is supposed to hide remote-ness and
concurrency by hiding distribution behind local
programming language constructs
Critique Jim Waldo, SUN Full transparency is
impossible and the price is too high
33Distribution Transparencies
- Access mask differences in data representation
and invocation mechanisms between heterogeneous
systems - Failure mask failures to enable fault tolerance
(e.g. Intelligent load-balancing) - Location use logical, not physical names to
access services - Migration hide the true location of a service or
object from clients. If the location changes, the
client wont notice it.
34Distribution Transparencies contd
- Replication hide a group of equal objects behind
an interface (performance, availability) - Persistence hide the storage mechanisms and
internal policies from a client. Make a remote
object look like it is persistently activated. - Transaction hide the complex coordination
necessary to achieve consistency.
Source ISO/IEC 10746-1 Open Distributed
Processing, www.iso.org
35Where do we find Middleware?
LDAP or DCE
Quotes
Distributed Cache
Directory
WebService
JMS
JNDI
Application Server Web- Tier
Application Server EJB Tier
Web Server
JDBC
RMI
XML-RPC
CORBA
News
E-bank
Part of a Portal running on a Web Cluster.
36Classification
- Socket Based Services
- Remote Procedure Calls (RPCs)
- Object Request Brokers (CORBA, RMI)
- Message Oriented Middleware (MOMs)
- Web-Services (XML-RPC, SOAP,UDDI)
- Component Systems (Enterprise Java Beans, J2EE)
- Peer-To-Peer (Napster, Gnutella, Freenet,
seti_at_home) - Agent based (Jini, Aglets)
-
37The ilities
- Reliability
- Availability
- Security
- Scalability
- Quality
- Performance
- Maintainability
Before using a specific middleware, always make
sure that the ilities aka non-functional
requirements are met. Middleware almost always
differs implementation quality between vendors.
38Real-World Problems
- Skills/Understanding Best practice patterns?
- Single-Point-Of-Failures replication,
load-balancing etc. - Tooling generators, deployment tools
- Brittle-ness if interfaces change (Compiler
illusion)
39RPC type Middleware
- E.g. Sun-RPC, OSF DCE
- Main idea distribute functions, use concurrent
processing - On top of it Distributed Directory, File system,
Security (cells, principals) - XML-RPC over http (www.userland.com)
Layer foundations UUIDs, value vs. reference,
marshaling, versioning etc.
40Distributed Objects
CORBA
RMI
- Java only (e.g.Introspection used)
- Lightweight method call semantics
- Java Implementations
- Wire Protocoll now mostly RMI over IIOP
- Object Request Broker
- Multi-language support (platform independence)
- Interface Definition language
- Wire Protocoll IIOP, GIOP
Both try to preserve object semantics.
Interface/Implementation separation
41Distributed Components
- Objects are too granular performance and
maintenance problems - Programmers need more help separation of
concerns and context
- Solutions
- Enterprise Java Beans
- CORBA Components
- COM
42Example Enterprise Java Beans
EJB Framework (Separation of concerns)
Deployment (Separation of context)
- Automatic Transaction Management
- System Management defines Data Sources and
Containers
System Management defines Pool sizes
Concurrency Control
System Management defines Role/User Binding
Automatic, method level Security
43EJB Container
Client
Entity Bean
invoke
Load/ persist
delegate
At the point of interception the container
provides the following services to the bean
Resource management, life-cycle,
state-management, transactions, security,
persistence
44Distributed Messages (MOM)
Asynchronous, loosely-coupled (fault tolerant),
persistent messages with either publish/subscribe
(topics) or queuing semantics. Scales well.
Delivery guarantees differ.
Sub
Get
Sub
Pub
Pub
Put (M1,M2)
Sub
Topic
M2
M1
queue
Sub
publish
send
Sub
Get
MOM
MOM
45Distributed Code I (Agents, Aglets)
The Problem who wants a new runtime system?
Agent
Agent
Perform work, come back with results
pack
unpack
Serialized Agent
Agent Runtime
Agent Runtime
Channel
OS
OS
46Distributed Code II (Jini) The End of Protocols?
Jini Lookup Service
Proxy moves to lookup service during registration
Proxy moves to client during service lookup
Jini Client
Jini Service
Service private protocol
Service Proxy Code
47Peer 2 Peer
Seti_at_home, freenet JXTA etc.
INTERNET DNS
Nodes have no fixed IP address and frequent
down-times
ISP
ISP
ISP
P2P uses cycles, provides file sharing and
anonymity because no central servers are used
Problems How do you version files? Overhead?
48WebServices
Promises de-coupling of service provider and
requester, document interfaces,
machine-to-machine communication and ease of use
compared to distributed objects.
Core services
Security, Transactions etc.
Registry (advertise)
Universal Description, Discovery and Integration
Service features
Web Services Description Language
exchange messages
SOAP
Wire Format/ Transport
XML Syntax/HTTP
Web Server
Broker
Service Granularity? Application, Component,
Object or Request?
Use your de-hyper generously!
49GRID Computing
A Grid providing OGSA
The problem requires a massive amount of system
resources
different companies
The abstraction single system image
Grid computing promises the just-in-time
availability of vast amounts of computing
resources, easily accessible through a single
system image. Scientific simulation or even game
construction (www.butterfly.net ) are possible
applications. See http//www.globus.org/research/
papers/anatomy.pdf, the anatomy to the grid.
50(Tuple) Spaces
A space providing tuple storage
users or agents storing or finding tuples
users or agents interacting through the space
The abstraction Anything can be stored as long
as it is addressable
The worlds largest space is the WWW. Other spaces
are WIKI-WIKI collaboration systems or more
traditional tuple spaces like tspaces or jspaces.
The principle is always the same a few simple
methods (put/take/find) which lets users or
machines store or find content. The content
itself is returned as a representation of a
resource. Thats why some people call those
systems REST (Representational State Transfer
Architecture), after a theses from Roy Fielding,
the father of http.
51Others
- Internet Games
- Portal Architectures
- Java Communicating Sequential Processes (JCSP). A
library implementing Hoares CSP. - Pi-Calculus for mobility
- Mozart/OZ http//www.mozart-oz.org/
- E-language, http//www.erights.org/
- Erlang language for distributed telco systems
with asynchr. message passing http//www.erlang.or
g/
- Parallel Processing PVM, MPI (e.g. for Linux
Beowulf cluster) - Wireless mobile communication, Bluetooth
- System Management
- Jiro/FMA/JMX
- Group Computing (virtual synchrony) Horus, iBus,
javagroups (www.javagroups.org , good for
building distributed caches or HA
infrastructures) - Distribution Subsystem (DSS) middleware library
- Simjava (discrete event simulation)
- Gridsim (grid simulation package)
- Teatime (www.opencroquet.org)
52Future Applications
- Collective Intelligence collaborative production
of content - Mashups dynamic integration of external sources
- Social Networks Analysis the use of information
and knowledge from many people and their personal
networks. - Sensor Mesh Networks ad hoc (self-organizing)
networks formed by dynamic meshes of peer nodes,
each of which includes simple networking,
computing and sensing capabilities.
(Real-world-web) - Event-driven Applications an architectural style
for distributed applications, in which certain
discrete functions are packaged into modular,
encapsulated, shareable components, some of which
are triggered by the arrival of one or more event
objects. - Web 2.0 represents a broad collection of recent
trends in Internet technologies and business
models. Particular focus has been given to
user-created content, lightweight technology,
service-based access and shared revenue models.
(technical base AJAX, RubyOnRails etc)
Roughly taken from Gartner's 2006 Emerging
Technologies http//www.gartner.com/it/page.jsp?id
495475
53Resources (Technologies)
- Jim Waldo, End of Protocols
- Java Data Objects (DO) specification
(www.java.sun.com) - ObjectSpectrum 7/2001, WebServices
- Jim Waldo, A note on distributed computing
(please read till next session)
- Java Magazine 7/2001, Java Message Service
- www.theserverside.com
- on Enterprise Java Beans
- Clay Shirky, What is P2P and what Isnt
(www.openp2p.com) - S.Tai, I.Rouvellou, Strategies for Integrating
Messaging and Distributed Object Transactions
54Resources (Programming)
- Wolfgang Emmerich, Engineering Distributed
Objects (www.distributed-objects.com) With slides
and tests. - Marco Boger, Java in verteilten Systemen
- Mastering Enterprise Java Beans
(www.theserverside.com) free! - Ted Neward, Java Server Side Programming
(sockets, servlets etc.) www.manning.com/neward - www.swarm.org, portal for swarm programming. Used
also as simulation tools for research in
economics and finance
55Resources (Systems)
- Coulouris, e.al., Distributed Systems
- Andrew Tanenbaum, Maarten van Steen, Distributed
Systems. Get this one or Coulouris for a long
term effect . (http//ajax.prenhall.com/divisions/
esm/app/author_tanenbaum/custom/dist_sys_1e/index.
html (slides and book chapters) - Ken Birman, Building secure and reliable Network
Applications (a good draft existed once) - Grey/Reuter, Transaction Processing
- Jiro/Federated Management Architecture (FMA)
- Open Grid Service Architecture http//www.gridforu
m.org/ogsi-wg/drafts/ogsa_draft2.9_2002-06-22.pdf
, explains the services needed in the new GRID
computing paradigm
56Resources (Theory)
- Designing Distributed Systems, A Conversation
with Ken Arnold, Part III, http//www.artima.com/i
ntv/distribP.html , shows importance of failures
and state in DS - The Paradigm Shift from Algorithms to
Interaction, Peter Wegner, 1996, a provocative
short essay on why interactive systems are much
more powerful than turing machines. Shows that DS
is more than just concurrency and remoteness. The
basics of emergence and non-algorithmic behavior.
Good for agent systems as well. - Reliable Distributed Systems Technologies, Web
Services, and Applications Birman, Kenneth P. - Phillip J. Windley, Digital Identity, Contains
architecture of identity repositories including
federation aspects. Network effects and its
effects against bilateral identity management.
57Resources (Scale-Free)
- Stability and topology of scale-free networks
under attack and defense strategies, Lazaros K.
Gallos u.a. http//xxx.lanl.gov/pdf/cond-mat/0505
201 - Albert-Laszlo Barabasi, Linked. Investigates
small worlds, scale free networks etc. Basically
moves from random networks to hub/spoke
architectures. Discovered how the WWW space is
organized (in/out/core/islands etc.). A must read
for everybody interested in the effects of
topology (e.g. on virus spreads)
58Resources (Web)
- Tim Oreillys famous article on Web2.0
http//www.oreillynet.com/pub/a/oreilly/tim/news/2
005/09/30/what-is-web-20.html - Gartner's 2006 Emerging Technologies Hype Cycle
Highlights Key Technology Themes
http//www.gartner.com/it/page.jsp?id495475 - Mashups Duane Merrill, http//www-128.ibm.com/dev
eloperworks/library/x-mashups.html?cadnw-727
59Resources (Events, Simulation)
- Simjava, discrete event simulation package.
Tutorial at http//www.dcs.ed.ac.uk/home/simjava/
tutorial/ - GridSim, Grid Simulation Package,
http//www.gridbus.org/gridsim/gridsim2.2/