Title: Architectures
1Architectures
- Introduction to Distributed SystemsCS
457/557Fall 2008Kenneth Chiu
2Software Designs
3Software Design
- What is kernel mode?
- At a high level, how is the software for a
computer system organized? - CPU
- Microcode
- ROM
- BIOS
- Firmware? Who has upgraded?
- Hardware abstraction layer
- Kernel
- Libraries
- Middleware
- Applications
4- What is software design? What is the design
space? - What are the questions relevant to the previous
slide? - What to put in kernel mode, library?
- Can something be a library, or does it need to be
a process/service/daemon? - If a daemon, what privilege does it need?
- The crucial question is What layers to have, and
what functionality to put in what layer? - If the wrong decision is made, what happens?
- What happens if too much functionality is put
into ROM? - What happens if too little?
- What happens if too many layers?
5Basic Categories of Software for Distributed
Systems
6Uniprocessor Operating Systems
1.11
- Microkernel design
- What does a monolithic design look like?
- What does a SMP OS look like?
- Any microkernel's commercially successful?
7Multicomputer Operating Systems
1.14
- Is the middle layer in kernel mode? Could it be?
What is an example of something that could be put
in kernel?
8Communications
- Fundamentally, how is communication done on SMP
vs. a cluster (multicomputer)? - From the concurrent programming point of view,
what are the two fundamental ways to communicate? - Is it possible to communicate using the network
within an SMP? - Is it possible to provide a shared memory model
on cluster?
9Distributed Shared Memory
- What is virtual memory?
- Where are the page frames stored when paged
(swapped) out? - Distributed shared memory (DSM) is a way of
making a non-inherently shared memory machine
look like it has shared memory.
10Distributed Shared Memory Systems
- Pages of address space distributed among four
machines
Situation after CPU 1 references page 10
Situation if page 10 is read only and replication
is used
- What kinds of inefficient things can happen?
- A page can be shared between two processes,
possibly falsely shared. May want to not migrate
a page always. - Using shared memory to communicate is normal on
MP or UP. Using it to communicate on DSM would be
very bad. - DSM is a leaky abstraction.
11- How could performance be improved?
- If a page is read-only, can we have more than one
copy? - What if it is read-write?
- Is strict consistency necessary?
- Relaxed consistency may be okay, if the
application can choose and adjust the consistency
model dynamically.
12False Sharing
- False sharing of a page between two independent
processes. - How big should a page be?
13Network Operating System
- General structure of a network operating system.
- Generally, services are not in the kernel.
- Services are remote login, file access, etc.
14- So what is a NOS? Probably easiest to describe by
example. - Suppose you have files on bingsuns, but you want
work at home on your laptop. What do you do? - Using things like FTP is inconvenient. There are
things like NFS and CIFS. These are NOS
capabilities. - NOS does not provide true distribution
transparency, but does provide convenience.
15File I/O in a NOS
- Two clients and a server in a network operating
system. - What is the request for?
16File Systems in a NOS
17Examples of NOS
- What are some NOSs? Why are they NOSs?
18DOS vs. NOS
- What is good/bad about DOS?
- Transparency
- Hard to make work right
- Lack of support for apps
- Problems are often socio-technological.
- What is good/bad about NOS?
- Simple.
- Decoupled, easy to add/remove.
- Lack of transparency.
- No integration for your apps
- Have NOS been successful?
- Have DOS been successful? Why or why not?
- DSM has hard time.
- An abstraction that is too leaky?
- Sometimes, trying too hard doesn't work very
well. Better to set a lower goal and do it well.
19Research Versus Practice
- What are some examples of things that were
developed a lot as research, but not used in
practice? - Why were they not adopted?
- Wrong problem
- Solution too complicated
- Social/political reasons, chicken-and-egg
problems, etc.
20Leaky Abstractions
- What is an abstraction?
- What is encapsulation?
- How well does this work in practice?
- High-level languages
- TCP/IP, reliable byte sequence
- Packet size?
- Ignore packet loss?
- Reliability?
- You can abstract too much! When have you
abstracted too much? - Are DOSs an abstraction that is too leaky?
- End-to-end principle
21What Now?
- Lets say you were an software company. You had 3
different applications. - You could not adopt a DOS as a platform. Why?
- But you needed to do some kind of integration and
communications between your apps. What would you
do?
22Middleware
- What is middleware?
- What is its function?
- Where is it located?
- Why does it exist?
23Positioning Middleware
- General structure of a distributed system as
middleware.
24Middleware and Openness
1.23
- In an open middleware-based distributed system,
the protocols used by each middleware layer
should be the same, as well as the interfaces
they offer to applications. - Why should they be open? Why should they not be
open?
25Typical Middleware Services
- Communication
- Naming
- Persistence
- Distributed transactions
- Security
26Middleware Models
- Distributed files
- Examples?
- Remote procedure call
- Examples?
- Distributed objects
- Examples?
- Distributed documents
- Examples?
- Others?
- Message-oriented middleware (MOM)
- Service oriented architecture (SOA)
27Middleware Discussion
- What is good/bad about middleware?
- Easy to make multiplatform.
- Easy to start something new.
- But this can also be bad.
- Examples of middleware?
- What do people use instead of DSM?
- Has middleware been successful?
28Comparison between Systems
29Example
- A user wants to access a file on a cluster. How
should we design this? - Issues?
- Transparency
- Fault tolerance
- Possibilities
- SAN
- Shared?
- GFS
- NFS
- Middleware?
- What should we do differently if it is on a WAN?
30Example
- A process on a cluster needs to start another
process on the cluster. How should we design
this? - Issues
- Load balancing
- Transparency, networking
- Possibilities
- Do nothing.
- User picks machine, uses ssh.
- System call.
- Disadvantages?
- Library function.
- Transparency?
31Styles
32Styles
- Layered
- Object-based
- Data-centered
- Event-based
33Layered
- Components can only call down.
34Layering
- Why layer?
- Flexible
- You can add functionality without changing
underlying layers. - Reuse
- Many applications can use Java jars, for example.
- Helps you solve the problem.
- Too hard to hold everything in your head at once.
35- Bottom layer
- void topo_sort( int (cmp)(double x, double
y), double array, int size) - Top layer
- int forward_cmp(double x, double y) return x
gt yint reverse_cmp(double x, double y)
return y gt xvoid top_layer_func()
topo_sort(reverse_cmp, array, 10) - Is this proper layering?
36- You can also think about layering purely in terms
of dependencies. - A layer A is above layer B if changes to the
interfaces provided by layer A do not require
changes to the code of layer B. - By this definition, is the previous one proper
layering?
37Object-Based
- Like objects, except distributed. Defines
distributed object systems. - In my experience, tend to get very complicated
and messy.
38Data-Centered
- Processes communicate through a shared
repository. - WebDAV, Linda, tuple-space.
39Event-Based Architectures
- What is the key point of events, and
publish-subscribe? - Decoupled in space (anonymous) and time
(asynchronous).
40- Can do both event and data-centered.
41Architectures
42Centralized
43Client-Server
- What is a client?
- What is a server?
- What would be the alternative? Suppose you were
designing an airline reservation system. What are
your choices? - Can something be both client and server?
44Clients and Servers
- General interaction between a client and a
server. - How is something like this implemented?
Wait for result
Client
Request
Reply
Server
Provide service
Time
45Delivery Failures
- How can a client tell that a request message was
lost? - Timeout is one approach.
- How can a client detect the difference between a
request message that was lost, and a reply
message that was lost? - No great answer, usually can offer only at most
once service, or at least once service. - Does using a connection-oriented protocol like
TCP help? - Book is misleading.
46- What guarantees does TCP provide?
- Ordered, reliable, byte-sequence.
- When a TCP write call returns, can you discard
the data? - int important_data100while (some_condition)
// Call below overwrites array.
prep_important_data(important_data)
write(connfd, important_data,
100sizeof(int)) - If you do discard immediately, what bad things
might happen?
47- TCP provides guarantees only in the absence of
faults. - Packets can be lost, but this can be thought of
as normal operation. - If you want to make sure that the data actually
got there, and got processed, you need wait for
an application-level acknowledgement from the
receiver. - Why doesnt TCP do this for you?
- Because it requires too much application
knowledge. Do you want the ack when it gets to
the app, or when written to disk, or RDBMS, etc.?
48Idempotency
- Can you categorize these into two categories?
- Read my account balance.
- Transfer 100 from savings to checking.
- Change block 100 of file A to read abcdef.
- Copy block 100 of file A to block 200.
493-Tier Architectures
- Client server is somewhat simplistic. A
three-tier architecture has emerged - User interface
- Processing (business logic)
- Data (database)
- What are examples of each of these layers?
50- Internet search engines are one example.
User-interfacelevel
Processinglevel
Data level
51- Other examples
- Stock brokerage decision support
- User interface
- Financial database
- Analysis
- Data level is typically an RDBMS, so will include
replication and consistency functionality.
52Logical Architecture vs. Physical Architecture
- Physical architecture may or may not match the
logical architecture. - Could have just two types
- Client machine containing interface
- Server machine running all else
- Or could have other partitionings.
53(No Transcript)
54(No Transcript)
55(c)
(d)
(a)
(b)
(e)
- Examples
- a server-side has some control over UI.
- c form checking.
- d banking application just uploads transaction.
- e Local cache
- Whats good about moving things out to desktop
machines? Whats bad? - Thin clients are popular, why?
- Less management.
56Physical 3-Tiered
- An example of a server acting as a client.
- Web server
- TPM
57Another Description of 3-Tier Architecture
583-Tier Example Web Proxy Server
Client
Webserver
Proxyserver
Webserver
Client
Process
Computer
593-Tier Example Clients Invoke Individual Servers
Client
Invocation
Server
Invocation
Result
Result
Server
Client
Process
Computer
60Web Applets
Client performs some action which causes code to
be downloaded.
Web
Client
server
Applet code
Client interacts with applet
Web
Client
Applet
server
61Example Client and Server
- Header file
- / Definitions needed by clients and servers.
/define MAX_PATH 255 / maximum length of
file name /define BUF_SIZE 1024 / how much
data to transfer at once /define
FILE_SERVER 243 / file server's network address
// Definitions of the allowed operations
/define CREATE 1 / create a new file
/define READ 2 / read data from a file and
return it /define WRITE 3 / write data to a
file /define DELETE 4 / delete an existing
file // Error codes. /define OK 0 /
operation performed correctly /define
E_BAD_OPCODE -1 / unknown operation requested
/define E_BAD_PARAM -2 / error in a parameter
/define E_IO -2 / disk error or other I/O
error // Definition of the message format
/struct message long source / sender's
identity / long dest / receiver's identity
/ long opcode / requested operation / long
count / number of bytes to transfer / long
offset / position in file to start I/O
/ long result / result of the operation
/ char nameMAX_PATH / name of file being
operated on / char dataBUF_SIZE / data to
be read or written /
62- Server code
- include ltheader.hgtvoid main(void) struct
message m1, m2 / incoming and outgoing messages
/ int r / result code / while (TRUE)
/ server runs forever / receive(FILE_SERVER,
m1) / block waiting for a message
/ switch(m1.opcode) / dispatch on type of
request / case CREATEr do_create(m1,
m2) break case READ r do_read(m1, m2)
break case WRITE r do_write(m1, m2)
break case DELETE r do_delete(m1, m2)
break default r E_BAD_OPCODE m2.re
sult r / return result to client
/ send(m1.source, m2) / send reply /
63- Client code, copying a file
- include ltheader.hgtint copy(char src, char
dst) / procedure to copy file using the server
/ struct message m1 / message buffer
/ long position / current file position
/ long client 110 / client's address
/ initialize() / prepare for execution
/ position 0 do m1.opcode READ /
operation is a read / m1.offset
position / current position in the file
/ m1.count BUF_SIZE / how many bytes to
read / strcpy(m1.name, src) / copy name of
file to be read to message / send(FILESERVER,
m1) / send the message to the file server
/ receive(client, m1) / block waiting for
the reply / / Write the data just received
to the destination file. / m1.opcode
WRITE / operation is a write / m1.offset
position / current position in the file
/ m1.count m1.result / how many bytes to
write / strcpy(m1.name, dst) / copy name
of file to be written to buf / send(FILE_SERVER
, m1) / send the message to the file server
/ receive(client, m1) / block waiting for
reply / position m1.result / m1.result
is number of bytes written / while(m1.result gt
0) / iterate until done / return
(m1.result gt 0 ? OK m1.result) / return OK
or error code /
64Buying an airline ticket
- How would you design the system?
- A terminal on one end, write a single program on
the other end. - A single program at the agent end. All things are
broadcast to everyone.
65Decentralized Architectures
66Horizontal vs. Vertical Distribution
- Previously, we have looked at what is known as
vertical distribution. What is it? - It is taking an application and splitting it
vertically. - We can also have horizontal distribution, what is
that? - Things like replication and clusters.
67- An example of horizontal distribution of a Web
service.
68- Horizontally distributed servers may talk to each
other.
69Peer-to-Peer
- How does it differ from previous?
- Can all apps be done as P2P?
- Generally, always on an overlay network.
- What is an overlay network?
- An overlay network is a logical network.
- Are neighbors in the overlay network connected by
a real link? - Are nodes that are close in the overlay network
close in the physical network?
70Distributed Hash Tables
- Lets say that you have a lot of data things that
you want to distribute over a P2P network. - Assume that for each data object, there is an
associated key that is an integer. - How do you find something? Its on some node out
there somewhere. - Basic operation map a key to a node.
71Structured P2P DHTs
- Assign keys from a large ID space.
- Assign nodes from the same space.
- A key is found on the lowest ID node greater than
the key. - How to join?
- Generate key
- Lookup
- Collisions?
- 2128 338
72CAN
- Space is n-D.
- Keys have coordinates.
- Search proceeds by a kind of geometric routing.
(1,1)
(0,1)
.8 1?.8 1
Data
0 .4?.6 1
.8 1?.4 .8
.4 .8?.4 1
QueryingNode
0 .4?0 .6
.4 1?0 .4
(0,0)
(1,0)
Ignore book diagram on this, it is more or less
wrong.
73- Node joins by picking a random point, and
querying it. - This will route a message to the node responsible
for that point. - Region is then split in half.
- Neighbor pointers need to be maintained.
.8 1?.8 1
0 .4?.6 1
.4 .8?.4 1
.8 1?.4 .8
0 .4?.3 .6
.4 1?0 .4
0 .4?0 .3
74- Suppose the node responsible for .4 .8 leaves.
Responsible node leaves
75Unstructured P2P
- Basic idea here is how to construct a random
overlay network. - Whats so hard about that? Just pick a random set
of nodes, and make them your neighbors. - But how do you pick a random set of nodes when
you dont have a list of the nodes?
76- Basic idea
- Suppose you always maintain a list of c
neighbors. The list is supposed to be random. - Now, periodically, send your list to your
neighbors. - If you receive a list from your neighbor, what
should you do?
77Actions by active thread (periodically repeated)
Actions by passive thread
- select a peer P from the current partial
viewif PUSH_MODE mybuffer (MyAddress,
0) permute partial view move H oldest
entries to the end append first c/2 entries
to mybuffer send mybuffer to P else
send trigger to Pif PULL_MODE receive
Ps bufferconstruct a new partial view from
the current one and Ps bufferincrement the
age of every entry in the new partial view
receive buffer from any process Qif PULL_MODE
mybuffer (MyAddress, 0) permute partial
view move H oldest entries to the
end append first c/2 entries to the
end send mybuffer to Pconstruct a new
partial view from the current one and Ps
bufferincrement the age of every entry in
the new partial view
78Hybrid Approaches
- Have a lower layer form random partial views.
- Pass to upper layer which makes a more structured
decision. - Note that the lower layer operates independently.
79- Interesting behaviors.
- Nodes on a grid.
- Each node maintains a list of nearest neighbors,
using the Manhattan distance. - Initially, the links are random.
- Complete different ranking functions can be used,
such as those based on semantic distance, to form
semantic overlay networks.
80Superpeers
- Having all nodes equal sometimes is problematic.
- Superpeers are special nodes that can serve as
brokers or maintain indices.
81- Superpeers can be static, or selected dynamically
from the other peers. - How do you pick a superpeer? Can use leader
election.
82Hybrid Architectures
83Hybrid Edge Server Architecture
- Edge servers connect users to internet.
- Can collaborate
84A distributed application based on peer processes
85Collaborative Distributed Systems
- BitTorrent
- How do you get people to give content, instead of
just taking content? - Exchange incentive system
- Trackers are centralized.
- Also bottleneck.
86Architecture Versus Middleware
87- We have talked about the physical architecture.
- Does middleware also have an architectural style?
- If it does, how does it affect flexibility,
extensibility? - Sometimes, the native style may not be optimal.
- Can we build messaging over RPC?
- Can we build RPC over messaging?
88Interceptors
- Request level could handle replication.
- Message-level could handle fragmentation.
89Adaptive Software
- Separation of concerns
- Computational reflection
- Multi-stage compilation
- Component-based design
90Self-Management in Distributed Systems
- Also known as autonomic computing, or self-
systems.
91Feedback Control Model
92Astrolabe
- A hierarchy of zones.
- Bottom is each host in a single zone.
- Higher zones aggregate lower zones.
93- Each upper zone aggregated the lower zone.
- Most interesting part is how to query. An SQL
model is adopted. For example, an average - SELECT AVG(procs) AS avg_procs FROM hostinfo
- Such a query would be running on a node.
- Information needs to be propagated. Done through
gossiping.
94Globule
- Pages are read and also written to.
- Origin server, source of web page.
- Replicate servers, at the edge.
- Problem, how and when to replicate pages?
95- What happens if you replicate too much? What do
you need to do to a replicated web page when it
is updated? - What happens if you replicate too little?
- Solution?
96- First, you need some metrics. Also, you need some
decision as to how to weight them. - cost (w1 ? m1) (w2 ? m2) (wn ? mn)
- Now the question is, how do you minimize this?
- You have various policies of replicating pages at
your disposal. You want the best policy. - One approach is trace-driven.
- Record what happened for a period of time D.
- Test different policies p1, p2, etc., on the
recorded traces. - Use the one that worked the best.
- When you are doing this testing, how long should
the trace be?
97