Title: Linda: A Dataspace Approach to Parallel Programming
1Linda A Data-space Approach to Parallel
Programming
- CSE60771 Distributed Systems
- David Moore
2Problem How to ImplementParallel Algorithms
Naturally
- Classic approaches to parallel programming are
clunky and difficult to use for many tasks - Logically concurrent algorithms benefit immensely
from simultaneous threads of computation - Distributed systems seek the utility of
interconnected, yet independent subsystems - Linda simplifies the task of writing parallel
programs
3Common ApproachesRemote Procedure Calls
Synchronous RPC
Time
Time
Remote Call
Call Serviced
Blocking
Control Returned
- A process blocks to execute a function on a
remote machine
4Common ApproachesRemote Procedure Calls
Asynchronous RPC
Time
Remote Call
Call Serviced
Executes other things
Interrupted with complete signal
- A process makes a remote function call, but
continues execution until interrupted by the
callee
5Common ApproachesMessage Passing
Time
Information Passed
- Systems may send, check, and wait for messages
from specified peers
6A New Approach
- RPC and Message-passing are limited
- One to one communication
- RPC control transfers
- Message-passing data is transient
- Linda provides a metaphor that is parallel in
nature, multi-node in scope, does not relinquish
the processor, and preserves data. - Its spirit is a shared data-space
7Overview of Linda
- Provides shared-memory style interface
- Extends existing programming languages
- Adds 3 commands in, out, and read
- Alters syntax slightly, hence requiring language
changes, not just a library addition - Implemented extensions include C and Fortran
8System Layout
- Linda provides a network accessible Tuple
Space. - Any node in the network may add, read, and remove
information to and from the shared space in the
form of tuples. - Done via output, input, and read commands.
- A tuple contains a key, and one or more fields of
data. - This is an example of a data-space architecture.
9System Layout Diagram
Time
Time
output tuple A
input tuple A
output tuple B
input tuple B
- Imagine a bucket from which everyone may add and
remove slips of paper with messages
10Out() CommandWriting a Tuple to the TS
- out(logical name, data1 , data2, ...)
- // If int a, char b, and long c are defined
- out(a, b, c) //Add a tuple to the TS named to
a's value, containing the information in b and c - out('A', b, 7) //Add a tuple to the TS with 'A'
as its name, containing b's value and the value 7 - No restriction on multiple tuples with the same
name or data - All arguments must be of unambiguous type
- For every tuple field, both value and type are
stored - If both int b and char b existed, we must make a
distinction
11In() CommandRetrieving a Tuple from the TS
- in(logical name, ltargument listgt)
- Valid arguments
- type and target variable if a tuple matching
the logical name and with a field in this
position with this type exists, that field's
value is put in the target variable - matching variable or constant restricts the set
of tuples considered to be read to those whose
value in this field matches the value provided - If there is more than one match, selection is
arbitrary - This command removes from the TS any tuple it
returns. - This command suspends until a matching tuple if
found - A type and variable name as an argument violate
standard C structure. The is the reason it
cannot be library-implemented.
12In() CommandExamples
- If these tuples are in the TS
- 'P', 5, 8
- 'P', 4, 7
- 'P', 'S'
- Valid commands and results
- in('P', int a, int b) //either a5, b8, or
(arbitrarily) a4, b7 - in('P', int a, 8) //a5
- in('P', char c) //c 'S'
- in('P', char c, int a) //block until a (char,
int) tuple is added to the TS
13Read() CommandExamining a Tuple in the TS
- in(logical name, ltargument listgt)
- Same arguments as in()
- Same results, except the tuple is not removed
from the Tuple Space.
14Implementation
- Each node in the network maintains a copy of the
TS. - out() and read() the simple cases
- out()
- Transmit to all nodes the tuple to be added
- Each node adds it to their local TS cache
- Constant time operation assuming a multi-cast
command - read()
- Examine the local TS cache
- Find any tuple that matches the specified types,
and possibly specified values if given. - Return its data.
- Order O(tuple_count) operation
- Searches and tuples are complex, hence keying,
hashing, and sorting cannot improve search time
15Implementation
- in()
- in(), while similar to read(), requires a
node-wide deletion - First search the local TS cache for a matching
tuple - If not found, block until one is added
- If found, contact the tuple's originator
- Obtain permission from that node to delete
- This prevents two nodes from performing
simultaneous in()'s on the same tuple - Inform all nodes of the deletion
- Return the data
- Order O(tuple_count) operation
16Example 1 A Matrix Multiplier
- Three Component Processes
- Initialization Process Loads all input data
into tuples by row or column in the TS. Spawns
multiple workers and a cleanup process. - Worker Process
- in()'s a position tuple indicating the next
un-calculated cell in the solution matrix. - Stores its value, increments it, and replaces it
into the TS. - Calculates the data for that cell, outputting the
results to the TS. - Cleanup Process Attempts to read in all the
cells in the solution matrix, repeatedly blocking
until all cells are obtained.
17Example 2 A Web Server
- Initializer Process Contacted by clients
requesting files. Creates tuples containing the
client's information and requested files. - Worker Process Examines the TS for requests for
files it is responsible for. Removes tuples
fulfilling this requirement, and services the
client requesting those files. - This would allow the addition of new files
seamlessly, as new workers to handle the new
files could be added without other system
modifications. - To achieve this without changing the initializer,
a clean-up process that deleted requests for
non-existent files would be needed, and modified
when new files became valid requests.
18Hard Cases
- This was implemented on ATT's S/Net bus network
it was extremely reliable and reportedly never
lost any information. Today's networks make no
such guarantee, yet Linda makes no allowance for
lost information. - All nodes require communication preprocessors to
avoid interruption of their execution when tuples
enter and leave the TS. - Large applications will create enormous TS's,
possibly overloading network traffic and
exhausting local storage on some nodes. - The failure or shutdown of a node prevents any
tuples it added from being removed.
19Hard Cases S/Net Specifics
Node 1
Node 2
Node 3
Node 4
Node 5
VAX Supercomputer
- Collection of up to 64 disjoint inexpensive nodes
- Connected by a fast, word-parallel broadcast bus
- The addition of new nodes is simple
- May have a super-computer node, but not required
- Perfect reliability
- If input buffer is full on a node and it is
forced to dump a message, it turns on a bus-wide
negative acknowledgment signal, all transmitting
nodes should repeat their messages until the
signal ceases
20Results Linda's Performance
- Hard Numbers Linda was capable of 720 in-out
pairs executed per second. - Effective performance As the number of workers
approaches the number of job sections, the time
to complete the total task approaches the time
for one worker to complete one job section. - System overhead is generally minimal in
comparison to job time
21Linda's Performance andGranularity of Parallelism
- If a job has 10 segments and 5 workers it takes
as long with 9 workers. - If we subdivide the job further to avoid this
problem, we incur additional overhead.
22Overview
- New approach to distributed computing
- Shared data-space architecture
- Assumes perfect reliability
- Difficulty scaling to large data sets
23Discussion
- What common tasks would be suited to this
approach? - What are the effects and varieties of failure in
the base Linda implementation? - How would Linda be implemented over the Internet?
- What semantics of failure would this
implementation present?