Title: Cougaar Design Case Study: Mandelbrot GUI Application
1Cougaar Design Case StudyMandelbrot GUI
Application
- Todd Wright
- Feb 5th, 2007
2Overview
- These slides present ten alternate designs for an
example Cougaar application, a Mandelbrot
fractal GUI - Each design explores various tradeoffs
- Design complexity
- Modularity and how the modules (plugins/agents)
interact with one another - Parallel / Distributed processing support
- As we go, well summarize what weve learned and
outline basic design patterns
3Mandelbrot Application
- Basic idea
- The user submits an image calculation request
- Cougaar application code (plugins agents)
compute the image data - The image is displayed to the user in a GUI
- The example image is a Mandelbrot fractal
- Given an (x, y) range and image size, e.g.
- Range (-1.5, -1.0) to (1.5, 1.0)
- Image 1024 x 768
- Compute the image using the Mandelbrot
algorithm - Simple math
- Entirely compute-bound (possibly network-bound if
we make it distributed)
Nodes
GUI
Agents
Agents
I/O
Plugins
Plugins
4Design Comparison Matrix
- For all the following designs, well rank each
design based on the following scales of 1-to-10,
with 10 being ideal - Simplicity
- How easy is the code to understand?
- Modularity
- Can we easily replace parts of our solution with
alternative implementations? - Scalability
- Can we distribute our solution across multiple
hosts? - Inter-job Parallelism
- Can separate jobs run in parallel?
- Intra-job Parallelism
- Can a single job be subdivided and run in
parallel? - Adaptability
- Can we customize the behavior, e.g. using
policies or runtime metrics? - This will allow us to better see the tradeoffs
between the designs.
5Design 1 Just a Servlet
- Design Do everything in a self-contained
Servlet - Listens for browser HTTP requests
- Computes image data in the servlet doGet thread
- Writes the image result as a JPG
- Characteristics
- Easy to implementation and configuration
- Few Cougaar dependencies (no need for a
blackboard or other plugins) - No synchronization or threading issues (runs in
the Servlet request thread)
Node
Servlet public void doGet(..) read params
compute image data write image as JPG
http
6Analysis Design 1
7Design 2 Servlet UI Calculator Service
- Design Move the compute() code out of the
Servlet and into a separate Component - Primarily a refactor of the prior design
- Use a service to advertise the compute() method
- This is a typical solution for wrapping library
code - Characteristics
- Still fairly easy to implement and configure
- Improved modularity
- Can replace UI code while keeping calculator code
(e.g. make popup Swing UI) - Can replace calculator code while keeping UI code
(e.g. compute different fractal design) - No threading or synchronization issues (runs in
the Servlet request thread)
Node
Servlet public void load() calc
getService(Calc..) public void doGet(..)
read params calc.compute(..) write image as
JPG
Calculator public void load()
advertise(Calc, this) public byte compute
(..) compute image data
http
8Analysis Design 2
9Design point Inlined code v.s. Services
- Key design points
- Design 1
- Summary The plugin directly calls the inlined /
library code - Benefit Easy to implement, self-contained
- Downside Difficult to switch between alternate
library implementations, awkward to share
non-static library instances between plugins - Design 2
- Summary One plugin advertises a service, other
plugin(s) obtain and use it - Benefit Supports shared, pluggable services,
cleans up the code - Downside Must refactor / wrap library code into
service API(s), plus add new plugins to advertise
these services - Example of interest
- Plugin A advertises a WindowManagerService
and pops up an empty Swing Panel - Subsequent plugins obtain this service and add
their Swing JComponent panels to the service by
calling an add(..) method (instead of popping
up their own windows) - The window manager plugin decides where to
place the sub-frames
10Design 3 Servlet UI Blackboard-based
Calculator Plugin
- Design Instead of a service, publish the
request on the blackboard - Use non-blocking blackboard operations (pub/sub)
instead of a blocking method call - Characteristics
- The calculate() method runs in a separate
plugin thread - Were using the blackboard as both a
communication and thread-switch layer - We no longer have a simple, blocking
calculate() service API - We now have a blackboard representation of the
Job - Defines our data-oriented API between our
plugins - Other plugins can observe this interaction (e.g.
for debugging, management, etc)
Node
Agent
Servlet public void doGet(..) Job job new
Job(params) publishAdd(job)
job.waitForCompletion() write image as JPG
Calculator public void setupSubs() subscribe
to Jobs public void execute() for all added
Jobs compute job notify of completion
http
Job
Blackboard
11Analysis Design 3
12Design Point Services v.s. Blackboards
- Key design points
- Design 2
- Summary Plugins interact through blocking
service method calls - Benefit Easy blocking method APIs
- Downside Method calls run in callers thread
and are blocking. Use of callbacks to support
non-blocking APIs requires awkward thread
switching. - Design 3
- Summary Plugins interact through asynchronous
blackboard pub/sub operations - Benefit Non-blocking and parallelized, plugin
execute() methods are single-threaded, Job
state is visible on the blackboard - Downside Must reorganize code to fit the
pub/sub execute() pattern. This can introduce
bookkeeping state, where a service-based design
would keep this state for free on the
method-call stack. - The prior example shows an awkward mixed design
- Servlet doGet() callbacks are blocking and must
complete in that thread - The blackboard is an asynchronous pub/sub
interaction - Hence the odd job.waitForCompletion()
solution..
13Design 4 Non-Blocking UI
- Design Replace Servlet UI with Plugin
Screensaver UI - The servlet case is odd, in that the doGet(..)
request method is a blocking, external Thread
call - As a point of comparison, create a Plugin-based
UI client that uses a non-blocking Cougaar thread
and standard blackboard pub/sub operations - Characteristics
- The UI plugin listens for a subscription change
instead of a lock notify() - This is a more standard Cougaar interaction
pattern - This approach isnt applicable for our
Servlet-based UI (but might fit a Swing UI)
Node
Agent
Requestor public void setupSubs() subscribe
to Jobs publishAdd(new Job) public void
execute(..) for all changed Jobs write
image as JPG
Calculator public void setupSubs() subscribe
to Jobs public void execute() for all added
Jobs compute image data
publishChange(job)
/tmp/out.jpg
Job
write
Blackboard
14Analysis Design 4
15Design Point Mixed Services/BB v.s. all BB
- Key design points
- Design 3
- Summary Servlet doGet() callback uses awkward
lock wait/notify to detect blackboard work
completion instead of an asynchronous
subscription - Benefit Required due to limitations of blocking
Servlet callback API - Downside Awkward mixed-metaphor of wait/notify
subscription changes - Design 4
- Summary All interaction is through blackboard
pub/sub options - Benefit Easy integration via subscriptions,
completely asynchronous - Downside Not applicable in the Servlet case.
- Most applications fit entirely into the
blackboard-friendly pub/sub pattern - The design often gets awkward when plugins must
interact with both blocking/callback services
plus blackboard pub/sub operations - Typically results in awkward todo lists to
switch threads - Ideally this can be avoided
16Design 5 Separate Job/Result Objects
- Design Instead of changing the Job, publish a
separate Result object - This makes it clear that the result is a separate
data structure - Well assume that were using the non-servlet
Requestor, as in design 4 - Characteristics
- The subscriptions now look for different data
structures (notice that arrows are one-way) - The Result object should have a pointer to the
Job, or have a shared unique job identifier
Node
Agent
Requestor public void setupSubs() subscribe
to Results publishAdd(new Job) public void
execute(..) for all added Results write
image as JPG
Calculator public void setupSubs() subscribe
to Jobs public void execute() for all added
Jobs compute image data publishAdd(new
Result)
/tmp/out.jpg
Job
Result
write
Blackboard
17Analysis Design 5
18Design Point Separate Results Object
- Key design points
- Design 4
- Summary Job has field for results data
- Benefit Fewer blackboard objects
- Downside Multiple writers to the same object,
to fill in result slot - Design 5
- Summary Calculator publishes a separate Results
object - Benefit Finer-grain subscriptions, publishAdd
driven - Downside More blackboard objects
- This is more a matter of style
19Design 6 Remote Processing
- Design Transfer the job to a remote agent
- Wrap the job in a relay
- Well assume that the master knows a-priori
about the single slave - Characteristics
- Can run the slave on a remote host (supports
remote processing) - Adds layer of Relay wrapping and processing
code to do our data transfer - Must transfer both the Job and its result-data
(two-way comms instead of shared memory)
Node 1
Node 2
Master Agent
Slave Agent
Servlet
Calculator
Relay
Relay copy
Job
Job
http
Blackboard
Blackboard
Messages
20Analysis Design 6
21Design Point Centralized v.s. Distributed
- Key design points
- Design 5
- Summary Single agent with shared blackboard
- Benefit Plugins can assume that everything is
on their local blackboard - Downside Limited to single host
- Design 6
- Summary Wrap job in Relay, transfer to remote
agent for processing - Benefit Distributed, partitions work and memory
across hosts - Downside Clutters plugin code with Relay
wrapping and addressing. No longer a shared
memory, so Relays must transfer data back
forth. - Relays (or similar mechanism) are used to
transfer data between blackboards - Required because agents dont support
shared-memory blackboards - Anytime you make something distributed you run
into well-known distributed processing
limitations (latency, robustness, etc) - The next design separates the Relay
wrapping/addressing from the non-transfer-related
plugin work
22Design 7 Remote Processing with Dispatcher
- Design Introduce concept of Dispatcher Plugin
- Separates Servlet/Calculator code from remote
transfer code - Still use relays to transfer jobs (an equivalent
option is to use task/allocation) - Characteristics
- Can implement different kinds of dispatch
policies as pluggable Dispatchers - Adds job management control in the Dispatcher
code - One more layer of thread switching indirection
(but thats often a good thing)
Node 1
Node 2
Master Agent
Slave Agent
Servlet
Dispatcher
Calculator
Receiver
http
Relay copy
Job
Relay
Job
Blackboard
Blackboard
Messages
23Analysis Design 7
24Design Point Use of Dispatcher Plugins
- Key design points
- Design 6
- Summary Domain plugins do Relay wrapping and
addressing - Benefit Fewer plugins
- Downside Clutters domain code, difficult to
enhance - Design 7
- Summary Introduce Dispatch / Receiver plugins
to handle Relay details - Benefit Cleans up design, supports pluggable
dispatch options - Downside Adds more indirection, more objects on
blackboard. - The Dispatcher design is often a good idea,
except in trivial cases where the added
flexibility would be overkill.
25Design 8 Load-balancing
- Design Support multiple worker agents
- Dispatcher can choose between slaves
- A job can be sent to any slave
- This allows to balance work between our slaves
- Allow multiple, dynamic slaves
- Slaves register with the master agent via a
Relay - Slave pulls down job, replies with results, and
pulls next job - Add concept of separate relays for
slave-to-master v.s. slave-to-master comms - Slave sends registration results via its relay
- Master sends new jobs via its relay
- Creates more of a unified comms channel, for
better error processing - Characteristics
- Can balance jobs between slaves (if we have more
jobs than slaves) - Ideally one agent per CPU, distributed across
hosts according to per-host CPU count - If we only have one job then this doesnt help,
since (in this design) we cant reduce jobs into
smaller tasks - Simple configuration via slave register,
instead of hard-coding slave names in the master - More adaptive we can dynamically support
added/removed slaves
Illustration on next slide..
26Design 8 Load balancing (2)
Node 0
Master Agent
Servlet
Dispatcher
http
Job A
from Slave1
from Slave2
to Slave1
to Slave2
Job B
http
Blackboard
Node 1
Node 2
Slave1 Agent
Slave2 Agent
Receiver
Calculator
Receiver
Calculator
to Master
to Master
Job A
Job B
from Master
from Master
Blackboard
Blackboard
27Analysis Design 8
28Design Point Single v.s. Load-balanced
- Key design points
- Design 7
- Summary Work is offloaded to a single remote
worker - Benefit Offloads work, relatively simple design
- Downside Only computes one job at a time, only
supports a single worker - Design 8
- Summary Work is dispatched to one of many
workers - Benefit Load-balances work, supports an
arbitrary number of workers - Downside More complex design, must choose which
slave to send work to. Parallelism is limited
to our job backlog. - The load-balanced solution is a general-purpose,
parallelized grid computer - However, were still limited by the granularity
of our Jobs.
29Design 9 Fine-Grained Parallel processing
- Design Divide the job into subtasks, allocate
tasks to remote agents - Add concept of Job-to-Task decomposition
- New Expander plugin decomposes Job into Tasks
- These Tasks are published on the blackboard
- Expanded detects when all tasks have been
completed, aggregates the result, and completes
the job - Can divide our Job into an arbitrary number of
Tasks, but ideally this is guided by the
Dispatchers knowledge of how many slaves we have - Characteristics
- Maximum parallelism.
- We can split a single Job across an arbitrary
number of slaves - We are no longer limited by our Job backlog
- Note that a complex Job representation is
required to support Task decomposition
incremental result updates
Illustration on next slide..
30Design 9 Fine-Grained Parallel processing(2)
Node 0
Master Agent
Servlet
Dispatcher
http
Expander
Job
Task 0
from Slave1
from Slave2
Task 1
to Slave1
to Slave2
Task N
Blackboard
Node 1
Node 2
Slave1 Agent
Slave2 Agent
Receiver
Calculator
Receiver
Calculator
to Master
to Master
Task
Task
from Master
from Master
Blackboard
Blackboard
31Analysis Design 9
32Design Point Load-balanced v.s. Parallel
- Key design points
- Design 8
- Summary Entire jobs are load-balanced between
workers - Benefit Offloads work, relatively simple design
- Downside No intra-job parallelism (but separate
jobs may run in parallel) - Design 9
- Summary Uses task decomposition and balanced
remote task allocation - Benefit Highly parallelized, can parallelize a
single job across multiple workers - Downside More complex design, must track and
re-assemble subtask results. Only works if the
job can be decomposed into independent,
parallelizable subtasks. - The primary tradeoff in this case is design
complexity. - This also assumes that we can decompose our Jobs
into arbitrarily small subtasks, which is not
true for all applications.
33Design 10 Support Dispatch Policies
- All the prior designs featured hard-coded
behaviors - Hard-coded or parameterized list slave agents
- Simple allocation rule allocate to next
available slave - As an enhancement, we could modify our plugins to
support more complex, policy-based behaviors. - Example Policies
- Timeout calculations and re-allocate to alternate
slave - Send same task to multiple slaves, to reduce
latency and add fail-over - Send multiple outstanding subtasks per slave, to
reduce network latency effects (i.e. keep working
while the results are being sent on the wire) - Allocate according to slave host metadata (e.g.
CPU speed, network latency, scheduling relative
to other work) - All of the above examples illustrate QoS
adaptation
34Analysis Design 10
35Design Point Hard-coded behavior v.s. Policies
- Key design points
- Design 98 (and prior)
- Summary Behavior is hard-coded or only supports
trivial parameterization. - Benefit Good enough for most applications.
- Downside Inflexible behavior.
- Design 10
- Summary Add plugin behavior options controlled
through policies - Benefit Pluggable / adaptive behavior
- Downside More complex to implement.
- The introduction of policies and behavior options
allows for a smarter application.
36Conclusions
37Design Analysis Summary
38Conclusions
- The prior slides showed many different ways to
build the same application but with different
system properties - The first couple designs are relatively simple
- Subsequent slides supported parallelism but are
more complex - Each design is valid and ideal in certain
environments - Each split of code/data introduces design
complexity - Splitting a plugin into multiple plugins requires
data coordination between the plugins, requiring - Coordination API (either a service or blackboard
pub/sub) - Data structures (must be internally synchronized)
- Splitting data across agents requires data
partitioning and transfer code - Must decide which data resides on which agent(s)
- Must transfer the data, typically via the
blackboard (e.g. Relays)
39Conclusions (2)
- Service-based API are useful in limited cases
- Ideal for wrapping simple libraries (e.g. log4j)
- Should be non-blocking and not require blackboard
access - Dont block pooled threads
- Requires a thread switch, otherwise youll get a
blackboard nested transaction problems - See the todo pattern and other (awkward)
workarounds - In contrast, blackboard interactions are
non-blocking - This is good in that it switches threads and
avoids blocking the plugin when performing remote
I/O, which increases parallelism - Its bad in that the plugin code must support an
asynchronous call and subsequent execute()
method resume when the result is published - The result is sometimes added bookkeeping state
in the plugin, to remember where prior async
calls left off. This is effectively a
continuation.