Title: Cyber Entity Directory Service
1Cyber Entity Directory Service
2Goal
- To provide a directory service for rapid location
of Cyber Entities within the Bionet architecture. - Tasks
- Prove the directory is scalable.
- Prove the directory is efficient.
- Prove the directory is fault tolerant.
3Motivation
- In BIONET a CE may only maintain a finite number
of relations. - These relationships stabilize based upon
similarity and usefulness. - If a CE attempts to locate another CE that is not
closely related or directly useful, search may
take a long time. - What if a CE application provided assistance to
the search by organizing a distributed directory? - The CE directory provides a fast look up service
for other Cyber entities in the hopes of making
discovery more efficient. - Note Purpose in Bionet Reduces the amount of
stabilization time in a network because CEs have
an ordered lookup directory. This eliminates the
unbounded search.
4Motivation (cont.)
- What if network is highly dynamic and is
constantly undergoing stabilization? - Suppose a network is designed in such a fashion
that CEs are born and die frequently. - These CEs may not have enough time to establish
meaningful relationships. - Network may be slow because too many CEs are
searching too much of the network (Relationships
developed are not efficient enough) - Ex PDA / Cellphone network with PDAs and
Cellphones constantly coming on and off of the
network. - The Directory could allow direct access to
necessary application CEs thus facilitating
stability.
5Proposed Purpose
- To provide a mechanism for fast look up of CEs
based on a CEs published keyword. - CEs will not have the capacity to publish an
entire list of keywords. This facilitates
modular design. Each CE should have a single
purpose. That purpose is described by the
keyword. - It would become extremely much more demanding in
terms of directory load to allow CEs to use
multiple keywords. - The mechanism is NOT the standard query method.
It is meant only to enhance the relationship
method. - Suppose it takes 30 hops to find something in the
directory while it takes 4 hops to find the same
CE you are related to. Clearly the relational
query is faster. - Suppose it takes 30 hops to find something in the
directory but takes 100 hops to find a CE you are
not directly related to ( or have links
established to ). Clearly it is better to take
the directory. - In the Bionet, CEs must make their own decisions
on how to use the directory. The choice of when
to use it is up to the CE designer. However,
policies should be developed to guide the
designer towards the most efficient use. - Use the directory to look up CEs not closely
related to currently established relationships. - Use currently established relationships as a
cache mechanism. - Use the directory at CE birth to establish
relationships.
6Approach
- The directory is composed entirely of the CEs
that choose to be a part of the directory. - The choice is made be the CE designer.
- In order to be a part of the directory, a CE must
inherit properties from a special CE class. - The system has 3 types of CEs.
- The base level CE that contains methods and
properties to maintain the database. The other 2
CEs inherit from this CE/class. - Daemon CEs that facilitate communication across
platforms. ( Daemon CEs implement the base level
directory CE ). - Application level CEs. ( Application level CEs
implement the base level directory CE ).
7Approach
- How it works
- All CEs in the system have special relationships.
These relationships are permanent. - Left Child
- Right Child
- Parent
- When a platform is started, it connects to other
platforms. A bootstrap process establishes a
Daemon directory CE on the platform. Only one
instance of a Daemon may run on a platform at a
time. The Daemon may not die. - Daemon CEs work to setup and maintain distributed
tree structure. The concept must be thought of
recursively.
8Approach
- If a distributed tree exists then new Daemons /
platforms insert themselves into the tree by
contacting a Daemon on the platform where the
Daemon migrated from. - CEs in the directory are ordered off of a single
published Keyword. - They insert themselves in binary tree fashion.
Since each CE in the directory has a left,right,
and parent it can search for the location the new
CE should be located. - When the location is found, the CE is inserted at
that spot, and a balancing algorithm progresses
up to balance the tree. - If no tree exists ( the first instance of the
directory structure ) then create a new tree with
the current Daemon as the top of the tree.
9Approach
- When the Daemon has been established on a
platform, CEs that implement the base level
directory CE may now insert themselves into the
directory. They do so in the same fashion that
the Daemon CEs did as described above by
contacting the local Daemon. - When a CE is dying it sends a message to its
children and parent. The message tells the
parent and children to coordinate in such a
fashion as to replace the dying node ( ie. The
left child moves up ). At this point a
rebalancing occurs. - When a CE migrates it sends a message to its
children and parent. The message tells the
parent and children the new location that the CE
is migrating too.
10Approach
- Tree balancing.
- Tree balancing is needed to handle the following
cases - Host failure When a host goes down the tree
cannot fall apart. Sub trees must realize that
there has been a host failure and merge with the
main tree. The main tree can be found using the
Daemon nodes. - Multiple directory locations If a failure occurs
in such a fashion that the Bionet splits and
multiple directories reestablish themselves, the
directories must be able to merge themselves
together efficiently. - Insertion / Deletion When insertion and deletion
happen there needs to be a weight balance
algorithm that ensures the tree has O(log(n))
depth. - Unexpected CE death If a CE dies for some reason
without completing its death state, the sub trees
of the CE must be merged back into the tree. (
similar to a host failure ).
11Approach
- Tree balancing should guarantee some type of
bound on the time for - Insertion
- Deletion
- Search
- Merging
- The bound can either be rigid or amortized.
12Approach
- Tree synchronization and locking
- Due to balancing and merging functionality
certain locking procedures must be in place. - What if a CE is searching the tree while a local
node is rebalancing? The CE may take the wrong
path down the tree. - What if 2 sub trees try and merge to the same
point in the tree?
13Approach
- I am currently looking at algorithms for
maintaining parallel trees, mergeable trees, and
amortized trees. A suitable algorithm must be
designed out of these fields to build the
balancing algorithm the directory tree will
implicitly maintain.
14Approach (cont.)
- Distribute the CEs over the network.
- The CEs are now free to behave like all other
CEs. They can migrate over the network as much as
they want because each migration requires only 3
updates to the tree. ( Left child, right child,
parent ). The updates are done in constant time
because a direct relationship is maintained.
15Approach (cont.)
- Make the application an extension of Bionet that
CEs chose to participate in rather than be
forced to. - If CEs can satisfy all their needs effectively
without the directory, then they can exclude
themselves from the directory. This leaves fewer
CEs in the directory, thus reducing search
resources overall.
16Efficiency
- Search, insert, delete Time Sufficient
algorithms exist to allow search times to be
guaranteed at O(log(n)) to O( square root(n) ). - Weight balance trees ( BBalpha, AVL ).
- Red-Black trees / 2 3 4 trees.
17Efficiency
- Merge Time Using the datastructures listed
previously we can guarantee a bound on merging
two trees - a is the number of nodes in tree A.
- b is the number of nodes in tree B.
- Inserting each value of A into B takes
- O(a log(a)) O( a log(b))
- If a is larger O(a log(a) )
- If b is larger O( a log(b) )
18Efficiency
- This is the naive algorithm however. Do better
algorithms exist? ( probably ) I am looking into
this. - However, we can assume that large merge
operations do not occur frequently since they
only occur during host failure or CE failure. - Given M hosts who equally distribute CEs, the
probability of a top level failure is 1 / M for
any given host failure. For a top level failure
we must merge two trees of n/2 size. This takes
O(n/2 log(n)) O(n/2 log(n) ) time. Resulting
in O(nlog(n)). - Furthermore, if a failure occurs, the sub trees
exist in a special state that may be more easily
repaired then just merging two random binary
trees.
19Efficiency
Failure occurs at the red relationship.
If a failure occurs at the red relationship the
resulting sub trees have a special property The
left and right sub tree each maintain a Somewhat
disjoint domain.
20Efficiency
A
B
lt
The two trees can be merged together because
their domains to not overlap ( provided the tree
has not changed ).
21Alternate Solutions
- Allow only BIONET relationships and current
discovery mechanisms to handle discovery. - Does not allow for rapid lookup of unrelated
entities. - Requires a period of time before the network
becomes stable enough to search efficiently.
22Alternate Solutions
- Distributed Consistent Hashing ( Chord protocol
applied to BIONET. http//pdos.ics.mit.edu/chord/
) - Provides good upperbound on both maximum number
of links to other agents as well as search time.
approx O(log( n )) - n agents requires nlog(n) total relationships in
the system. - Our system needs 3 n.
23Alternate Solutions (cont.)
- Adapt a graph theory approach like the HITS or
HyperClass algorithm for webcrawling. - Must be implemented at the CE level so all CEs
are a part of the directory, not just a subset. - Similar to Freenet, but instead of node becoming
good at searching an area, a node gets ranked on
how well it searches.
J. Kleinberg, S.R. Kumar, P. Raghavan, S.
Rajagopalan, and A. Tomkins, The web as a graph
Measurements, models and methods, Proceedings of
the International Conference on Combinatorics and
Computing 1999.
24Advantages
- Applications that wish to be diverse may
implement a protocol to talk to the CE directory
service. This would link highly mobile and
diverse agents together, while leaving relatively
stagnant agents out of the directory, keeping the
size of the directory limited and thus faster.
25Advantages
- Provable bounds
- Search times of O(log(n)).
- Insertion / Deletion times of O(log(n)).
- Merge times of O(nlog(n)) with unrelated trees (
two independent directories merging ) - Merge times of O(log(n)) with tree failures.
26Summary
- To provide a directory service for rapid location
of Cyber Entities within the Bionet architecture. - Tasks
- Prove the directory is scalable.
- Prove the directory is efficient.
- Prove the directory is fault tolerant.
27Summary
- This all boils down to finding good merging
algorithms.