A Symmetric and Polyvalent Resource Location System - PowerPoint PPT Presentation

1 / 54

About This Presentation

Title:

A Symmetric and Polyvalent Resource Location System

Description:

attribute/value pairs ( Resource/ Query properties) Constraint section ... Define the valid attributes and values that can appear in the description of a ... – PowerPoint PPT presentation

Number of Views:37

Avg rating:3.0/5.0

Slides: 55

Provided by: lya1

Learn more at: http://people.cs.uchicago.edu

Category:

more less

Transcript and Presenter's Notes

Title: A Symmetric and Polyvalent Resource Location System

1
A Symmetric and Polyvalent Resource Location
System

Candidate Chuang Liu
Advisor Ian Foster
University of Chicago

2
Growth of the Internet

The broad deployment of the Internet and the
emergence of service-oriented architectures have
led to a remarkable increase in the number of
resources to which a user, program, or community
may have access.

3
Infrastructures of Resource Pools
Pools 1079 CPUs 105146
Sites 100 CPUs 100,000
Condor
Globus
Planetlab
Gnutella
Sites 298 CPUs 629
Node 1.5 M
4
Applications and Challenges

Applications
Scientific computing application
Content distribution systems
On-demand and utility computing
Challenges
Applications need to run on one resource (or
resource collection) with desired individual and
aggregation properties to achieve good
performance or efficiency
Resources are heterogeneous and dynamic
Large number of resources ? selection expensive
Resource owners impose policies concerning, e.g.,
who can use a resource and for what purpose
In Internet environments, resources are
distributed

5
We Hypothesize a Unifying MechanismResource
Location Service

We need efficient algorithms for polyvalent
queries, e.g.
Resource set based on their aggregation
properties
Resource set based on their network locations

In Internet environments, resources are
distributed
Organization of information, distributed query
evaluation
? Scalable Internet resource location service

6
Outline

Resource and query description
Data Model
Syntax
Search algorithms
A computer location service
Summary

7
Requirements

Resource description
Resource properties
Query description
Search condition constraints on resource
properties
Traditionally, resources and queries look
different
MDS, UDDI, etc.

Access policies constraints on user properties
User properties
We want to treat resources queries as symmetric
Condor pioneered such an approach
(matchmaking)But many limitations in its features
8
Symmetric Data Model

A (query or resource) description
Data section
attribute/value pairs ( Resource/ Query
properties)
Constraint section
constraints on properties (Access policy / search
condition)
Rank section
Symmetric evaluation
A query and a (set of ) resource(s) match each
other if all constraints in their descriptions
are satisfied
Focus here on 1-1 and 1-N matches
Have addressed N-N in other work CCGrid 2005

9
Syntax

Description uses XML-based syntax
Extensibility
XML Schema
Define the valid attributes and values that can
appear in the description of a particular type
resource/query

ltrldescription typecomputergt
ltrldata_sectiongt ltcomposgtlinuxlt/compo
sgt ltrlif condition'userorgA'gt
ltcompdisksizegt100lt/compdisksizegt
ltcompbandwidthgt100lt/compbandwidthgt
lt/rlifgt lt/rldata_sectiongt
ltrlrequirement_sectiongt ltrlif
condition'userorgB'gt
ltrlconstraint nameaccess time
errmsgnot accessiblegt
rq.useraccesstime between (600PM, 600AM)
lt/rlconstraintgt lt/rlifgt
lt/rlrequirement_sectiongt ltrlrank_sectiongtlt/rl
rank_sectiongt lt/rldescriptiongt
10
New Features(Relative to Previous Approaches)

Resources may show different properties or
different access policies to different users
Condition structure
If( condition1 ) attribute value1
If( condition2 ) attribute value2
Option structure
attribute1 value2, attribute2 value3
or
attribute1 value3, attribute2 value4
Queries for resource sets
Constraints on aggregation properties of resource
set
rs1 ISASET computer sum(rs1.memorysize) gt
100rank -count(Rs1)

A Constraint Language Approach to Matchmaking.
Liu, C., Foster, I., 14th Intl Workshop on
Research Issues on Data Engineering (RIDE 2004),
Boston, 2004.
11
Outline

Resource and query description
Search algorithms
A computer location service
Summary and future work

12
Search Algorithms

Locating one resource with desired properties
MDS, Condor Matchmaker, RGIS, Gnutella, UDDI
Relational and other databases
Locating resource set with desired properties
(polyvalent queries)
a) Resource sets with required aggregation
properties
b) Resource sets with required network connections

13
Queries with Aggregation PropertiesExtending
Relational Databases

A query for a resource set with aggregation
properties can be represented by a database query
requiring the simultaneous satisfaction of
arithmetic constraints on multiple attributes
(ACMA) from different relations
Database search engine solves ACMA queries by
join operations. Unfortunately, current
algorithms have poor performance.
? Introduce ACMA Join operator and ACMA query
evaluation plan

SELECT FROM T as A, T as B, T as C, T as D
WHERE A.price
B.price C.price D.price lt 5
AND A.cpuSpeed
B.cpuSpeed C.cpuSpeed D.cpuSpeed gt 100
AND A.memory B.memory C.memory
D.memory gt 100
14
Execution Plan of ACMA Join

SELECT FROM T as A, T as B, T as C, T as D
WHERE
A.price B.price C.price D.price lt 5
AND
A.cpuSpeed B.cpuSpeed C.cpuSpeed D.cpuSpeed
gt 100 AND A.memory B.memory
C.memory D.memory gt 100
15
Implementation of ACMA Join

Selection operators
Use consistency algorithm to initialize selection
conditions, which is range constraints on single
attributes, in selection operators
Constrained join operator
Extends nested-loop join operator
Use consistency algorithms to foretell if an
intermediate result will lead to any final query
results

Consistency algorithm
Range constraints on single attribute
ACMA query
ACMA query Intermediate result
Consistency algorithm
Yes/no
16
Evaluation of Our Method

Traditional plan
Plan with selection operators
Plan with selection operators
and constrained join operator

17
Performance ExperimentsExample Results

Plan I reads from 104 to 106 times more tuples
than do the other two plans
Plan III performs a factor of ten times fewer
tuple reads than does plan II.

Efficient Combinatorial Search in Relational
Databases, Liu, C., Yang, L., Foster, I., 9th
International Database Applications and
Engineering Symposium (IDEAS 2005), Montreal, 2005
18
Outline

Resource and query description
Search algorithms
Resource sets with required aggregation
properties
gt Resource sets with required network connections
A computer location service
Summary

19
Resource Set with Required Network Connection

Locate a set of resources with particular network
connections in the Internet.
Q1 Find a set of R resources close to each
other
The network latency between any pair of those
resources is less than L milliseconds
Useful for e.g. computational applications
Q2 Find a set of R resources far from each
other
The network latency between any pair of those
resources is more than L milliseconds
Useful for e.g. content distribution applications

20
Challenges

Direct computation
Such as tree search algorithm
Challenges
It is a NP-hard problem
It may require a large number of measurements
Unstable networks and resources may lead to
individual measurements failing ? only partial
data
Network latency data is noisy because of the
sharing of network resources among users

21
Intuition of Our Heuristic Method

Clustering
We partition resources into clusters based on
end-to-end network latency
A cluster is set of resources having much smaller
latency with each other than with other resources
Search based on the cluster structure
Q1. Search for resources in a cluster
Q2. Search for resources from different clusters

22
Outline

Resource and query description
Search algorithms
Resource sets with required aggregation
properties
Resource sets with required network connection
Cluster Algorithms
Cluster Algorithm I
Cluster Algorithm II
Search Algorithm
A computer location service
Summary

23
Cluster Algorithm I Resource Pool

Resource pools such as OSG, PlanetLab, etc.
Hundreds of resources
Resources are relatively stable
Latency measurements between resources exist
Available latency measurements are only a subset
of all possible measurements

Latency data on PlanetLab Collected by Stribling
24
Cluster Algorithm I

Markov cluster algorithm Dongen 2000
If there are many short paths between two
resources, it is highly possible that these two
resources have a small latency, and therefore
belong to the same cluster
Details in
S. Dongen A cluster algorithm for graphs, 2000

25
Effectiveness of the Cluster Algorithm

Compute cluster structures using 10-90 of data.
Quantify, as fraction of changes D, difference
between each structure and the structure obtained
with all data
? We conclude that the cluster algorithm is still
effective when running on an incomplete set of
data

Frac 90 80 70 60 50 40 30 20 1
D 0.06 0.145 0.152 0.161 0.198 0.228 0.336 0.38 0.46
26
Variation of the Cluster Structure

Compare each clustering structure with the one
based on data one, two and four hours ago.

30 of cluster structures change less than 10
from one hour ago
gt60 of cluster structures change between 10 and
15 from one hour ago
Difference does not increase over time

Efficient and Robust Computation of Resource
Clusters in the Internet, Liu, C., Foster, I. 6th
IEEE International Conference on Cluster
Computing (Cluster 2005), Boston, 2005
27
Outline

Resource and query description
Search algorithms
Resource sets with required aggregation
properties
Resource sets with required network connection
Cluster Algorithms
Cluster Algorithm I
Cluster Algorithm II
Search Algorithm
A computer location service
Summary

28
Cluster Algorithm II Resource Pool

Resource pools such as Gnutella, Kazaa, etc.
Resources join the resource pool incrementally
Very large number of resources
Very expensive to measure and store latency
between all resources
Requirements
Incrementally modify cluster structure when
resources leave and join the resource pool
Only a modest number of latency measurements
Need small storage space

29
Hierarchical Cluster Structure

Storage space O(N)

Average Standard deviation
30
Incremental Cluster Algorithm

Number of Measurements Log(N)

31
Outline

Resource and query description
Search algorithms
Resource sets with required aggregation
properties
Resource sets with required network connections
Cluster Algorithms
gt Search Algorithm
A computer location service
Summary

32
Modified Tree Search Algorithm

Tree search algorithm
Starts with an empty set
Repeatedly picks from available resources one
resource that has required connections with
current members in the set, and adds it to the
set
Rolls back the addition in previous step if no
such resource exists
Finishes when the set contains all required
resources
Modified tree search algorithm
Q1 pick resources from the same clusters
Q2 pick resources from different clusters

33
Evaluation of Performance

Cumulative distribution of execution time
Our algorithm answers 70 of queries within a few
milliseconds

Algorithm 70 90
tree 0.6 s 26 s
modified 1.6 ms 0.4 s
34
Outline

Resource and query description
Search algorithms
Resource sets with required aggregation
properties
Resource sets with required network connections
gt A computer location service
Summary

35
Computer Location Service

Build a resource location service for computers
connected by Internet
Requirements
Support polyvalent queries for computer sets
Support queries for one computer with
requirements on multiple properties
Support queries based on network locations
Support resource access policy
Scalable to handle large number of computers and
queries

36
Related Work
We need a new service
37
System Structures
Centralized structure Short response time Poor
scalability E.g., MDS2, Napster, UDDI
Peer-to-peer structure Good scalability Long
response time Poor support of queries for
resource setE.g., Gnutella 0.4, SWORD
Super-peer structure Medium response time Good
scalability Good support of queries for resource
set E.g., Gnutella 0.6, Kazaa
38
Super-peer Structure

Partition computers based on the latency
hierarchy
One computer in each group acts as the super-peer
Advantages
Answer polyvalent queries locally
Support queries for computer based on their
network location
Low network traffic
Cannot find solutions that span groups

39
Load Balance

Update of computer information
Each computer reports to the super-peer in its
group
Query processing
Each computer knows about K super-peers and sends
queries to them randomly

40
Fault Tolerance

Restart of a super-peer
A super-peer periodically sends out a backup list
to each computer managed by it
If a super-peer fails, all related computers
report to the first computer in the backup list
Recovery of data in a super-peer
Each computer reports to the new super-peer its
clusterID that will be used to reconstruct the
cluster structure

41
Work Remaining to be Done

Measure
Query success rates
Query response times
Average and maximum input/output traffic
For
Our super-peer structure and algorithm
Random super-peer structure and our algorithm
Others?
Using
Workloads TBD
Assuming
Computer characteristics change randomly

42
Outline

Description of resources and queries
Matchmaking algorithms
An algorithm to locate resource sets with
required aggregation properties
Algorithms for locating resource sets with
required network connection
A matchmaking service
Summary

43
My Contributions

A matchmaking language to describe resources and
queries
Symmetric mechanism that enables both resource
owner and requesters to control matching between
resources and queries
Support polyvalent queries
Fast algorithms to solve polyvalent queries that
search for a resource set with desired
aggregation properties and network connections
Order-of-magnitude(s) faster than other
approaches
Scalable resource location service that supports
a large set of queries for networked computers
Evaluation in progress

44
Publications

Efficient and Robust Computation of Resource
Clusters in the Internet, Liu, C., Foster, I.,
6th IEEE International Conference on Cluster
Computing (Cluster 2005), Boston, 2005
Matchmaking Systems A Survey, Liu, C., Foster,
I., unpublished document, 2005
Efficient Combinatorial Search in Relational
Databases, Liu, C., Yang, L., Foster, I., 9th
International Database Applications and
Engineering Symposium (IDEAS 2005), Montreal,
2005
Online Resource Matching in a Heterogeneous Grid
Environment, Naik, V., Liu, C., Yang, L., Wagner,
J., 6th IEEE International Symposium on Cluster
Computing and the Grid (CCGrid 2005), Cardiff,
UK, 2005.
DB_CSP A Framework and Algorithms for Applying
Constraint Solving within Relational Databases,
Liu, C., Foster, I., 19th Workshop on
(Constraint) Logic Programming (WLP 2005), Ulm,
Germany, 2005.
A Constraint Language Approach to Matchmaking.
Liu, C., Foster, I., 14th International Workshop
on Research Issues on Data Engineering (RIDE
2004), Boston, 2004.
Scheduling in the Grid Application to Grid
Resource Selection. Dail, H., Sievert, O.,
Berman, F., Casanova, H., Yarkhan, A., Vadhiyar,
S., Dongarra, J., Liu, C., Yang, L., Angulo, D.,
Foster, I., In Grid Resource Management, Kluwer
Publishing, 2003.
Design and Evaluation of a Resource Selection
Framework. Liu, C., Yang, L., Foster, I. and
Angulo, D., 11th IEEE International Symposium on
High Performance Distributed Computing (HPDC-11),
Edinburgh, Scotland, 2002.
The Cactus Worm Experiments with Dynamic
Resource Discovery and Allocation in Grid
Environments, Allen, G., Angulo, D., Foster, I.,
Lanfermann, G., Liu, C., Radke, T., Seidel, E.,
Shalf, J., International Journal of Supercomputer
Applications, Winter, 2001, v15(4).

Questions?
Thank you

46
Consistency algorithm

ACMA
Consistency algorithm

Logic operators, such as gt, lt, , etc.
attributes
constants
47
To Do

Refine slide 23.

48
Infrastructures
Sites 27 Users 100 CPUs 2700
Sites 100 Users 25,000 CPUs 100,000 Data
10 PB
Sites 298 CPUs 629
Condor
Pools 1079 CPUs 105146
Node 1.5 M
49
Protocol
50
System Structures

Centralized vs. peer-to-peer vs. super-peer
structure
Reasons to choose super-peer structure
It is necessary to aggregate computer information
to process polyvalent queries efficiently
Balance between scalability and efficiency
Suitable for queries with high selectivity

51
Incremental Cluster Algorithm

Number of Measurements N Log(N)

52
Benchmark

Three relations A, B, and C with two attributes
K1000 and K10000
Values of K1000 (K10000) distribute uniformly
from 1 to 1000 (10000). (Wisconsin benchmark)
Values of K1000 (K10000) follow a normal
distribution with medium value 500 and standard
division 250 (medium 5000 and standard division
2500 )
Query
SELECT FROM A, B, C WHERE
A.K1000 B.K1000 C.K1000 gt N1
AND
A.K1000 B.K1000 C.K1000 lt N2
AND
A.K10000 B.K10000 C.K10000 gt N3

53
New Features

Resources may show different properties or
different access policies to different users
Condition structure
Option structure
Queries for resource set
constraints on aggregation properties of resource
set, such as connected(), etc

A Constraint Language Approach to Matchmaking.
Liu, C., Foster, I., Proceedings of the 14th
International Workshop on Research Issues on Data
Engineering (RIDE 2004), Boston, 2004.
54
Cluster Structure of Resources on Planetlab