Title: Querying the Internet with PIER (PIER = Peer-to-peer Information Exchange and Retrieval)
1Querying the Internet with PIER(PIER
Peer-to-peer Information Exchange and Retrieval)
2What is PIER?
- Peer-to-Peer Information Exchange and Retrieval
- Query engine that runs on top of P2P network
- step to the distributed query processing at a
larger scale - way for massive distribution querying
heterogeneous data - Architecture meets traditional database query
processing with recent peer-to-peer technologies
3- Key goal is scalable indexing system for
large-scale decentralized storage applications on
the Internet - P2P, Large scale storage management systems
(OceanStore, Publius), wide-area name resolution
services
4What is Very Large?Depends on Who You Are
Internet scale systems vs. hundred node systems
Database Community
Network Community
- How to run DB style queries at Internet Scale!
5What are the Key Properties?
- Lots of data that is
- Naturally distributed (where its generated)
- Centralized collection undesirable
- Homogeneous in schema
- Data is more useful when viewed as a whole
6Who Needs Internet Scale?Example 1 Filenames
- Simple ubiquitous schemas
- Filenames, Sizes, ID3 tags
- Born from early P2P systems such as Napster,
Gnutella etc. - Content is shared by normal non-expert users
home users - Systems were built by a few individuals in their
garages ? Low barrier to entry
7Example 2 Network Traces
- Schemas are mostly standardized
- IP, SMTP, HTTP, SNMP log formats
- Network administrators are looking for patterns
within their site AND with other sites - DoS attacks cross administrative boundaries
- Tracking virus/worm infections
- Timeliness is very helpful
- Might surprise you how useful it is
- Network bandwidth on PlanetLab (world-wide
distributed research test bed) is mostly filled
with people monitoring the network status
8Our Challenge
- Our focus is on the challenge of scale
- Applications are homogeneous and distributed
- Already have significant interest
- Provide a flexible framework for a wide variety
of applications
9Four Design Principles (I)
- Relaxed Consistency
- ACID transactions severely limits the scalability
and availability of distributed databases - We provide best-effort results
- Organic Scaling
- Applications may start small, withouta priori
knowledge of size
10Four Design Principles (II)
- Natural habitat
- No CREATE TABLE/INSERT
- No publish to web server
- Wrappers or gateways allow the information to be
accessed where it is created - Standard Schemas via Grassroots software
- Data is produced by widespread software providing
a de-facto schema to utilize
11gtgtbased on Can
12Applications
- P2P Databases
- Highly distributed and available data
- Network Monitoring
- Intrusion detection
- Fingerprint queries
13 DHTs
- Implemented with CAN (Content Addressable
Network). - Node identified by hyper-rectangle in
d-dimensional space - Key hashed to a point, stored in corresponding
node. - Routing Table of neighbours is maintained. O(d)
-
14Given a message with an ID, route the message to
the computer currently responsible for that ID
(16,16)
(16,0)
(0,16)
(0,0)
15DHT Design
- Routing Layer
- Mapping for keys
- (-- dynamic as nodes leave and join)
- Storage Manager
- DHT based data
- Provider
- Storage access interface for higher
levels
16DHT Routing
- Routing layer
- maps a key into the IP address of the node
currently responsible for that key. Provides
exact lookups, callbacks higher levels when the
set of keys has changed - Routing layer API
- lookup(key) ? ipaddr (Asynchronous Fnc)
- join(landmarkNode)
- leave()
- locationMapChange()
17DHT Storage
Storage Manager stores and retrieves records,
which consist of key/value pairs. Keys are used
to locate items and can be any data type or
structure supported Storage Manager
API store(key, item) retrieve(key)?
item remove(key)
18DHT Provider (1)
- Provider
- ties routing and storage manager layers and
provides an interface - Each object in the DHT has a namespace,
resourceID and instanceID - DHT key hash(namespace,resourceID)
- namespace - application or group of object, table
or relation - resourceID primary key or any attribute(Object)
- instanceID integer, to separate items with the
same namespace and resourceID - Lifetime - item storage duration
- CANs mapping of resourceID/Object is equivalent
to an index
19DHT Provider (2)
- Provider API
- get(namespace, resourceID) ? item
- put(namespace, resourceID, item, lifetime)
- renew(namespace, resourceID, instanceID,
lifetime) ? bool - multicast(namespace, resourceID, item)
- lscan(namespace) ? items
- newData(namespace, item)
rID3
item
Node R1
(1..n)
Table R (namespace)
(1..n) tuples
(n1..m) tuples
rID2
item
Node R2
(n1..m)
rID1
item
20Query Processor
- How it works?
- performs selection, projection, joins, grouping,
aggregation -gtOperators - Operators push and pull data
- simultaneous execution of multiple operators
pipelined together - results are produced and queued as quick as
possible - How it modifies data?
- insert, update and delete different items via DHT
interface - How it selects data to process?
- dilated-reachable snapshot data, published by
reachable nodes at the query arrival time
21Join Algorithms
- Limited Bandwidth
- Symmetric Hash Join
- - Rehashes both tables
- Semi Joins
- - Transfer only matching tuples
- At 40 selectivity, bottleneck switches from
computation nodes to query sites
22Future Research
- Routing, Storage and Layering
- Catalogs and Query Optimization
- Hierarchical Aggregations
- Range Predicates
- Continuous Queries over Streams
- Sharing between Queries
- Semi-structured Data
23(No Transcript)
24(No Transcript)
25(No Transcript)
26Distributed Hash Tables (DHTs)
- What is a DHT?
- Take an abstract ID space, and partition among a
changing set of computers (nodes) - Given a message with an ID, route the message to
the computer currently responsible for that ID - Can store messages at the nodes
- This is like a distributed hash table
- Provides a put()/get() API
- Cheap maintenance when nodes come and go
27Distributed Hash Tables (DHTs)
- Lots of effort is put into making DHTs better
- Scalable (thousands ? millions of nodes)
- Resilient to failure
- Secure (anonymity, encryption, etc.)
- Efficient (fast access with minimal state)
- Load balanced
- etc.
28PIERs Three Uses for DHTs
- Single elegant mechanism with many uses
- Search Index
- Like a hash index
- Partitioning Value (key)-based routing
- Like Gamma/Volcano
- Routing Network routing for QP messages
- Query dissemination
- Bloom filters
- Hierarchical QP operators (aggregation, join,
etc) - Not clear theres another substrate that supports
all these uses
29Metrics
- We are primarily interested in 3 metrics
- Answer quality (recall and precision)
- Bandwidth utilization
- Latency
- Different DHTs provide different properties
- Resilience to failures (recovery time) ? answer
quality - Path length ? bandwidth latency
- Path convergence ? bandwidth latency
- Different QP Join Strategies
- Symmetric Hash Join, Fetch Matches, Symmetric
Semi-Join, Bloom Filters, etc. - Big Picture Tradeoff bandwidth (extra rehashing)
and latency
30Symmetric Hash Join (SHJ)
31Fetch Matches (FM)
32Symmetric Semi Join (SSJ)
- Both R and S are projected to save bandwidth
- The complete R and S tuples are fetched in
parallel to improve latency
33(No Transcript)
34(No Transcript)
35(No Transcript)
36Overview
- CAN is a distributed system that maps keys onto
values - Keys hashed into d dimensional space
- Interface
- insert(key, value)
- retrieve(key)
37Overview
y
State of the system at time t
Peer
Resource
Zone
x
In this 2 dimensional space a key is mapped to a
point (x,y)
38DESIGN
- D-dimensional Cartesian coordinate
- space (d-torus)
- Every Node owns a distinct Zone
- Map Key k1 onto a point p1 using a
- Uniform Hash function
- (k1,v1) is stored at the node Nx
- that owns the zone with p1
39- Node Maintains routing
- table with neighbors
- Ex A Node holdsB,C,E,D
- Follow the straight line path through
- the Cartesian space
40Routing
y
- d-dimensional space with n zones
- 2 zones are neighbor if d-1 dim overlap
- Routing path of length
- Algorithm
- Choose the neighbor nearest to the destination
Peer
(x,y)
Q(x,y)
Query/ Resource
41CAN construction
Bootstrap node
new node
42CAN construction
Bootstrap node
new node
1) Discover some node I already in CAN
43CAN construction
(x,y)
I
new node
2) Pick random point in space
44CAN construction
(x,y)
J
I
new node
3) I routes to (x,y), discovers node J
45CAN construction
new
J
4) split Js zone in half new owns one half
46Maintenance
- Use zone takeover in case of failure or leaving
of a node - Send your neighbor table to neighbors to inform
that you are alive at discrete time interval t - If your neighbor does not send alive in time t,
takeover its zone - Zone reassignment is needed
47Node Departure
- Some one has to take over the Zone
- Explicit hand over of the zone to one of its
Neighbors - Merge to valid Zone if possible
- If not Possible then to Zones are temporary
handled by the smallest neighbor
48Zone reassignment
1
3
1
3
2
4
4
2
Partition tree
Zoning
49Zone reassignment
1
3
1
3
4
4
Partition tree
Zoning
50Design Improvements
- Multi-Dimension
- Multi-Coordinate Spaces
- Overloading the Zones
- Multiple Hash Functions
- Topologically Sensitive Construction
- Uniform Partitioning
- Caching
51Multi-Dimension
- Increase in the dimension reduces the path
length
52Multi-Coordinate Spaces
- Multiple coordinate spaces
- Each node is assigned different zone in each of
them. - Increases the availability and reduces the path
length
53Overloading the Zones
- More than one peer are assigned to one zone.
- Increases availability
- Reduces path length
- Reduce per-hop latency
54Uniform Partitioning
- Instead of splitting directly splitting the node
occupant node - Compare the volume of its zone with neighbors
- The one to split is the one having biggest volume