PeertoPeer P2P and Sensor Networks

About This Presentation

Title:

PeertoPeer P2P and Sensor Networks

Description:

http://www.ecse.rpi.edu/Homepages/shivkuma ... 10 billions of Mhz CPUs. 10000 terabytes of storage. Clients are not that dumb after all ... – PowerPoint PPT presentation

Number of Views:39

Avg rating:3.0/5.0

Slides: 88

Provided by: ShivkumarK7

Category:

more less

Transcript and Presenter's Notes

Title: PeertoPeer P2P and Sensor Networks

1
Peer-to-Peer (P2P) and Sensor Networks

Shivkumar Kalyanaraman
Rensselaer Polytechnic Institute
shivkuma_at_ecse.rpi.edu
http//www.ecse.rpi.edu/Homepages/shivkuma
Based in part upon slides of Don Towsley, Ion
Stoica, Scott Shenker, Joe Hellerstein, Jim
Kurose, Hung-Chang Hsiao, Chung-Ta King

2
Overview

P2P networks Napster, Gnutella, Kazaa
Distributed Hash Tables (DHTs)
Database perspectives data-centricity,
data-independence
Sensor networks and its connection to P2P

3
P2P Key Idea

Share the content, storage and bandwidth of
individual (home) users

Internet
4
(No Transcript)
5
(No Transcript)
6
What is P2P (Peer-to-Peer)?

P2P as a mindset
Slashdot
P2P as a model
Gnutella
P2P as an implementation choice
Application-layer multicast
P2P as an inherent property
Ad-hoc networks

7
P2P Application Taxonomy
P2P Systems
Distributed Computing SETI_at_home
File Sharing Gnutella
Collaboration Jabber
Platforms JXTA
8
How to Find an Object in a Network?
Network
9
A Straightforward Idea
Use a BIG server
Store the object
How to do it in a distributed way?
Network
Provide a directory
10
Why Distributed?

Client-server model
Client is dumb
Server does most things (compute, store, control)
Centralization makes things simple, but
introduces
Single point of failure, performance bottleneck,
tighter control, access fee and manage cost,
ad hoc participation?
Estimate of net PCs
10 billions of Mhz CPUs
10000 terabytes of storage
Clients are not that dumb after all
Use the resources in the clients (at net edges)

11
(No Transcript)
12
(No Transcript)
13
First Idea Napster

Distributing objects, centralizing directory

Network
14
(No Transcript)
15
(No Transcript)
16
(No Transcript)
17
Today P2P Video traffic is dominant

Source cachelogic Video, bittorrent, edonkey !

18
40-60 P2P traffic
19
2006 p2p Data

Between 50 and 65 percent of all download traffic
is P2P related.Between 75 and 90 percent of all
upload traffic is P2P related.
And it seems that more people are using p2p
today
In 2004 1 CacheLogic-server registered 3 million
IP-addresses in 30 daysIn 2006 1
CacheLogic-server registered 3 million
IP-addresses in 8 days
So what do people download?
61,4 percent video11,3 percent audio27,2
percent is games/software/etc.
The average filesize of shared files is 1
gigabyte!
Source http//torrentfreak.com/peer-to-peer-traff
ic-statistics/

20
(No Transcript)
21
A More Aggressive Idea

Distributing objects and directory

Blind flooding!
How to find objects w/o directory?
Network
22
(No Transcript)
23
Gnutella

Distribute file location
Idea flood the request
Hot to find a file
Send request to all neighbors
Neighbors recursively multicast the request
Eventually a machine that has the file receives
the request, and it sends back the answer
Advantages
Totally decentralized, highly robust
Disadvantages
Not scalable the entire network can be swamped
with request (to alleviate this problem, each
request has a TTL)

24
Gnutella Unstructured P2P

Ad-hoc topology
Queries are flooded for bounded number of hops
No guarantees on recall

xyz
Query xyz
25
Now Bittorrent Edonkey2000! (2006)
26
Lessons and Limitations

Client-Server performs well
But not always feasible
Ideal performance is often not the key issue!
Things that flood-based systems do well
Organic scaling
Decentralization of visibility and liability
Finding popular stuff
Fancy local queries
Things that flood-based systems do poorly
Finding unpopular stuff Loo, et al VLDB 04
Fancy distributed queries
Vulnerabilities data poisoning, tracking, etc.
Guarantees about anything (answer quality,
privacy, etc.)

27
Detour . Bittorrent
28
(No Transcript)
29
(No Transcript)
30
BitTorrent joining a torrent
metadata file
peer list
join
datarequest

Peers divided into
seeds have the entire file
leechers still downloading

1. obtain the metadata file
2. contact the tracker
3. obtain a peer list (contains seeds leechers)
4. contact peers from that list for data
31
BitTorrent exchanging data
!
I have
? Verify pieces using hashes
? Download sub-pieces in parallel
? Advertise received pieces to the entire peer
list
? Look for the rarest pieces
32
BitTorrent - unchoking
? Periodically calculate data-receiving rates
? Upload to (unchoke) the fastest downloaders
? Optimistic unchoking ? periodically select
a peer at random and upload to it ?
continuously look for the fastest partners
33
End of Detour .
34
Back to P2P Structures

Unstructured P2P architecture
Napster, Gnutella, Freenet
No logically deterministic structures to
organize the participating peers
No guarantee objects be found
How to find objects within some no. of hops?
Extend hashing
Structured P2P architecture
CAN, Chord, Pastry, Tapestry, Tornado,
Viewed as a distributed hash table for directory

35
How to Bound Search Quality?

Many ideas , again

Work on placement!
Network
36
High-Level Idea Indirection

Indirection in space
Logical (content-based) IDs, routing to those IDs
Content-addressable network
Tolerant of churn
nodes joining and leaving the network
Indirection in time
Want some scheme to temporally decouple send and
receive
Persistence required. Typical Internet solution
soft state
Combo of persistence via storage and via retry
Publisher requests TTL on storage
Republishes as needed
Metaphor Distributed Hash Table

hz
37
Basic Idea
P2P Network
Publish (H(y))
Join (H(x))
Object y
Peer x
H(y)
H(x)
Peer nodes also have hash keys in the same hash
space
Objects have hash keys
y
x
Hash key
Place object to the peer with closest hash keys
38
Distributed Hash Tables (DHTs)

Abstraction a distributed hash-table data
structure
insert(id, item)
item query(id) (or lookup(id))
Note item can be anything a data object,
document, file, pointer to a file
Proposals
CAN, Chord, Kademlia, Pastry, Tapestry, etc
Goals
Make sure that an item (file) identified is
always found
Scales to hundreds of thousands of nodes
Handles rapid arrival and failure of nodes

39
Viewed as a Distributed Hash Table
0
2128-1
Hash table
Peer node
Each is responsible for a range of the hash
table, according to the peer hash key Objects
are placed in the peer with the closest key
Note that peers are Internet edges
40
How to Find an Object?
0
2128-1
Hash table
Peer node
Want to keep only a few entries!
one hop to find the object
Simplest idea Everyone knows everyone else!
41
Structured Networks

Distributed Hash Tables (DHTs)
Hash table interface put(key,item), get(key)
O(log n) hops
Guarantees on recall

42
Content Addressable Network, CAN

Distributed hash table
Hash table as in a Cartesian coordinate space
A peer only needs to know its logical neighbors
Dimensional-ordered multihop routing

43
Content Addressable Network (CAN)

Associate to each node and item a unique id in an
d-dimensional Cartesian space on a d-torus
Properties
Routing table size O(d)
Guarantees that a file is found in at most dn1/d
steps, where n is the total number of nodes

44
CAN Example Two Dimensional Space

Space divided between nodes
All nodes cover the entire space
Each node covers either a square or a rectangular
area of ratios 12 or 21
Example
Node n1(1, 2) first node that joins ? cover the
entire space

7
6
5
4
3
n1
2
1
0
2
3
4
6
7
0
1
5
45
CAN Example Two Dimensional Space

Node n2(4, 2) joins ? space is divided between
n1 and n2

7
6
5
4
3
n2
n1
2
1
0
2
3
4
6
7
0
1
5
46
CAN Example Two Dimensional Space

Node n2(4, 2) joins ? space is divided between
n1 and n2

7
6
n3
5
4
3
n2
n1
2
1
0
2
3
4
6
7
0
1
5
47
CAN Example Two Dimensional Space

Nodes n4(5, 5) and n5(6,6) join

7
6
n5
n4
n3
5
4
3
n2
n1
2
1
0
2
3
4
6
7
0
1
5
48
CAN Example Two Dimensional Space

Nodes n1(1, 2) n2(4,2) n3(3, 5)
n4(5,5)n5(6,6)
Items f1(2,3) f2(5,1) f3(2,1) f4(7,5)

7
6
n5
n4
n3
5
f4
4
f1
3
n1
n2
2
f3
1
f2
0
2
3
4
5
6
7
0
1
49
CAN Example Two Dimensional Space

Each item is stored by the node who owns its
mapping in the space

7
6
n5
n4
n3
f4
5
4
f1
3
n2
n1
2
f3
1
f2
0
2
3
4
6
7
0
1
5
50
CAN Query Example

Each node knows its neighbors in the d-space
Forward query to the neighbor that is closest to
the query id
Example assume n1 queries f4
Can route around some failures

7
6
n5
n4
n3
f4
5
4
f1
3
n2
n1
2
f3
1
f2
0
2
3
4
6
7
0
1
5
51
Another Design Chord

Node and object keys
random location around a circle
Neighbors
nodes 2-i around the circle
found by routing to desired key
Routing greedy
pick nbr closest to destination
Storage own interval
node owns key range betweenher key and previous
nodes key

?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
Ownership range
?
?
52
OpenDHT

A shared DHT service
The Bamboo DHT
Hosted on PlanetLab
Simple RPC API
You dont need to deploy or host to play with a
real DHT!

53
Review DHTs vs Unstructured P2P

DHTs good at
exact match for rare items
DHTs bad at
keyword search, etc. cant construct DHT-based
Google
tolerating extreme churn
Gnutella etc. (unstructured P2P) good at
general search
finding common objects
very dynamic environments
Gnutella etc. bad at
finding rare items

54
Distributed Systems Pre-Internet

Connected by LANs (low loss and delay)
Small scale (10s, maybe 100s per server)
PODC literature focused on algorithms to achieve
strict semantics in the face of failures
Two-phase commits
Synchronization
Byzantine agreement
Etc.

55
Distributed Systems Post-Internet

Very different context
Huge scales (thousands if not millions)
Highly variable connectivity
Failures common
Organic growth
Abandoned distributed strict semantics
Adaptive apps rather than guaranteed
infrastructure
Adopted pairwise client-server approach
Server is centralized (even if server farm)
Relatively primitive approach (no sophisticated
dist. algms.)
Little support from infrastructure or middleware

56
A Database viewpoint on DHTs Towards
Data-centricity, Data Independence
57
Host-centric Protocols

Protocols defined in terms of IP addresses
Unicast IP address host
Multicast IP address set of hosts
Destination address is given to protocol
Protocol delivers data from one host to another
unicast conceptually trivial
multicast address is logical, not physical

58
Host-centric Applications

Classic applications destination is intrinsic
telnet target machine
FTP location of files
electronic mail email address turns into mail
server
multimedia conferencing machines of participants
Destination is specified by user (not network)
Usually specified by hostname not address
DNS translates names into addresses

59
Domain Name System (DNS)

DNS is built around recursive delegation
Top level domains (TLDs) .com, .net, .edu, etc.
TLDs delegate authority to subdomains
berkeley.edu
Subdomains can further delegate
cs.berkeley.edu
Hierarchy fits host administrative structure
Local decentralized control
Crucial to efficient hostname resolution

60
Modern Web ? Data-Centricity

URLs often function as names of data
users think of www.cnn.com as data, not a host
Fact that www.cnn.com is a hostname is irrelevant
Users want data, not access to particular host
The web is now data-centric

61
Data-centric App in Host-centric World

Data still associated with host names (URLs)
administrative structure of data same as hosts
weak point in current web
Key enabler search engines
Searchable databases map keywords to URLs
Allowed users to find desired data
Networkers focused on technical problems
HTTP, persistence (URNs), replication (CDNs), ...

62
A DNS for Data? DHTs

Can we map data names into addresses?
a data-centric DNS, distributed and scalable
doesnt alter net protocols, but aids data
location
not just about stolen music, but a general
facility
A formidable challenge
Data does not have a clear administrative
hierarchy
Likely need to support a flat namespace
Can one do this scalably?
Data-centrism requires scalable flat lookups gt
DHTs

63
Data Independence In DB Design

Decouple app-level API from data organization
Can make changes to data layout without modifying
applications
Simple version location-independent names
Fancier declarative queries

As clear a paradigm shift as we can hope to find
in computer science - C. Papadimitriou
64
The Pillars of Data Independence

Indexes
Value-based lookups have to compete with direct
access
Must adapt to shifting data distributions
Must guarantee performance
Query Optimization
Support declarative queries beyond lookup/search
Must adapt to shifting data distributions
Must adapt to changes in environment

65
Generalizing Data Independence

A classic level of indirection scheme
Indexes are exactly that
Complex queries are a richer indirection
The key for data independence
Its all about rates of change
Hellersteins Data Independence Inequality
Data independence matters when
d(environment)/dt gtgt d(app)/dt

66
Data Independence in Networks

d(environment)/dt gtgt d(app)/dt
In databases, the RHS is unusually small
This drove the relational database revolution
In extreme networked systems, LHS is unusually
high
And the applications increasingly complex and
data-driven
Simple indirections (e.g. local lookaside tables)
insufficient

67
Hierarchical Networks ( Queries)

IP
Hierarchical name space (www.vldb.org,
141.12.12.51)
Hierarchical routing
Autonomous Systems correlate with name space
(though not perfectly)
DNS
Hierarchical name space (clients hierarchy of
servers)
Hierarchical routing w/aggressive caching
13 managed root servers
Traditional pros/cons of Hierarchical data mgmt
Works well for things aligned with the hierarchy
Esp. physical locality a la Astrolabe
Inflexible
No data independence!

68
The Pillars of Data Independence

Indexes
Value-based lookups have to compete with direct
access
Must adapt to shifting data distributions
Must guarantee performance
Query Optimization
Support declarative queries beyond lookup/search
Must adapt to shifting data distributions
Must adapt to changes in environment

69
Sensor Networks The Internet Meets the
Environment
70
Today Internet meets Mobile Wireless Computing
iPoD impact of disk size/cost
Samsung Cameraphone w/ camcorder

Computing smaller, faster
Disks larger size, small form
Communications wireless voice, data
Multimedia integration voice, data, video, games

SONY PSP mobile gaming
Blackberry phone PDA
71
Tomorrow Embedded Networked Sensing Apps

Micro-sensors, on-board processing, wireless
interfaces feasible at very small scale--can
monitor phenomena up close
Enables spatially and temporally dense
environmental monitoring
Embedded Networked Sensing will reveal
previously unobservable phenomena

Seismic Structure response
Contaminant Transport
Ecosystems, Biocomplexity
Marine Microorganisms
72
Embedded Networked Sensing Motivation

Imagine
high-rise buildings self-detect structural faults
(e.g., weld cracks)
schools detect airborn toxins at low
concentrations, trace contaminant transport to
source
buoys alert swimmers to dangerous bacterial
levels
earthquake-rubbled building infiltrated with
robots and sensors locate survivors, evaluate
structural damage
ecosystems infused with chemical, physical,
acoustic, image sensors to track global change
parameters
battlefield sprinkled with sensors that identify
track friendly/foe air, ground vehicles,
personnel

73
Embedded Sensor Nets Enabling Technologies
Embed numerous distributed devices to monitor and
interact with physical world
Network devices to coordinate and perform
higher-level tasks
Embedded
Networked
Exploitcollaborative Sensing, action
Control system w/ Small form factor Untethered
nodes
Sensing
Tightly coupled to physical world
Exploit spatially/temporally dense, in
situ/remote, sensing/actuation
74
Sensornets

Vision
Many sensing devices with radio and processor
Enable fine-grained measurements over large areas
Huge potential impact on science, and society
Technical challenges
untethered power consumption must be limited
unattended robust and self-configuring
wireless ad hoc networking

75
Similarity w/ P2P Networks

Sensornets are inherently data-centric
Users know what data they want, not where it is
Estrin, Govindan, Heidemann (2000, etc.)
Centralized database infeasible
vast amount of data, constantly being updated
small fraction of data will ever be queried
sending to single site expends too much energy

76
Sensor Nets New Design Themes

Self configuring systems that adapt to
unpredictable environment
dynamic, messy (hard to model), environments
preclude pre-configured behavior
Leverage data processing inside the network
exploit computation near data to reduce
communication
collaborative signal processing
achieve desired global behavior with localized
algorithms (distributed control)
Long-lived, unattended, untethered, low duty
cycle systems
energy a central concern
communication primary consumer of scarce energy
resource

77
From Embedded Sensing to Embedded Control

embedded in unattended control systems
control network, and act in environment
critical apps extend beyond sensing to control
and actuation
transportation, precision agriculture, medical
monitoring and drug delivery, battlefield apps
concerns extend beyond traditional networked
systems and apps usability, reliability, safety
need systems architecture to manage interactions
current system development one-off,
incrementally tuned, stove-piped
repercussions for piecemeal uncoordinated design
insufficient longevity, interoperability, safety,
robustness, scaling

78
Why cant we simply adapt Internet protocols, end
to end architecture?

Internet routes data using IP Addresses in
Packets and Lookup tables in routers
humans get data by naming data to a search
engine
many levels of indirection between name and IP
address
embedded, energy-constrained (un-tethered,
small-form-factor), unattended systems cant
tolerate communication overhead of indirection
special purpose system function(s) dont need
want Internet general purpose functionality
designed for elastic applications.

79
Sample Layered Architecture
User Queries, External Database
Resource constraints call for more tightly
integrated layers Open Question What are
defining Architectural Principles?
In-network Application processing, Data
aggregation, Query processing
Data dissemination, storage, caching
Adaptive topology, Geo-Routing
MAC, Time, Location
Phy comm, sensing, actuation, SP
80
Coverage measures

area coverage fraction of area covered by
sensors
detectability probability sensors detect moving
objects
node coverage fraction of sensors covered by
other sensors
control
where to add new nodes for max coverage
how to move existing nodes for max coverage

D
x
S
Given sensor field (either known sensor
locations, or spatial density)
81
In Network Processing

communication expensive when limited
power
bandwidth
perform (data) processing in network
close to (at) data
forward fused/synthesized results
e.g., find max. of data
distributed data, distributed computation

82
Distributed Representation and Storage

Data Centric Protocols, In-network Processing
goal
Interpretation of spatially distributed data
(Per-node processing alone is not enough)
network does in-network processing based on
distribution of data
Queries automatically directed towards nodes that
maintain relevant/matching data
pattern-triggered data collection
Multi-resolution data storage and retrieval
Distributed edge/feature detection
Index data for easy temporal and spatial
searching
Finding global statistics (e.g., distribution)

83
Directed Diffusion Data Centric Routing

Basic idea
name data (not nodes) with externally relevant
attributes data type, time, location of node,
SNR,
diffuse requests and responses across network
using application driven routing (e.g., geo
sensitive or not)
support in-network aggregation and processing
data sources publish data, data clients subscribe
to data
however, all nodes may play both roles
node that aggregates/combines/processes incoming
sensor node data becomes a source of new data
node that only publishes when combination of
conditions arise, is client for triggering event
data
true peer to peer system?

84
Traditional Approach Warehousing

data extracted from sensors, stored on server
Query processing takes place on server

85
Sensor Database System

Sensor Database System supports distributed query
processing over sensor network

Sensor Nodes
86
Sensor Database System

Can existing database techniques be reused? What
are the new problems and solutions?
Representing sensor data
Representing sensor queries
Processing query fragments on sensor nodes
Distributing query fragments
Adapting to changing network conditions
Dealing with site and communication failures
Deploying and Managing a sensor database system