PeertoPeer P2P and Sensor Networks - PowerPoint PPT Presentation

1 / 87
About This Presentation
Title:

PeertoPeer P2P and Sensor Networks

Description:

http://www.ecse.rpi.edu/Homepages/shivkuma ... 10 billions of Mhz CPUs. 10000 terabytes of storage. Clients are not that dumb after all ... – PowerPoint PPT presentation

Number of Views:39
Avg rating:3.0/5.0
Slides: 88
Provided by: ShivkumarK7
Category:

less

Transcript and Presenter's Notes

Title: PeertoPeer P2P and Sensor Networks


1
Peer-to-Peer (P2P) and Sensor Networks
  • Shivkumar Kalyanaraman
  • Rensselaer Polytechnic Institute
  • shivkuma_at_ecse.rpi.edu
  • http//www.ecse.rpi.edu/Homepages/shivkuma
  • Based in part upon slides of Don Towsley, Ion
    Stoica, Scott Shenker, Joe Hellerstein, Jim
    Kurose, Hung-Chang Hsiao, Chung-Ta King

2
Overview
  • P2P networks Napster, Gnutella, Kazaa
  • Distributed Hash Tables (DHTs)
  • Database perspectives data-centricity,
    data-independence
  • Sensor networks and its connection to P2P

3
P2P Key Idea
  • Share the content, storage and bandwidth of
    individual (home) users

Internet
4
(No Transcript)
5
(No Transcript)
6
What is P2P (Peer-to-Peer)?
  • P2P as a mindset
  • Slashdot
  • P2P as a model
  • Gnutella
  • P2P as an implementation choice
  • Application-layer multicast
  • P2P as an inherent property
  • Ad-hoc networks

7
P2P Application Taxonomy
P2P Systems
Distributed Computing SETI_at_home
File Sharing Gnutella
Collaboration Jabber
Platforms JXTA
8
How to Find an Object in a Network?
Network
9
A Straightforward Idea
Use a BIG server
Store the object
How to do it in a distributed way?
Network
Provide a directory
10
Why Distributed?
  • Client-server model
  • Client is dumb
  • Server does most things (compute, store, control)
  • Centralization makes things simple, but
    introduces
  • Single point of failure, performance bottleneck,
    tighter control, access fee and manage cost,
  • ad hoc participation?
  • Estimate of net PCs
  • 10 billions of Mhz CPUs
  • 10000 terabytes of storage
  • Clients are not that dumb after all
  • Use the resources in the clients (at net edges)

11
(No Transcript)
12
(No Transcript)
13
First Idea Napster
  • Distributing objects, centralizing directory

Network
14
(No Transcript)
15
(No Transcript)
16
(No Transcript)
17
Today P2P Video traffic is dominant
  • Source cachelogic Video, bittorrent, edonkey !

18
40-60 P2P traffic
19
2006 p2p Data
  • Between 50 and 65 percent of all download traffic
    is P2P related.Between 75 and 90 percent of all
    upload traffic is P2P related.
  • And it seems that more people are using p2p
    today
  • In 2004 1 CacheLogic-server registered 3 million
    IP-addresses in 30 daysIn 2006 1
    CacheLogic-server registered 3 million
    IP-addresses in 8 days
  • So what do people download?
  • 61,4 percent video11,3 percent audio27,2
    percent is games/software/etc.
  • The average filesize of shared files is 1
    gigabyte!
  • Source http//torrentfreak.com/peer-to-peer-traff
    ic-statistics/

20
(No Transcript)
21
A More Aggressive Idea
  • Distributing objects and directory

Blind flooding!
How to find objects w/o directory?
Network
22
(No Transcript)
23
Gnutella
  • Distribute file location
  • Idea flood the request
  • Hot to find a file
  • Send request to all neighbors
  • Neighbors recursively multicast the request
  • Eventually a machine that has the file receives
    the request, and it sends back the answer
  • Advantages
  • Totally decentralized, highly robust
  • Disadvantages
  • Not scalable the entire network can be swamped
    with request (to alleviate this problem, each
    request has a TTL)

24
Gnutella Unstructured P2P
  • Ad-hoc topology
  • Queries are flooded for bounded number of hops
  • No guarantees on recall

xyz
Query xyz
25
Now Bittorrent Edonkey2000! (2006)
26
Lessons and Limitations
  • Client-Server performs well
  • But not always feasible
  • Ideal performance is often not the key issue!
  • Things that flood-based systems do well
  • Organic scaling
  • Decentralization of visibility and liability
  • Finding popular stuff
  • Fancy local queries
  • Things that flood-based systems do poorly
  • Finding unpopular stuff Loo, et al VLDB 04
  • Fancy distributed queries
  • Vulnerabilities data poisoning, tracking, etc.
  • Guarantees about anything (answer quality,
    privacy, etc.)

27
Detour . Bittorrent
28
(No Transcript)
29
(No Transcript)
30
BitTorrent joining a torrent
metadata file
peer list
join
datarequest
  • Peers divided into
  • seeds have the entire file
  • leechers still downloading

1. obtain the metadata file
2. contact the tracker
3. obtain a peer list (contains seeds leechers)
4. contact peers from that list for data
31
BitTorrent exchanging data
!
I have
? Verify pieces using hashes
? Download sub-pieces in parallel
? Advertise received pieces to the entire peer
list
? Look for the rarest pieces
32
BitTorrent - unchoking
? Periodically calculate data-receiving rates
? Upload to (unchoke) the fastest downloaders
? Optimistic unchoking ? periodically select
a peer at random and upload to it ?
continuously look for the fastest partners
33
End of Detour .
34
Back to P2P Structures
  • Unstructured P2P architecture
  • Napster, Gnutella, Freenet
  • No logically deterministic structures to
    organize the participating peers
  • No guarantee objects be found
  • How to find objects within some no. of hops?
  • Extend hashing
  • Structured P2P architecture
  • CAN, Chord, Pastry, Tapestry, Tornado,
  • Viewed as a distributed hash table for directory

35
How to Bound Search Quality?
  • Many ideas , again

Work on placement!
Network
36
High-Level Idea Indirection
  • Indirection in space
  • Logical (content-based) IDs, routing to those IDs
  • Content-addressable network
  • Tolerant of churn
  • nodes joining and leaving the network
  • Indirection in time
  • Want some scheme to temporally decouple send and
    receive
  • Persistence required. Typical Internet solution
    soft state
  • Combo of persistence via storage and via retry
  • Publisher requests TTL on storage
  • Republishes as needed
  • Metaphor Distributed Hash Table

hz
37
Basic Idea
P2P Network
Publish (H(y))
Join (H(x))
Object y
Peer x
H(y)
H(x)
Peer nodes also have hash keys in the same hash
space
Objects have hash keys
y
x
Hash key
Place object to the peer with closest hash keys
38
Distributed Hash Tables (DHTs)
  • Abstraction a distributed hash-table data
    structure
  • insert(id, item)
  • item query(id) (or lookup(id))
  • Note item can be anything a data object,
    document, file, pointer to a file
  • Proposals
  • CAN, Chord, Kademlia, Pastry, Tapestry, etc
  • Goals
  • Make sure that an item (file) identified is
    always found
  • Scales to hundreds of thousands of nodes
  • Handles rapid arrival and failure of nodes

39
Viewed as a Distributed Hash Table
0
2128-1
Hash table
Peer node
Each is responsible for a range of the hash
table, according to the peer hash key Objects
are placed in the peer with the closest key
Note that peers are Internet edges
40
How to Find an Object?
0
2128-1
Hash table
Peer node
Want to keep only a few entries!
one hop to find the object
Simplest idea Everyone knows everyone else!
41
Structured Networks
  • Distributed Hash Tables (DHTs)
  • Hash table interface put(key,item), get(key)
  • O(log n) hops
  • Guarantees on recall

42
Content Addressable Network, CAN
  • Distributed hash table
  • Hash table as in a Cartesian coordinate space
  • A peer only needs to know its logical neighbors
  • Dimensional-ordered multihop routing

43
Content Addressable Network (CAN)
  • Associate to each node and item a unique id in an
    d-dimensional Cartesian space on a d-torus
  • Properties
  • Routing table size O(d)
  • Guarantees that a file is found in at most dn1/d
    steps, where n is the total number of nodes

44
CAN Example Two Dimensional Space
  • Space divided between nodes
  • All nodes cover the entire space
  • Each node covers either a square or a rectangular
    area of ratios 12 or 21
  • Example
  • Node n1(1, 2) first node that joins ? cover the
    entire space

7
6
5
4
3
n1
2
1
0
2
3
4
6
7
0
1
5
45
CAN Example Two Dimensional Space
  • Node n2(4, 2) joins ? space is divided between
    n1 and n2

7
6
5
4
3
n2
n1
2
1
0
2
3
4
6
7
0
1
5
46
CAN Example Two Dimensional Space
  • Node n2(4, 2) joins ? space is divided between
    n1 and n2

7
6
n3
5
4
3
n2
n1
2
1
0
2
3
4
6
7
0
1
5
47
CAN Example Two Dimensional Space
  • Nodes n4(5, 5) and n5(6,6) join

7
6
n5
n4
n3
5
4
3
n2
n1
2
1
0
2
3
4
6
7
0
1
5
48
CAN Example Two Dimensional Space
  • Nodes n1(1, 2) n2(4,2) n3(3, 5)
    n4(5,5)n5(6,6)
  • Items f1(2,3) f2(5,1) f3(2,1) f4(7,5)

7
6
n5
n4
n3
5
f4
4
f1
3
n1
n2
2
f3
1
f2
0
2
3
4
5
6
7
0
1
49
CAN Example Two Dimensional Space
  • Each item is stored by the node who owns its
    mapping in the space

7
6
n5
n4
n3
f4
5
4
f1
3
n2
n1
2
f3
1
f2
0
2
3
4
6
7
0
1
5
50
CAN Query Example
  • Each node knows its neighbors in the d-space
  • Forward query to the neighbor that is closest to
    the query id
  • Example assume n1 queries f4
  • Can route around some failures

7
6
n5
n4
n3
f4
5
4
f1
3
n2
n1
2
f3
1
f2
0
2
3
4
6
7
0
1
5
51
Another Design Chord
  • Node and object keys
  • random location around a circle
  • Neighbors
  • nodes 2-i around the circle
  • found by routing to desired key
  • Routing greedy
  • pick nbr closest to destination
  • Storage own interval
  • node owns key range betweenher key and previous
    nodes key

?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
Ownership range
?
?
52
OpenDHT
  • A shared DHT service
  • The Bamboo DHT
  • Hosted on PlanetLab
  • Simple RPC API
  • You dont need to deploy or host to play with a
    real DHT!

53
Review DHTs vs Unstructured P2P
  • DHTs good at
  • exact match for rare items
  • DHTs bad at
  • keyword search, etc. cant construct DHT-based
    Google
  • tolerating extreme churn
  • Gnutella etc. (unstructured P2P) good at
  • general search
  • finding common objects
  • very dynamic environments
  • Gnutella etc. bad at
  • finding rare items

54
Distributed Systems Pre-Internet
  • Connected by LANs (low loss and delay)
  • Small scale (10s, maybe 100s per server)
  • PODC literature focused on algorithms to achieve
    strict semantics in the face of failures
  • Two-phase commits
  • Synchronization
  • Byzantine agreement
  • Etc.

55
Distributed Systems Post-Internet
  • Very different context
  • Huge scales (thousands if not millions)
  • Highly variable connectivity
  • Failures common
  • Organic growth
  • Abandoned distributed strict semantics
  • Adaptive apps rather than guaranteed
    infrastructure
  • Adopted pairwise client-server approach
  • Server is centralized (even if server farm)
  • Relatively primitive approach (no sophisticated
    dist. algms.)
  • Little support from infrastructure or middleware

56
A Database viewpoint on DHTs Towards
Data-centricity, Data Independence
57
Host-centric Protocols
  • Protocols defined in terms of IP addresses
  • Unicast IP address host
  • Multicast IP address set of hosts
  • Destination address is given to protocol
  • Protocol delivers data from one host to another
  • unicast conceptually trivial
  • multicast address is logical, not physical

58
Host-centric Applications
  • Classic applications destination is intrinsic
  • telnet target machine
  • FTP location of files
  • electronic mail email address turns into mail
    server
  • multimedia conferencing machines of participants
  • Destination is specified by user (not network)
  • Usually specified by hostname not address
  • DNS translates names into addresses

59
Domain Name System (DNS)
  • DNS is built around recursive delegation
  • Top level domains (TLDs) .com, .net, .edu, etc.
  • TLDs delegate authority to subdomains
  • berkeley.edu
  • Subdomains can further delegate
  • cs.berkeley.edu
  • Hierarchy fits host administrative structure
  • Local decentralized control
  • Crucial to efficient hostname resolution

60
Modern Web ? Data-Centricity
  • URLs often function as names of data
  • users think of www.cnn.com as data, not a host
  • Fact that www.cnn.com is a hostname is irrelevant
  • Users want data, not access to particular host
  • The web is now data-centric

61
Data-centric App in Host-centric World
  • Data still associated with host names (URLs)
  • administrative structure of data same as hosts
  • weak point in current web
  • Key enabler search engines
  • Searchable databases map keywords to URLs
  • Allowed users to find desired data
  • Networkers focused on technical problems
  • HTTP, persistence (URNs), replication (CDNs), ...

62
A DNS for Data? DHTs
  • Can we map data names into addresses?
  • a data-centric DNS, distributed and scalable
  • doesnt alter net protocols, but aids data
    location
  • not just about stolen music, but a general
    facility
  • A formidable challenge
  • Data does not have a clear administrative
    hierarchy
  • Likely need to support a flat namespace
  • Can one do this scalably?
  • Data-centrism requires scalable flat lookups gt
    DHTs

63
Data Independence In DB Design
  • Decouple app-level API from data organization
  • Can make changes to data layout without modifying
    applications
  • Simple version location-independent names
  • Fancier declarative queries

As clear a paradigm shift as we can hope to find
in computer science - C. Papadimitriou
64
The Pillars of Data Independence
  • Indexes
  • Value-based lookups have to compete with direct
    access
  • Must adapt to shifting data distributions
  • Must guarantee performance
  • Query Optimization
  • Support declarative queries beyond lookup/search
  • Must adapt to shifting data distributions
  • Must adapt to changes in environment

65
Generalizing Data Independence
  • A classic level of indirection scheme
  • Indexes are exactly that
  • Complex queries are a richer indirection
  • The key for data independence
  • Its all about rates of change
  • Hellersteins Data Independence Inequality
  • Data independence matters when
  • d(environment)/dt gtgt d(app)/dt

66
Data Independence in Networks
  • d(environment)/dt gtgt d(app)/dt
  • In databases, the RHS is unusually small
  • This drove the relational database revolution
  • In extreme networked systems, LHS is unusually
    high
  • And the applications increasingly complex and
    data-driven
  • Simple indirections (e.g. local lookaside tables)
    insufficient

67
Hierarchical Networks ( Queries)
  • IP
  • Hierarchical name space (www.vldb.org,
    141.12.12.51)
  • Hierarchical routing
  • Autonomous Systems correlate with name space
    (though not perfectly)
  • DNS
  • Hierarchical name space (clients hierarchy of
    servers)
  • Hierarchical routing w/aggressive caching
  • 13 managed root servers
  • Traditional pros/cons of Hierarchical data mgmt
  • Works well for things aligned with the hierarchy
  • Esp. physical locality a la Astrolabe
  • Inflexible
  • No data independence!

68
The Pillars of Data Independence
  • Indexes
  • Value-based lookups have to compete with direct
    access
  • Must adapt to shifting data distributions
  • Must guarantee performance
  • Query Optimization
  • Support declarative queries beyond lookup/search
  • Must adapt to shifting data distributions
  • Must adapt to changes in environment

69
Sensor Networks The Internet Meets the
Environment
70
Today Internet meets Mobile Wireless Computing
iPoD impact of disk size/cost
Samsung Cameraphone w/ camcorder
  • Computing smaller, faster
  • Disks larger size, small form
  • Communications wireless voice, data
  • Multimedia integration voice, data, video, games

SONY PSP mobile gaming
Blackberry phone PDA
71
Tomorrow Embedded Networked Sensing Apps
  • Micro-sensors, on-board processing, wireless
    interfaces feasible at very small scale--can
    monitor phenomena up close
  • Enables spatially and temporally dense
    environmental monitoring
  • Embedded Networked Sensing will reveal
    previously unobservable phenomena

Seismic Structure response
Contaminant Transport
Ecosystems, Biocomplexity
Marine Microorganisms
72
Embedded Networked Sensing Motivation
  • Imagine
  • high-rise buildings self-detect structural faults
    (e.g., weld cracks)
  • schools detect airborn toxins at low
    concentrations, trace contaminant transport to
    source
  • buoys alert swimmers to dangerous bacterial
    levels
  • earthquake-rubbled building infiltrated with
    robots and sensors locate survivors, evaluate
    structural damage
  • ecosystems infused with chemical, physical,
    acoustic, image sensors to track global change
    parameters
  • battlefield sprinkled with sensors that identify
    track friendly/foe air, ground vehicles,
    personnel

73
Embedded Sensor Nets Enabling Technologies
Embed numerous distributed devices to monitor and
interact with physical world
Network devices to coordinate and perform
higher-level tasks
Embedded
Networked
Exploitcollaborative Sensing, action
Control system w/ Small form factor Untethered
nodes
Sensing
Tightly coupled to physical world
Exploit spatially/temporally dense, in
situ/remote, sensing/actuation
74
Sensornets
  • Vision
  • Many sensing devices with radio and processor
  • Enable fine-grained measurements over large areas
  • Huge potential impact on science, and society
  • Technical challenges
  • untethered power consumption must be limited
  • unattended robust and self-configuring
  • wireless ad hoc networking

75
Similarity w/ P2P Networks
  • Sensornets are inherently data-centric
  • Users know what data they want, not where it is
  • Estrin, Govindan, Heidemann (2000, etc.)
  • Centralized database infeasible
  • vast amount of data, constantly being updated
  • small fraction of data will ever be queried
  • sending to single site expends too much energy

76
Sensor Nets New Design Themes
  • Self configuring systems that adapt to
    unpredictable environment
  • dynamic, messy (hard to model), environments
    preclude pre-configured behavior
  • Leverage data processing inside the network
  • exploit computation near data to reduce
    communication
  • collaborative signal processing
  • achieve desired global behavior with localized
    algorithms (distributed control)
  • Long-lived, unattended, untethered, low duty
    cycle systems
  • energy a central concern
  • communication primary consumer of scarce energy
    resource

77
From Embedded Sensing to Embedded Control
  • embedded in unattended control systems
  • control network, and act in environment
  • critical apps extend beyond sensing to control
    and actuation
  • transportation, precision agriculture, medical
    monitoring and drug delivery, battlefield apps
  • concerns extend beyond traditional networked
    systems and apps usability, reliability, safety
  • need systems architecture to manage interactions
  • current system development one-off,
    incrementally tuned, stove-piped
  • repercussions for piecemeal uncoordinated design
    insufficient longevity, interoperability, safety,
    robustness, scaling

78
Why cant we simply adapt Internet protocols, end
to end architecture?
  • Internet routes data using IP Addresses in
    Packets and Lookup tables in routers
  • humans get data by naming data to a search
    engine
  • many levels of indirection between name and IP
    address
  • embedded, energy-constrained (un-tethered,
    small-form-factor), unattended systems cant
    tolerate communication overhead of indirection
  • special purpose system function(s) dont need
    want Internet general purpose functionality
    designed for elastic applications.

79
Sample Layered Architecture
User Queries, External Database
Resource constraints call for more tightly
integrated layers Open Question What are
defining Architectural Principles?
In-network Application processing, Data
aggregation, Query processing
Data dissemination, storage, caching
Adaptive topology, Geo-Routing
MAC, Time, Location
Phy comm, sensing, actuation, SP
80
Coverage measures
  • area coverage fraction of area covered by
    sensors
  • detectability probability sensors detect moving
    objects
  • node coverage fraction of sensors covered by
    other sensors
  • control
  • where to add new nodes for max coverage
  • how to move existing nodes for max coverage

D
x
S
Given sensor field (either known sensor
locations, or spatial density)
81
In Network Processing
  • communication expensive when limited
  • power
  • bandwidth
  • perform (data) processing in network
  • close to (at) data
  • forward fused/synthesized results
  • e.g., find max. of data
  • distributed data, distributed computation

82
Distributed Representation and Storage
  • Data Centric Protocols, In-network Processing
    goal
  • Interpretation of spatially distributed data
    (Per-node processing alone is not enough)
  • network does in-network processing based on
    distribution of data
  • Queries automatically directed towards nodes that
    maintain relevant/matching data
  • pattern-triggered data collection
  • Multi-resolution data storage and retrieval
  • Distributed edge/feature detection
  • Index data for easy temporal and spatial
    searching
  • Finding global statistics (e.g., distribution)

83
Directed Diffusion Data Centric Routing
  • Basic idea
  • name data (not nodes) with externally relevant
    attributes data type, time, location of node,
    SNR,
  • diffuse requests and responses across network
    using application driven routing (e.g., geo
    sensitive or not)
  • support in-network aggregation and processing
  • data sources publish data, data clients subscribe
    to data
  • however, all nodes may play both roles
  • node that aggregates/combines/processes incoming
    sensor node data becomes a source of new data
  • node that only publishes when combination of
    conditions arise, is client for triggering event
    data
  • true peer to peer system?

84
Traditional Approach Warehousing
  • data extracted from sensors, stored on server
  • Query processing takes place on server

85
Sensor Database System
  • Sensor Database System supports distributed query
    processing over sensor network

Sensor Nodes
86
Sensor Database System
  • Can existing database techniques be reused? What
    are the new problems and solutions?
  • Representing sensor data
  • Representing sensor queries
  • Processing query fragments on sensor nodes
  • Distributing query fragments
  • Adapting to changing network conditions
  • Dealing with site and communication failures
  • Deploying and Managing a sensor database system
  • Characteristics of a Sensor Network
  • Streams of data
  • Uncertain data
  • Large number of nodes
  • Multi-hop network
  • No global knowledge about the network
  • Node failure and interference is common
  • Energy is the scarce resource
  • Limited memory
  • No administration,

87
Summary
  • P2P networks Napster, Gnutella, Kazaa
  • Distributed Hash Tables (DHTs)
  • Database perspectives data-centricity,
    data-independence
  • Sensor networks and its connection to P2P
Write a Comment
User Comments (0)
About PowerShow.com